• NIAS-Server
  • Database
  • Structure
  • Docs
  • Results

NIAS-Server: Neighbors Influence of Amino Acids and Secondary Structures - Server

NIAS is a server to help the analysis of the conformational preferences of amino acid residues in proteins.

Server Status: 0 Job(s) in the queue right now.
Email Address: Short identifier for submission:
Search by Genbank id:   Search AA by manual input:
GenInfo Identifier (gi):
Definition:
Target AA Sequence:
PSIPRED v4.0:   Search SS by manual input:
Target SS Sequence:
PDBs to Ignore:
B-factor Coil SS:   DSSP v2.1:  STRIDE v1.0:Centroid:
B-factor Regular SS:   APL 1:  APL 2: APL 3:     
Feedback:   APL 5:  APL 7: APL 9:

For inquires about NIAS:

mdorn@inf.ufrgs.br or bborguesan@inf.ufrgs.br

Please cite NIAS as shown below:

Borguesan, B.; Inostroza-Ponta, M.; Dorn, M. NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins. Journal of Computational Biology. March 2017, 24(3): 255-265.

NIAS-Server: Database

The Angle Probability List (APL) represents the normalized frequency of observed pairs of amino acid residues and secondary structure in the Protein Data Bank [1]. It combines the conformational preferences of amino acid residues (aa, torsion angles) in proteins with their secondary structure information (ss). The APL database consists of a set of 11,130 protein (download the pdb list here) structures experimentally determined by X-ray diffraction with resolution ≤ 2.5Å and stored in PDB until December 2014. It only considers 3D protein structures with R-factor less than 0.2. For homologous protein chains with sequence identity at most 30%, only one of them is considered. A set of 5,255,768 amino acids with occupancy=1 are used to further analysis. For each amino acid residue dihedral angles, phi and psi, is computed and assigns its secondary structure information using Stride [2] (Tab. 1) and Dssp [3]. (Tab. 2)
An Angle Probability List (APL) is built from a matrix Haa,ss of [− 180, 180] × [−180, 180] cells for each amino acid residue (aa) and secondary structure (ss). Each cell (i,j) has the number of times that a given amino acid residue aa in secondary structure ss has a pair of torsion angles and for each pair, amino acid residue and secondary structure, the APLaa,ss is computed by and represents the normalized frequency of each pair. A higher frequency associated with a pair phi and psi indicates that this combination is more common in nature. Ramachandran Plots from Fig. 1 show the conformational preferences of Amino Acids Alanine, Cysteine and Glycine in Turn secondary structure.




Fig 1. Ramachandran plots for Alanine (left) Cysteine (center) Glycine (right) amino acid residues for Turn secondary structure (T). The dark red color marks the most densely occupied regions of the Ramachandran plot.

With APL it is possible to analyze the conformational preferences (dihedral angles) of each amino acid residue and its secondary structure with DSSP and STRIDE (Fig. 2). For a given secondary structure it is possible to see different conformational preferences (phi and psi) depending on the amino acid residue.
Table 1. Stride 1-letter code used in APL.
Secondary Structure 1-Letter-Code
Alpha helix H
3-10 helix G
PI-helix I
Extended conformation E
Isolated bridge B or b
Turn T
Coil (none of the above) C
Table 2. DSSP 1-letter code used in APL.
Secondary Structure 1-Letter-Code
Alpha helix
H
3-10 helix
G
PI-helix
I
Extended strand, participates in β ladder
E
Residue in isolated β-bridge
B
Hydrogen bonded Turn
T
Bend
S
Coil (none of the above)
C

SS

DSSP

STRIDE

SS

DSSP

STRIDE

H/H

G/G

I/I

E/E

B/B

T/T

C/C

S/b


Fig 2. Individual Ramachandran plots for STRIDE and DSSP secondary structures. Torsion angles values were computed from a set of 11,130 protein structures obtained from the PDB.
The dark red color marks the most densely occupied regions of the Ramachandran plot.

ALA

ARG

ASN

ASP

CYS

GLU

GLN

GLY

HIS

ILE

LEU

LYS

MET

PHE

PRO

SER

THR

TRP

TYR

VAL


Fig 3. Conformational preferences of the 20 amino acid residues. Torsion angles values were computed from a set of 11,130 protein structures obtained from the PDB.
The dark red color marks the most densely occupied regions of the Ramachandran plot.

Tables below summarize the conformational preferences per secondary structure of the 5,255,768 amino acid residues obtained from the 11,130 selected protein structures when STRIDE and DSSD are considered.


Please cite APL as shown below:

BORGUESAN, B.; BARBACHAN e SILVA, M.; GRISCI, B. I.; INSTROZA-PONTA, M.; DORN, M. APL: an Angle Probability List to improve knowledge-based metaheuristics for the three-dimensional protein structure prediction. Computational Biology and Chemistry (Print), v. 59, p. 142-157, 2015.

References:

[1] Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N. and Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 28(1):235–242 (2000).
[2] Frishman, D. and Argos P. Knowledge-Based Protein Secondary Structure Assignment Proteins: Structure, Function, and Genetics 23:566-579 (1995).
[3] Kabsch, W. and Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577-2637 (1983).

NIAS-Server: Structure

The Angle Probability List is organized in four different types. The APL1 without neighbor influence, APL2 with Single Neighbor-Dependent influence (Left or Right neighbor), APL3 with Complete Neighbor-Dependent influence (Left and Right neighbor) and APL-Centroid which considers the full Neighbors of range five, seven and nine for the Secondary Structure and only the middle Amino Acid residue. These APL structures are better described below.

Fig1. Schematic representation of APL1.

APL1:


Fig1 represents the final structure to build the Angle Probability List with the NIAS-Server. The target set of aa1,...,n and ss1,...,n is used to search in the APL Database and built the APL with the relative frequency of occurrence for each pair aa, ss. The NIAS-Server ignores the PDB ID's informed by user, and also ignores the amino acid in the “Coil”, “Turn” or “Bend” secondary structure with average of B-Factor backbone greater than B-factor Coil SS informed by user. To the other secondary structures the threshold used is the B-factor SS informed in the NIAS-Server. The APL1 implemented is similar to the Angle Probability List developed by Borguesan et al, 2015.
The structure of the output data is demonstrated in the APL Docs.


Fig2. Schematic representation of APL2.

APL2:


The APL2 is the Single Neighbor-Dependent Angle Probability List. In the APL2 instead of using only one pair (aai,ssi) the Neighbor-Dependent pair is used (aai,ssi) (aai+1,ssi+1). With this approach we have two files, the first one returns the relative frequency of occurrence of the pair at left (aai,ssi) with the influence of the pair at right (aai+1,ssi+1). The other one returns the relative frequency of occurrence of the pair at right (aai+1,ssi+1) with the influence of the pair at left (aai,ssi). The Fig2 represents that structure of the Single Neighbor-Dependent Angle Probability List.
The APL2 uses the same strategy to ignores PDB's and B-Factor as the APL1.
The structure of the output data is demonstrated in the APL Docs.


Fig3. Schematic representation of APL3.

APL3:


The APL3 is the Complete Neighbor-Dependent Angle Probability List. In the APL3 both pairs from left (aai-1,ssi-1) and right (aai+1,aai+1) have influence in the middle pair (aai,ssi). This approach has less occurrences than APL2, which has less occurrences than APL1. Fig3 represents the structure of the Complete Neighbor-Dependent Angle Probability List. The APL3 uses the same strategy to ignores PDB's and B-Factor as the APL1.
The structure of the output data is demonstrated in the APL Docs.


Fig4. Schematic representation of APL5 (centroid).

APL-Centroid:


This APL uses only the amino acid central with any combination of amino acids in the neighbors, but maintaining the full secondary structure target. Fig4 shows the structure of APL5 where the "*" can be replaced by any amino acid residue. The APL7 and APL9 uses the same structure, only changing the range of selected part of target by seven and nine respectively. This APL's use the same strategy to ignores PDB's and B-Factor as the APL1.
The structure of the output data is demonstrated in the APL Docs.

NIAS-Server: Documentation



NIAS is a server to help the analysis of the conformational preferences of amino acid residues in proteins or fragments.
How to use NIAS-server:

1. Inform your e-mail address to receive an email with information to perform the download of APL.

2. Inform the target amino acid sequence to be searched in our NIAS-server Database. The target sequence must be in one CAPITAL letter code format, example "FNAAANF".

            Amino Acid   1-Letter
            Alanine 	    A
            Arginine 	    R
            Asparagine 	    N
            Aspartic acid   D
            Cysteine 	    C
            Glutamic acid   E
            Glutamine 	    Q
            Glycine 	    G
            Histidine 	    H
            Isoleucine 	    I
            Leucine 	    L
            Lysine          K
            Methionine 	    M
            Phenylalanine   F
            Proline 	    P
            Serine          S
            Threonine 	    T
            Tryptophan 	    W
            Tyrosine 	    Y
            Valine          V

3. Inform the Secondary Structure content of the target amino acid sequence. The Secondary Structure content must be informed in one letter code from STRIDE or DSSP as presented below.

       Table 1. Stride 1-letter code used in APL.     Table 2. DSSP 1-letter code used in APL. 
       Secondary Structure        1-Letter-Code       Secondary Structure                    1-Letter-Code
          Alpha helix                   H             Alpha helix                                  H
          3-10 helix                    G             3-10 helix                                   G
          PI-helix                      I             PI-helix                                     I
          Extended conformation         E             Extended strand, participates in β ladder    E
          Isolated bridge               B or b        Residue in isolated β-bridge                 B
          Turn                          T             Hydrogen bonded Turn                         T
          Coil (none of the above)      C             Bend                                         S
                                                      Coil (none of the above)                     C

4. Inform the PDB ID's to ignore from the 11,130 proteins structure shown in 'NIAS Database'.

5. Inform B-factor Coil SS and  B-factor Regular SS:
The 'B-factor Coil SS' represents the threshold for coil and turn secondary structure which normally are the most flexible.
The 'B-factor Regular SS' represents the threshold for helices and sheets secondary structure.

6. Inform the secondary structure schema, and the APL structure to generate.
Select STRIDE or DSSP to assign the secondary structure. Please make sure to use the same Secondary Structure code informed in step 3.
Select your APL output based on the explanation in the 'NIAS Structure'.

7. Submit :D

The process of construction of APL can take some time. When the process it is over, you will receive a e-mail from apl.send@gmail.com with a Download link to your files.

Each APL selected will be separated in one folder, with the exception of APL2 which will generated one folder for left neighbor and one folder for right neighbor as explained in 'NIAS Structure'.

For each strip of the target sequence, based on the length of APL, will generate an APL file and one Ramachandran Plot. The structure of APL file is presented below.


    Example:
    File A_H_histogram.dat represents the APL for the Alanine (A) amino acid with alpha-Helix (H) secondary structure.

    File KA_TT_histogram.dat from folder "Right" represents the APL for the Alanine (A) amino acid with Turn (T) secondary structure with influence of a Lysine (K) in also in Turn (T) secondary structure.

    File KA_TT_histogram.dat from folder "Left" represents the APL for the Lysine (K) amino acid with Turn (T) secondary structure with influence of a Alanine (A) in also in Turn (T) secondary structure

    File QAK_EEE_histogram.dat represents the APL for the Alanine (A) amino acid with beta-Sheet (E) secondary structure round by a Lysine (K) in beta-Sheet (E) secondary structure and a Glutamine (Q) also in beta-Sheet (E) secondary structure.
    
    File A_EEEEC_histogram.dat represents the APL for the Alanine (A) amino acid with beta-Sheet (E) secondary structure round by two beta-Sheet (E) secondary structure from left and a beta-Sheet (E) secondary structure and coil (C) secondary structure from right.

    In each *_histogram.dat file, we have 4 major groups where the last group is subdivided in sets of [OMEGA, CHI's, Protein_ID]:
     PHI PSI Frequency [[OMEGA, CHI's, Protein_ID],]

    Example:
     QAK_EEE_histogram.dat
       PHI        PSI        Freq.    [[OMEGA, CHI's, Protein_ID],]
      -103.000000 168.000000 0.100000 [[175.5, 999.9, 999.9, 999.9, 999.9, '3RHTC'] ,]

    ***The value 999.9 represents the amino acid with no CHI angle.
            

NIAS-Server: Results


BORGUESAN, B.; BARBACHAN e SILVA, M.; GRISCI, B. I.; INSTROZA-PONTA, M.; DORN, M.. APL: An angle probability list to improve knowledge-based metaheuristics for the three-dimensional protein structure prediction. Computational Biology and Chemistry (Print), v. 59, p. 142-157, 2015.

INSTROZA-PONTA, M.; FARFÁN, C.; DORN, M.. 2015. A Memetic Algorithm for Protein Structure Prediction based on Conformational Preferences of Amino Acid Residues. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO Companion '15), Sara Silva (Ed.). ACM, New York, NY, USA, 1403-1404.

GRISCI, B. I.; BORGUESAN, B.; DORN, M.; INOSTROZA, M.. Using conformational preferences of amino acid residues and meta-heuristics to predict 3-D protein structures. In: Third International Society for Computational Biology Latin America, 2014, Belo Horizonte. Proceedings of third International Society for Computational Biology Latin America. La Jolla: International Society for Computational Biology, 2014.

DORN, MARCIO; INOSTROZA-PONTA, MARIO; BURIOL, L.S.; VERLI, H.. A knowledge-based genetic algorithm to predict three-dimensional structures of polypeptides. In: 2013 IEEE Congress on Evolutionary Computation (CEC), 2013, Cancun. 2013 IEEE Congress on Evolutionary Computation. p. 1233-8.



        
Structural Bioinformatics and Computational Biology Lab @2017