APL: Angle Probability List

The Angle Probability List (APL) represents the normalized frequency of observed pairs of amino acid residues and secondary structure in the Protein Data Bank. We combine the conformational preferences of amino acid residues (AA, torsion angles) in proteins with their secondary structure information (SS).

We selected a set of 6,650 protein structures from PDB. All 3-D protein structures were experimentally determined by X-ray diffraction with resolution ≤ 2.0Å and stored in PDB until December 2014. We remove all structures with R-factor greater than 0.2. If homologous protein chains with sequence identity at most 30% were found, only one of them was retained. We select only amino acid residues with b-factor ≤ 30Å2 and occupancy equal to 1. Similar parameters to filter PDB data were used before by Hovmoller and Ohlson (2002).

For more details please contact us: mdorn@inf.ufrgs.br

Download

APL Files (15.4 MB)

Please cite APL as shown below:

DOI BibTex

BORGUESAN, B.; BARBACHAN e SILVA, M.; GRISCI, B. I.; INSTROZA-PONTA, M.; DORN, M. APL: an Angle Probability List to improve knowledge-based metaheuristics for the three-dimensional protein structure prediction. Computational Biology and Chemistry (Print), v. 59, p. 142-157, 2015.

        The 'APL.zip' file contains all APLs arranged by the 20 standard amino
        acid residues (3 letter code) and eight different secondary structure
        assigned by STRIDE.

                    Amino Acid  3-Letter-Code
                     Alanine         Ala
                     Arginine        Arg
                     Asparagine      Asn
                     Aspartic acid   Asp
                     Cysteine        Cys
                     Glutamic acid   Glu
                     Glutamine       Gln               Secondary Structure    1-Letter-Code
                     Glycine         Gly               Alpha helix                 H
                     Histidine       His               3-10 helix                  G
                     Isoleucine      Ile               PI-helix                    I
                     Leucine         Leu               Extended conformation       E
                     Lysine          Lys               Isolated bridge             B or b
                     Methionine      Met               Turn                        T
                     Phenylalanine   Phe               Coil (none of the above)    C
                     Proline         Pro
                     Serine          Ser
                     Threonine       Thr
                     Tryptophan      Trp
                     Tyrosine        Tyr
                     Valine          Val

        Example:
        File ALA_H_histogram.dat represents the APL for the Alanine (ALA) amino acid
        with alpha-Helix (H) secondary structure.

        In each *_histogram.dat file, we have 4 major groups:
        PHI PSI Frequency {OMEGA}

        When the amino acid residue has CHI angles they are represented as:
        PHI PSI Frequency {OMEGA} {CHI}'s

        Example:
        ALA_H_histogram.dat
        PHI        PSI       Freq.    {OMEGA}
        -94.000000 11.000000 0.000010 {0: [169.7]}
        ASN_G_histogram.dat
        PHI        PSI       Freq.    {OMEGA}       {CHI-1}       {CHI-2}
        -99.000000 11.000000 0.000240 {0: [-176.0]} {0: [-171.3]} {0: [48.1]}

        When the number of occurrences grows, the number of {OMEGA} and {CHI}'s grows as well.
        Example:
        ALA_H_histogram.dat
        PHI        PSI       Freq.    {OMEGA}
        -95.000000 20.000000 0.000019 {0: [172.6, 173.5]}
        ASN_G_histogram.dat
        PHI        PSI        Freq.    {OMEGA}               {CHI-1}             {CHI-2}
        -95.000000 -16.000000 0.000480 {0: [-175.5, -159.3]} {0: [-68.3, -66.1]} {0: [-51.5, -33.8]}

        When the number of occurrences grows but the values of {OMEGA} or {CHI}'s
        are different, sub-groups are made.
        Example:
        ALA_H_histogram.dat
        PHI        PSI       Freq.    {OMEGA}
        -95.000000 -5.000000 0.000019 {0: [-179.1], 1: [172.0]}
        ASN_G_histogram.dat
        PHI        PSI        Freq.    {OMEGA}               {CHI-1}             {CHI-2}
        -95.000000 -11.000000 0.000480 {0: [-176.6, -163.6]} {0: [-79.8, -74.1]} {0: [-23.1], 1: [131.0]}

        PHI        PSI      Freq.    {OMEGA}                   {CHI-1}             {CHI-2}
        -91.000000 8.000000 0.000480 {0: [-179.2], 1: [173.4]} {0: [-76.7, -75.1]} {0: [-13.8, -9.4]}

        PHI        PSI        Freq.    {OMEGA}                   {CHI-1}                   {CHI-2}
        -71.000000 -24.000000 0.000480 {0: [-176.1], 1: [175.3]} {0: [-170.2], 1: [-68.8]} {0: [-36.1], 1: [51.2]}