Journal of Physical Chemistry B, Vol.119, No.34, 11136-11145, 2015
From Ramachandran Maps to Tertiary Structures of Proteins
Sequence to structure of proteins is an unsolved problem. A possible coarse grained resolution to this entails specification of all the torsional (Phi, Psi) angles along the backbone of the polypeptide chain. The Ramachandran map quite elegantly depicts the allowed conformational (Phi, Psi)space of proteins which is still very large for the purposes of accurate structure generation. We have divided the allowed (Phi, Psi) space in Ramachandran maps into 27 distinct conformations sufficient to regenerate a structure to within 5 angstrom from the native, at least for small proteins, thus reducing the structure prediction problem to a specification of an alphanumeric string, i.e., the amino acid sequence together with one of the 27 conformations preferred by each amino acid residue. This still theoretically results in 27(n) conformations for a protein comprising "n" amino acids. We then investigated the spatial correlations at the two-residue (dipeptide) and three-residue (tripeptide) levels in what may be described as higher order Ramachandran maps, with the premise that the allowed conformational space starts to shrink as we introduce neighborhood effects. We found, for instance, for a tripeptide which potentially can exist in any of the 27(3) "allowed" conformations, three-fourths of these conformations are redundant to the 95% confidence level, suggesting sequence context dependent preferred conformations. We then created a look-up table of preferred conformations at the tripeptide level and correlated them with energetically favorable conformations. We found in particular that Boltzmann probabilities calculated from van der Waals energies for each conformation of tripeptides correlate well with the observed populations in the structural database (the average correlation coefficient is similar to 0.8).An alpha-numeric string and hence the tertiary structure can be generated for any sequence from the look-up table within minutes on a single processor and to a higher level of accuracy if secondary structure can be specified. We tested the methodology on 100 small proteins, and in 90% of the cases, a structure within 5 angstrom is recovered. We thus believe that the method presented here provides the missing link between Ramachandran maps and tertiary structures of proteins. A Web server to convert a tertiary structure to an alphanumeric string and to predict the tertiary structure from the sequence of a protein using the above methodology is created and made freely accessible.