Journal of Physical Chemistry B, Vol.113, No.16, 5520-5527, 2009
Statistical Theory of Protein Sequence Design by Random Mutation
A self-consistent mean-field based theory is developed to evaluate the site-specific amino acid pair probabilities in a library of sequences to consider the effect of correlated mutations. This approach computes the entire residue-residue substitution pattern by completely characterizing all possible residue-residue combinations consistent with a given protein structure. Design involves screening a library of sequences with different monomer types to estimate the number and composition of sequences as a function of a generalized foldability criterion. The theory is applied to a simple lattice model of proteins. The theoretical results are respectively compared with real sequences obtained from both the lysozyme protein fold and 1789 nonhomologous globular proteins. The pairwise sequence probability profile of the real proteins show a reasonably good match with that of the lattice proteins with a simple coarse-grained potential. The theory may provide a framework for exploring site directed mutagenesis strategies in engineering known proteins and designing them de novo.