Rapid Protein Global Fold Determination Using Ultrasparse Sampling, High-Dynamic Range Artifact Suppression, and Time-Shared NOESY

Coggins BE; Werner-Allen JW; Yan A; Zhou P

Journal of the American Chemical Society, Vol.134, No.45, 18619-18630, 2012

Coggins BE, Werner-Allen JW, Yan A, Zhou P

In Structural studies of large proteins by NMR, global fold determination plays an increasingly important role in providing a first look at a targets topology and reducing assignment ambiguity in NOESY spectra of fully protonated samples. In this work, we demonstrate the use of ultrasparse sampling, a new data processing algorithm, and a 4-D timeshared NOESY experiment (1) to collect all NOEs in H-2/C-13/N-15-labeled protein samples with selectively protohated amide and ILV methyl groups at high resolution in only four days, and (2) to Calculate global folds from this data using fully automated resonance assignment. The new algorithm, SCRUB, incorporates the CLEAN method for iterative artifact removal but applies an additional level of iteration, permitting real signals to be distinguished from noise and allowing nearly all artifacts generated by real signals to be :eliminated. In simulations,. with 1.2% of the data required by Nyquist 'sampling, SCRUB achieves. a dynamic range over 10000:1 (250X better artifact suppression than CLEAN) and Completely quantitative reproduction of signal intensities, volumes, and line shapes. Applied to 4-D time-shared NOESY data, SCRUB processing dramatically reduces aliasing noise from strong diagonal signals, enabling the identification of weak NOE crosspeaks with intensities 100X less than those Of diagonal signals Nearly all of the expected peaks for interproton distances under 5 angstrom were observed. The practical benefit of this method is demonstrated with structure. calculations for 23 kDa and 29 kDa test proteins using the automated assignment protocol of CYANA, in which unassigned 4-D timeshared,NOESY peak lists produce accurate and well converged global fold ensembles, whereas 3-D peak lists either fail to converge, or produce significantly less accurate folds. The approach presented here succeeds with an order of magnitude less :sampling than required by alternative methods for processing sparse 4-D data