Polymer, Vol.46, No.17, 6461-6473, 2005
Stochastic molecular descriptors for polymers. 3. Markov electrostatic moments as polymer 2D-folding descriptors: RNA-QSAR for mycobacterial promoters
Stochastic molecular descriptors have been applied in QSAR studies on small molecules and polymers (including our series in Polymer) [H. Gonzalez-Diaz, A.R. Ramos de, R.R. Molina, Bioinformatics 19 (2003) 2079-2087; H. Gonzalez-Diaz, R.R. Molina, E. Uriarte, Bioorg Med Chem Lett 14 (2004) 4691-4695; H. Gonzalez-Diaz, R.R. Molina, E. Uriarte, Polymer (1) 45 (2004) 3845-3853; H. Gonzalez-Diaz, E. Olazabal, N. Castanedo, S.I. Hernadez, A. Morales, H.S. Serrano, et al., J Mol Mod 8 (2002) 237-245; H. Gonzalez-Diaz, E. Uriarte, A.R. Ramos de, Bioorg Med Chem 13 (2005) 323-331; Polymer (11) (2005) accepted. [40,41,42,44,48]]. However, QSAR studies concerning multiple polymeric RNA molecules, which are among the most important biopolymers, have not been reported to date. The work described here attempts to extend this research by introducing for the first time stochastic moments for the secondary structure of polymeric RNA molecules. These moments are subsequently used to seek a QSAR model that classifies a polymeric DNA sequence as a mycobacterial promoter (mps) or not on the basis of its putative RNA secondary polymeric structure. The model correctly classified 83.7% of 132 mps and 98.89% of 274 control sequences in training. Similar results were obtained in four cross validation experiments using a re-substitution technique that showed the model to have an average 93.9% of robustness and 94.1% of predictability for the 407 sequences used. The present model (mps = 14.2(1)O(0) - 13.4(2)O(2) - 1.1), which has only two variables, compares very favorably in terms of complexity with other models previously reported by Kalate et al.-these authors used a non-linear artificial neural network and a large parameter space [R.N. Kalate, S.S. Tambe, B.D. Kulkarni, Comput Biol Chem 27 (2003) 555-564. [82]]. The model can also be back-projected to derive maps showing the influence of sub-structural RNA patterns on the biological activity of the polymer as a whole. (c) 2005 Elsevier Ltd. All rights reserved.