Applied Biochemistry and Biotechnology, Vol.174, No.1, 437-451, 2014
Classification of DNA Minor and Major Grooves Binding Proteins According to the NLSs by Data Analysis Methods
High-mobility group proteins are a superfamily of DNA-binding proteins that bind to the DNA minor groove and bend it, whereas most of the transcription factors such as centromere protein B (CENP-B), octamer (Oct)-1, growth factor independence 1 (Gfi-1), and WRKY bind to the major groove of DNA. Classification of proteins using their DNA-binding features is the aim of this study. Nuclear localization signals play more important roles in entering DNA-binding proteins to nucleus and doing their functions; therefore, they have been considered as a feature which is important for DNA-binding manner in proteins. Nuclear localization signals (NLSs) were predicted by two prediction web servers, and then, their sequence ordered features were extracted by Chou's pseudo amino acid composition (PseAAC) and ProtParam. Multilayer perceptron was used as an artificial neural network for analyzing the features by calculating the correlation coefficient and 30-fold cross-validation. Another used data-analyzing program was principal component analysis of the Minitab software. By calculating the eigenvalues and considering five principal components, the sequence length of NLSs was known as the best feature for classifying DNA-binding proteins. Minimum mean squared error (MSE) (0.1098) and the highest R (2) (0.963) mean that there is a significant difference between the NLS length of the DNA major groove and minor groove binder proteins. Results showed that it is possible to classify DNA major groove and minor groove binder proteins by their NLS sequences as a feature.
Keywords:High-mobility group proteins;Nuclear localization signal;Principal component analysis;Multilayer perceptron;Pseudo amino acid composition