Pattern identification and classification in gene expression data using an autoassociative neural network model

Bicciato S; Pandin M; Didone G; Di Bello C

Biotechnology and Bioengineering, Vol.81, No.5, 594-606, 2003

Bicciato S, Pandin M, Didone G, Di Bello C

The application of DNA microarray technology for analysis of gene expression creates enormous opportunities to accelerate the pace in understanding living systems and identification of target genes and pathways for drug development and therapeutic intervention. Parallel monitoring of the expression profiles of thousands of genes seems particularly promising for a deeper understanding of cancer biology and the identification of molecular signatures supporting the histological classification schemes of neoplastic specimens. However, the increasing volume of data generated by microarray experiments poses the challenge of developing equally efficient methods and analysis procedures to extract, interpret, and upgrade the information content of these databases. Herein, a computational procedure for pattern identification, feature extraction, and classification of gene expression data through the analysis of an autoassociative neural network model is described. The identified patterns and features contain critical information about gene-phenotype relationships observed during changes in cell physiology. They represent a rational and dimensionally reduced base for understanding the basic biology of the onset of diseases, defining targets of therapeutic intervention, and developing diagnostic tools for the identification and classification of pathological states. The proposed method has been tested on two different microarray clatasets-Golub's analysis of acute human leukemia [Golub et al. (1999) Science 286:531537], and the human colon adenocarcinoma study presented by Alon et al. [1999; Proc Natl Acad Sci USA 97:10101-10106]. The analysis of the neural network internal structure allows the identification of specific phenotype markers and the extraction of peculiar associations among genes and physiological states. At the same time, the neural network outputs provide assignment to multiple classes, such as different pathological conditions or tissue samples, for previously unseen instances. (C) 2003 Wiley Periodicals, Inc.

Keywords:gene expression data;autoassociative neural networks;pattern recognition;feature extraction;cancer classification