Energy & Fuels, Vol.21, No.6, 3394-3400, 2007
Use of principal component analysis (PCA) and linear discriminant analysis (LDA) in gas chromatographic (GC) data in the investigation of gasoline adulteration
Chemometric data analysis was applied to chromatographic data as a modeling too] to identify the presence of solvents in gasoline obtained at gas stations in the Minas Gerais state. As a training set, 75 samples were formulated by mixing pure gasolines with varying concentrations of four solvents and analyzed by gas chromatography-mass spectrometry. Selected chromatographic peak areas were used in chemometric analysis. Sample distribution patterns were investigated with principal component analysis (PCA). Score graphics revealed a clear sample agglomeration according to the solvents added. Classification models were created with linear discriminant analysis (LDA). Because gasoline presents a very complex profile and the chromatographic data contains too many variables, two approaches were tested to reduce the dimensionality of the data before LDA. Fisher weights were used as an exclusion criterion of lesser variables, and the original variables were substituted for a few principal components obtained from the covariance matrix. To test the quality of the models, a test set with a total of 31 new samples was prepared using certified gasolines mixed with the same solvents used in the training set. Both models indicated the presence of solvent in gasoline effectively, failing only for samples whose solvent concentrations were low. The PCA plus LDA model was more efficient in signaling solvent-free samples, which reduced the number of false positive cases.