Energy & Fuels, Vol.19, No.6, 2350-2356, 2005
Identification of adulteration of gasoline applying multivariate data analysis techniques HCA and KNN in chromatographic data
Chemometric data analysis tools were applied to chromatographic data to identify the presence of solvents in gasoline samples from gas stations in Minas Gerais state, Brazil. A training set of 75 samples was formulated by mixing pure gasoline with various concentrations of four complex solvents. The samples were analyzed by GC-MS, and the selected peaks were used in chemometric studies. Hierarchical cluster analysis, HCA, was used to search for sample distribution patterns according to the solvent added. K-nearest neighbor (KNN) was used to create a classification scheme to differentiate pure and mixed samples and to indicate the type of solvent present. HCA revealed a clear clustering tendency of samples containing the same solvent. However, only after the exclusion of lesser variables (peaks) by means of Fisher weights was it possible to separate samples with low solvent concentrations. After optimization of the KNN algorithm, it was possible to classify 88% of the samples of the training set correctly. To check the quality of the model, another group of samples was prepared with certified gasoline and the same solvents. The algorithm classified the great majority of the samples correctly once again.