Multivariate Classification Methods for Spectroscopic Data with Multiple Class Structure
MetadataVis full innførsel
- Master's theses (IMT) 
The classification of microorganisms is an important task in many fields such as food production, medicine, biotechnology. Fourier transform infrared (FTIR) spectroscopy can provide comprehensive biochemical information about microorganisms via spectra. To extract the information, an appropriate chemometrics technique is needed to treat the data and get reliable classification results. From the very beginning it was known that utilizing hierarchical structure of the data is an advantage but might be a tedious and time consuming procedure. In this study we evaluate the best way for setting up a classifications scheme to identify microorganisms by FTIR spectroscopy. In this context our task was to classify ten different genera of food spoilage yeasts, which were cultivated in five different media and subsequently analyzed by (FTIR) spectroscopy. The methods, which were used in this study, are: principal component analysis (PCA), partial least squares discriminant analysis (PLSDA), Fisher liner discriminant analysis (FLDA), PLSDA and FLDA combined with HCA, PLSDA and FLDA combined with a one-versus-all (OVA) approach, PLSDA and FLDA combined with a one-versus-one (OVO) approach, and random forest (RF). The last method showed the best performance among the all methods we used. A validation success rate (SR) achieved by RF is equal to 97.5% for one of the media. The other successful methods are PLSDA combined with HCA and PLSDA applied directly to ten groups with SRs equal to 96.3% and 94.4%, respectively. Our results suggest that RF can be used for rapid identification of microorganisms even without utilizing a hierarchical structure in the data and can perform very accurately. Moreover, when using information from other blocks of data representing different cultivation media, the performance of RF was improved.