Truncation-PLS for Variable Selection : a simulation study
MetadataShow full item record
- Master's theses (KBM) 
Partial least squares (PLS) is a class of statistical methods for multivariate data analysis. In the PLSR algorithm, regression, reducing dimensions and analyzing correlations among variables are simultaneously performed. In the recent 20 years， as high-dimensional data have emerged in large numbers, PLS has been improved and applied in many fields. In this research, a variable-selection procedure, which is derived from Lenth method, was embedded into PLSR. This algorithm known as Truncation PLS was tried out on several simulated datasets with different designs for the parameters. In order to simulate dataset with different properties, an R package relsim was applied. Another well-known wrapper method Jackknife PLS was also applied to the same datasets as a reference. The purpose of this research is to evaluate these two methods and explore how the properties of dataset will affect the performance of a specific method. After applying these two PLS methods to different datasets, the value of root mean squared error of prediction (RMSEP) for every parameter setting was obtained through cross validation. RMSEP is a statistic indicating the capability of a model for prediction. In addition, by comparing the beforehand known relevant variables in the datasets, the accuracies of variable selection were calculated to evaluate the capability of a method for variable selection. Considering the results, both of these two methods performed well and produced satisfying values of RMSEP and accuracy. However, the truncation PLS showed a better capability of dealing with datasets of high multicollinearity in X-variables and smaller variance in its relevant component. Besides, Truncation-PLS method is more efficient than Jackknife PLS from the aspect of calculation and time consumption.