Classification and regression tree analysis for molecular descriptor selection and retention prediction in chromatographic quantitative structure–retention relationship studies
Put, R., Perrin, C., Questier, F., Coomans, D., Massart, D.L., and Vander Heyden, Y. (2003) Classification and regression tree analysis for molecular descriptor selection and retention prediction in chromatographic quantitative structure–retention relationship studies. Journal of Chromatography A, 988 (2). pp. 261-276.
PDF (Published Version)
- Published Version
Restricted to Repository staff only
The use of the classification and regression tree (CART) methodology was studied in a quantitative structure–retention relationship (QSRR) context on a data set consisting of the retentions of 83 structurally diverse drugs on a Unisphere PBD column, using isocratic elutions at pH 11.7. The response (dependent variable) in the tree models consisted of the predicted retention factor (log kw) of the solutes, while a set of 266 molecular descriptors was used as explanatory variables in the tree building. Molecular descriptors related to the hydrophobicity (log P and Hy) and the size (TPC) of the molecules were selected out of these 266 descriptors in order to describe and predict retention. Besides the above mentioned, CART was also able to select hydrogen-bonding and molecular complexity descriptors. Since these variables are expected from QSRR knowledge, it demonstrates the potential of CART as a methodology to understand retention in chromatographic systems. The potential of CART to predict retention and thus occasionally to select an appropriate system for a given mixture was also evaluated. Reasonably good prediction, i.e. only 9% serious misclassification, was observed. Moreover, some of the misclassifications probably are inherent to the data set applied.
|Item Type:||Article (Refereed Research - C1)|
|Keywords:||molecular descriptors; regression alaysis; retention prediction; structure-retention relationships|
|Date Deposited:||16 Jun 2009 06:52|
|FoR Codes:||01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%|
|SEO Codes:||92 HEALTH > 9299 Other Health > 929999 Health not elsewhere classified @ 100%|
|Citation Count from Web of Science||