Wavelet basis selection for spectroscopic data analysis

Donald, David Andrew (2012) Wavelet basis selection for spectroscopic data analysis. PhD thesis, James Cook University.

PDF (Thesis) - Submitted Version
Download (3MB)
View at Publisher Website: https://doi.org/10.25903/knk9-ne60


The discrete wavelet transform using adaptive wavelet bases were investigated in classification, regression and experimental design applications for spectroscopic data. Adaptive wavelets have been used previously in near infrared spectroscopy fields for classification and regression; however methods to select the parameters required in the adaptive wavelet algorithm have been largely influenced by human interaction. Methods are developed within this thesis to select parameters for adaptive wavelets along with investigating the hypothesis of using multiple wavelet bases to improve the predictability of classification and regression models.

Use of the adaptive discrete wavelet transform (ADWT) is illustrated using a repeated measures experiment. Near infrared (NIR) spectra of wine grape homogenates, from the Australian viticulture industry, underwent feature extraction via the ADWT and then modelled using penalised discriminate analysis, random forests and multiple adaptive regression splines. The correct classification rates of all three methods were substantially improved when the ADWT was applied. Scores from the ADWT penalised discriminate analysis (PDA) were analysed via multivariate analysis of variance (MANOVA) where it is reported that all main and interaction effects were significant. A bi-plot of the PDA scores illustrated the ease of which the ADWT extracted useful features from the spectra which were pertinent to the experimental design.

A method of ADWT parameter selection was derived using the Bayes' information criteria (BIC) and demonstrated in an unsupervised classification problem. Using the BIC to select ADWT parameters removed the need to for human interaction to select good, optimised, adaptive wavelets. This outcome highlighted an advantage over standard wavelet types, which gave similar unsupervised classification performances, where adaptive wavelets only need to span a relatively small set of parameters to give good models while a prohibitively large number of standard wavelet types need to be trialled.

Investigation of using multiple wavelet transforms to improve model performance - a new hypothesis in the field of chemometrics – was demonstrated in supervised classification and regression applications. In the classification example, SELDI-TOF mass spectra from a cancer study were analysed by pre-processing the spectra with a variety of standard wavelet types prior to variable elimination via a t-static and random forest approach. The retained variables were subsequently model using Treeboost where the specificity and sensitivity of the modelling process was improved by using multiple standard wavelet types compared to model using only one wavelet type alone. Models derived from wavelet processing were superior to models without preprocessing.

Further evidence supporting the multiple wavelet feature extraction hypothesis was gained in the regression application. Using a publically available and well documented NIR dataset, a Bayes Metropolis regression was modified to incorporate multiple wavelet transforms by using constrained stacking rather than Bayes model averaging as the model ensemble method. Multiple adaptive wavelets and multiple standard wavelets were trialled with the multiple adaptive wavelet approach resulting in a superior predictive regression model when compared to: all single standard wavelet models, single adaptive wavelet models, multiple wavelet standard wavelet models and models cited previously in literature for the same data set.

Methods for using adaptive wavelets, both multiple and singular wavelet bases, are outlined in this thesis with the general conclusion that the modelling process of NIR data (or juxta-positional data) can be substantially improved by the use of these wavelet transforms.

Item ID: 29969
Item Type: Thesis (PhD)
Keywords: wavelet; constraine regression; ensemble; Bayes; chemometrics; modelling processes
Date Deposited: 30 Oct 2013 01:37
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 34%
03 CHEMICAL SCIENCES > 0301 Analytical Chemistry > 030106 Quality Assurance, Chemometrics, Traceability and Metrological Chemistry @ 33%
08 INFORMATION AND COMPUTING SCIENCES > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining @ 33%
SEO Codes: 91 ECONOMIC FRAMEWORK > 9105 Measurement Standards and Calibration Services > 910503 Manufacturing Standards and Calibrations @ 50%
82 PLANT PRODUCTION AND PLANT PRIMARY PRODUCTS > 8203 Industrial Crops > 820306 Wine Grapes @ 25%
82 PLANT PRODUCTION AND PLANT PRIMARY PRODUCTS > 8205 Winter Grains and Oilseeds > 820507 Wheat @ 25%
Downloads: Total: 330
Last 12 Months: 9
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page