A comparison of non-linear regression methods for improved on-line near infrared spectroscopic analysis of a sugarcane quality measure

Sexton, Justin, Everingham, Yvette, Donald, David, Staunton, Steve, and White, Ronald (2018) A comparison of non-linear regression methods for improved on-line near infrared spectroscopic analysis of a sugarcane quality measure. Journal of Near Infrared Spectroscopy, 26 (5). pp. 297-310.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1177/0967033518802448


On-line near infrared (NIR) spectroscopic analysis systems play an important role in assessing the quality of sugarcane in Australia. As quality measures are used to calculate the payment made to growers, it is imperative that NIR models are both accurate and robust. Machine learning and non-linear modelling approaches have been explored as methods for developing improved NIR models in a variety of industrial settings, yet there has been little research into their application to cane quality measures. The objective of this paper was to compare chemometric models of commercial cane sugar (CCS) based on four calibration techniques. CCS was estimated using partial least squares regression (PLS), support vector regression (SVR), artificial neural networks (ANNs) and gradient boosted trees (GBTs). Model performance was assessed on an independent validation data set using root mean square error of prediction (RMSEP) and r(2) values. SVR (RMSEP = 0.37%; r(2) = 0.92) and ANN (RMSEP= 0.36%; r(2) = 0.93) performed similarly to PLS (RMSEP = 0.37%; r(2) = 0.92) on the validation data set, while GBT exhibited a much lower skill (RMSEP = 0.51%; r(2) = 0.85). Analysis of important wavelengths in each model showed that PLS regression, SVR and ANN techniques emphasized the importance of similar spectral regions. Future research should consider testing model robustness over seasons and/or regions. Comparisons of chemometric models should consider reporting variable importance as a way of understanding how models use spectral information.

Item ID: 56063
Item Type: Article (Research - C1)
ISSN: 1751-6552
Keywords: commercial cane sugar, sugarcane, gradient boosting, neural networks, cane analysis system, variable importance
Related URLs:
Copyright Information: © The Author(s) 2018.
Additional Information:

A version of this publication was included as Chapter 3 of the following PhD thesis: Sexton, Justin David (2020) Statistical data mining algorithms for optimising analysis of spectroscopic data from on-line NIR mill systems. PhD thesis, James Cook University, which is available Open Access in ResearchOnline@JCU. Please see the Related URLs for access.

Funders: Sugar Research Australia (SRA), James Cook University (JCU)
Date Deposited: 07 Nov 2018 08:40
FoR Codes: 49 MATHEMATICAL SCIENCES > 4905 Statistics > 490501 Applied statistics @ 50%
30 AGRICULTURAL, VETERINARY AND FOOD SCIENCES > 3002 Agriculture, land and farm management > 300299 Agriculture, land and farm management not elsewhere classified @ 50%
SEO Codes: 82 PLANT PRODUCTION AND PLANT PRIMARY PRODUCTS > 8203 Industrial Crops > 820304 Sugar @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page