A comparison of data mining algorithms for improving NIR models of cane quality measures

Sexton, J., Everingham, Y., and Donald, D. (2017) A comparison of data mining algorithms for improving NIR models of cane quality measures. In: Proceedings of the 39th Annual Conference of the Australian Society of Sugar Cane Technologists (39) pp. 557-567. From: ASSCT 2017: 39th Annual Conference of the Australian Society of Sugar Cane Technologists, 3-5 May 2017, Cairns, QLD, Australia.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://www.assct.com.au/


NEAR INFRARED (NIR) analysis systems are used to estimate cane quality measures such as brix and pol in juice and apparent purity. Within the Australian sugarcane industry, partial least squares regression (PLSR) has been used to build NIR models of cane quality measures in the lab, on-line and in the field. PLSR relies on the linear relationship between sample constituents and electromagnetic absorption at NIR wavelengths. In practice, this linear relationship can often break down resulting in relationships that are more complex. Recently, machine learning techniques have become popular for their skill with complex data and ability to produce robust calibrations. The objective of this paper was to compare PLSR with the machine learning technique support vector regression (SVR). The two techniques were used to estimate three cane quality parameters; brix in juice, pol in juice and (apparent) purity. Results from the PLSR models were consistent with previous industry studies and justified the use of PLSR as a baseline against which to compare approaches that are more sophisticated. The SVR models slightly reduced prediction error compared with PLSR models for brix and pol in juice, but slightly increased prediction error for purity. The marginal improvement in model skill using SVR was not considered sufficient to recommend SVR over PLSR, given the relative ease of use and interpretability of PLSR. However, this study showed that certain samples were difficult to model with either approach. Future research should consider machine learning algorithms as well as techniques to identify difficult to model samples. This will allow researchers to seek greater improvement in our ability to utilise NIR modelling techniques.

Item ID: 48855
Item Type: Conference Item (Research - E1)
ISSN: 0726-0822
Keywords: near infrared, quality, machine learning, SVM, PLS
Related URLs:
Additional Information:

A version of this publication was included as Chapter 3 of the following [PhD] thesis: Sexton, Justin David (2020) Statistical data mining algorithms for optimising analysis of spectroscopic data from on-line NIR mill systems. PhD thesis, James Cook University, which is available Open Access in ResearchOnline@JCU. Please see the Related URLs for access.

Funders: Sugar Research Australia (SRA), James Cook University
Projects and Grants: SRA Scholarship (2014/109)
Date Deposited: 11 May 2017 00:06
FoR Codes: 49 MATHEMATICAL SCIENCES > 4905 Statistics > 490501 Applied statistics @ 50%
30 AGRICULTURAL, VETERINARY AND FOOD SCIENCES > 3002 Agriculture, land and farm management > 300299 Agriculture, land and farm management not elsewhere classified @ 50%
SEO Codes: 82 PLANT PRODUCTION AND PLANT PRIMARY PRODUCTS > 8203 Industrial Crops > 820304 Sugar @ 100%
Downloads: Total: 6
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page