A comparison of data mining algorithms for improving NIR models of cane quality measures

Sexton, J., Everingham, Y., and Donald, D. (2017) A comparison of data mining algorithms for improving NIR models of cane quality measures. In: Proceedings of the 39th Annual Conference of the Australian Society of Sugar Cane Technologists (39) pp. 557-567. From: ASSCT 2017: 39th Annual Conference of the Australian Society of Sugar Cane Technologists, 3-5 May 2017, Cairns, QLD, Australia.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://www.assct.com.au/


NEAR INFRARED (NIR) analysis systems are used to estimate cane quality measures such as brix and pol in juice and apparent purity. Within the Australian sugarcane industry, partial least squares regression (PLSR) has been used to build NIR models of cane quality measures in the lab, on-line and in the field. PLSR relies on the linear relationship between sample constituents and electromagnetic absorption at NIR wavelengths. In practice, this linear relationship can often break down resulting in relationships that are more complex. Recently, machine learning techniques have become popular for their skill with complex data and ability to produce robust calibrations. The objective of this paper was to compare PLSR with the machine learning technique support vector regression (SVR). The two techniques were used to estimate three cane quality parameters; brix in juice, pol in juice and (apparent) purity. Results from the PLSR models were consistent with previous industry studies and justified the use of PLSR as a baseline against which to compare approaches that are more sophisticated. The SVR models slightly reduced prediction error compared with PLSR models for brix and pol in juice, but slightly increased prediction error for purity. The marginal improvement in model skill using SVR was not considered sufficient to recommend SVR over PLSR, given the relative ease of use and interpretability of PLSR. However, this study showed that certain samples were difficult to model with either approach. Future research should consider machine learning algorithms as well as techniques to identify difficult to model samples. This will allow researchers to seek greater improvement in our ability to utilise NIR modelling techniques.

Item ID: 48855
Item Type: Conference Item (Research - E1)
ISSN: 0726-0822
Keywords: near infrared, quality, machine learning, SVM, PLS
Related URLs:
Funders: Sugar Research Australia (SRA), James Cook University
Projects and Grants: SRA Scholarship (2014/109)
Date Deposited: 11 May 2017 00:06
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 50%
07 AGRICULTURAL AND VETERINARY SCIENCES > 0701 Agriculture, Land and Farm Management > 070199 Agriculture, Land and Farm Management not elsewhere classified @ 50%
SEO Codes: 82 PLANT PRODUCTION AND PLANT PRIMARY PRODUCTS > 8203 Industrial Crops > 820304 Sugar @ 100%
Downloads: Total: 6
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page