Scrutinizing different predictive modeling validation methodologies and data-partitioning strategies: new insights using groundwater modeling case study

Lal, Alvin, Sharan, Ashneel, Sharma, Krishneel, Ram, Arisha, Roy, Dilip Kumar, and Datta, Bithin (2024) Scrutinizing different predictive modeling validation methodologies and data-partitioning strategies: new insights using groundwater modeling case study. Environmental Monitoring and Assessment, 196 (623).

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview
View at Publisher Website: https://doi.org/10.1007/s10661-024-12794...
 
16


Abstract

Groundwater salinity is a critical factor affecting water quality and ecosystem health, with implications for various sectors including agriculture, industry, and public health. Hence, the reliability and accuracy of groundwater salinity predictive models are paramount for effective decision-making in managing groundwater resources. This pioneering study presents the validation of a predictive model aimed at forecasting groundwater salinity levels using three different validation methods and various data partitioning strategies. This study tests three different data validation methodologies with different data-partitioning strategies while developing a group method of data handling (GMDH)-based model for predicting groundwater salinity concentrations in a coastal aquifer system. The three different methods are the hold-out strategy (last and random selection), k-fold cross-validation, and the leave-one-out method. In addition, various combinations of data-partitioning strategies are also used while using these three validation methodologies. The prediction model’s validation results are assessed using various statistical indices such as root mean square error (RMSE), means squared error (MSE), and coefficient of determination (R2). The results indicate that for monitoring wells 1, 2, and 3, the hold-out (random) with 40% data partitioning strategy gave the most accurate predictive model in terms of RMSE statistical indices. Also, the results suggested that the GMDH-based models behave differently with different validation methodologies and data-partitioning strategies giving better salinity predictive capabilities. In general, the results justify that various model validation methodologies and data-partitioning strategies yield different results due to their inherent differences in how they partition the data, assess model performance, and handle sources of bias and variance. Therefore, it is important to use them in conjunction to obtain a comprehensive understanding of the groundwater salinity prediction model's behavior and performance.

Item ID: 82995
Item Type: Article (Research - C1)
ISSN: 1573-2959
Keywords: Groundwater salinity, Machine, learning, GMDH, FEMWATER, Data partitioning strategies
Copyright Information: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Date Deposited: 20 Jun 2024 00:17
FoR Codes: 40 ENGINEERING > 4005 Civil engineering > 400513 Water resources engineering @ 100%
SEO Codes: 19 ENVIRONMENTAL POLICY, CLIMATE CHANGE AND NATURAL HAZARDS > 1901 Adaptation to climate change > 190101 Climate change adaptation measures (excl. ecosystem) @ 20%
18 ENVIRONMENTAL MANAGEMENT > 1803 Fresh, ground and surface water systems and management > 180301 Assessment and management of freshwater ecosystems @ 20%
18 ENVIRONMENTAL MANAGEMENT > 1803 Fresh, ground and surface water systems and management > 180399 Fresh, ground and surface water systems and management not elsewhere classified @ 60%
Downloads: Total: 16
Last 12 Months: 16
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page