Quantifying natural organic matter concentration in water from climatological parameters using different machine learning algorithms

Published in H2Open Journal, 2020

Recommended citation: Moradi, S., Agostino, A., Gandomkar, Z., Kim, S., Hamilton, L., Sharma, A., Henderson, R. and Leslie, G., 2020. Quantifying natural organic matter concentration in water from climatological parameters using different machine learning algorithms. h2oj, 3(1), 328-342. https://iwaponline.com/h2open/article/3/1/328/76304

Abstract

The present understanding of how changes in climate conditions will impact the flux of natural organic matter (NOM) from the terrestrial to aquatic environments and thus aquatic dissolved organic carbon (DOC) concentrations is limited. In this study, three machine learning algorithms were used to predict variations in DOC concentrations in an Australian drinking water catchment as a function of climate, catchment and physical water quality data. Four independent variables including precipitation, temperature, leaf area index and turbidity (n = 5,540) were selected from a large dataset to develop and train each machine learning model. The accuracy of the multivariable linear regression, support vector regression (SVR) and Gaussian process regression algorithms with different kernel functions was determined using adjusted R-squared (adj. R2), root-mean-squared error (RMSE) and mean absolute error (MAE). Model accuracy was very sensitive to the time interval used to average climate observations prior to pairing with DOC observations. The SVR model with a quadratic kernel function and a 12-day time interval between climate and water quality observations outperformed the other machine learning algorithms (adj. R2 = 0.71, RMSE = 1.9, MAE = 1.35). The area under the receiver operating characteristic curve method (AUC) confirmed that the SVR model could predict 92% of the elevated DOC observations; however, it was not possible to estimate DOC values at specific sampling sites in the catchment, probably due to the complex local geological and hydrological changes in the sites that directly surround and feed each sampling point. Further research is required to establish potential relationships between climatological data and NOM concentration in other water catchments, especially in the face of a changing climate.