Nitrate concentration modeling in Water Resources using Support Machine Regression and metaheuristic optimization algorithm Harris Hawks

Document Type : Original Article

Authors

1 Department of Civil Engineering, Taft Branch, Islamic Azad University, Taft, Iran.

2 Department of Water Engineering, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran.

Abstract

Objective: The objective of this study is to develop a machine learning model for simulating the nitrate concentration. Simulating and predicting nitrate concentration has always been one of the most important issues in the field of water resources management.
 
Method: In this research, after collecting the data, the nitrate concentration data are first clustered using the JNB, then, an SVR model is used for each cluster. The SFFS algorithm is used to select the input variables for the model simultaneously with the training process of this model, then, based on the results of these three models, the average value of the error indices for the training stage (RMSE = 0.2387, MAE = 0.2236, R^ 2=0.9874) and test (RMSE = 0.2474, MAE = 0.2350, R^2=0.9841) are calculated. In this case, the trial and error procedure is used for this work. In the next step, the HHO algorithm is used to determine the optimal value of the parameters of the kernel functions. In this case, the values of R2, MAE and RMSE for the training phase are 0.9961, 0.1169, and 0.1502, respectively, and their values for the test phase are 0.9845, 0.1308, and 0.9978, respectively.
 
Results: Based on the results of this study, firstly, the use of HHO to predict nitrate concentration can significantly increase the accuracy of the SVR model, secondly, the use of different machine learning models together can play an effective role in increasing the accuracy of regression models such as SVR. The results of this study show that the use of data clustering before developing machine learning models can improve the accuracy of nitrate concentration prediction. The HHO-SVR hybrid model has performed better in different clusters with proper selection of kernel function and has provided optimal results. Also, this study emphasizes that the different statistical characteristics of each cluster have a significant effect on the performance of the models. Therefore, to more accurately predict nitrate concentration in groundwater, it is recommended to first cluster the data and then develop a specific model for each cluster.
 
Conclusions: The results of this study show that the use of data clustering before developing machine learning models can improve the accuracy of nitrate concentration prediction. The HHO-SVR hybrid model has performed better in different clusters with proper selection of kernel function and has provided optimal results. Also, this study emphasizes that the different statistical characteristics of each cluster have a significant effect on the performance of the models. Therefore, to more accurately predict nitrate concentration in groundwater, it is recommended to first cluster the data and then develop a specific model for each cluster.

Keywords

Main Subjects


Adeloju, S.B., Khan, S., & Patti, A.F. (2021). Arsenic Contamination of Groundwater and Its Implications for Drinking Water Quality and Human Health in Under-Developed Countries and Remote Communities—A Review. Appl. Sci. 11, 1926. https://doi.org/10.3390/app11041926
Alabool, H.M., Alarabiat, D., Abualigah, L., Heidari, A.A. (2021). Harris hawks optimization: a comprehensive review of recent variants and applications. Neural Comput & Applic, 33, 8939–8980. https://doi.org/10.1007/s00521-021-05720-5
Amiri, S., Rajabi, A., Shabanlou, S., Yosefvand, F., & Izadbakhsh, M.A. (2023). Prediction of groundwater level variations using deep learning methods and GMS numerical model. Earth Sci Inform, 16, 3227–3241. https://doi.org/10.1007/s12145-023-01052-1
Azizi, E., Yosefvand, F., Yaghoubi, B., Izadbakhsh, M.A., & Shabanlou, S. (2023) Modelling and prediction of groundwater level using wavelet transform and machine learning methods: A case study for the Sahneh Plain, Iran. Irrigation and Drainage, 72(3), 747–762. https://doi.org/10.1002/ird.2794
Bouchair, A., Yagoubi, B., & Makhlouf, S.A. (2022). A Cluster-Oriented Policy for Virtual Network Embedding in SDN-Enabled Distributed Cloud. International Journal of Computing and Digital Systems, 11(1), 365-353. https://dx.doi.org/10.12785/ijcds/120129
Chai, T., & Draxler, R.R. (2014) Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)?—Arguments against Avoiding RMSE in the Literature. Geoscientific Model Development, 7, 1247-1250. https://dx.doi.org/10.5194/gmd-7-1247-2014
Chen, J.,  Xin, B., Peng, Zh., Dou, L., & Zhang, J. (2009). Optimal contraction theorem for exploration–exploitation tradeoff in search and optimization. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 39(3), 680-691. https://dx.doi.org/10.1109/TSMCA.2009.2012436
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20, 273-297. https://dx.doi.org/10.1007/BF00994018
Debnath, R., & Takahashi, H. (2004). Kernel selection for the support vector machine. IEICE transactions on information and systems, 87(12), 2903-2904.                            https://www.researchgate.net/publication/220237100_Kernel_selection_for_the_support_vector_machine
Deng, W., Yao, R., Zhao, H.,Yang, X., &Li, G. (2019). A novel intelligent diagnosis method using optimal LS-SVM with improved PSO algorithm. Soft Comput, 23, 2445–2462. https://doi.org/10.1007/s00500-017-2940-9
Di Bucchianico, A. (2008). Coefficient of determination (R2). Encyclopedia of statistics in quality and reliability, 1, Wiley Publicatins. https://doi.org/10.1002/9780470061572.eqr173
El Amri, A., M'nassri, S., Nasri, N., Nsir, H., & Majdoub, R. (2022). Nitrate concentration analysis and prediction in a shallow aquifer in central-eastern Tunisia using artificial neural network and time series modelling. Environmental Science and Pollution Research, 29(28), 43300-43318. https://doi.org/10.1007/s11356-021-18174-y
Elzain, H. E., Chung, S.Y., Park, K.H., Senapathi, V., Sekar, S., Sabarathinam, Ch., Hassan, M. (2021). ANFIS-MOA models for the assessment of groundwater contamination vulnerability in a nitrate contaminated area. Journal of Environmental Management, 286, 112162. https://doi.org/10.1016/j.jenvman.2021.112162
Esmaeili, F., Shabanlou, S., & Saadat, M. (2021). A wavelet-outlier robust extreme learning machine for rainfall forecasting in Ardabil City, Iran. Earth Sci Inform, 14, 2087–2100. https://doi.org/10.1007/s12145-021-00681-8
Fallahi, M.M., Shabanlou, S., Rajabi, A., Yosefvand, F., & IzadBakhsh, M.A. (2023). Effects of climate change on groundwater level variations affected by uncertainty (case study: Razan aquifer). Appl Water Sci, 13(143). https://doi.org/10.1007/s13201-023-01949-8
Fried, J.J. (1975) Groundwater pollution. Elsevier Scientific Publishing Company, Amsterdam. https://www.scirp.org/reference/referencespapers?referenceid=102612
Hearst, M. A., Dumais, S.T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28. https://Doi:10.1109/5254.708428
Heidari, A. A., Mirjalili, S.A., Faris, H., Aljarah, I., Mafarja, M., & Chen, H. (2019). Harris hawks optimization: Algorithm and applications. Future generation computer systems, 97, 849-872. https://doi.org/10.1016/j.future.2019.02.028
Lahjouj, A., Hmaidi, A.E. Bouhafa, K., & Boufala. M. (2020). Mapping specific groundwater vulnerability to nitrate using random forest: case of Sais basin, Morocco. Modeling Earth Systems and Environment, 6(3), 1451-1466. https://link.springer.com/article/10.1007/s40808-020-00761-6
Liang, Z., & Zhang, L. (2021). Support vector machines with the ε-insensitive pinball loss function for uncertain data classification. Neurocomputing, 457, 117-127. https://doi.org/10.1016/j.neucom.2021.06.044
Mazraeh, A., Bagherifar, M., Shabanlou, S., & Ekhlasmand, R. (2023). A hybrid machine learning model for modeling nitrate concentration in water sources. Water, Air, & Soil Pollution, 234(11), 721. https://doi.org/10.1007/s11270-023-06745-3
Mazraeh, A., Bagherifar, M., Shabanlou, S., & Ekhlasmand, R. (2024). A novel committee-based framework for modeling groundwater level fluctuations: A combination of mathematical and machine learning models using the weighted multi-model ensemble mean algorithm. Groundwater for Sustainable Development, 24, 101062.                            https://doi.org/10.1016/j.gsd.2023.101062
Noble, W. (2006). What is a support vector machine? Nat Biotechnol, 24, 1565–1567. https://doi.org/10.1038/nbt1206-1565
Panahi, J., Mastouri, R., & Shabanlou, S. (2022). Insights into enhanced machine learning techniques for surface water quantity and quality prediction based on data pre-processing algorithms. Journal of Hydroinformatics, 24(4), 875–897. https://doi.org/10.2166/hydro.2022.022
Prieto, A., Prieto, B., Ortigosa, E.M., Ros, E., Pelayo, F., Ortega, J., & Rojas, I. (2016). Neural networks: An overview of early research, current frameworks and new challenges. Neurocomputing, 214, 242-268. https://doi.org/10.1016/j.neucom.2016.06.014
Rizeei, H. M., Pradhan, B., Saharkhiz, A., & Lee, S. (2019). Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique. Journal of Hydrology, 579, 124172. https://dx.doi.org/10.1016/j.jhydrol.2019.124172
Shabanlou, S. (2018). Improvement of extreme learning machine using self-adaptive evolutionary algorithm for estimating discharge capacity of sharp-crested weirs located on the end of circular channels. Flow Measurement and Instrumentation, 59, 63-71. https://doi.org/10.1016/j.flowmeasinst.2017.11.003
Vapnik, V., & Chervonenkis,  A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2), 264-280. https://doi.org/10.1137/1116025
World Health Organization. (2022). Guidelines for drinking‑water quality: incorporating the first and second addenda, WHO publications, Geneva,  Switzerland.                         https://www.who.int/publications/i/item/9789240045064
Wu, Q., Zhang, T., Sun, H., & Kannan, K. (2010). Perchlorate in tap water, groundwater, surface waters, and bottled water from China and its association with other inorganic anions and with disinfection byproducts. Archives of environmental contamination and toxicology , 58(3), 543-550. http://dx.doi.org/10.1007/s00244-010-9485-6
Zhang, Q., Qian, H., Xu, P., & Li, W. (2021). Effect of hydrogeological conditions on groundwater nitrate pollution and human health risk assessment of nitrate in Jiaokou Irrigation District. Journal of Cleaner Production, 298, 126783.                            http://dx.doi.org/10.1016/j.jclepro.2021.126783