The effect of preprocessing and reducing the input dimensions of the flow prediction model on optimized support vector regression by genetic algorithm

Document Type : Original Article

Authors

1 Department of Water Science and Engineering, Faculty of Agriculture, Razi University, Kermanshah, Iran.

2 M. Sc graduated of Water Resources, Department of Water Science and Engineering, Faculty of Agriculture, Razi University, Kermanshah, Iran.

3 Assistant professor, Department of Water Science and Engineering, Faculty of Agriculture, Razi University, Kermanshah,, Iran.

Abstract

Accurate and correct prediction of surface water flow plays an important role in the principled planning and proper management of water resources. To achieve this, various prediction models using mathematical relationships based on hydrological information can be used. In this study, monthly discharge of Polechehr hydrometric station for a 48-year has been used (Sep. 2018-October 1971).
Two main scenarios with and without pre-processing (standardization), two time series or non-time series approaches were considered. Also, two cases with and without feature selection have been considered by a random forest algorithm.
In all cases, 80% and 20% of the data are intended for model training and testing, respectively. The entire coding process is done in the Python programming platform. Genetic algorithm was used to optimize the parameters of the support vector regression method. The results showed that standardization, non-time series approach, reducing the dimensionality of the model input to select and also using genetic algorithm to optimize the parameters of the support vector regression model have the greatest effect on prediction accuracy, respectively. So that the highest coefficient of explanation for training data is 0.85 and for testing is equal to 0.6.
If standardization is not applying on the data, adopting a time series approach and feature selection will lead to better results in predicting the SVR model, and also the use of genetic algorithm optimizer compared to the simple model will have a significant effect on improving results.

Keywords

Main Subjects


Ahmadi, F., Radmanesh, F., & Mirabbasi, R. (2016). Comparison between Genetic Programming and Support Vector Machine Methods for Daily River Flow Forecasting (Case Study: Barandoozchay River). Water and Soil, 28(6), 1162-1171.  https://dx.doi.org/10.22067/jsw.v0i0.32406 [In Persian]
Ahmadi, F. (2020). Evaluation of Support Vector Machine and Adaptive Neuro-Fuzzy Inference System Performance in Prediction of Monthly River Flow (Case Study: Nazlu chai and Sezar Rivers). Iranian Journal of Soil and Water Research, 51(3), 673-686.                                       https://dx.doi.org/10.22059/ijswr.2019.290994.668356 [In Persian]
Alinezhadi, M., Mousavi, S.F., & Hosseini, K. (2021) Comparison of Gene Expression Programming (GEP) and Parametric and Non-Parametric Regression Methods in the Prediction of the Mean Daily Discharge of Karun River (A case Study: Mollasani Hydrometric Station). JWSS, 25(1). 43-62. http://jstnar.iut.ac.ir/article-1-3999-fa.html [In Persian]
Banzhaf, W., Nordin, P., Keller, R.E., & Francone, F.D. (1998). Genetic Programming. Morgan Kaufmann, San Francisco, CA. https://dx.doi.org/10.1007/978-3-642-14344-1_10
Breiman, L., (2001). Random forests. Machine Learning, 45(1), 5–32.                     https://link.springer.com/article/10.1023/A:1010933404324
Chen, Q., Dai, G., & Liu, H. (2002). Volume of fluid model for turbulence numerical simulation of stepped spillway overflow. Journal of Hydraulic Engineering, 128(7), 683-688.  https://doi.org/10.1061/(ASCE)0733-9429(2002)128:7(683)
Dehghani, R., Yonesi, H., & Torabi Poudeh, H. (2017). Comparing the performance of Support Vector Machines, Gene Expression Programming and Bayesian networks in predicting river flow (Case study: Kashkan River). Journal of Water and Soil Conservation, 24(4), 161-177. https://dx.doi.org/10.22069/jwfst.2017.12398.2701 [In Persian]
Goldberg, D.E. (2002). The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Addison-Wesley. Reading. MA, 2002 edition.
Foroudi Khowr, A., Saneie, M., & Azhdari Moghaddam, M. (2017). Comparison of Adaptive Neuro Fuzzy Inference System (ANFIS) and Support Vector Machines(SVM) for discharge capacity prediction of a sharp-crested weirs. Iranian Journal of Irrigation & Drainage, 11(5), 772-784.  http://idj.iaid.ir/article_59839.html [In Persian]
Jalali, M.N., Ghezelbash Naderi, B., Guili, A., & Bahrami, J. (2017). Select the best time series model to predict the river inlet flow. Third National Conference on Semi-Arid Hydrology with a focus on water, humans, nature, Sanandaj. https://civilica.com/doc/903801 [In Persian]
Jeirani, F., Morid, S., & Moridi, A. (2011). Effect of spatial accuracy of digital elevation map on calibration and estimation of runoff and sediment using CUP-SWAT model. Journal of Soil Conservation Research, 18(4), 81-102. https://dorl.net/dor/20.1001.1.23222069.1390.18.4.5.5 [In Persian]
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press. Ann Arbor.
Heshmati, S., & Hafez Parast Movadat, M. (2018). Flow prediction using time series analysis with SARIMA model in drought conditions Case study: Pir Salman flowmeter station in Jamishan river. Rainwater catchment systems, 6(1), 73-82. http://jircsa.ir/article-1-309-fa.html [In Persian]
Koza, J., (1992). Genetic programming on the programming of computers by natural selection. MIT Press, Cambridge, MA.
Mantero, P., Moser, G., & Serpico, S.B. (2005). Partially supervised classification of remote sensing images through SVM-based probability density estimation. IEEE Transactions on Geoscience and Remote Sensing, 43(3), 559-570.                                            https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.473.467&rep=rep1&type=pdf
Moeeni, H., Bonakdari, H., Fatemi, S.A., & Ebtehaj, I. (2016). Modeling the Monthly Inflow to Jamishan Dam Reservoir Using Autoregressive Integrated Moving Average and Adaptive Neuro- Fuzzy Inference System Models. Water and Soil Science, 26(1-2), 273-285.                  
     https://water- soil.tabrizu.ac.ir/article_4894.html [In Persian]
Nazeri Tahroudi, M., Hashemi, S., Ahmadi, F., & Nazeri Tahroudi, Z. (2016). Evaluation the Accuracy of ANFIS, SVM and GP Models to Modeling the River Flow Discharge. Iranian journal of Ecohydrology, 3(3), 347-361. https://dx.doi.org/10.22059/ije.2016.60024 [In Persian]
Nozari, H., & tavakoli, F. (2018). Stream Flow Prediction Using Support Vector Machine Based on Discharge and Precipitation Time series on Upstream Stations (Case Study: Taleh Zang Hydrometric Station). Journal of Modeling in Engineering, 16(54), 95-104.                            https://dx.doi.org/10.22075/jme.2017.11363.1112 [In Persian]
Samadi, M., & Fathabadi, A. (2019). Application of Time Series, ANN, and SVM Models in Forecasting the Gorgan Dam Inflow Rate. Environment and Water Engineering, 4(4), 299-309. https://dx.doi.org/10.22034/jewe.2018.128256.1256 [In Persian]
Samadi, M., Bahremand, A., & Fathabadi, A. (2019). The Boustan Dam monthly inflow forecasting using data-driven and ensemble models in the Golestan Province. Watershed Engineering and Management, 11(4), 1044-1058. https://dx.doi.org/10.22092/ijwmse.2018.108926.1251 [In Persian]
Samadi, M., Moslem, A.P, Beitollahpour Chaharmahali, A., Kushafar, L., & Farmahini Farahani, M.A. (2019). Investigating the efficiency of support vector machine (SVM) and neural network models in predicting the inlet flow to Golestan Dam. 9th Scientific Conference on Watershed Management and Management of Water and Soil Resources, Kerman. https://civilica.com/doc/10073 [In Persian]
Seyedian, S., Soleimani, M., & Kashani, M. (2014). Predicting streamflow using data-driven model and time series. Iranian journal of Ecohydrology, 1(3), 167-179. https://dx.doi.org/10.22059/ije.2014.54219 [In Persian]
Shin, K.S., Lee, T.S., & Kim, H.J. (2005). An application of support vector machines in bankruptcy prediction model. Expert systems with applications, 28(1), 127-135.                    https://doi.org/10.1016/j.eswa.2004.08.009
Sivapragasam, C, Shie-Yui Liong, S, Y., & Pasha, M.F.K., (2001). Rainfall and runoff forecasting with SSA–SVM approach.  Journal of Hydro informatics. 3(3), 141-152.  https://doi.org/10.2166/hydro.2001.0014
Sudheer, Ch., Anand, N., Panigrahi B.K., & Mathur, S., (2013). Stream flow forecasting by SVM with quantum behaved particle swarm optimization. Neuro computing, 101, 18-23.  https://doi.org/10.1016/j.neucom.2012.07.017
Rahimi, B., & Hafez Parast Movadat, M. (2020). Comparison of support vector regression models, gene expression programming and IHACRES in predicting runoff changes under the influence of climate change (Case study: Jamishan Dam). Iranian Soil and Water Research, 51(10), 2483-2499. https://dx.doi.org/10.22059/ijswr.2020.303779.668640 [In Persian]
Zarezadeh, M., & Bozorg Haddad, O. (2010). Inflow Simulation and Forecasting Optimization Using Hybrid ANN-GA Algorithm. Water and Soil, 24(5),942-954. https://dx.doi.org/10.22067/jsw.v0i0.5295 [In Persian]
Zeinalie, M., Golabi, M.R., Sharifi, M.R., & Hafez Parast Movadat, M. (2018). Evaluation of Artificial Intelligence Models in River Flow Modeling, Case Study: Gamasiab River. Journal of Watershed Engineering and Management, 11(4), 941-954. https://dx.doi.org/10.22092/ijwmse.2018.115870.1370. [In Persian]