Predicting and Analyzing Water Quality using Machine Learning By Yafra Khan & Chai Soo See



Predicting and Analyzing Water Quality using Machine Learning

 A Comprehensive Model


 INTRODUCTION

                                                                This issue has been addressed in many previous researches, however, more work needs to be done in terms of effectiveness, reliability, accuracy as well as usability of the current water quality management methodologies. The goal of this study is to develop a water quality prediction model with the help of water quality factors using Artificial Neural Network (ANN) and time-series analysis. Previous works about Water Quality prediction have also been analyzed and future improvements have been proposed in this paper. The deteriorating quality of natural water resources like lakes, streams and estuaries, is one of the direst and most worrisome issues faced by humanity.
                                                                       This research uses the water quality historical data of the year of 2014, with 6-minutes time interval. The effects of water contamination can be tackled efficiently if data is analyzed and water quality is predicted beforehand. For this paper, the data includes the measurements of 4 parameters which affect and influence water quality. Data is obtained from the United States Geological Survey (USGS) online resource called National Water Information System (NWIS).
Therefore, management of water resources is very crucial in order to optimize the quality of water. Artificial Neural Networks (ANN) with Nonlinear Autoregressive (NAR) time series model is used in order to develop a comprehensive methodology for efficient water quality prediction and analysis. Multivariate statistical techniques like Principal Component Analysis (PCA) has been used in order to determine relationship among different water quality parameters.


                                                         In order to carry out useful and efficient water quality analysis and predicting the water quality patterns, it is very significant to include a temporal dimension to the analysis, so that the seasonal variation of water quality is addressed.The sample data for this research has been acquired from U.S. Geological Survey’s (USGS) National Water Information System (NWIS) which is an open data repository supporting acquisition, processing and long-term storage of water quality data across the U.S. The study area of this research lies in Island Park village, situated in the South- Western Nassau County with Latitude 40°36'31.8", Longitude 73°39'22.0” in the state of New York (Figure 1).



THEORETICAL BACKGROUND OF APPLIED METHODOLOGY

The algorithmic architecture of ANN attempts to simulate the structure and networks in a human brain, with an input layer, hidden layer and output layer each consisting of nodes. Moreover, it has strong adaptability to depict the changes that might occur in the water environment of a particular area. It has the ability to efficiently describe the non-linear relationship of the complex water quality datasets. Data used in this study also comes in the category of Continuous-time time series, as it consists of the values of water quality factors observed with the time-interval of 6 minutes. Since ANN is used to interpret non-linear relationship of the data, the time series model used in this study is Non-linear Autoregressive (NAR) model. In this scenario, the general ANN model slightly changes to take the mathematical form: Where is the output time series and (t-1) is the input time series. In the feed forward process, the weights are multiplied by the inputs and the resultant value is moved forward towards the next layer, until it reaches the output layer. This is a non-linear model is used to define the input and output in terms of time, which is easily estimated in terms of regression. The back-propagation process determines the error value by calculating the difference between estimated value and expected value, starting from output layer towards the input layer.

RESULTS AND DISCUSSION

Some statistics about the selected water quality parameters for the year 2014 were collected from USGS, including Minimum Value, Maximum Value and Mid-Range value, in order to depict the range of values (Table 1). 
This test consists of four models, each used for forecasting the four water quality factors that have been selected i.e. Turbidity, Dissolved Oxygen concentration, Chlorophyll and Specific Conductance. A test was conducted in order to forecast the selected water quality factors based upon their past values.
After running the test, the performance parameters of Regression(R), Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) have been calculated.A feed-forward Neural Network with NAR time series model has been used with the training algorithm of Scaled Conjugate Gradient (SCG) and the activation function of Log Sigmoid.The values of the performance measures for four ANN models for training and testing processes are shown in the table (Table 2).






The graphs for MSE show the amount of epochs (iterations) it takes for the function to converge and         the related MSE for training, testing and validation. The graphs for Regression Analysis show how well the data fits into the function, for training, testing and validation.

CONCLUSION
The proposed model comprising of ANN-NAR proves to a reliable one with the prediction accuracy indicating much improved values, with the lowest MSE being 3.7x10-4 for turbidity and the best Regression value for Specific Conductance (0.99). The future of water quality modeling seems to be very bright and remarkable with the continuous improvement in technology day by day. Besides further improvements in prediction accuracy, there needs to be a more user-centric approach towards tackling the water quality issues, by involving all the relevant stakeholders, using user-friendly tools and an interactive environment so that the solution actually benefits the target users in tackling water quality issues. This paper analyzes and forecasts the values of water quality parameters, in order to determine the concentration of Chlorophyll, Dissolved Oxygen, Turbidity and Specific Conductance and analyzes the results.

 REFERENCES





Comments

Popular posts from this blog

KNN Algorithm

Use amCharts to visualize Google Analytics data

What are the steps used in Machine Learning?