Predicting and Analyzing Water Quality using Machine Learning By Yafra Khan & Chai Soo See
A Comprehensive Model
This issue has been addressed in many previous researches,
however, more work needs to be done in terms of effectiveness, reliability,
accuracy as well as usability of the current water quality management methodologies.
The goal of this study is to develop a water quality prediction model with the
help of water quality factors using Artificial Neural Network (ANN) and
time-series analysis. Previous works about Water Quality prediction have also
been analyzed and future improvements have been proposed in this paper. The
deteriorating quality of natural water resources like lakes, streams and
estuaries, is one of the direst and most worrisome issues faced by humanity.
This research uses the water quality historical data of the
year of 2014, with 6-minutes time interval. The effects of water contamination
can be tackled efficiently if data is analyzed and water quality is predicted beforehand.
For this paper, the data includes the measurements of 4 parameters which affect
and influence water quality. Data is obtained from the United States Geological
Survey (USGS) online resource called National Water Information System (NWIS).
Therefore, management of water
resources is very crucial in order to optimize the quality of water. Artificial
Neural Networks (ANN) with Nonlinear Autoregressive (NAR) time series model is
used in order to develop a comprehensive methodology for efficient water
quality prediction and analysis. Multivariate statistical techniques like
Principal Component Analysis (PCA) has been used in order to determine
relationship among different water quality parameters.
In order to carry out useful
and efficient water quality analysis and predicting the water quality patterns,
it is very significant to include a temporal dimension to the analysis, so that
the seasonal variation of water quality is addressed.The sample data for this
research has been acquired from U.S. Geological Survey’s (USGS) National Water
Information System (NWIS) which is an open data repository supporting
acquisition, processing and long-term storage of water quality data across the
U.S. The study area of this research lies in Island Park village, situated in
the South- Western Nassau County with Latitude 40°36'31.8", Longitude
73°39'22.0” in the state of New York (Figure 1).
THEORETICAL BACKGROUND OF APPLIED METHODOLOGY
The algorithmic architecture of ANN attempts to
simulate the structure and networks in a human brain, with an input layer,
hidden layer and output layer each consisting of nodes. Moreover, it has strong
adaptability to depict the changes that might occur in the water environment of
a particular area. It has the ability to efficiently describe the non-linear
relationship of the complex water quality datasets. Data used in this study
also comes in the category of Continuous-time time series, as it consists of
the values of water quality factors observed with the time-interval of 6 minutes.
Since ANN is used to interpret non-linear relationship of the data, the time
series model used in this study is Non-linear Autoregressive (NAR) model. In
this scenario, the general ANN model slightly changes to take the mathematical
form: Where is the output time series and (t-1) is the input time series. In
the feed forward process, the weights are multiplied by the inputs and the
resultant value is moved forward towards the next layer, until it reaches the
output layer. This is a non-linear model is used to define the input and output
in terms of time, which is easily estimated in terms of regression. The
back-propagation process determines the error value by calculating the
difference between estimated value and expected value, starting from output
layer towards the input layer.
RESULTS AND DISCUSSION
Some statistics about the selected water quality
parameters for the year 2014 were collected from USGS, including Minimum Value,
Maximum Value and Mid-Range value, in order to depict the range of values
(Table 1).
This test consists of four models, each used for
forecasting the four water quality factors that have been selected i.e.
Turbidity, Dissolved Oxygen concentration, Chlorophyll and Specific Conductance.
A test was conducted in order to forecast the selected water quality factors
based upon their past values.
After running the test, the
performance parameters of Regression(R), Mean Squared Error (MSE) and Root Mean
Squared Error (RMSE) have been calculated.A feed-forward Neural Network with
NAR time series model has been used with the training algorithm of Scaled
Conjugate Gradient (SCG) and the activation function of Log Sigmoid.The values
of the performance measures for four ANN models for training and testing
processes are shown in the table (Table 2).
The graphs for MSE show the amount of epochs
(iterations) it takes for the function to converge and the
related MSE for training, testing and validation. The graphs for Regression
Analysis show how well the data fits into the function, for training, testing
and validation.
The proposed model comprising of ANN-NAR proves to a
reliable one with the prediction accuracy indicating much improved values, with
the lowest MSE being 3.7x10-4 for turbidity and the best Regression value for
Specific Conductance (0.99). The future of water quality modeling seems to be
very bright and remarkable with the continuous improvement in technology day by
day. Besides further improvements in prediction accuracy, there needs to be a
more user-centric approach towards tackling the water quality issues, by
involving all the relevant stakeholders, using user-friendly tools and an
interactive environment so that the solution actually benefits the target users
in tackling water quality issues. This paper analyzes and forecasts the values
of water quality parameters, in order to determine the concentration of
Chlorophyll, Dissolved Oxygen, Turbidity and Specific Conductance and analyzes
the results.


Comments
Post a Comment