Algorithms and tools for time series mining
8 September, 2020
4
4
0
Contributors
1. Introduction
1.1 Background
a. Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousand bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues. (1)
b. Climate change analysis requires datasets that not only cover a long span of time but are also homogeneous through time. Dedicated datasets, carefully curated from weather station sites with long-records and subjected to complex quality control to address inconsistencies and errors, have been developed for this purpose. The daily minimum temperature recorded data of Melbourne, Australia has been used for time series analysis in this project.
c. The solar radiation arriving at Earth (once known as the “solar constant”, now usually referred to as Total Solar Irradiance (TSI)), is the most fundamental of climate parameters as it indicates the totality of the energy driving the climate system. All climate models need to prescribe a value for it, either explicitly or implicitly, but its measurement with the precision and stability needed for climate studies has proved challenging.
1.2 Objective
Our goal is to use optimized Machine Learning and Deep Learning regression models that effectively predict the future values of the target variable (or the dependent variable) using the independent variables for aforementioned datasets.
For Bike sharing dataset, predictions of the count of total rental bikes for the time period in between August and December 2012 will be obtained, using available information about the past rentals count for 2011 and 2012. The time series data will be analyzed using various time series regression models and later on the models will be evaluated using RMSE value. This analysis would be helpful to have an idea about the future trend of bike rentals based on season, temperature, windspeed, weather situations and some other environmental conditions.[1]
For Daily minimum temperature data, predictions of temperatures will be obtained using univariate regression models by introducing lag values. Finally, performance of each model will be analyzed taking into consideration their root mean square error value.
For TCTE & SORCE data, total solar irradiance magnitude predictions will be obtained from the past values. For SORCE data, dates have been given in julian format which will be converted to calendar date format (yyyy-mm-dd) and then will be used in the regression models. The Total Irradiance Monitor uses an ambient temperature active cavity radiometer with an at-launch estimated absolute accuracy of 350 parts per million (ppm) and a long-term relative accuracy (stability) of 10 ppm per year. Daily TSI values reported at a mean solar distance of 1 astronomical unit (AU) and zero relative line-ofsight velocity with respect to the Sun is available in the dataset.
1.3 Datasets
1.3.1 Bike-Sharing Dataset: Bike-sharing rental process is highly correlated to the environmental and seasonal settings.
For instance, weather conditions, precipitation, day of week, season, hour of the day, etc. can affect the rental behaviors. The core data set is related to the two-year historical log corresponding to years 2011 and 2012 from Capital Bikeshare system, Washington D.C., USA which is publicly available in http://capitalbikeshare.com/system-data. The data has been aggregated on two hourly and daily basis and then extracted and added the corresponding weather and seasonal information (2). Weather information are extracted from http://www.freemeteo.com. [ (3)]
1.3.2 Daily Minimum Temperatures Dataset: This dataset describes the minimum daily temperatures over 10 years (19811990) in the city of Melbourne, Australia. The units are in degrees Celsius and there are 3650 observations. The source of the data is credited as the Australian Bureau of Meteorology. (4)
1.3.3 SORCE and TCTE dataset: Recently, Total Solar Irradiance has been measured by the Total Irradiance Monitor (TIM); two versions of this instrument have flown on the SORCE spacecraft (providing TSI measurements since 2003) and the
TCTE platform (providing TSI measurements since 2013). SORCE and TCTE data are available through the University of Colorado's Laboratory for Atmospheric and Space Physics. (5)
2. Exploratory data analysis
2.1 Bike Sharing Data
Python has been used for this project. For data visualization, matplotlib and seaborn libraries were imported.
The following plot depicts the trend for bike rentals for the year 2011 and 2012 based on the factors described above.
As seen from the above plot, the total count of rental bikes varies as per seasons for both the years. It can be observed that there has been a hike in between April and September for both the years and then the rentals declined with winter season approaching. Also, it is be noted that the total count for 2012 is more in comparison to 2011.
2.1.a Plot for Windspeed Vs Bike rentals count
As seen from the above plot, windspeed and the rental bikes count are negatively correlated i.e. with the increase of windspeed, people doesn’t prefer renting bikes.
To visualize the variation in windspeed for the year 2011 and 2012, a histogram has been plotted, which might help to understand the fall in rental bikes during windy weather condition.
2.1.b Histogram for variation in wind speed for 2011 and 2012
2.1.c Distribution of bike rentals per season
As it can be seen from the pie chat above that most bike rentals have happened during Fall season of the year followed by Summer and Winter season. Spring season accounts for the lowest count in bike rentals for both the years combined. The reason behind less bike rentals for spring could be the weather situations like light Snow, light Rain + thunderstorm + scattered clouds, light rain + scattered clouds etc. which will be analyzed later.
2.1.d Total count of bike rentals at each hour per month for 2011 and 2012
From the above plots, the count of total bikes rented per hour for each month of year 2011 and 2012 can be observed. Mostly bikes have been rented during afternoon or evening time of a day (usually in between 3 pm to 8 pm) and then it becomes steep during night hours. On the basis of month, it could be seen that most bikes were rented between April and October for both the years combined.
A box and whisker plot (sometimes called a boxplot) is a graph that presents information from a five-number summary. It is useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set. Box and whisker plots are also very useful when large numbers of observations are involved and is usually involves categorical data. The boxplot above shows the count of rental bikes on each day of a week and indicates whether it was holiday or not. As seen from the plot, most bikes have been rented when it was not a holiday except for Sundays (which also have an average of nearly same count as for any other day in a week). The mean for each day does not show much variation but it can be distinguished on the basis of minimum and maximum count. There is an outlier value for Sunday (weekday 1) when the count was nearly ignorable. This value was later on removed from the datset.
2.1.e Scatter plot of bike rentals for each season varied as per weather condition
There is a positive correlation between real feel temperature and count of total bike rentals which can be observed by plotting a scatter plot between both the features.
The other scatter plot shows the variation in bike rentals per season in different weather conditions. For spring and summer season the count of rental bikes is linearly proportional with the normalized temperature and have less rentals due to Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog in comparison to other weather situations. For Fall season, since the relationship is quite non-discernible but has most rental counts in Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist weather conditions. For winter, it has a quadratic non-linear relationship with mostly Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds.
2.2 Daily Minimum Temperature Data, Melbourne, Australia
2.2.a Plot for recorded daily temperature (degree Celsius)
The above plot shows the variation in daily temperature over the 10 years period (1981-1990). It can be observed that the time series is stationary and further ADF test is performed to confirm the stationarity of the series.
2.2.b Lag plot
A lag plot checks whether a data set or time series is random or not. Random data should not exhibit any identifiable structure in the lag plot. Non-random structure in the lag plot indicates that the underlying data are not random.
The data shows a linear pattern, it suggests autocorrelation is present. A positive linear trend (i.e. going upwards from left to right) is suggestive of positive autocorrelation.
2.2.c Year-wise and Month-wise boxplot depicting daily temperature variation
From the year-wise plot, it can be seen that there is some variation in the mean temperature value over the 10 years period. Also, there are outliers present for certain years in the dataset. The mean minimum temperature in Melbourne, Australia is around 11 to 12 degree Celsius and has a maximum value of about 25 degree Celsius.
From month-wise plot, it can be observed that beginning April the temperature value decreases and June, July has been recorded as the coldest month for Melbourne. Then gradually the average temperature value increases for rest of the months.
2.3 TSI values- TCTE and SORCE data
2.3.a TSI Values- Calibration transfer experiment, for January 2018
The above plot shows the trend in which total solar irradiance value changes on each day for January 2018. Ordinate corresponds to TSI values and abscissa represents each day in the month. It is to be noted that the range for TSI values has been set to 1361.10 to 1361.275.
2.3.b Scatter plot
The above scatter plot shows the variation in the magnitude of total solar irradiance as per TCTE for each month over the years 2013- 2019. The range for TSI on the ordinate has been set to 1360-1363.
2.3.c TSI Values- Solar radiation and climate experiment, for January 2019
The plot above shows the variation in TSI values for the month of January,2019. The x-axis marks each day of the month of January and y-axis marks represents the TSI values in 1 astronomical unit. It is to be noted here that the range of TSI values has been set to 1360.6 to 1360.74 for better visualization of the variation in the magnitude.
2.3.d Scatter plot for SORCE
The scatter plot above shows the recorded total solar irr
adiance values for each day of a month over the years 2003 to 2019. The maximum recorded value of TSI is set to 1364 and although for the dataset minimum value is 0 but for better visualization, I have set the minimum value as 1359.
3. Accounting for Stationarity of the time series
3.1 Augmented Dickey-Fuller Test (6)
In statistics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis that a unit root is present in a time series sample. The alternative hypothesis is different depending on which version of the test is used, but is usually stationarity or trend-stationarity. It is an augmented version of the Dickey–Fuller test for a larger and more complicated set of time series models.
The augmented Dickey–Fuller (ADF) statistic, used in the test, is a negative number. The more negative it is, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence. It allows for higher-order autoregressive processes by including 𝜟𝒚𝒕−𝜟𝒚𝒕"𝒑 in the model. But our test is still if γ=0.
𝜟𝒚𝒕=α+βt+𝜸𝒚𝒕"𝟏+𝜹𝟏𝜟𝒚𝒕"𝟏+𝜹𝟐𝜟𝒚𝒕"𝟐+…
The null hypothesis of the Augmented Dickey-Fuller is that there is a unit root, with the alternative that there is no unit root. If the p-value is above a critical size, then we cannot reject that there is a unit root.
The p-values are obtained through regression surface approximation. If the p-value is close to significant, then the critical values should be used to judge whether to reject the null.
3.1.a Results of ADF test on the bike sharing dataset:
Test Statistics -1.817991 p-value 0.371567 No. of lags used 13.000000 Number of observations used 716.000000 critical value (1%) -3.439516 critical value (5%) -2.865585 critical value (10%) -2.568924
Here, since the p-value > 0.05 so we cannot reject null hypothesis that there is a unit root and thus the time series is non-stationary. 3.1.b Results of ADF test on daily minimum temperature data:
Significance Level = 0.05
Test Statistic = -4.4448
No. Lags Chosen = 20
Critical value 1% = -3.432
Critical value 5% = -2.862
Critical value 10% = -2.567
=> P-Value = 0.0002. Rejecting Null Hypothesis.
=> Series is Stationary.
3.1.c Results of ADF test on TCTE data:
Significance Level = 0.05
Test Statistic = -2.6122
No. Lags Chosen = 22
Critical value 1% = -3.434
Critical value 5% = -2.863
Critical value 10% = -2.568
=> P-Value = 0.0905. Weak evidence to reject the Null Hypothesis. => Series is Non-Stationary.
Since, it is found that the time series for Total Solar Irradiance Calibration Transfer Experiment is non-stationary. So, in order to transform the series into a stationary one, I have used a difference of order 1. This converted the series into a stationary series.
Significance Level = 0.05
Test Statistic = -9.1561
No. Lags Chosen = 26
Critical value 1% = -3.434
Critical value 5% = -2.863
Critical value 10% = -2.568
=> P-Value = 0.0. Rejecting Null Hypothesis.
=> Series is Stationary.
3.1.d Results of ADF test on Solar Radiation and Climate Experiment time series
Significance Level = 0.05
Test Statistic = -4.5328
No. Lags Chosen = 31
Critical value 1% = -3.431
Critical value 5% = -2.862
Critical value 10% = -2.567
=> P-Value = 0.0002. Rejecting Null Hypothesis.
=> Series is Stationary.
Rolling mean and standard deviation for Bike sharing data:
From the plot above it can be observed that since both mean and standard deviation are not flat lines(constant mean and constant variance), the series is non-stationary.
4. Separating seasonality and trend from rental bikes count time series
Now, from the original time series plot since it was found to be non-stationary, so we applied rolling mean and standard deviation to smoothen the curve and later separated the trend and seasonality from the time series. The resultant series will become stationary through this process. There is no pattern in the seasonality of the time series and the residuals in a time series model are what is left over after fitting a model. Its equal to the difference between the observations and the corresponding fitted values. We see the residual errors to be random as it means that the model has captured all of the structure and the only error left is the random fluctuations in the time series that cannot be modeled.
Later in the process, to reduce the magnitude of count logarithmic values has been taken into consideration.
Rolling mean and standard deviation has been calculated and shown in the above plot. The graph displays one-sided moving averages with a length of each day for year 2011 and 2012. Moving averages removes the seasonal pattern and makes the underlying trend visible. Each moving average point is the daily average of the past day. We can look at any date.
5. Applying Machine learning algorithms
Splitting original data into train and test before applying machine learning algorithm to the dataset.
For further analysis, 70% of the overall data has been considered as training and remaining 30% is considered as test data. The plot above shows the division of original dataset for both the years combined.
5.1 ARIMA model:
• AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.
• I: Integrated. The use of differencing of raw observations (e.g. subtracting an observation from an observation at the previous time step) in order to make the time series stationary.
• MA: Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations. (7)
I have used Auto-ARIMA model for determining the optimal parameters to train the ARIMA (Auto Regressive Integrated Moving Average) model i.e. finding the optimal p, d, q values.
• p is the parameter associated with the auto-regressive aspect of the model, which incorporates past values.
• d is the parameter associated with the integrated part of the model, which effects the amount of differencing to apply to a time series.
• q is the parameter associated with the moving average part of the model.
5.1.a Results of the ARIMA model when trained with optimal parameters for rental bikes data.
ARIMA Model Results
==============================================================================
Dep. Variable: D.cnt No. Observations: 509
Model: ARIMA(1, 1, 1) Log Likelihood -106.847
Method: css-mle S.D. of innovations 0.298
Date: Sun, 16 Aug 2020 AIC 221.694
Time: 16:16:38 BIC 238.624
Sample: 1 HQIC 228.332
=============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------- const 0.0033 0.002 1.878 0.060 -0.000 0.007 ar.L1.D.cnt 0.2540 0.049 5.203 0.000 0.158 0.350 ma.L1.D.cnt -0.9027 0.019 -46.445 0.000 -0.941 -0.865 Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 3.9363 +0.0000j 3.9363 0.0000
MA.1 1.1078 +0.0000j 1.1078 0.0000 -----------------------------------------------------------------------------
5.1.b Results of the ARIMA (SARIMAX) model when trained with optimal parameters for daily minimum temperature dataset.
5.1.c Results of the ARIMA (SARIMAX) model when trained with optimal parameters for TSI values- TCTE data
5.1.d Results of the ARIMA (SARIMAX) model when trained with optimal parameters for TSI values- SORCE data.
5.1.e Interpreting the results:
Method
This field tells
which calculation was used for defining the parameters. There are various methods available for estimating the parameters, such as Yule Walker procedure, method of moments or maximum likelihood estimation (MLE). This field shows us which method was used to calculate the parameters. In this case, ‘css-mle’ stands for ‘conditional sum of squares’ and ‘maximum likelihood estimation’. The Statsmodel documentation page tells us that “the conditional sum of squares likelihood is maximized and its values are used as starting values for the computation of the exact likelihood via the Kalman filter.” It means that the estimated mean of the distribution is based on a normal distribution with its peak at the highest probability point of the observed values. MLE’s role in the algorithm is to determine the values for the parameters of the model with a high degree of probability that the model’s results will be close to the observed (given) data.
Log-Likelihood
The log-likelihood value is a simpler representation of the maximum likelihood estimation. It is created by taking logs of the previous value. This value on its own is quite meaningless, but it can be helpful if we compare multiple models to each other. Generally speaking, the higher the log-likelihood, the better. However, it is not the only parameter to evaluate a model’s performance.
AIC
AIC stands for Akaike’s Information Criterion. It is a metric that helps to evaluate the strength of a particular model. It takes in the results of maximum likelihood as well as the total number of given parameters. Since adding more parameters to the model will always increase the value of the maximum likelihood, the AIC balances this by penalizing for the number of parameters, hence searching for models with few parameters but fitting the data well. Looking at the models with the lowest AIC is a good way to select to best one! The lower this value is, the better the model is performing.
BIC
BIC (Bayesian Information Criterion) is very similar to AIC, but also considers the number of rows in the dataset. Again, the lower your BIC, the better the model works. BIC induce a higher penalization for models with complicated parameters compared to AIC. (8)
5.1.f ARIMA model predictions on bike sharing test data:
Test RMSE for ARIMA model on bike sharing data was obtained as 0.239. The model was able to predict the count near to the actual values. It is to be noted that to reduce the magnitude of total count of rental bikes I have taken the logarithmic values and in order to make the series stationary, first order difference of the series has been taken.
5.1.g ARIMA model predictions on daily minimum temperature test data:
Test RMSE for ARIMA model on daily minimum temperature data was obtained as 2.207. The model performed quite decently and was able to predict values near to the actual values. The p,q,d values for model was selected as 3,0,1 respectively.
5.1.h ARIMA model predictions on Total Solar Irradiance Calibration Transfer Experiment (TCTE) TSI data.
Some predicted values from ARIMA model for TCTE test data have been shown above. The RMSE value turned out to be 232.572. So, it can be noted that ARIMA model was not able to predict most of the test samples correctly.
5.1.i ARIMA model predictions on Solar Radiation and Climate Experiment (SORCE) TSI data.
The plot above shows the ARIMA model evaluation for SORCE data. It can be seen that the model was able to perform quite well with predictions close to actual values. Also, for better visualization, I have plotted first 400 samples of test data and setting the limit for ordinate between 1350-1364. The abscissa marks each day of the month over the given years in the test data. The RMSE value turned out to be 5.108.
5.2 Long Short-Term Memory Neural Network (LSTM)
5.2.a Data scaling
Some preprocessing has been done to standardize features by removing the mean and scaling to unit variance The standard score of a sample x is calculated as:
z = (x - u) / s where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.
Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform.
5.2.b Training and fitting the LSTM model to time series data:
A Recurrent Neural Network (RNN) works like this- First words get transformed into machine-readable vectors. Then the RNN processes the sequence of vectors one by one. While processing, it passes the previous hidden state to the next step of the sequence. The hidden state acts as the neural network’s memory. It holds information on previous data the network has seen before. Initially, the input and previous hidden state are combined to form a vector. That vector now has information on the current input and previous inputs. The vector goes through the tanh activation, and the output is the new hidden state, or the memory of the network.
An LSTM has a similar control flow as a recurrent neural network. It processes data passing on information as it propagates forward. The differences are the operations within the LSTM’s cells. These operations are used to allow the LSTM to keep or forget information. For implementing LSTM, Keras library has been used. (9)
5.2.c Fitting LSTM model to Bike sharing dataset.
The hidden layer has 120 units and then a final dense layer is added to the network. Mean-squared error loss is used along with Adam optimizer with a learning rate of 0.001.
For fitting the model to the training data, 30 epochs are being performed with a batch size of 20. Below is the variation in train and test loss with respect to the number of epochs.
As it can be seen that after almost 20 epochs, the loss saturates. It does not keep on decreasing but has a reducing trend. So, I have considered that after 30 epochs the model is pretty much done learning.
5.2.d LSTM model evaluation for Bike sharing data:
In the above plot, the red curve shows the predicted values as obtained from the LSTM model. The model seems to be doing a great job of capturing the general pattern of the data. It fails to capture random fluctuations, which is a good thing as it avoids chances of overfitting. The RMSE value was 164.975.
5.2.e Fitting LSTM model to daily minimum temperature dataset.
The hidden layer has 128 units with relu activation and then a final dense layer is added to the network. Mean-squared error loss is used along with Adam optimizer. For fitting the model to the training data, 40 epochs are being performed.
5.2.f Model evaluation for minimum temperature test data.
As seen from the above actual vs predic
tions plot that LSTM model was able to perform quite well for the given dataset. The RMSE value came about 2.203.
5.2.g Fitting LSTM model to TCTE data.
Split on train and test data was performed which now includes the time steps into the shape. After splitting the log differenced series, model was fitted into the data, with 64 hidden units and relu activation. Total 30 epochs were performed for fitting the model.
5.2.h LSTM Model evaluation for TCTE data.
The above plot shows actual vs predicted values from the LSTM model for the given TCTE data. The RMSE value came about 154.494
5.2.i Fitting LSTM model to SORCE data.
Split on train and test data was performed which now includes the time steps into the shape. After splitting the TSI value series, model was fitted into the data, with 64 hidden units and relu activation. Total 30 epochs were performed for fitting the model.
5.2.j LSTM Model evaluation for SORCE data.
It can be observed from the above plot that LSTM model was able to predic
t values for test samples close to the actual TSI values for SORCE data with a RMSE value of 2.135.
5.3 Auto-Regression (AR) Model for time series
An autoregressive model is when a value from a time series is regressed on previous values from that same time series. for example, 𝑦& on 𝑦&"’:
In this regression model, the response variable in the previous time period has become the predictor and the errors have our usual assumptions about errors in a simple linear regression model. The order of an autoregression is the number of immediately preceding values in the series that are used to predict the value at the present time. (10)
5.3.a Fitting auto-regression model on bike sharing data.
An AR(p) model is an autoregressive model where specific lagged values of yt are used as predictor variables. Lags are where results from one time period affect following periods. The value for “p” is called the order. yt = δ + φ1yt-1 + φ2yt-2 + … + φpyt-1 + At
Where:
yt-1, yt-2…yt-p are the past series values (lags), At is white noise (i.e. randomness), and δ is defined by the following equation:
For bike sharing data, AR model with 55 lags
have been used i.e. the outcome variable in this AR process at some point in time t is related only to time periods that are 55 periods apart.
5.3.b AR model evaluation on bike sharing data.
With respect to the given time period, total count of rental bikes has been plotted with red plot showing the predicted values from the AR model and blue curve representing the actual test samples. The RMSE value equals to 0.285.
5.3.c Fitting auto-regressive model on daily minimum temperature dataset.
For daily minimum temperature data, AR model with a lag value of 500 have been used i.e. the outcome variable in this AR process at some point in time t is related only to time periods that are 500 periods apart.
5.3.d Auto-regressive model evaluation on daily minimum temperature dataset.
The RMSE value for the given dataset equals to 2.866.
5.3.e Fitting Auto-Regression model on TCTE data.
In order to fit the AR model on TCTE data, firstly, the log & first order differenced TSI value series has been split into train and test part. Then introduced lag value of 15 to the time series.
5.3.f AR model evaluation on TCTE data.
The plot above shows some predicted TSI values as obtained from AR model. The root means square error for autoregression model turned out to be 151.056. As seen from the aforementioned predicted values for total solar irradiance, the model was able to predict values close to the actual values of the test set.
5.3.g Fitting AR model on SORCE data.
For TSI values as recorded from solar radiation and climate experiment data, AR model with lag 2 have been used i.e. the outcome variable in this AR process at some point in time t is related only to time periods that are 2 periods apart.
5.3.h AR model evaluation on SORCE data.
From the above predictions as obtained from auto-regression model when fitted on total solar irradiance values for solar radiation and
climate experiment, it can be seen that the model was able to predict nearby values initially but later on with more samples it didn’t perform well in comparison to other models used. The test RMSE was about 74.070. All prediction values are present in the project link provided.
5.4 Gaussian Process Regression (GPR) for time series
Gaussian process regression (GPR) is a nonparametric, Bayesian approach to regression that is making waves in the area of machine learning. GPR has several benefits, working well on small datasets and having the ability to provide uncertainty measurements on the predictions. (11)
In GPR, we first assume a Gaussian process prior, which can be specified using a mean function, m(x), and covariance function, k (x, x’):
A popular kernel is the composition of the cons
tant kernel with the radial basis function (RBF) kernel, which encodes for smoothness of functions (i.e. similarity of inputs in space corresponds to the similarity of outputs) and this combination is used for all datasets.:
5.4.a Fitting GPR model on bike sharing data.
A GPR model with following parameters has been used for given data.
{'k1': 0.955**2,
'k2': RBF (length_scale=0.001),
'k1__constant_value': 0.9118828521096602,
'k1__constant_value_bounds': (0.1, 1000.0),
'k2__length_scale': 0.0010000000000000002, 'k2__length_scale_bounds': (0.001, 1000.0)}
Here, length scale corresponds to the value l used in the kernel equation and the constant value corresponds to the value of sigma in the kernel equation.
5.4.b GPR model evaluation for bike sharing data.
The RMSE value is equal to 0.303.
5.4.c Fitting GPR model on daily minimum temperature dataset.
A composition of constant kernel and radial basis function (RBF) kernel has been used for the GPR model with the hyperparameters: signal variance equals to 2.0 and length scale equals to 0.1. The variance of the i.i.d. noise on the labels is set to 0.1.
5.4.d GPR model evaluation on daily minimum temperature data.
As seen from the above plot, the GPR model was not able to predict the temperature values correctly as compared to the other regression models used for analysis. The RMSE value came about 3.820.
5.4.e Fitting GPR model on TCTE data.
The hyperparameters are set as: signal variance equals to 3.0 and length scale equals to 0.01. The variance of the i.i.d. noise on the labels is set to 0.1. A popular approach to tune the hyperparameters of the covariance kernel function is to maximize the log marginal likelihood of the training data. A gradient-based optimizer is typically used for efficiency. Because the log marginal likelihood is not necessarily convex, multiple restarts of the optimizer with different initializations is used (n_restarts_optimizer =30).
5.4.f GPR model evaluation for TCTE data.
The RMSE value for GPR model when trained on total solar irradiance values obtained from TCTE turned out to be 151.056. It is to be noted that the performance of GPR model is almost same as that of auto-regression model with the hyperparameters as defined before.
5.4.g Training GPR model with SORCE train data.
A composition of constant kernel and radial basis function (RBF) kernel has been used for the GPR model with the hyperparameters: signal variance equals to 10.0 and length scale equals to 1.5. The variance of the i.i.d. noise on the labels is set to 0.1 and the normalize _y is set to true i.e. it refers to the constant mean function — either zero if False or the training data mean if True.
5.4.h GPR model evaluation on SORCE test data
The root means square error for GPR model, when trained on SORCE data, is equal to 917.738.
5.5. Vector Auto-regression (VAR)
VAR models (vector autoregressive models) are used for multivariate time series. The structure is that each variable is a linear function of past lags of itself and past lags of the other variables.
In general, for a VAR(p) model, the first p lags of each variable in the system would be used as regression predictors for each variable.
If we suppose that we measure three different time series variables, denoted by 𝑥&,’ , 𝑥&,), 𝑥&,* then the vector autoregressive model of order 1, denoted as VAR (1), is as follows:
Each variable is a linear function of the lag 1 values for all variables in the set. (12)
5.5.a Training VAR model on bike sharing dataset
For this dataset, we have 4 different time series variables namely ‘season’, ‘temperature’, ‘real-feels temp’, ‘count’. The order of VAR model is selected as 14. So, for the given dataset, I have used VAR (14). The order is selected on the basis of minimum AIC value, which in this case is -17.427.
5.5.b VAR model evaluation on bike sharing test data.
The RMSE value is obtained as 0.300.
5.5.c Training VAR model on daily minimum temperature data, Melbourne, Australia.
For this dataset, we have 4 different time series variables namely ‘year, ‘month, ‘day’, ‘temperature’. The order of VAR model is selected as 300. So, for the given dataset, I have used VAR (300). The order is selected on the basis of minimum AIC value.
5.5.d VAR model evaluation on test set of daily minimum temperature dataset.
The above plot shows the predicted values for daily temperature as obtained from the VAR model as defined before. The root means square error value is equals to 2.724.
5.5.e Training VAR model on TCTE data.
For TSI values as recorded by TCTE data, we have 4 different time series variables namely ‘year, ‘month, ‘day’, ‘total solar irradiance’ (here, I have considered first order differenced tsi values). The order of VAR model is selected as 13. So, for the given dataset, I have used VAR (13). The order is selected on the basis of minimum AIC value, which in this case was found to be -2.978, as shown below.
5.5.f VAR model evaluation on test set of TCTE data.
The RMSE value is found to be 151.072. It could be observed that for total solar irradiance value as obtained from TCTE data, VAR model performed somewhat similar to auto-regression model and gaussian process regression model.
5.5.g Training VAR model on SORCE data.
For TSI values as recorded by SORCE data, we have 4 different time series variables namely ‘year, ‘month, ‘day’, ‘total solar irradiance’. The order of VAR model is selected as 32. So, for this given dataset, I have used VAR (32). The order is selected on the basis of minimum AIC value, which in this case was found to be -6.899.
5.5.h VAR model evaluation when trained on SORCE data.
The RMSE value is found to be 138.423. It can be seen that for total solar irradiance value as obtained from SORCE data, VAR model could not perform well in comparison to other time series regression models used. 6. Results
6.1 Bike Sharing Dataset
As per the RMSE value calculated for each model, it could be observed that ARIMA model showed minimum value for calculated error among all the ML models with a Root Mean Square Error value of 0.239. So, for this dataset we can say that ARIMA model works best.
6.2 Daily Minimum Temperature data, Melbourne, Australia
From the above table of comparison, it can be noted that for Minimum temperature dataset, LSTM model performs better in comparison to other models with 128 hidden units, relu activation, 40 epochs and RMSE value of 2.203, followed by ARIMA model with RMSE value of 2.207.
6.3 Total Solar Irradiance Calibration Transfer Experiment (TCTE) data
From the above table, it can be seen that for TCTE data, AR model and GPR model performed well among all the other time series regression models with a RMSE value of 151.056.
6.4 Solar Radiation and Climate Experiment (SORCE) data
As seen from the above table, for SORCE data, LSTM model performed better with 64 hidden units, relu activation, 30 epochs and RMSE value of 2.135.
7. Conclusion
In this report, I have performed data analysis on four different time series datasets and later used machine learning algorithms to predict the target variable by learning from the past data. The analysis combines the strengths of ARIMA, LSTM, AR, VAR, GPR concepts for enhanced model expressiveness and robust prediction. These models were then compared based on the root means square error value.
8. Acknowledgements
This project was successfully completed under the guidance of Prof. Jason T. L. Wang, New Jersey Institute of Technology for the course CS 700B.
9. Bibliography
1. Hadi Fanaee-T. Bike Sharing Data Dataset. UCI Machine Learning Repository. [Online] https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset#.
2. Repository, UCI Machine Learning. Bike Sharing Data Dataset. UCI Machine Learning Repository. [Online] https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset#.
3. Hadi Fanaee-T. UCI Machine Learning Repository. Bike Sharing Data dataset- Description File. [Online] https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset#.
4. Jason Brownlee. Sensitivity Analysis of History Size to Forecast Skill with ARIMA in Python. Machine Learning
Mastery. [Online] https://machinelearningmastery.com/sensitivity-analysis-history-size-forecast-skill-arima-python/. 5. University of Colorado- Lab for atmospheric and space physics. Universit of Colorado Boulder. [Online] https://lasp.colorado.edu/home/sorce/data/.
6. statsmodels. statsmodels.tsa.stattools.adfuller. [Online] https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html.
7. Brownlee, Jason. How to Create an ARIMA Model for Time Series Forecasting in Python. Machine Learning Mastery. [Online] https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/.
8. Holicka, Nikol. Interpreting ARMA model results in Statsmodels for absolute beginners. Analytics Vidhya. [Online] https://medium.com/analytics-vidhya/interpreting-arma-model-results-in-statsmodels-for-absolute-beginnersa4d22253ad1c.
9. Venelin Valkov . Time Series Forecasting with LSTMs using TensorFlow 2 and Keras in Python . Medium.com. [Online] https://towardsdatascience.com/time-series-forecasting-with-lstms-using-tensorflow-2-and-keras-in-python6ceee9c6c651.
10. Penn State- Eberly college of science. Regression methods. Penn state- Eberly college of science. [Online] https://online.stat.psu.edu/stat501/lesson/14/14.1.
11. Hilarie Sit . Quick Start to Gaussian Process Regression. Medium.com. [Online] https://towardsdatascience.com/quick-start-to-gaussian-process-regression-36d838810319.
12. Penn state- Eberly college of science. Applied time series analysis. Penn state- Eberly college of science. [Online] https://online.stat.psu.edu/stat510/lesson/11/11.2.