We began this series off with a deep dive into probably the most well-known statistical models used for time series forecasting, and how effectively they carry out. (Learn Part 1 right here.) We noticed that whereas the outcomes had been adequate for quick time period forecasts, any slight enhance in our forecasting horizon triggered not solely a lower in efficiency, however a rise in coaching time. Above all, these models weren’t simple to tune. Together with a superb understanding of the information, one would additionally want vital statistical background to optimize the outcomes produced by the likes of ARIMA, SARIMA, SARIMAX or Prophet.
On this second part, we’ll take a look at two of probably the most commonly-used machine and deep learning algorithms for time series forecasting: LSTMs and LightGBM. We received’t do any exploratory information evaluation, this was accomplished in part one, so I’ll refer you to that article for those who’re excited by seeing what the BikeShare Demand time series we’re utilizing appears to be like like.
Be aware: Your complete pocket book, with all of the code, could be discovered on this public repository.
All models might be skilled and examined on a machine with 128 GB of reminiscence and 32 cores, in a Databricks surroundings.
We’ll begin by a extensively used machine learning algorithm, not solely for time series, however for any tabular information normally: LightGBM. LightGBM is a gradient boosting framework that makes use of tree-based learning algorithms. It’s designed to be distributed and environment friendly, with the benefit of being quick, scalable, and correct. We’ll use the LGBMRegressor class to forecast, however earlier than doing so, we have to preprocess our information in a means that it is understood by LightGBM. Specifically, we’ll:
1. Convert our information in order that it may be considered as a supervised learning drawback. That is accomplished by including columns for observations at time t-1, t-2, …, t-4, and use these observations to foretell demand at time t. Be aware that we may’ve regarded additional into the previous e.g., t-10, however selected to stay with 4 time intervals.
2. We standardize the information, as is assumed by LGBM.
Determine 1: LightGBM forecasting outcomes
Very fascinating outcomes. Not solely is the coaching time extraordinarily low, however the RMSLE can also be low for each short- and long-term forecasting. After all, an error of 0.44 for a forecasting horizon of 300 hours isn’t nice, however when in comparison with earlier models, LGBM is effectively forward. As for the variety of hyperparameters, LightGBM has over 20 potential parameters that may be included.
Those we selected are claimed by LGBM’s documentation to be an important. Deciding on these parameters wasn’t tough; we merely adopted the strategies within the documentation as to how we should always set them, and obtained the ends in the graphs above. Additionally observe that the quantity of knowledge preparation wanted in order that it may be inputted to the community is considerably bigger than it was for all the opposite models.
The ultimate mannequin we’ll use to forecast is a kind of recurrent neural community (RNN), the lengthy short-term reminiscence (LSTM). The wonder behind LSTMs is of their potential to keep up a reminiscence. Whereas feed-forward neural networks are nice for duties reminiscent of picture classification, they’re restricted of their potential to course of sequenced information. With no notion of time, feed ahead neural networks aren’t the very best DL models for time series forecasting. RNNs, and LSTMs in particular, do take into account the time and order during which information is offered, making them nice candidates for time series forecasting.
The structure we’ll use may be very primary, and contains:
1. Three LSTM layers with 100 nodes every
2. An output layer and ReLU activation
We additionally preprocess our information in the identical means we did for LGBM, to get our information prepared for use by an LSTM.
Determine 8: LSTM forecasting outcomes
The coaching time is low, and the LSTM appears to get low forecasting errors for each short- and long-term forecasts. What stands out probably the most is how our community performs in comparison with LightGBM. The latter had a a lot shorter coaching interval, whereas getting higher forecasting outcomes for all time horizons. Whereas the information preprocessing was the identical, hyperparameter tuning was a lot easier for LightGBM than it was for the LSTM. Whereas the strategies in LGBM’s documentation labored considerably effectively, deciding on the fitting structure, in addition to the fitting variety of epochs, batch measurement and learning price was a way more tough process when working with LSTMs, which concerned many iterations and plenty of trial-and-error.
Now, to reply the query: is deep learning wanted for time series forecasting? After all, Your choice ought to be primarily based on your online business’s targets, sources, and experience. Nonetheless, regardless of the present pattern seen in each academia and business to rely totally on deep models, they don’t seem to be at all times the answer. For the bike sharing demand dataset, we noticed that LightGBM outperformed LSTM. Whereas deep models have develop into simpler to create and use, they nonetheless include their justifiable share of complexity. As your dataset turns into extra difficult, so too will your neural community, resulting in issues with scalability, explainability and effectivity.
As for the autoregressive models, they had been good for quick time period forecasting, however struggled as we elevated the forecasting horizon. Extra difficult models in SARIMA and SARIMAX got here at the price of a better coaching time, with little profit to the forecasting accuracy. In addition they contain extra parameters, which suggests the necessity for higher statistical information. Prophet’s coaching time was considerably higher than that of SARIMAX, with comparable forecasting outcomes, so it may be used as a substitute for SARIMAX. This does not imply, nonetheless, that the likes of ARIMA, SARIMA and SARIMAX ought to be fully ignored. For smaller datasets, that are not too advanced, these autoregressive models could be very helpful.
I go away you with the desk beneath, which summarizes the outcomes we received on this article.
Desk: Abstract of outcomes