Author(s): Yashveer Singh Sohi Data Visualization Photo by Chris Liverani on Unsplash In these series of articles, the S&P 500 Market Index is analyzed using popular Statistical Model: SARIMA (Seasonal Autoregressive Integrated Moving Average), and GARCH (Generalized AutoRegressive Conditional Heteroskedasticity). The series was removed from the Python yfinance API. It was cleaned and used to derive the S&P 500 Returns (percent change in successive prices) and Volatility (magnitude of returns). The second section used a variety of time series exploration methods to extract insights about the characteristics such as trend, seasonality and stationarity. These insights were used to explore the SARIMA class and GARCH classes of models in the third and fourth parts. These links are located at the end of this article. In this part, the 2 models introduced previously (SARIMA and GARCH) are combined to build predictions and effective confidence intervals for S&P 500 Returns. This article uses code from Returns Models/ARMA GARCH for SPX returns.ipynb Notebook in this repository. Table of Contents Importing data Train-Test Parameter Estimation for ARMA Model on Residuals ARMA Model of Returns ARMA Model Residuals GARCH Model of ARMA Residuals Predictions of ARMA and GARCH models Conclusions. Links to other parts of the series. You can refer to part 1 for the complete data (linked at end) or you can download data.csv from this repository. https://medium.com/media/700bcc1ce5d188851dc14f292716d9ee/href Output for the previous code cell showing the first 5 rows of the dataset Since this is same code used in the previous parts of this series, the individual lines are not explained in detail here for brevity. The data is now divided into two sets: Train and Test. Here all the observations on and from 2019-01-01 form the test set, and all the observations before that is the train set. https://medium.com/media/11740302b8e44b550cc9d91211c2f8d1/href The output of previous code cell showing the shape of training and testing sets Parameter Estimation for ARMA Model ARMA model is a subset of the ARIMA model, discussed previously in this series. The parameters are ARMA (p, q). Similar to ARIMA, the number of significant delays in the PACF plot indicate the order of P (which regulates the effects of past values upon the present value). The ACF plot’s significant lags indicate the order in which q controls past residuals and the current value. https://medium.com/media/64cf54fc493921f8c15c11372bfbaed0/href ACF and PACF plots for S&P 500 Returns The plot_acf and plot_pacf functions from the statsmodels.graphics.tsaplots library are used to plot the ACF and PACF plots for S&P 500 Returns. The plots clearly show that both plots have significant lags. The significance level drops abruptly after these lags and then rises again. Thus, to keep the model simple, it is reasonable to set the initial parameters as: p = 1, or p = 2 q = 1, or q = 2 Fitting ARMA Model on Returns Let’s build the ARMA(1, 1) model on S&P 500 Returns. Notification: It is important to verify that the input series used for fitting the model in the SARIMA model class are stationary before you can fit it. Summary statistics for a stationary series are stable over time. The underlying process that created the data remains the same over time. Intuitively, one cannot model a series if its underlying process that generated it remains the same. The stationarity of this series spx_ret is tested in previous parts (part2 and part3) of this series using the adfuller test from statsmodels.tsa.stattools . https://medium.com/media/8cbc28b734d5f58ca4ae6f3d97171ae4/href Output of the previous code cell showing the summary table for ARMA(1, 1) on S&P 500 Returns The SARIMAX function in the statsmodels.tsa.statespace.sarimax library is used to fit any subset of the SARIMAX family of models. To get the ARMA(1) model definition, the series, spx_ret, and order, order = (1-1, 0, 1) are passed into the function. To train the model, the fit method is applied to the model. As shown in the image, the summary method prints the summary table for the fit model. This summary table clearly shows that the coefficients in the model’s models are all significant. ARMA Model Predictions & Confidence Intervals The ARMA (11) model is used to predict the Test set’s Returns. The confidence intervals are also generated using the same model. https://medium.com/media/272e1cb4c6c961eba5e7fb1f3915a5d5/href ARMA(1, 1) model Predictions(In red) and Confidence Intervals(In green) plotted against Actual Returns(In blue) The get_forecast method is used to build a forecasts object that can later be used to derive the confidence intervals using the conf_int function. To get predictions for test sets, the predict function can be used. To calculate how accurate the predicted returns are relative to actual ones, the RMSE (Root Mean Squared Error), metric is used. To visually verify the accuracy of the model, the predictions are plotted with the confidence intervals against the actual test returns. The plot shown in the image shows that the predictions can be confirmed visually. In some cases, they are correct. These confidence intervals don’t give any indication of how predictions will perform over different times periods. Sometimes the confidence intervals are too conservative and the returns exceed the limits. Sometimes, confidence intervals may not be conservative enough. This issue will be addressed in the following section. We will analyze the residuals and attempt to predict the times when ARMA predictions are off. ARMA Model Residuals This section shows how the residuals are plotted and explores some of the properties. https://medium.com/media/7a09e04798b97469ffd195473f14d71b/href Plot of the Residuals of ARMA(1, 1) on S&P 500 Returns The residuals generated by the ARMA(1, 1) model can be accessed by using the resid attribute of the fitted model. The plot clearly shows the volatility clustering phenomenon. If the series has high volatility at one time, it will also show high volatility in other time steps. This phenomenon can be clearly seen in periods between 2004, and periods between (periods with high volatility) The GARCH model can successfully be used to model series that exhibit such clustering (as shown in the fourth link at the end). Next, we will estimate the parameters required to fit the GARCH modeling on residuals from ARMA(1). GARCH Parameter estimation The GARCH model has two parameters, GARCH (p, q). These parameters can be estimated using the PACF plot’s significant lags. This code is similar to that used to create the PACF plot. The data passed is all that’s required to make minor adjustments. The series that is passed in this instance will be model_results.resid. PACF for ARMA (1, 1) Residuals This plot shows that no

Home Innovation Statistical Forecasting for Time Series Data Part 5: ARMA+GARCH model for Time Series Forecasting

#### THE FOREFRONT OF TECHNOLOGY

We monitors and writes about new technologies in areas such as technology, innovation, digitization, space, Earth, IT and AI.