# Case Studies

If you're interested in time series analysis and forecasting, this is the right place to be. The Time Series Lab (TSL) software platform makes time series analysis available to anyone with a basic knowledge of statistics. Future versions will remove the need for a basic knowledge altogether by providing fully automated forecasting systems. The platform is designed and developed in a way such that results can be obtained quickly and verified easily. At the same time, many advanced time series and forecasting operations are available for the experts. In our case studies, we often present screenshots of the program so that you can easily replicate results.

Did you know you can make a screenshot of a TSL program window? Press Ctrl + p to open a window which allows you to save a screenshot of the program. The TSL window should be located on your main monitor.

Click on the buttons below to go to our case studies. At the beginning of each case study, the required TSL package is mentioned. Our first case study, about the Nile data, is meant to illustrate the basic workings of the program and we advise you to start with that one.

# Regression with ARMA errors

Date: July 05, 2022

Software: Time Series Lab - Home Edition

Topics: Linear regression with ARMA errors

#### Linear regression with ARMA errors

TSL makes extensive use of models in state space form due to the many advantages this brings. Once a model has been put in a state space form, the way is opened for the application of a number of important algorithms. At the centre of these is the Kalman filter. The Kalman filter is a recursive procedure for computing the optimal estimator of the state vector at time $t$, based on the information available at time $t$, see also Harvey (1990).

The dynamics of state space models come from stochastic components, see also Appendix B of the manual. If the components are deterministic (meaning the error terms of the state components have variance zero and therefore disappear from the equation), the Kalman filter would be equivalent to the ordinary least square (OLS) recursions. This means that if you would only include explanatory variables in TSL and no time-varying components you would be estimating a standard regression model as given by
\[
y_i = \alpha + X_i \beta + \varepsilon_i, \qquad i = 1,\ldots,n.
\]
The estimates of $\beta$, denoted by $\hat{\beta}$, are quickly found with the equation
\[
\hat{\beta} = (X'X)^{-1}X'y
\]
and you normally would not use the Kalman filter to find $\hat{\beta}$. However, the results from the Kalman filter should be exactly the same as the $\hat{\beta}$ from above and it is illustrative to see the results from the static regression model in TSL. These results will later be extended with ARMA(p,q) errors.

#### Regression model in TSL

Load the El Nino dataset which can be found in the data folder located in the install folder of TSL. Select the EN3.4 series from the loaded data set. Go to the Build you own model page and switch-on the **Explanatory variables** and select all variables except the **Date** variable from the pop-up window. If you need to include a constant in the regression model, you can add a column of ones to the dataset but more convenient is just to add a fixed Level component to the model. A fixed Level component is in this scenario exactly the same as the constant $\alpha$ in the standard regression model above. Go to the Estimation page and estimate the model. The result should be:

```
Regression coefficients:
Beta Value Std.Err t-stat Prob
beta_RB 0.1591 0.0510 3.1229 0.0019
beta_WPAC -0.1293 0.1998 -0.6470 0.5180
beta_WPAC2 0.9271 0.2026 4.5750 6.4481e-06
beta_WPAC3 -0.1793 0.1903 -0.9419 0.3468
beta_WPAC4 -0.3629 0.1686 -2.1525 0.0320
beta_50fin 0.5637 0.2164 2.6046 0.0096
beta_100cold -0.0055 0.0256 -0.2166 0.8286
beta_100fin1 -0.1267 0.0915 -1.3848 0.1669
beta_100fin2 -0.4103 0.0824 -4.9781 9.7349e-07
beta_150fin1 -0.2586 0.1310 -1.9738 0.0491
beta_150fin2 0.1432 0.0693 2.0656 0.0395
beta_200fin1 0.2533 0.1456 1.7403 0.0826
beta_200fin2 -0.0347 0.1005 -0.3454 0.7300
beta_250fin1 0.2143 0.1494 1.4346 0.1522
beta_250fin2 -0.0651 0.1389 -0.4688 0.6395
beta_300fin1 -0.1427 0.2210 -0.6458 0.5188
beta_300fin2 -0.1130 0.2308 -0.4894 0.6248
beta_400fin1 -0.3206 0.2428 -1.3202 0.1876
beta_400fin2 0.1122 0.2680 0.4186 0.6758
beta_500fin1 -0.6584 0.2839 -2.3193 0.0209
beta_500fin2 -0.3397 0.2786 -1.2193 0.2235
beta_wnd160.200_0.10 -33.6202 2.8740 -11.6982 0.0000
beta_wnd180.220_-4.4 86.0116 5.3723 16.0103 0.0000
beta_wnd180.210_-10.0 -16.4366 4.5229 -3.6341 3.1714e-04
State vector at period 2015-11-01:
Component Value Std.Err t-stat Prob
Level 28.02 3.808 7.358 1.1555e-12
```

which is exactly equal to the OLS estimate $\hat{\beta}$ as we would calculate it from $\hat{\beta} = (X'X)^{-1}X'y$.

But what does Predicting, Filtering, and Smoothing mean in the case of the regression model we just estimated? Remember that, being in time point $t$, Predicting uses the data up to time $t−1$, Filtering the data up to time $t$, and Smoothing uses all the data. If we would plot the fixed level (constant $\alpha$) for Predicting, Filtering, and Smoothing we see that Smoothing gives a straight line while Predicting, Filtering build the level up over time to the end of the data set, see also the figure below. With the above logic, the estimates for Filtering and Smoothing should be the same at time $t=T$ when all data is used. If we look at the bottom panel of the figure we see that this is indeed the case.

#### Predicted, Filtered, and Smoothed constant in regression model

If you would like to end up with a set of only significant variables, based on a user-specified t-stat bound, you can select the **Automatically** option for the explanatory variables on the Build your own model page. Estimating the model leads to the following estimates.

```
Regression coefficients:
Beta Value Std.Err t-stat Prob
beta_RB 0.1692 0.0474 3.570 4.0098e-04
beta_WPAC2 0.9083 0.1476 6.153 1.8695e-09
beta_WPAC4 -0.5950 0.1118 -5.322 1.7235e-07
beta_50fin 0.4916 0.1719 2.860 0.0045
beta_100fin1 -0.1781 0.0533 -3.344 9.0407e-04
beta_100fin2 -0.3515 0.0643 -5.466 8.1630e-08
beta_150fin2 0.0894 0.0320 2.795 0.0054
beta_500fin1 -0.9462 0.2136 -4.430 1.2228e-05
beta_wnd160.200_0.10 -35.4201 2.2560 -15.701 0.0000
beta_wnd180.220_-4.4 92.1050 4.6275 19.904 0.0000
beta_wnd180.210_-10.0 -20.3677 3.8036 -5.355 1.4566e-07
State vector at period 2015-11-01:
Component Value Std.Err t-stat Prob
Level 24.19 2.799 8.640 2.2204e-16
```

from which we can see that all variables are significant with an absolute t-stat of at least 2.795. The following figure shows the contribution of all X's combined $(X \hat{\beta})$ in the top panel and the individual contributions of the X's in a sandgraph in the bottom panel.

#### Contribution of all significant X's in a Sandgraph

#### Regression model with ARMA(p,q) errors

The ACF plot of the predicted residuals shows that there is first and second lag autocorrelation left in the residuals. We can combat this by introducing ARMA(p,q) errors in the model. Select an additional ARMA(2,1) model from the Build your own model page, select all variables except the Date variable, set Explanatory variables to automatic and Estimate the model. The output is:

```
Variance of disturbances:
Variance type Value q-ratio
Level variance 0.0000 0
ARMA variance 0.0769 1
ARMA properties:
Parameter type Value
Unconditional variance 1.0645
AR2 phi1 1.6105
AR2 phi2 -0.7024
MA1 theta1 -0.1509
Regression coefficients:
Beta Value Std.Err t-stat Prob
beta_WPAC3 0.6416 0.1025 6.258 1.0117e-09
beta_WPAC4 -0.2615 0.0889 -2.941 0.0035
beta_150fin2 -0.1771 0.0412 -4.304 2.1204e-05
beta_200fin1 0.1417 0.0423 3.345 9.0050e-04
beta_250fin2 0.3072 0.0981 3.133 0.0019
beta_wnd160.200_0.10 -5.5813 1.4499 -3.849 1.3797e-04
beta_wnd180.220_-4.4 11.9443 2.6044 4.586 6.0621e-06
beta_wnd180.210_-10.0 -6.6402 2.0543 -3.232 0.0013
State vector at period 2015-11-01:
Component Value Std.Err t-stat Prob
Level 14.058 1.5445 9.102 0
ARMA(p,q) 1.990 0.1953 10.189 0
```

The ACF show no residual correlation in the first lags but the 12th lag has a large spike. This is due to the missing of a seasonal component. Additional measures can be lagged explanatory variables or a monthly seasonal component.

# Bibliography

### References

Durbin, J. and Koopman, S. J. (2012). Time series analysis by state space methods. *Oxford university press*.

Harvey, A. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. *Cambridge: Cambridge University Press*. doi:10.1017/CBO9781107049994