# Case Studies

If you're interested in time series analysis and forecasting, this is the right place to be. The Time Series Lab (TSL) software platform makes time series analysis available to anyone with a basic knowledge of statistics. Future versions will remove the need for a basic knowledge altogether by providing fully automated forecasting systems. The platform is designed and developed in a way such that results can be obtained quickly and verified easily. At the same time, many advanced time series and forecasting operations are available for the experts. In our case studies, we often present screenshots of the program so that you can easily replicate results.

Did you know you can make a screenshot of a TSL program window? Press Ctrl + p to open a window which allows you to save a screenshot of the program. The TSL window should be located on your main monitor.

Click on the buttons below to go to our case studies. At the beginning of each case study, the required TSL package is mentioned. Our first case study, about the Nile data, is meant to illustrate the basic workings of the program and we advise you to start with that one.

# El Niño

Author: Rutger Lit
Date: July 04, 2022
Software: Time Series Lab - Home Edition
Topics: climate modelling; seasonal and two cycles

#### El Niño

El Niño is a well-known phenomenon in climate science and is characterized by higher than average sea surface temperatures in the central and eastern equatorial Pacific Ocean. It has a substantial impact on the climate in many parts of the world. Hence, it has been given much coverage in the popular media, and it is the subject of extensive research in the scientific world. El Niño typically causes changes in weather patterns related to temperature, pressure and rainfall. Thus, a warm event may not only have a negative impact on local economies, but can also have negative consequences for public health, as in some regions these changes increase substantially the risk of water-borne and/or vector-borne diseases. Given its huge impact particularly on some developing countries bordering the Pacific Ocean, it is self-evident that a timely forecast of the next El Niño event is important. Much scientific research has been devoted to the development of forecasting methods for El Niño. The oscillation is characterized by an irregular period of between 2 and 7 years. Currently, forecasts are issued regularly for up to three seasons in advance, but the long term of more than one year ahead forecasts remain a real challenge. At the same time, one of the two main theories about the physics underlying El Niño implies that it may be a self-sustaining climatic fluctuation that is quasi-periodic, with several dominant peaks in its spectrum, the main one being at about every 4-5 years and a secondary at about 2 years. This suggests that it may be predictable at lead times of several years, see also Li et al. (2020). In this Case study we show that we can make accurate forecasts with TSL. We build up the model in several steps and show increases in training sample model fit and validation sample forecast performance in each step.

Load and select the elnino.csv data set from the file system by pressing the Load database button or by clicking File ► Load data.
Our series of interest is the EN3.4 time series. This is a time series of monthly temperature values which is referred to as the Niño3.4 time series and which is the area-averaged sea surface temperature in the region (5 ° N - 5 ° S, 170 ° W - 120 ° W). In this area the El Niño events are identified, see also the discussion in Bamston et al. (1997). The National Centers for Environmental Information (NOAA) defines an El Niño or La Niña event as a phenomenon in the equatorial Pacific Ocean characterised by a five consecutive 3-month running mean of sea surface temperature (SST) anomalies in the Niño3.4 region that is above (below) the threshold of +0.5°C (-0.5°C). In our empirical study, the Niño3.4 time series, denoted by $y_t$, is the variable of interest. The variable is observed from January 1982 to the end of 2015 with 34 years of data and 407 monthly observations. For this period, observations for 24 predictor variables are available which consist of physical measures of zonal wind stress and sea temperatures at different depths in the ocean and at different locations. Petrova et al. (2017) give a detailed account of the selection of these variables. For graphs, acronyms and references to data sources for all time series, we refer to Appendix B of Li et al. (2020).
Click the vertical arrow bar on the right of the screen to see additional information about the selected time series. The Statistical tests panel shows the result of the Augmented Dickey-Fuller test and KPSS test. The ADF test (with intercept) strongly implies that the null hypothesis of a unit root is rejected. Based on the KPSS test, we cannot reject the null hypothesis of trend stationarity which suggests that the Niño3.4 time series is generated from a stationary process around a fixed mean.

#### Periodicity in the time series

The spectral density is a very useful plot to get an idea about cyclical patterns in the time series. Click on the spectral density button in the bottom right corner of the graph and change the number of lags to 100 in the spinbox under other settings. The screen should like to the one below.

#### Spectral density of the EN3.4 time series

From the sample spectrum, we can identify four peaks which correspond to periods of approximately 6, 12, 18, and 51 months. The 6 and 12 month periods correspond to the monthly seasonality and the 18 and 51 month periods will be modelled with cycles.

#### Model: level + slope + seasonal

We build a model with a time-varying level, time-varying slope, and time-varying seasonal. Select these components on the Build your own model page. The cycles come at a later stage. After selecting the components, go to the Estimation page and change the end of the sample to 324 (27 years) which leaves 83 months as Validation sample. Click the Estimate button and after TSL is done estimating, go to the Text output page where we see:


—————————————————————————————— PARAMETER SUMMARY ———————————————————————————————

Variance of disturbances:

Variance type                      Value        q-ratio
Level variance                    0.0872         1.0000
Slope variance                    0.0000         0.0000
Seasonal variance                 0.0000         0.0000
Irregular variance            8.4884e-05     9.7383e-04

State vector at period 2008-12-01:

Component                          Value        Std.Err         t-stat           Prob
Level                            26.2030         0.0576       454.9773         0.0000
Slope                            -0.0032         0.0164        -0.1922         0.8477

—————————————————————————————————— MODEL FIT ———————————————————————————————————

Model: TSL006
variable: EN3.4

TSL006
Log likelihood                               -95.6977
Akaike Information Criterion (AIC)           221.3954
Bias corrected AIC (AICc)                    222.9539
Bayesian Information Criterion (BIC)         278.1066
in-sample MSE                                  0.1042
... RMSE                                       0.3229
... MAE                                        0.2526
... MAPE                                       0.9364
Sample size                                       324
Effective sample size                             311
* based on one-step-ahead forecast errors


The Variance of disturbances show something interesting. We have two of the four variances estimated at zero. A value of zero for a variance indicates that the corresponding component is deterministic. If this is the case, a standard regression type significance test can be carried out on the corresponding component in the state. If it is not significantly different from zero, it may be possible to simplify the model by eliminating that particular component. Since the Slope component has a variance of zero and the t-stat for the Slope component at time 2008−12−01 has a value of -0.1923 with corresponding p-value of 0.8477, we can safely remove the Slope component from the model.
De-select the Slope component on the Build your own model page and re-estimate the model. Go to the Text output page where we see:


—————————————————————————————— PARAMETER SUMMARY ———————————————————————————————

Variance of disturbances:

Variance type                      Value        q-ratio
Level variance                    0.0869         1.0000
Seasonal variance                 0.0000         0.0000
Irregular variance            8.5250e-05     9.8151e-04

Seasonal short properties:

Period                             Value        Std.Err         t-stat           Prob
1                                -0.5053         0.0568         -8.901         0.0000
2                                -0.3412         0.0567         -6.018     4.9541e-09
3                                 0.1496         0.0566          2.642         0.0087
4                                 0.7176         0.0566         12.680         0.0000
5                                 0.8035         0.0566         14.205         0.0000
6                                 0.6069         0.0565         10.732         0.0000
7                                 0.2026         0.0565          3.583     3.9341e-04
8                                -0.1848         0.0566         -3.267         0.0012
9                                -0.2843         0.0566         -5.023     8.5673e-07
10                               -0.3200         0.0566         -5.650     3.6164e-08
11                               -0.3806         0.0567         -6.712     9.0447e-11
12                               -0.4640         0.0568         -8.172     7.5495e-15

Value                                         Prob
Seasonal chi2 test                 302.4                                   2.7131e-58

State vector at period 2008-12-01:

Component                          Value        Std.Err         t-stat           Prob
Level                              26.20         0.0575          455.7              0

—————————————————————————————————— MODEL FIT ———————————————————————————————————

Model: TSL017
variable: EN3.4

TSL017
Log likelihood                               -92.5260
Akaike Information Criterion (AIC)           213.0520
Bias corrected AIC (AICc)                    214.4112
Bayesian Information Criterion (BIC)         265.9824
in-sample MSE                                  0.0990
... RMSE                                       0.3146
... MAE                                        0.2471
... MAPE                                       0.9165
Sample size                                       324
Effective sample size                             312
* based on one-step-ahead forecast errors


Since the variance of the seasonal component is zero as well, we have a deterministic seasonal which is a special case of the stochastic seasonal. Alternatively, the fixed seasonal can be incorporated within $X_t$ as it is usually done in regression models. If the seasonal component is deterministic, either because it is specified to be 'fixed' at the outset or its disturbance variance is estimated to be zero, a joint test of significance can be carried out on the s-1 seasonal effects. The test is essentially the same as a test for the joint significance of a set of explanatory variables in regression. Under the null hypothesis of no seasonality, the large sample distribution of the test statistic, denoted by Seasonal chi2 test in the output, is $\chi^2_{s−1}$ distributed. The Prob value is the probability of a $\chi^2_{s−1}$ variable exceeding the value of the test statistic. In the case of a stochastic seasonal, the joint seasonal test is also produced although a formal joint test of significance of the seasonal effect is inappropriate. However, the seasonal pattern is persistent throughout the series and when the seasonal pattern changes relatively slowly, which is usually the case, the test statistic can provide a useful guide to the relative importance of the seasonal. The formal definition of the test statistic is $a'Pa$ where $a$ contains the estimates of the s-1 seasonal effects at time $T$ and $P$ is the corresponding variance matrix. From the text output we see that we have a strongly significant seasonal pattern with a p-value of close to zero.
Finally, we have the Model fit which we will use to compare this model to other models. The result above are for the Training sample but we need Validation results as well to be able to thoroughly compare models with each other. Go to the Model comparison page and click the Start loss calculation button in the top right corner of the screen. We refer to Chapter 9 of the manual for more information on loss calculations. After the loss calculations are made, an entry appears under user defined models in the top left corner of the screen. Tick the box and also tick the Last observation and Average of last 10 observations boxes to add two benchmark models to the graph. The resulting graph should like like the one presented in the following figure:

#### Forecast losses for different lags

We can see that compared to the (simple) benchmark models, our model is performing better for 1 to 18-step-ahead forecast but after that our model is not better anymore. For forecast horizons of 19 and higher our model does not perform better than a very simple benchmark model. You should always compare your model to simple benchmark models and sometimes you come to the conclusion that you cannot do better. But of course we do not end this Case study here because we can improve further.

#### Model: level + seasonal + cycle1 + cycle2

We continue the modelling process by adding two cycles to our existing model on the Build your own model page. Next, go to the Estimation page. Under the header Edit and fix parameter values, all parameters that need can be estimated are summarized. For specific model specifications we can fix parameters at certain values. If User defined starting values is switched off (which is the default), an algorithm determines the starting values before optimization. Note that the current values in the column Value are not the algorithmic determined starting values. That process starts after the green Estimate button has been clicked. For the majority of the cases, it is best to let TSL determine the starting values. An exception to the above is the period length of the cycle which can be set as starting value by the user regardless if User defined starting values is switched on or off. Since we target a period of 18 months and a period of around 51 months (information obtained from the spectral density), we set the starting value of the cycle 1 period to 18 and the cycle 2 period to 51. Note that we always have cycle period 1 < cycle period 2 < cycle period 3. The resulting screen should look like the following figure:

#### Estimation page of TSL

Click the Estimate button and go to the Text output page. Among other output, we have:


—————————————————————————————— PARAMETER SUMMARY ———————————————————————————————

Cycle properties:

Parameter type                   Cycle 1        Cycle 2
Variance                          0.1708         0.7423
Period                           17.8175        48.1535
Frequency                         0.3526         0.1305
Damping factor                    0.9622         0.9721
Amplitude                         0.5931         0.9511

—————————————————————————— TRAINING SAMPLE MODEL FIT ———————————————————————————

Variable: EN3.4
Model: TSL008
TSL008
Log likelihood                               -53.8648
Akaike Information Criterion (AIC)           147.7296
Bias corrected AIC (AICc)                    150.5019
Bayesian Information Criterion (BIC)         223.3445
in-sample MSE                                  0.0775
... RMSE                                       0.2783
... MAE                                        0.2146
... MAPE                                       0.7985
Sample size                                       324
Effective sample size                             312
* based on one-step-ahead forecast errors


We see that the periods of the cycles are estimated around 18 months (1.5 years) and 48 months (4 years) which nicely corresponds with the information from the spectral density. The model is also improved on training sample fit since we have a higher log likelihood and lower in-sample losses.
Go to the Model comparison page and add the loss of the latest model to the graph. You should see a loss line that is below all other loss lines meaning the latest model performs better for all forecast horizons, see also the following figure:

#### Further exploration

• Verify that the ACF of the standardized residuals from the model with time-varying level, slope, and seasonal (no cycles) has a cyclical pattern indicating that there is signal left to explain.

# Bibliography

### References

Li, M., S. J. Koopman, R. Lit, and D. Petrova (2020). Long-term forecasting of el nino events via dynamic factor simulations. Journal of Econometrics 214(1), 46–66.

Bamston, A. G., M. Chelliah, and S. B. Goldenberg (1997). Documentation of a highly enso-related sst region in the equatorial pacific: Research note. Atmosphere-ocean 35(3), 367–383.

Petrova, D., S. J. Koopman, J. Ballester, and X. Rodo (2017). Improving the long-lead predictability of el nino using a novel forecasting scheme based on a dynamic components model. Climate Dynamics 48(3), 1249–1276.