Case Studies
If you're interested in time series analysis and forecasting, this is the right place to be. The Time Series Lab (TSL) software platform makes time series analysis available to anyone with a basic knowledge of statistics. Future versions will remove the need for a basic knowledge altogether by providing fully automated forecasting systems. The platform is designed and developed in a way such that results can be obtained quickly and verified easily. At the same time, many advanced time series and forecasting operations are available for the experts. In our case studies, we often present screenshots of the program so that you can easily replicate results.
Did you know you can make a screenshot of a TSL program window? Press Ctrl + p to open a window which allows you to save a screenshot of the program. The TSL window should be located on your main monitor.
Click on the buttons below to go to our case studies. At the beginning of each case study, the required TSL package is mentioned. Our first case study, about the Nile data, is meant to illustrate the basic workings of the program and we advise you to start with that one.
El Niño
Date: July 04, 2022
Software: Time Series Lab - Home Edition
Topics: climate modelling; seasonal and two cycles
Batch program: elnino.txt
El Niño
El Niño is a well-known phenomenon in climate science and is characterized by higher than average sea surface temperatures in the central and eastern equatorial Pacific Ocean. It has a substantial impact on the climate in many parts of the world. Hence, it has been given much coverage in the popular media, and it is the subject of extensive research in the scientific world. El Niño typically causes changes in weather patterns related to temperature, pressure and rainfall. Thus, a warm event may not only have a negative impact on local economies, but can also have negative consequences for public health, as in some regions these changes increase substantially the risk of water-borne and/or vector-borne diseases. Given its huge impact particularly on some developing countries bordering the Pacific Ocean, it is self-evident that a timely forecast of the next El Niño event is important. Much scientific research has been devoted to the development of forecasting methods for El Niño. The oscillation is characterized by an irregular period of between 2 and 7 years. Currently, forecasts are issued regularly for up to three seasons in advance, but the long term of more than one year ahead forecasts remain a real challenge. At the same time, one of the two main theories about the physics underlying El Niño implies that it may be a self-sustaining climatic fluctuation that is quasi-periodic, with several dominant peaks in its spectrum, the main one being at about every 4-5 years and a secondary at about 2 years. This suggests that it may be predictable at lead times of several years, see also Li et al. (2020). In this Case study we show that we can make accurate forecasts with TSL. We build up the model in several steps and show increases in training sample model fit and validation sample forecast performance in each step.
Loading and inspecting the data
Load and select the elnino.csv data set from the file system by pressing the Load database button or by clicking File ► Load data.
Our series of interest is the EN3.4 time series. This is a time series of monthly temperature
values which is referred to as the Niño3.4 time series and which is the area-averaged sea surface
temperature in the region (5 ° N - 5 ° S, 170 ° W - 120 ° W). In this area the El Niño events
are identified, see also the discussion in Bamston et al. (1997). The National Centers for Environmental Information (NOAA) defines an El Niño or La Niña event as a phenomenon
in the equatorial Pacific Ocean characterised by a five consecutive 3-month running mean of
sea surface temperature (SST) anomalies in the Niño3.4 region that is above (below) the
threshold of +0.5°C (-0.5°C).
In our empirical study, the Niño3.4 time series, denoted by $y_t$, is the variable of interest.
The variable is observed from January 1982 to the end of 2015 with 34 years of data and 407
monthly observations. For this period, observations for 24 predictor variables are available which consist of physical measures of zonal wind stress and sea temperatures at different
depths in the ocean and at different locations. Petrova et al. (2017) give a detailed account
of the selection of these variables. For graphs, acronyms and references to data sources for
all time series, we refer to Appendix B of Li et al. (2020).
Click the vertical arrow bar on the right of the screen to see additional information about
the selected time series.
The Statistical tests panel shows the result of the Augmented
Dickey-Fuller test and KPSS test. The ADF test (with intercept) strongly implies that the
null hypothesis of a unit root is rejected. Based on the KPSS test, we cannot reject the null
hypothesis of trend stationarity which suggests that the Niño3.4 time series is generated from
a stationary process around a fixed mean.
Periodicity in the time series
The spectral density is a very useful plot to get an idea about cyclical patterns in the time series. Click on the spectral density button in the bottom right corner of the graph and change the number of lags to 100 in the spinbox under other settings. The screen should like to the one below.
Spectral density of the EN3.4 time series
From the sample spectrum, we can identify four peaks which correspond to periods of approximately 6, 12, 18, and 51 months. The 6 and 12 month periods correspond to the monthly seasonality and the 18 and 51 month periods will be modelled with cycles.
Model: level + slope + seasonal
We build a model with a time-varying level, time-varying slope, and time-varying seasonal. Select these components on the Build your own model page. The cycles come at a later stage. After selecting the components, go to the Estimation page and change the end of the sample to 324 (27 years) which leaves 83 months as Validation sample. Click the Estimate button and after TSL is done estimating, go to the Text output page where we see:
—————————————————————————————— PARAMETER SUMMARY ———————————————————————————————
Variance of disturbances:
Variance type Value q-ratio
Level variance 0.0872 1.0000
Slope variance 0.0000 0.0000
Seasonal variance 0.0000 0.0000
Irregular variance 8.4884e-05 9.7383e-04
State vector at period 2008-12-01:
Component Value Std.Err t-stat Prob
Level 26.2030 0.0576 454.9773 0.0000
Slope -0.0032 0.0164 -0.1922 0.8477
—————————————————————————————————— MODEL FIT ———————————————————————————————————
Model: TSL006
variable: EN3.4
TSL006
Log likelihood -95.6977
Akaike Information Criterion (AIC) 221.3954
Bias corrected AIC (AICc) 222.9539
Bayesian Information Criterion (BIC) 278.1066
in-sample MSE 0.1042
... RMSE 0.3229
... MAE 0.2526
... MAPE 0.9364
Sample size 324
Effective sample size 311
* based on one-step-ahead forecast errors
The Variance of disturbances show something interesting. We have two of the four variances
estimated at zero. A value of zero for a variance indicates that the corresponding component
is deterministic. If this is the case, a standard regression type significance test can be carried
out on the corresponding component in the state. If it is not significantly different from zero,
it may be possible to simplify the model by eliminating that particular component. Since
the Slope component has a variance of zero and the t-stat for the Slope component at time 2008−12−01 has a value of -0.1923 with corresponding p-value of 0.8477, we can safely remove the Slope component from the model.
De-select the Slope component on the Build your own model page and re-estimate the
model. Go to the Text output page where we see:
—————————————————————————————— PARAMETER SUMMARY ———————————————————————————————
Variance of disturbances:
Variance type Value q-ratio
Level variance 0.0869 1.0000
Seasonal variance 0.0000 0.0000
Irregular variance 8.5250e-05 9.8151e-04
Seasonal short properties:
Period Value Std.Err t-stat Prob
1 -0.5053 0.0568 -8.901 0.0000
2 -0.3412 0.0567 -6.018 4.9541e-09
3 0.1496 0.0566 2.642 0.0087
4 0.7176 0.0566 12.680 0.0000
5 0.8035 0.0566 14.205 0.0000
6 0.6069 0.0565 10.732 0.0000
7 0.2026 0.0565 3.583 3.9341e-04
8 -0.1848 0.0566 -3.267 0.0012
9 -0.2843 0.0566 -5.023 8.5673e-07
10 -0.3200 0.0566 -5.650 3.6164e-08
11 -0.3806 0.0567 -6.712 9.0447e-11
12 -0.4640 0.0568 -8.172 7.5495e-15
Value Prob
Seasonal chi2 test 302.4 2.7131e-58
State vector at period 2008-12-01:
Component Value Std.Err t-stat Prob
Level 26.20 0.0575 455.7 0
—————————————————————————————————— MODEL FIT ———————————————————————————————————
Model: TSL017
variable: EN3.4
TSL017
Log likelihood -92.5260
Akaike Information Criterion (AIC) 213.0520
Bias corrected AIC (AICc) 214.4112
Bayesian Information Criterion (BIC) 265.9824
in-sample MSE 0.0990
... RMSE 0.3146
... MAE 0.2471
... MAPE 0.9165
Sample size 324
Effective sample size 312
* based on one-step-ahead forecast errors
Since the variance of the seasonal component is zero as well, we have a deterministic seasonal which is a special case of the stochastic seasonal. Alternatively, the fixed seasonal can be incorporated within $X_t$ as it is usually done in regression models. If the seasonal
component is deterministic, either because it is specified to be 'fixed' at the outset or its
disturbance variance is estimated to be zero, a joint test of significance can be carried out on
the s-1 seasonal effects. The test is essentially the same as a test for the joint significance
of a set of explanatory variables in regression. Under the null hypothesis of no seasonality, the
large sample distribution of the test statistic, denoted by Seasonal chi2 test in the output, is $\chi^2_{s−1}$ distributed. The Prob value is the probability of a $\chi^2_{s−1}$ variable exceeding the value of the test statistic. In the case of a stochastic seasonal, the joint seasonal test is also produced although a formal joint test of significance of the seasonal effect is inappropriate. However, the seasonal pattern is persistent throughout the series and when the seasonal pattern changes relatively slowly, which is usually the case, the test statistic can provide a useful guide to the relative importance of the seasonal. The formal definition of the test statistic is
\[
a'Pa
\]
where $a$ contains the estimates of the s-1 seasonal effects at time $T$ and $P$ is the corresponding variance matrix. From the text output we see that we have a strongly significant
seasonal pattern with a p-value of close to zero.
Finally, we have the Model fit which we will use to compare this model to other models.
The result above are for the Training sample but we need Validation results as well to be able
to thoroughly compare models with each other. Go to the Model comparison page and click
the Start loss calculation button in the top right corner of the screen. We refer to Chapter 9 of the manual for more information on loss calculations.
After the loss calculations are made, an entry appears under user defined models in the top left corner of the screen. Tick the box and also
tick the Last observation and Average of last 10 observations boxes to add two benchmark
models to the graph. The resulting graph should like like the one presented in the following figure:
Forecast losses for different lags
We can see that compared to the (simple) benchmark models, our model is performing better for 1 to 18-step-ahead forecast but after that our model is not better anymore. For forecast horizons of 19 and higher our model does not perform better than a very simple benchmark model. You should always compare your model to simple benchmark models and sometimes you come to the conclusion that you cannot do better. But of course we do not end this Case study here because we can improve further.
Model: level + seasonal + cycle1 + cycle2
We continue the modelling process by adding two cycles to our existing model on the Build your own model page. Next, go to the Estimation page. Under the header Edit and fix parameter values, all parameters that need can be estimated are summarized. For specific model specifications we can fix parameters at certain values. If User defined starting values is switched off (which is the default), an algorithm determines the starting values before optimization. Note that the current values in the column Value are not the algorithmic determined starting values. That process starts after the green Estimate button has been clicked. For the majority of the cases, it is best to let TSL determine the starting values. An exception to the above is the period length of the cycle which can be set as starting value by the user regardless if User defined starting values is switched on or off. Since we target a period of 18 months and a period of around 51 months (information obtained from the spectral density), we set the starting value of the cycle 1 period to 18 and the cycle 2 period to 51. Note that we always have cycle period 1 < cycle period 2 < cycle period 3. The resulting screen should look like the following figure:
Estimation page of TSL
Click the Estimate button and go to the Text output page. Among other output, we have:
—————————————————————————————— PARAMETER SUMMARY ———————————————————————————————
Cycle properties:
Parameter type Cycle 1 Cycle 2
Variance 0.1708 0.7423
Period 17.8175 48.1535
Frequency 0.3526 0.1305
Damping factor 0.9622 0.9721
Amplitude 0.5931 0.9511
—————————————————————————— TRAINING SAMPLE MODEL FIT ———————————————————————————
Variable: EN3.4
Model: TSL008
TSL008
Log likelihood -53.8648
Akaike Information Criterion (AIC) 147.7296
Bias corrected AIC (AICc) 150.5019
Bayesian Information Criterion (BIC) 223.3445
in-sample MSE 0.0775
... RMSE 0.2783
... MAE 0.2146
... MAPE 0.7985
Sample size 324
Effective sample size 312
* based on one-step-ahead forecast errors
We see that the periods of the cycles are estimated around 18 months (1.5 years) and 48 months (4 years) which nicely corresponds with the information from the spectral density. The model is also improved on training sample fit since we have a higher log likelihood and lower in-sample losses.
Go to the Model comparison page and add the loss of the latest model to the graph. You should see a loss line that is below all other loss lines meaning the latest model performs better for all forecast horizons, see also the following figure:
Forecast losses for different lags
Further exploration
- Verify that the ACF of the standardized residuals from the model with time-varying level, slope, and seasonal (no cycles) has a cyclical pattern indicating that there is signal left to explain.
Bibliography
References
Li, M., S. J. Koopman, R. Lit, and D. Petrova (2020). Long-term forecasting of el nino events via dynamic factor simulations. Journal of Econometrics 214(1), 46–66.
Bamston, A. G., M. Chelliah, and S. B. Goldenberg (1997). Documentation of a highly enso-related sst region in the equatorial pacific: Research note. Atmosphere-ocean 35(3), 367–383.
Petrova, D., S. J. Koopman, J. Ballester, and X. Rodo (2017). Improving the long-lead predictability of el nino using a novel forecasting scheme based on a dynamic components model. Climate Dynamics 48(3), 1249–1276.