# Case Studies

If you're interested in time series analysis and forecasting, this is the right place to be. The Time Series Lab (TSL) software platform makes time series analysis available to anyone with a basic knowledge of statistics. Future versions will remove the need for a basic knowledge altogether by providing fully automated forecasting systems. The platform is designed and developed in a way such that results can be obtained quickly and verified easily. At the same time, many advanced time series and forecasting operations are available for the experts. In our case studies, we often present screenshots of the program so that you can easily replicate results.

Did you know you can make a screenshot of a TSL program window? Press Ctrl + p to open a window which allows you to save a screenshot of the program. The TSL window should be located on your main monitor.

Click on the buttons below to go to our case studies. At the beginning of each case study, the required TSL package is mentioned. Our first case study, about the Nile data, is meant to illustrate the basic workings of the program and we advise you to start with that one.

# Nile

Author: Rutger Lit
Date: June 30, 2022
Software: Time Series Lab - Home Edition
Topics: basic workings of program

#### Nile data

In this first case study we illustrate the fundamentals of TSL using observations from the river Nile. The data set consists of a series of readings of the annual flow volume at Aswan from 1871 to 1970. The Nile dataset is part of any TSL installer file and can be found in the data folder located in the install folder of TSL. Many time series concepts can be explained by the Nile time series alone.

Let's start the modelling process. First go to the Database page of TSL by clicking the Database button. On this page we load, visually inspect, and prepare our data for the modelling process. The data set is loaded and selected from the file system by pressing the Load data button or by selecting Load data from the File menu. Locate the file Nile.csv in the data folder of the TSL install folder.

Important: The data set should be in column format with headers. The format of the data should be *.xls(x), or *.csv, *.txt with commas as field separation. The program (purposely) does not sort the data which means that the data should be in the correct time series order before loading it into the program.

After loading, click on the name Nile in the database field. If we click the arrow bar at the right side of the screen, a new area unfolds which shows us Data characteristics of the selected time series. It shows that the Nile time series has a length of $T = 100$ observations with $0$ missing values, among other characteristics. The TSL window should look like the Figure below.

#### Data inspection and preparation page

The highlighted variable Nile also appears in the Select dependent variable drop-down menu. This is the so-called y-variable of the time series equation and it is the time series variable of interest, i.e. the time series variable you want to model, analyse, and forecast. Optionally, a time series axis can be specified. The program's algorithm tries to auto-detect the time axis specification (e.g. annual data, daily data) from the first column of the data set. In the case of the Nile data illustration, it finds an annual time axis specification. If auto-detection fails, the program selects the Index axis option which is just a number for each observation, $1,2,3,...$

#### Pre-built models

Click on the Pre-built models button in the button bar at the left of your screen. Switch on the Local Level model. Make sure this is the only selected model, see also the Model selection summary in the blue pane in the bottom of the screen. Select an 100%/0% ratio for Training and Validation sample. The settings are shown in the Figure below. Click the Process Dashboard button which is the green arrow located at the bottom right of your screen. After pressing this button, two things happen:

• TSL estimates the selected models and prints results to the Text output page. The results are: progress results from the optimizer and model fit of the selected models.
• Once processing of the selected models is complete, TSL plots the information it found and shows the Graphics page.

#### Graphical output

After processing the selected models, TSL automatically takes you to the Graphics page. Components, or combinations of components, can be easily plotted and removed from the plot by checking or unchecking the tickboxes in the top left corner of the page. You can add subplots as well to create a grid of plots.

Click on a (sub)plot to activate it. Notice that by clicking on a subplot, the check-boxes in the top left of the window correspond to the current selection of lines in the subplot. If not all checkbox settings correspond with the lines in the subplot, switch tabs to show the rest of the selection.

To see what is meant by the text above: switch from Smoothing to Filtering. You now see that the Level checkbox is unchecked because the level that is currently plotted corresponds to the Smoothed level and not the Filtered level. The reason the Smoothed level in the plot does not automatically switch to a Filtered level on changing is that we sometimes want to compare Smoothed, Filtered, and Predicted components in one plot. If you click on level, the resulting graph should look like the following figure.

#### Time Series Lab Graph page

For State Space models, confidence intervals can be included in the plot as well. A major benefit of State Space models is that the error bounds can easily be obtained. More on State Space models in Appendix B of the manual. You can also choose between Predicting, Filtering, and Smoothing by changing the type in the top left corner. The difference between Smoothed, Filtered, and Predicted components has to do with the subset of the data used to determine the statistical properties of the components, i.e. the data up to time $t−1$ (forecasting), the data up to time $t$ (filtering) or the whole data set (smoothing), see also Appendix B of the manual.

#### Missing data

We continue this Case study with a version of the Nile data with missing values to illustrate one of the many advantages of using TSL, namely the capability of easily handling missing values. Missing values in time series can occur due a variety of reasons and for some time series algorithms it is problematic.

Missing data can cause problems for some time series algorithms. These algorithms often revert to deleting the missing values or the missing values are filled with certain values. In TSL there is no need to rely on such drastic measures. Missing values are part of time series analysis and they should be handled in a correct manner.

Go back to the Database page of TSL and select the Nile_missing time series by clicking on the name. We see that the Data characteristics are updated by selecting the new time series. It shows us 40 missing values, among other characteristics. The TSL window should now look like the following figure.

#### Data inspection and preparation page

We will estimate and compare two models with each other. Click on the Pre-built models button in the button bar at the left of your screen and switch on, the models Exponential Smoothing and Local Level. Make sure these are the only selected models, see also the Model selection summary in the blue pane in the bottom of the screen. Select an 100%/0% ratio for Training and Validation sample and click the Process Dashboard button which is the green arrow located at the bottom right of your screen.

#### Comparing results

Go to the Graphics and diagnostics page and click the Clear all button (eraser icon, bottom right) to start with a clean graph window. From the Individual tab select Y data to plot the Nile_missing time series. From the drop-down menu select the Exp Smoothing model and plot the Total signal from the Composite tab. Next, from the drop-down menu select the Local Level model and make sure the Type in the top left corner says Predicting, followed by plotting the Total signal from the Composite tab. The resulting graph should look the figure below. We see that the Local Level model reacts stronger to changes in the time series after missing values periods.

#### Time Series Lab graph page

We can also see the difference in model fit expressed in numbers. Go to the Text output page where at the end of the estimation, model fit of the selected models is summarized. Looking at in-sample MSE we see that the loss of the Local Level model is lower.

Variable: Nile_missing
Model(s):
TSL005 Exp Smoothing
TSL006 Local Level

TSL005         TSL006
Log likelihood                                      -        -380.01
Akaike Information Criterion (AIC)                  -         766.02
Bias corrected AIC (AICc)                           -         766.44
Bayesian Information Criterion (BIC)                -         772.30
in-sample MSE                                23735.33       23069.80
... RMSE                                       154.06         151.89
... MAE                                        119.86         118.60
... MAPE                                        14.06          13.70
Sample size                                       100            100
Effective sample size                              99             99
* based on one-step-ahead forecast errors

If you want to compare models and conclude something like "model A is better than model B", it is important to note that only looking at in-sample (Training sample) model fit can be misleading. It is often a good idea to take forecast performance into account as well. If model A performs better on both model fit and forecast performance, it is a good indication of model A being preferred over model B. We see examples of comparing forecast performance in other Case studies. The forecasts of both our models can be visually inspected on the Forecasting page. The figure below plots the forecasts of both model in one graph. Since no new data is coming in, the forecasts are just straight lines but the level (height) of the lines differ per model. Note that the local level model is not just a theoretical model, it has practical value as well. For example for inflation modelling, the local level model is a strong contender. We will see more complex forecasting patterns in other case studies.

#### Outliers and Structural breaks

Intervention analysis, also called anomaly detection, is an important part of time series analysis. We distinguish two types of anomalies, Outliers and Structural breaks. For example, early warning systems rely on outlier and break detection. Could a catastrophic event have been seen in advance? Take for example sensor readings from an important piece of heavy machinery. The breaking down of this machine would cost a company a lot of money. If anomalies were detected in the sensor reading, preventive maintenance might have saved the company from a break-down of the machine. Intervention variables are dummy (or indicator) variables which are used to take account of outlying observations and structural breaks. These data irregularities are usually thought of as arising from a specific event, for example a strike in the case of an outlier or a change in policy in the case of a structural break. An outlier can be thought of as an unusually large value of the irregular disturbance at a particular time. It can be captured by an impulse intervention variable which takes the value one at the time of the outlier and zero elsewhere. A structural break in which the level of the series shifts up or down is modelled by a step intervention variable which is zero before the event and one after. Alternatively it can be modelled in exactly the same way by adding an outlying intervention to the level equation. In other words the break is identified with an unusually large value of the level disturbance. TSL is able to propose a set of potential outliers and structural breaks for time series. It is an effective multi-step procedure based on the auxiliary residuals, see also Harvey and Koopman (1992) for details. First the selected model is estimated and the diagnostics are investigated. Then a first (larger) set of potential outliers and trend breaks are selected from the auxiliary residuals. After re-estimation of the model, only those interventions survive that are sufficiently significant. After the automatic selection, the results are reported. All considered outliers and breaks are kept in the intervention dialog and they can be deleted from the model or added to the model.
The Nile time series has some interesting features with regard to Intervention analysis. To see this, go back to the Database page and select the Nile time series again without missing values. Next, go to the Build your own model page and select a time-varying level and time-varying slope. These two model components correspond to a model with the name Local Linear Trend model. On top of that, select Intervention variables with the automatic setting. Next, go to the Estimation page, make sure the sample starts at $t = 1$ and ends at $t = 100$ and click the green Estimate button. Once TSL is done estimating, you should see the graph as presented in the figure below. We see from the figure that TSL finds a structural break and an outlier. We can also inspect these in more detail by looking at the Text output page where we see

Beta                        Value       Std.Err         t-stat          Prob
beta_outlier_1913-01-01    -389.4        123.92         -3.143        0.0022
beta_break_1899-01-01      -265.5         43.67         -6.079    2.4458e-08

TSL finds the location of the structural break at 1899 which is very plausible since the year 1899 corresponds to the building of a dam at Aswan. Interestingly, the addition of the outlier and structural break remove certain dynamics from the data which we can see from the straight lines in the graph which are the result of the (close to) zero variances from the Level and Slope component.

#### Further exploration

• On the graph page, plot the Autocorrelation Function (ACF) of the Predicted standardized residuals for the Local Level model. Are all plotted lags within the confidence bounds?
• Performing diagnostic tests can be done via the Print diagnostics button located on the Graph page. Can you print the Residual diagnostics for the Exponential Smoothing model? Are all Probabilities for the Normality test above 0.05?
• Outliers and structural breaks can be added (and removed) manually to (from) the model by selecting the Manual option of the Intervention variables. Estimate a Local Level model with only the structural break.
• On the Estimation page, specify the end of the estimation sample at 90 instead of 100. You now created a test sample which you can use to analyse out-of-sample forecast accuracy. Estimate the model with the new sample (1 - 90). You should see a new button (Model comparison) appear on the bottom left (button bar left) of the screen which allows you to do a forecast comparison with other models.

# Bibliography

### References

Durbin, J. and Koopman, S. J. (2012). Time series analysis by state space methods. Oxford university press.

Harvey, A. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. doi:10.1017/CBO9781107049994

A.C. Harvey, Koopman, S.J. (1992). Diagnostic checking of unobserved-components time series models. Journal of Business & Economic Statistics 10(4), 377–389.