Time Series Differencing: A Complete Guide

Navigate to:

What is time series difference analysis?

Time difference analysis is a method of analyzing data points at regular time intervals over a set period. However, in time series analysis, we derive crucial information such as the variance of the variables among data points over a period of time. This gives additional information on how the data adapts over time. This can be used to analyze data during different trends at different time intervals. It can also help verify if the time series is following a stationary pattern or has a non-stationary pattern, and how to analyze either.

Time series analysis is generally used by commercial, scientific, and other types of organizations for better predictive analysis. Time series analysis seeks to find the change in data patterns over different time periods. This provides a better picture of how the data trends move, allowing a user to develop a model based on these observations that captures most of the trends of the data.

Stationary and non-stationary data trends are different categorizations of data based on data trends over a period of time. A stationary data trend is a type of data pattern that reoccurs without seasonal changes or fluctuations and has a constant variance over time. A non-stationary data trend is a type of data trend that does not have constant variance over time but has seasonal fluctuations over different time periods. In this article, we’ll mostly be focusing on non-stationary data trends with a brief summary of stationary data trends.

Stationary time series

A stationary time series is one where the variance in data is constant over different periods of time. There is no seasonal fluctuation or change in the pattern of data over time. The data is autocorrelational, and the difference between data points follows a static pattern. This can usually be clearly seen in a plot of the data points. This type of data provides no useful information on data trends. A stationary time series just follows a horizontal direction with constant variance.

Testing a time series: the Dicky-Fuller test

The Dicky-Fuller test in statistics is used to test if a given data model is stationary or non-stationary. We set the null hypothesis (Ho), indicating that the data is stationary. This also implies the unit root (δ) is zero. The unit root in the Dicky-Fuller test confirms the null hypothesis in case δ = 0, and anything otherwise confirms the alternate hypothesis. Below, we have three tests for the unit root under different set of conditions. The three versions of the test are:

  1. Test for unit root ΔY = δY-1 + At where δY-i is the difference between the two data points of the previous term. At is the error variable term, which can vary such as in the random walk non-stationary model below.
  2. Test for unit root with constant ΔY = δY-i + At + Co, where Co is the constant.
  3. Test for unit root with constant and deterministic time trend (t): ΔY = δY-i + At + Co + tCi

However, the Dicky-Fuller test can be prone to the near-zero observance problem. That is why we’ll be using the Augmented Dicky-Fuller test (ADF), which removes the autocorrelation from the series. The Augmented Dicky-Fuller test follows similar versions of the test as the general Dicky-Fuller test. Furthermore, the ADF has the same null hypothesis as the general Dicky-Fuller test.

The null hypothesis is not rejected if the number of unit roots (δ) > 0 and the p-value > 0.05. However, it’s rejected for cases where the p-value < significance level of 0.05. The final formula for the ADF is given as D.F= δ/S.E(δ), where D.F is the Dicky-Fuller Value and S.E is Standard Error, where we remove the autocorrelation from the series.

Non-stationary time series

A non-stationary time series is when there is a pattern or trend in the variance of data over a period of time. Unlike a stationary time series, where the variance is constant over a period of time, the non-stationary time series shows a change or fluctuation in the time series. This provides information on the data trend over a period of time.

When is a time series non-stationary? ADF and rejecting the alternate hypothesis

In the Augmented Dicky-Fuller test, we can test if a time series is non-stationary if the null hypothesis is not rejected.

A time series is considered to be non-stationary if the number of unit roots (δ) is greater than 0 and the p-value is greater than the significant level of 0.05 . If the p-value < 0.05, then the null hypothesis (Ho) is rejected and the alternative hypothesis (Ha) is considered to be true.

We can use the statsmodel package in Python, the tseries library in R, or HypothesisTests in Julia, but we’ll only be covering code for Python in this post.

To apply the Augmented Dicky-Fuller Test in Python:

First, install the required packages:


pip3 install statsmodels matplotlib

Then start the code by creating a list with random values:


import random

data = [random.random() for x in range(10)]

Now, use matplotlib to plot the datapoints:


import random

import matplotlib.pyplot as plt

data = [random.random() for x in range(10)]

#We plot the data first and save the image of the graph

plt.plot(data)

While the plots can be different, I got a plot that looks something like this: Now, test the null hypothesis using the Augmented Dicky-Fuller function Adfuller from statsmodels. The final code looks like:


import random

import matplotlib.pyplot as plt

from statsmodels.tsa.stattools import adfuller

data = [random.random() for x in range(10)]

#We plot the data first and save the image of the graph

plt.plot(data)

#We finally test using the Augmented Dicky fuller test

adf_result = adfuller(data)

print(f"Test Static: {adf_result[0]}\n p-value: {adf_result[1]}")

The final p-value when the code above was executed was:


p-value: 0.200584620

This is greater than the significance level of 0.05. Therefore, the null hypothesis is not rejected, and we can conclude that the given data model is non-stationary.

Below, we’ll discuss first-order and second-order time series differencing. In first-order time series differencing, we’ll discuss and implement the logarithmic transformation to convert a non-stationary time series to stationary.

First-order time series differencing

As the title suggests, first-order time series is the transformation of the non-stationary time series to stationary in order to analyze the data further and help understand the variance in the data. Applying a transformation like the logarithmic transformation to the data model can get a stationary representation of the data.

We’ll be implementing a logarithmic transformation based on the above code by using the NumPy log function to apply log transformation to the data. This can be achieved by adding the code below to our earlier code:


import numpy as np

log_data = np.log(data)

We also added a line of code to save our graphs as a PNG image:


plt.savefig('Graph_cmpr.png')

The final code will be something like this:


import random

import matplotlib.pyplot as plt

from statsmodels.tsa.stattools import adfuller

import numpy as np

data = [random.randrange(5, 20) for x in range(10)]

#We plot the data first and save the image of the graph

plt.plot(data)

#We finally test using the Augmented Dicky fuller test

adf_result = adfuller(data)

if adf_result[1] > 0.05:  # We check if p-value is greater

	# than 0.05

	pass  # continue the code below

else:

	print(

	    "The p-value is lesser than the significant value. Hence the data is already stationary and cannot be further processed"

	)

	quit()

# We apply the log transformation to the data

log_data = np.log(data)

#Now we plot the log transformed data

plt.plot(log_data)

#Now finally we save the plot of two graphs for comparison

plt.savefig('Graph_cmpr.png')

The final graph that was generated by executing the code in the above case was: The blue graph line is for the actual value of the data points, while the orange graph line is the logarithmic transformation of the data points. Hence, we can see that by differencing the data, we can derive a stationary data trend from a non-stationary one.

Second-order time series differencing

In the case where the first-order differencing of the time series is not stationary yet, we can apply a second-order differencing to make the time series stationary for further analysis. Usually, first-order time series differencing is sufficient for most cases, but in case the data is really varying at uneven intervals, second-order differencing can be used.

Final thoughts

This post gives an introduction to time series analysis and the difference between stationary and non-stationary time series. It also introduced testing for non-stationary time series using the Dicky-Fuller test. Then it covered the difference between first-order and second-order differencing. Finally, it showed how they can be used to produce a stationary representation of the data.