Exponential smoothing or exponential moving average (EMA) is a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.
Exponential smoothing is one of many window functions commonly applied to smooth data in signal processing, acting as low-pass filters to remove high-frequency noise. This method is preceded by Poisson's use of recursive exponential window functions in convolutions from the 19th century, as well as Kolmogorov and Zurbenko's use of recursive moving averages from their studies of turbulence in the 1940s.
The raw data sequence is often represented by beginning at time , and the output of the exponential smoothing algorithm is commonly written as , which may be regarded as a best estimate of what the next value of will be. When the sequence of observations begins at time , the simplest form of exponential smoothing is given by the formulas:
\begin{align} s0&=x0\\ st&=\alphaxt+(1-\alpha)st-1, t>0 \end{align}
where is the smoothing factor, and . If is substituted into continuously so that the formula of is fully expressed in terms of , then exponentially decaying weighting factors on each raw data is revealed, showing how exponential smoothing is named.
The simple exponential smoothing is not able to predict what would be observed at based on the raw data up to , while the double exponential smoothing and triple exponential smoothing can be used for the prediction due to the presence of
bt
The use of the exponential window function is first attributed to Poisson[1] as an extension of a numerical analysis technique from the 17th century, and later adopted by the signal processing community in the 1940s. Here, exponential smoothing is the application of the exponential, or Poisson, window function. Exponential smoothing was first suggested in the statistical literature without citation to previous work by Robert Goodell Brown in 1956,[2] and then expanded by Charles C. Holt in 1957.[3] The formulation below, which is the one commonly used, is attributed to Brown and is known as "Brown’s simple exponential smoothing".[4] All the methods of Holt, Winters and Brown may be seen as a simple application of recursive filtering, first found in the 1940s[1] to convert finite impulse response (FIR) filters to infinite impulse response filters.
The simplest form of exponential smoothing is given by the formula:
st=\alphaxt+(1-\alpha)st-1=st-1+\alpha(xt-st-1).
where
\alpha
0\le\alpha\le1
st
xt
st-1
\alpha
\alpha
\alpha
\alpha
\alpha
\alpha
There is no formally correct procedure for choosing
\alpha
\alpha
\alpha
(st-xt+1)2
Unlike some other smoothing methods, such as the simple moving average, this technique does not require any minimum number of observations to be made before it begins to produce results. In practice, however, a "good average" will not be achieved until several samples have been averaged together; for example, a constant signal will take approximately
3/\alpha
This simple form of exponential smoothing is also known as an exponentially weighted moving average (EWMA). Technically it can also be classified as an autoregressive integrated moving average (ARIMA) (0,1,1) model with no constant term.[6]
The time constant of an exponential moving average is the amount of time for the smoothed response of a unit step function to reach
1-1/e ≈ 63.2\%
\tau
\alpha
\alpha=1-e-\Delta
\tau=-
\DeltaT | |
ln(1-\alpha) |
where
\DeltaT
\DeltaT\ll\tau
\alpha ≈
\DeltaT | |
\tau |
Note that in the definition above,
s0
x0
st-1
\alpha
s0
For every exponential smoothing method, we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter (α), but for the methods that follow there are usually more than one smoothing parameter.
There are cases where the smoothing parameters may be chosen in a subjective manner – the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more robust and objective way to obtain values of the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.
The unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the sum of squared errors (SSE). The errors are specified as for (the one-step-ahead within-sample forecast errors) where and are a variable to be predicted at
t
t
SSE=
T | |
\sum | |
t=1 |
(yt-\hat{y}t\mid
T | |
) | |
t=1 |
2 | |
e | |
t |
Unlike the regression case (where we have formulae to directly compute the regression coefficients which minimize the SSE) this involves a non-linear minimization problem, and we need to use an optimization tool to perform this.
The name 'exponential smoothing' is attributed to the use of the exponential window function during convolution. It is no longer attributed to Holt, Winters & Brown.
By direct substitution of the defining equation for simple exponential smoothing back into itself we find that
\begin{align} st&=\alphaxt+(1-\alpha)st-1\\[3pt] &=\alphaxt+\alpha(1-\alpha)xt-1+(1-\alpha)2st-2\\[3pt] &=\alpha\left[xt+(1-\alpha)xt-1+(1-\alpha)2xt-2+(1-\alpha)3xt-3+ … +(1-\alpha)t-1x1\right] +(1-\alpha)tx0. \end{align}
In other words, as time passes the smoothed statistic
st
st-1,\ldots,st-n,\ldots
1,(1-\alpha),(1-\alpha)2,\ldots,(1-\alpha)n,\ldots
A geometric progression is the discrete version of an exponential function, so this is where the name for this smoothing method originated according to Statistics lore.
Exponential smoothing and moving average have similar defects of introducing a lag relative to the input data. While this can be corrected by shifting the result by half the window length for a symmetrical kernel, such as a moving average or gaussian, it is unclear how appropriate this would be for exponential smoothing. They (moving average with symmetrical kernels) also both have roughly the same distribution of forecast error when α = 2/(k + 1) where k is the number of past data points in consideration of moving average. They differ in that exponential smoothing takes into account all past data, whereas moving average only takes into account k past data points. Computationally speaking, they also differ in that moving average requires that the past k data points, or the data point at lag k + 1 plus the most recent forecast value, to be kept, whereas exponential smoothing only needs the most recent forecast value to be kept.[10]
In the signal processing literature, the use of non-causal (symmetric) filters is commonplace, and the exponential window function is broadly used in this fashion, but a different terminology is used: exponential smoothing is equivalent to a first-order infinite-impulse response (IIR) filter and moving average is equivalent to a finite impulse response filter with equal weighting factors.
Simple exponential smoothing does not do well when there is a trend in the data.[11] In such situations, several methods were devised under the name "double exponential smoothing" or "second-order exponential smoothing," which is the recursive application of an exponential filter twice, thus being termed "double exponential smoothing". This nomenclature is similar to quadruple exponential smoothing, which also references its recursion depth.[12] The basic idea behind double exponential smoothing is to introduce a term to take into account the possibility of a series exhibiting some form of trend. This slope component is itself updated via exponential smoothing.
One method, works as follows:[13]
Again, the raw data sequence of observations is represented by
xt
t=0
st
t
bt
t
Ft+m
xt+m
m>0
t
\begin{align} s0&=x0\\ b0&=x1-x0\\ \end{align}
And for
t>0
\begin{align} st&=\alphaxt+(1-\alpha)(st-1+bt-1)\\ bt&=\beta(st-st-1)+(1-\beta)bt-1\\ \end{align}
\alpha
0\le\alpha\le1
\beta
0\le\beta\le1
To forecast beyond
xt
Ft+m=st+m ⋅ bt
Setting the initial value
b
n
Note that F0 is undefined (there is no estimation for time 0), and according to the definition F1=s0+b0, which is well defined, thus further values can be evaluated.
A second method, referred to as either Brown's linear exponential smoothing (LES) or Brown's double exponential smoothing works as follows.[14]
\begin{align} s'0&=x0\\ s''0&=x0\\ s't&=\alphaxt+(1-\alpha)s't-1\\ s''t&=\alphas't+(1-\alpha)s''t-1\\ Ft+m&=at+mbt, \end{align}
where at, the estimated level at time t and bt, the estimated trend at time t are:
\begin{align} at&=2s't-s''t\\[5pt] bt&=
\alpha | |
1-\alpha |
(s't-s''t). \end{align}
Triple exponential smoothing applies exponential smoothing three times, which is commonly used when there are three high frequency signals to be removed from a time series under study. There are different types of seasonality: 'multiplicative' and 'additive' in nature, much like addition and multiplication are basic operations in mathematics.
If every month of December we sell 10,000 more apartments than we do in November the seasonality is additive in nature. However, if we sell 10% more apartments in the summer months than we do in the winter months the seasonality is multiplicative in nature. Multiplicative seasonality can be represented as a constant factor, not an absolute amount.[15]
Triple exponential smoothing was first suggested by Holt's student, Peter Winters, in 1960 after reading a signal processing book from the 1940s on exponential smoothing.[16] Holt's novel idea was to repeat filtering an odd number of times greater than 1 and less than 5, which was popular with scholars of previous eras.[16] While recursive filtering had been used previously, it was applied twice and four times to coincide with the Hadamard conjecture, while triple application required more than double the operations of singular convolution. The use of a triple application is considered a rule of thumb technique, rather than one based on theoretical foundations and has often been over-emphasized by practitioners. Suppose we have a sequence of observations
xt,
t=0
L
The method calculates a trend line for the data as well as seasonal indices that weight the values in the trend line based on where that time point falls in the cycle of length
L
Let
st
t
bt
ct
ct
t
L
2L
The output of the algorithm is again written as
Ft+m
xt+m
t+m>0
t
\begin{align} s0&=x0\\[5pt] st&=\alpha
xt | |
ct-L |
+(1-\alpha)(st-1+bt-1)\\[5pt] bt&=\beta(st-st-1)+(1-\beta)bt-1\\[5pt] ct&=\gamma
xt | |
st |
+(1-\gamma)ct-L\\[5pt] Ft+m&=(st+mbt)ct-L+1+(m-1)\bmod, \end{align}
where
\alpha
0\le\alpha\le1
\beta
0\le\beta\le1
\gamma
0\le\gamma\le1
The general formula for the initial trend estimate
b
\begin{align} b0&=
1 | \left( | |
L |
xL+1-x1 | |
L |
+
xL+2-x2 | |
L |
+ … +
xL+L-xL | |
L |
\right) \end{align}
Setting the initial estimates for the seasonal indices
ci
i=1,2,\ldots,L
N
ci=
1 | |
N |
N | |
\sum | |
j=1 |
xL(j-1)+i | |
Aj |
fori=1,2,\ldots,L
Aj=
| ||||||||||
L |
forj=1,2,\ldots,N
Aj
x
jth
Triple exponential smoothing with additive seasonality is given by:
\begin{align} s0&=x0\\ st&=\alpha(xt-ct-L)+(1-\alpha)(st-1+bt-1)\\ bt&=\beta(st-st-1)+(1-\beta)bt-1\\ ct&=\gamma(xt-st-1-bt-1)+(1-\gamma)ct-L\\ Ft+m&=st+mbt+ct-L+1+(m-1), \end{align}