Forecasting Time Series

February 7, 2017

Dow Jones Industrial Average

Log(Dow)

Take logs to adjust for exponential growth and level-dependent volatility.

Stylized Fact #1

Many datasets exhibit exponential growth and level-dependent volatility.

Regress Log(Dow) on Time

Call:
lm(formula = log_dow ~ time)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.08022 -0.40173 -0.02597  0.42103  1.31975 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.708e+00  5.667e-03   654.5   <2e-16 ***
time        2.079e-04  3.586e-07   579.6   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4687 on 27367 degrees of freedom
Multiple R-squared:  0.9247,    Adjusted R-squared:  0.9247 
F-statistic: 3.36e+05 on 1 and 27367 DF,  p-value: < 2.2e-16

Log(Dow) vs. Time, with Fit

Today's vs. Yesterday's Log(Dow)

Regress Today on Yesterday

Call:
lm(formula = log_dow ~ lag1_log_dow)

Residuals:
        Min          1Q      Median          3Q         Max 
-0.25654574 -0.00470490  0.00025780  0.00508635  0.14253642 

Coefficients:
                Estimate  Std. Error     t value Pr(>|t|)    
(Intercept)  1.57303e-04 2.63292e-04     0.59745  0.55021    
lag1_log_dow 1.00001e+00 3.88805e-05 25720.06238  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0109845 on 27366 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.999959,  Adjusted R-squared:  0.999959 
F-statistic: 6.61522e+08 on 1 and 27366 DF,  p-value: < 2.22e-16

Slope significantly different from 1?

                 Estimate   Std. Error      t value  Pr(>|t|)
(Intercept)  0.0001573034 0.0002632925 5.974476e-01 0.5502135
lag1_log_dow 1.0000089250 0.0000388805 2.572006e+04 0.0000000

T statistic:

[1] 0.2295497

95% Confidence Interval:

                2.5 %  97.5 %
lag1_log_dow 0.999933 1.00009

We probably shouldn't trust these results. (Why?)

Stylized Fact #2

Nearby observations are highly correlated with each other.

Log(Dow) vs. Time (again)

Histogram of Log(Dow)

Problem: the series is not ergodic; this histogram tells us nothing about the generative process.

Return: Today's - Yesterday's Log(Dow)

Predicting Returns

Regress Today's on Yesterday's Return

Call:
lm(formula = ret_dow ~ lag1_ret_dow)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.255316 -0.004710  0.000247  0.005095  0.141891 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.088e-04  6.638e-05   3.146  0.00166 ** 
lag1_ret_dow 2.575e-02  6.042e-03   4.262 2.03e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.01098 on 27365 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.0006634, Adjusted R-squared:  0.0006269 
F-statistic: 18.17 on 1 and 27365 DF,  p-value: 2.032e-05

Stylized Fact #2 (cont.)

Nearby observations are highly correlated with each other. Differences are mean-reverting.

Histogram of Returns

Normal Probability Plot of Returns

Kurtosis, AD Test of Returns

kurtosis(ret_dow, na.rm=TRUE) # excess kurtosis, 0 for Gaussian

[1] 22.68784

ad.test(ret_dow)

    Anderson-Darling normality test

data:  ret_dow
A = 519.86, p-value < 2.2e-16

Stylized Fact #3

Returns exhibit leptokurtosis (heavy tails).

Return vs. Time (again)

Persistent volatility?

Volatility (Squared Return)

Volatility (Absolute Return)

Today's vs. Yesterday's Abs. Return

Call:
lm(formula = abs_ret_dow ~ lag1_abs_ret_dow)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.031320 -0.004634 -0.001939  0.002399  0.238004 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      5.216e-03  6.378e-05   81.78   <2e-16 ***
lag1_abs_ret_dow 2.781e-01  5.805e-03   47.91   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.007946 on 27365 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.0774,    Adjusted R-squared:  0.07737 
F-statistic:  2296 on 1 and 27365 DF,  p-value: < 2.2e-16

Stylized Fact #4

Volatility is persistent.

Stylized Facts

Many datasets exhibit exponential growth and level-dependent volatility.
Nearby observations are correlated with each other.
Returns exhibit leptokurtosis.
Volatility is persistent.