February 7, 2017
Take logs to adjust for exponential growth and level-dependent volatility.
Many datasets exhibit exponential growth and level-dependent volatility.
Call:
lm(formula = log_dow ~ time)
Residuals:
Min 1Q Median 3Q Max
-1.08022 -0.40173 -0.02597 0.42103 1.31975
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.708e+00 5.667e-03 654.5 <2e-16 ***
time 2.079e-04 3.586e-07 579.6 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4687 on 27367 degrees of freedom
Multiple R-squared: 0.9247, Adjusted R-squared: 0.9247
F-statistic: 3.36e+05 on 1 and 27367 DF, p-value: < 2.2e-16
Call:
lm(formula = log_dow ~ lag1_log_dow)
Residuals:
Min 1Q Median 3Q Max
-0.25654574 -0.00470490 0.00025780 0.00508635 0.14253642
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.57303e-04 2.63292e-04 0.59745 0.55021
lag1_log_dow 1.00001e+00 3.88805e-05 25720.06238 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0109845 on 27366 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.999959, Adjusted R-squared: 0.999959
F-statistic: 6.61522e+08 on 1 and 27366 DF, p-value: < 2.22e-16
Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0001573034 0.0002632925 5.974476e-01 0.5502135 lag1_log_dow 1.0000089250 0.0000388805 2.572006e+04 0.0000000
T statistic:
[1] 0.2295497
95% Confidence Interval:
2.5 % 97.5 % lag1_log_dow 0.999933 1.00009
We probably shouldn't trust these results. (Why?)
Nearby observations are highly correlated with each other.
Problem: the series is not ergodic; this histogram tells us nothing about the generative process.
Call:
lm(formula = ret_dow ~ lag1_ret_dow)
Residuals:
Min 1Q Median 3Q Max
-0.255316 -0.004710 0.000247 0.005095 0.141891
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.088e-04 6.638e-05 3.146 0.00166 **
lag1_ret_dow 2.575e-02 6.042e-03 4.262 2.03e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01098 on 27365 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.0006634, Adjusted R-squared: 0.0006269
F-statistic: 18.17 on 1 and 27365 DF, p-value: 2.032e-05
Nearby observations are highly correlated with each other. Differences are mean-reverting.
kurtosis(ret_dow, na.rm=TRUE) # excess kurtosis, 0 for Gaussian
[1] 22.68784
ad.test(ret_dow)
Anderson-Darling normality test
data: ret_dow
A = 519.86, p-value < 2.2e-16
Returns exhibit leptokurtosis (heavy tails).
Persistent volatility?
Call:
lm(formula = abs_ret_dow ~ lag1_abs_ret_dow)
Residuals:
Min 1Q Median 3Q Max
-0.031320 -0.004634 -0.001939 0.002399 0.238004
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.216e-03 6.378e-05 81.78 <2e-16 ***
lag1_abs_ret_dow 2.781e-01 5.805e-03 47.91 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.007946 on 27365 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.0774, Adjusted R-squared: 0.07737
F-statistic: 2296 on 1 and 27365 DF, p-value: < 2.2e-16
Volatility is persistent.
Many datasets exhibit exponential growth and level-dependent volatility.
Nearby observations are correlated with each other.
Returns exhibit leptokurtosis.
Volatility is persistent.