February 7, 2017
Take logs to adjust for exponential growth and level-dependent volatility.
Many datasets exhibit exponential growth and level-dependent volatility.
Call: lm(formula = log_dow ~ time) Residuals: Min 1Q Median 3Q Max -1.08022 -0.40173 -0.02597 0.42103 1.31975 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.708e+00 5.667e-03 654.5 <2e-16 *** time 2.079e-04 3.586e-07 579.6 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4687 on 27367 degrees of freedom Multiple R-squared: 0.9247, Adjusted R-squared: 0.9247 F-statistic: 3.36e+05 on 1 and 27367 DF, p-value: < 2.2e-16
Call: lm(formula = log_dow ~ lag1_log_dow) Residuals: Min 1Q Median 3Q Max -0.25654574 -0.00470490 0.00025780 0.00508635 0.14253642 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.57303e-04 2.63292e-04 0.59745 0.55021 lag1_log_dow 1.00001e+00 3.88805e-05 25720.06238 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.0109845 on 27366 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.999959, Adjusted R-squared: 0.999959 F-statistic: 6.61522e+08 on 1 and 27366 DF, p-value: < 2.22e-16
Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0001573034 0.0002632925 5.974476e-01 0.5502135 lag1_log_dow 1.0000089250 0.0000388805 2.572006e+04 0.0000000
T statistic:
[1] 0.2295497
95% Confidence Interval:
2.5 % 97.5 % lag1_log_dow 0.999933 1.00009
We probably shouldn't trust these results. (Why?)
Nearby observations are highly correlated with each other.
Problem: the series is not ergodic; this histogram tells us nothing about the generative process.
Call: lm(formula = ret_dow ~ lag1_ret_dow) Residuals: Min 1Q Median 3Q Max -0.255316 -0.004710 0.000247 0.005095 0.141891 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.088e-04 6.638e-05 3.146 0.00166 ** lag1_ret_dow 2.575e-02 6.042e-03 4.262 2.03e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.01098 on 27365 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: 0.0006634, Adjusted R-squared: 0.0006269 F-statistic: 18.17 on 1 and 27365 DF, p-value: 2.032e-05
Nearby observations are highly correlated with each other. Differences are mean-reverting.
kurtosis(ret_dow, na.rm=TRUE) # excess kurtosis, 0 for Gaussian
[1] 22.68784
ad.test(ret_dow)
Anderson-Darling normality test data: ret_dow A = 519.86, p-value < 2.2e-16
Returns exhibit leptokurtosis (heavy tails).
Persistent volatility?
Call: lm(formula = abs_ret_dow ~ lag1_abs_ret_dow) Residuals: Min 1Q Median 3Q Max -0.031320 -0.004634 -0.001939 0.002399 0.238004 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.216e-03 6.378e-05 81.78 <2e-16 *** lag1_abs_ret_dow 2.781e-01 5.805e-03 47.91 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.007946 on 27365 degrees of freedom (2 observations deleted due to missingness) Multiple R-squared: 0.0774, Adjusted R-squared: 0.07737 F-statistic: 2296 on 1 and 27365 DF, p-value: < 2.2e-16
Volatility is persistent.
Many datasets exhibit exponential growth and level-dependent volatility.
Nearby observations are correlated with each other.
Returns exhibit leptokurtosis.
Volatility is persistent.