Linear Regression

Fit a linear regression model with response variable sqrt(passing.distance) and predictors helmet, vehicle, and kerb. This includes an intercept by default.

bikedata <- read.csv("bikedata.csv")
model <- lm(sqrt(passing.distance) ~ helmet + vehicle + kerb,
            data = bikedata)

Get the coefficient estimates, their standard errors, t statistics, and p values.

summary(model)

Call:
lm(formula = sqrt(passing.distance) ~ helmet + vehicle + kerb, 
    data = bikedata)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.6761 -0.0932 -0.0063  0.0882  0.7455 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.21756    0.02204   55.26  < 2e-16 ***
helmetY     -0.02255    0.00603   -3.74  0.00019 ***
vehicleCar   0.11207    0.02164    5.18  2.4e-07 ***
vehicleHGV   0.04895    0.02674    1.83  0.06726 .  
vehicleLGV   0.09134    0.02299    3.97  7.3e-05 ***
vehiclePTW   0.09773    0.03275    2.98  0.00287 ** 
vehicleSUV   0.11193    0.02456    4.56  5.4e-06 ***
vehicleTaxi  0.06884    0.02977    2.31  0.02083 *  
kerb        -0.10326    0.00850  -12.15  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.145 on 2346 degrees of freedom
Multiple R-squared:  0.0906,	Adjusted R-squared:  0.0875 
F-statistic: 29.2 on 8 and 2346 DF,  p-value: <2e-16

Check regression assumptions with diagnostic plots.

plot(model)
plot of chunk unnamed-chunk-3
plot of chunk unnamed-chunk-3
plot of chunk unnamed-chunk-3
plot of chunk unnamed-chunk-3

Scatter plot of standardized residuals versus kerb.

plot(bikedata$kerb, rstandard(model))
plot of chunk unnamed-chunk-4

Boxplots of standardized residuals versus helmet

boxplot(rstandard(model) ~ bikedata$helmet)
plot of chunk unnamed-chunk-5

99% confidence interval for coefficient of helmetY.

confint(model, "helmetY", 0.99)
           0.5 %    99.5 %
helmetY -0.03809 -0.007009