# Linear Regression

Fit a linear regression model with response variable `sqrt(passing.distance)` and predictors `helmet`, `vehicle`, and `kerb`. This includes an intercept by default.

``````bikedata <- read.csv("bikedata.csv")
model <- lm(sqrt(passing.distance) ~ helmet + vehicle + kerb,
data = bikedata)``````

Get the coefficient estimates, their standard errors, t statistics, and p values.

``summary(model)``
```
Call:
lm(formula = sqrt(passing.distance) ~ helmet + vehicle + kerb,
data = bikedata)

Residuals:
Min      1Q  Median      3Q     Max
-0.6761 -0.0932 -0.0063  0.0882  0.7455

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.21756    0.02204   55.26  < 2e-16 ***
helmetY     -0.02255    0.00603   -3.74  0.00019 ***
vehicleCar   0.11207    0.02164    5.18  2.4e-07 ***
vehicleHGV   0.04895    0.02674    1.83  0.06726 .
vehicleLGV   0.09134    0.02299    3.97  7.3e-05 ***
vehiclePTW   0.09773    0.03275    2.98  0.00287 **
vehicleSUV   0.11193    0.02456    4.56  5.4e-06 ***
vehicleTaxi  0.06884    0.02977    2.31  0.02083 *
kerb        -0.10326    0.00850  -12.15  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.145 on 2346 degrees of freedom
Multiple R-squared:  0.0906,	Adjusted R-squared:  0.0875
F-statistic: 29.2 on 8 and 2346 DF,  p-value: <2e-16
```

Check regression assumptions with diagnostic plots.

``plot(model)``

Scatter plot of standardized residuals versus `kerb`.

``plot(bikedata\$kerb, rstandard(model))``

Boxplots of standardized residuals versus `helmet`

``boxplot(rstandard(model) ~ bikedata\$helmet)``

99% confidence interval for coefficient of `helmetY`.

``confint(model, "helmetY", 0.99)``
```           0.5 %    99.5 %
helmetY -0.03809 -0.007009
```