# Regression

Martijn Wieling
University of Groningen

## This lecture

• Correlation
• Regression
• Linear regression
• Multiple regression
• Interpreting interactions
• Regression assumptions and model criticism

## Correlation

• Quantify relation between two numerical variables (interval or ratio scale)
• $$-1 \leq r \leq 1$$ indicates strength (effect size) and direction

## Linear regression

• To assess relationship between numerical dependent variable and one (simple regression) or more (multiple regression) quantitative or categorical predictor variables
• Measures impact of each individual variable on dependent variable, while controlling for other variables in the model
• Note that regression is equivalent to ANOVA, but the focus is different: relation between numerical variables vs. group comparisons

## Linear regression: formula

• Linear regression captures relationship between dependent variable and independent variables using a formula
• $$y_i = \beta_1 x_i + \beta_0 + \epsilon_i$$
• With $$y_i$$: dependent variable, $$x_i$$: independent variable, $$\beta_0$$: intercept (value of $$y_i$$ when $$x_i$$ equals 0), $$\beta_1$$: coefficient (slope) for all $$x_i$$, and $$\epsilon_i$$: error (residuals; all residuals follow normal distribution with mean 0)

## Dataset for this lecture

#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
# 1          5.1         3.5          1.4         0.2  setosa       0.14
# 2          4.9         3.0          1.4         0.2  setosa       0.14
# 3          4.7         3.2          1.3         0.2  setosa       0.13
# 4          4.6         3.1          1.5         0.2  setosa       0.15
# 5          5.0         3.6          1.4         0.2  setosa       0.14
# 6          5.4         3.9          1.7         0.4  setosa       0.34

## Fitting a simple regression model in R

m0 <- lm(Petal.Area ~ Sepal.Length, data = iris)
summary(m0)
#
# Call:
# lm(formula = Petal.Area ~ Sepal.Length, data = iris)
#
# Residuals:
#    Min     1Q Median     3Q    Max
# -2.671 -0.794 -0.099  0.730  3.489
#
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)
# (Intercept)   -11.357      0.711   -16.0   <2e-16 ***
# Sepal.Length    2.439      0.120    20.3   <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 1.22 on 148 degrees of freedom
# Multiple R-squared:  0.735,   Adjusted R-squared:  0.733
# F-statistic:  410 on 1 and 148 DF,  p-value: <2e-16

## Visualization

library(visreg)  # package containing visualization function visreg
visreg(m0)  # visualize regression line together with data points

• The blue regression line shows the predicted (fitted) values of the model

## Numerical interpretation

• $$y_i = \beta_1 x_i + \beta_0 + \epsilon_i$$
round(m0$coefficients, 2) # (Intercept) Sepal.Length # -11.36 2.44 • Petal.Area = 2.44 $$\times$$ Sepal.Length + -11.36 • For sepal length of 5.1, predicted (fitted) petal area: 2.44 $$\times$$ 5.1 + -11.36 = 1.08 iris$FittedPA <- fitted(m0)