Statistiek I

(Multiple) linear regression

Martijn Wieling

Question 1: last lecture

Last lecture

Correlation
Linear regression with a single numerical independent variable
- Formula: \(y = \beta_0 + \beta_1 \times x\)
- Example: m <- lm(ratingNL ~ BAC, data=lls)
  - Dependent variable: ratingNL
  - Independent variable: BAC
- Interpretation of intercept and slope (influence centering and \(z\)-transformation)

This lecture

Part I: Linear regression with a single nominal independent variable
Part II: Multiple linear regression with multiple independent variables
Part III: Introduction to multiple linear regression with an interaction

Data: Lowlands Science data from last lecture

RQ: Is there a different influence of alcohol (BAC) on L1 vs. L2 language proficiency?
- Hypothesis (alternative): BAC has a stronger negative effect on L1 than L2
BAC and L1 and L2 speech samples collected for about 90 native Dutch speakers
Dutch speech samples were rated by native Dutch speakers (at Lowlands)
English speech samples were rated by native American English speakers (online)
Recall from lecture and lab of last week:
- Significant negative effect of BAC for Dutch
- Small (non-significant) negative effect of BAC for English
Here we assess whether this really represents a difference
- Note that we again (incorrectly!) ignore variability associated with individual speakers (each speaker received two ratings)
We start simple, by first comparing ratings across the two languages

Part I: Comparing group means using regression

Regression allows for the inclusion of nominal variables
Behind the scenes, several dummy variables (binary: value 0 or 1) are created for this
Number of dummy variables created equals nr. of levels of nominal variable - 1
- At most a single dummy variable associated with a nominal variable can be set to 1
- Intercept represents one of the levels (all dummy variables are 0) = reference level
Example: for modeling 3 colors (blue, green, red), 2 dummy variables are needed
- colorGreen equals 1 when green, 0 otherwise
- colorRed equals 1 when red, 0 otherwise
- When both colorGreen and colorRed equal 0, color equals blue

Comparing group means using regression

Hypotheses:
- \(H_0\): no differences between group means
- \(H_a\): group means differ
- Specifically for comparing two groups \(a\) and \(b\)
  - \(H_0\): \(\mu_a = \mu_b\), \(H_a\): \(\mu_a \neq \mu_b\) (for two-tailed hypothesis)

First example: comparing Dutch vs English proficiency

m <- lm(rating ~ lang, data=lls) # (Intercept) is included by default
summary(m) # significant difference between EN and NL


Call:
lm(formula = rating ~ lang, data = lls)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.9843 -0.4177  0.0753  0.4134  1.6346 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.0128     0.0689   -0.19   0.8525   
langNL        0.2939     0.0980    3.00   0.0031 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.665 on 182 degrees of freedom
Multiple R-squared:  0.0471,    Adjusted R-squared:  0.0418 
F-statistic: 8.99 on 1 and 182 DF,  p-value: 0.0031

Question 2

Interpretation

summary(m)$coef

             Estimate Std. Error  t value  Pr(>|t|)
(Intercept) -0.012838   0.068947 -0.18621 0.8524890
langNL       0.293941   0.098040  2.99818 0.0030957

Regression formula: \(y = \beta_0 + \beta_1 \times x \implies\) rating = -0.01 + 0.29 * langNL
- For EN (langNL == 0) fitted rating of: -0.01 + 0.29 \(\times\) 0 = -0.01 (= Intercept)
- For NL (langNL == 1) fitted rating of: -0.01 + 0.29 \(\times\) 1 = 0.28

Examining dummy variables and releveling

lls$lang <- as.factor(lls$lang) # ensuring it is a factor (nominal) variable
contrasts(lls$lang) # shows the levels mapped to the single dummy variable (alphabetic)

   NL
EN  0
NL  1

We can change the reference level using the function relevel

lls$lang <- relevel(lls$lang, 'NL') # set reference level to NL
contrasts(lls$lang) # shows changed contrasts

   EN
NL  0
EN  1

Model using releveled variable

m2 <- lm(rating ~ lang, data=lls) 
summary(m2)$coef

            Estimate Std. Error t value   Pr(>|t|)
(Intercept)  0.28110    0.06970  4.0330 8.0925e-05
langEN      -0.29394    0.09804 -2.9982 3.0957e-03

Regression formula: \(y = \beta_0 + \beta_1 \times x \implies\) rating = 0.28 + -0.29 * langEN
- For EN (langEN == 1) fitted rating of: 0.28 + -0.29 \(\times\) 1 = -0.01
- For NL (langEN == 0) fitted rating of: 0.28 + -0.29 \(\times\) 0 = 0.28 (= Intercept)

Question 3

Side-by-side comparison: identical predictions

summary(m)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -0.013      0.069   -0.19   0.8525
langNL         0.294      0.098    3.00   0.0031

visreg(m)

summary(m2)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.281     0.0697    4.03 8.09e-05
langEN        -0.294     0.0980   -3.00 3.10e-03

visreg(m2)

Effect size

summary(m)$r.squared

[1] 0.0471

Effect size is small to medium

Second example: comparing males and females

m3 <- lm(rating ~ sex, data=lls)
summary(m3) # significant difference between males and females


Call:
lm(formula = rating ~ sex, data = lls)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7478 -0.4059  0.0715  0.4830  1.5074 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.2218     0.0657    3.38   0.0009 ***
sexM         -0.2080     0.1003   -2.07   0.0395 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.673 on 182 degrees of freedom
Multiple R-squared:  0.0231,    Adjusted R-squared:  0.0177 
F-statistic:  4.3 on 1 and 182 DF,  p-value: 0.0395

Interpretation

summary(m3)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.222     0.0657    3.38 0.000898
sexM          -0.208     0.1003   -2.07 0.039455

Regression formula: \(y = \beta_0 + \beta_1 \times x \implies\) rating = 0.22 + -0.21 * sexM
- For ♀ (sexM == 0) fitted rating of: 0.22 + -0.21 \(\times\) 0 = 0.22 (= Intercept)
- For ♂ (sexM == 1) fitted rating of: 0.22 + -0.21 \(\times\) 1 = 0.01

Releveling

lls$sex <- as.factor(lls$sex) # ensuring it is a factor (nominal) variable
contrasts(lls$sex) # shows the levels mapped to the single dummy variable (alphabetic)

  M
F 0
M 1

We now change the reference level again

lls$sex <- relevel(lls$sex, 'M') # set reference level to M
contrasts(lls$sex) # shows changed contrasts

  F
M 0
F 1

Model using releveled variable: identical predictions

m4 <- lm(rating ~ sex, data=lls) 
summary(m4)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.0138     0.0757   0.183   0.8551
sexF          0.2080     0.1003   2.074   0.0395

Regression formula: \(y = \beta_0 + \beta_1 \times x \implies\) rating = 0.01 + 0.21 * sexF
- For ♀ (sexF == 1) fitted rating of: 0.01 + 0.21 \(\times\) 1 = 0.22
- For ♂ (sexF == 0) fitted rating of: 0.01 + 0.21 \(\times\) 0 = 0.01 (= Intercept)

Visualization and effect size

visreg(m4)

summary(m4)$r.squared # small effect size

[1] 0.0231

Third example: comparing educational levels

lls$edu = relevel(lls$edu, 'OTH') # set OTH as reference level
summary(m5 <- lm(rating ~ edu, data=lls)) # fit model and show summary in one line


Call:
lm(formula = rating ~ edu, data = lls)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0038 -0.4183  0.0858  0.4763  1.4945 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.1221     0.0992   -1.23  0.21999    
eduHBO        0.1489     0.1395    1.07  0.28731    
eduWO         0.4227     0.1200    3.52  0.00054 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.658 on 181 degrees of freedom
Multiple R-squared:  0.0715,    Adjusted R-squared:  0.0613 
F-statistic: 6.97 on 2 and 181 DF,  p-value: 0.00121

Interpretation

summary(m5)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    -0.12      0.099    -1.2  0.21999
eduHBO          0.15      0.140     1.1  0.28731
eduWO           0.42      0.120     3.5  0.00054

Regression formula: rating = -0.12 + 0.15 * eduHBO + 0.42 * eduWO
- rating for OTH: -0.12 + 0.15 \(\times\) 0 + 0.42 \(\times\) 0 = -0.12 (= Intercept)
- rating for HBO: -0.12 + 0.15 \(\times\) 1 + 0.42 \(\times\) 0 = 0.03
- rating for WO: -0.12 + 0.15 \(\times\) 0 + 0.42 \(\times\) 1 = 0.30
Note that WO has a significantly higher score than OTH, but what about WO vs. HBO?

Pairwise comparisons

If you are interested in statistically comparing all levels of a nominal variable, you effectively need to run multiple statistical tests (i.e. setting the reference level to each of the individual levels)
Using multiple tests/comparisons risks finding significance through sheer chance
Suppose we need to run three comparisons, always using \(\alpha\) = 0.05
- Chance of finding one or more significant values (family-wise error rate) is: \(1 - (1 - \alpha)^3\) = \(1 - 0.95^3 = 0.143\) (almost thrice as high as we’d like!)
To guarantee a family-wise error rate of 0.05, we should correct for the number of tests: the Bonferroni correction does this
- This correction effectively multiplies the original \(p\)-values by the number of tests (maximized by 1)

Correction for multiple comparisons in R

library(emmeans)
emmeans(m5, list(pairwise ~ edu), adjust='bonferroni')

$`emmeans of edu`
 edu emmean    SE  df lower.CL upper.CL
 OTH -0.122 0.099 181    -0.32     0.07
 HBO  0.027 0.098 181    -0.17     0.22
 WO   0.301 0.068 181     0.17     0.43

Confidence level used: 0.95 

$`pairwise differences of edu`
 1         estimate    SE  df t.ratio p.value
 OTH - HBO    -0.15 0.140 181  -1.100  0.8600
 OTH - WO     -0.42 0.120 181  -3.500  <.0001
 HBO - WO     -0.27 0.119 181  -2.300  0.0700

P value adjustment: bonferroni method for 3 tests

Question 4

Visualization and effect size

visreg(m5)

summary(m5)$r.squared # small to medium effect size

[1] 0.072

Part II: Multiple regression

Multiple regression captures a linear relationship between dependent variable (DV) and independent variables (IVs)
- Regression formula \(y = \beta_0 + \beta_1 x_1 + ... + \beta_n x_n + \epsilon\)
  - \(y\): dependent variable (= predicted or fitted values), \(x_i\): independent variables
  - \(\beta_0\): intercept (value of \(y\) when all \(x_i's\) are \(0\))
  - \(\beta_i\): influence (slope) of \(x_i\) on \(y\)
  - \(\epsilon\): errors (residuals)
In a multiple regression model, the effect (estimate) of each IV is determined while controlling for the effect of all other variables in the model
Unless there are interactions (covered later), each IV’s effect is assumed to be independent of the other IVs

Our first multiple regression model

summary(m6 <- lm(rating ~ lang + sex, data=lls)) # ref. levels lang=EN, sex=F


Call:
lm(formula = rating ~ lang + sex, data = lls)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.8657 -0.3810  0.0659  0.3684  1.6524 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   0.0764     0.0803    0.95   0.3424   
langNL        0.2936     0.0971    3.02   0.0029 **
sexM         -0.2075     0.0981   -2.12   0.0358 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.66 on 181 degrees of freedom
Multiple R-squared:  0.0701,    Adjusted R-squared:  0.0598 
F-statistic: 6.82 on 2 and 181 DF,  p-value: 0.0014

Whole model effect size: adjusted \(R^2\)

We have used \(R^2\) as a measure of effect size in simple regression
Adding additional independent variables will generally increase the \(R^2\)
Adjusted \(R^2\) corrects for the number of predictors in the model
A large difference between the \(R^2\) value and the adjusted \(R^2\) value indicates the model contains independent variables which are not useful
Adjusted \(R^2\) value is the best effect size measure (of a multiple regression model)

summary(m6)$adj.r.squared # model explains about 6% of the variation in the ratings

[1] 0.06

Model assumptions for multiple regression

Identical to linear regression
- Distribution of residuals is assumed to follow a normal distribution
- Variance of residuals should be equal along the regression line (homoscedastic)
Additional requirement: no collinearity among multiple independent variables

Collinearity

When including multiple variables, these may be correlated
high correlations are problematic as regr. estimates are harder to reliably determine
Collinearity can be measured via the variance inflation factor (VIF)
- \(VIF_i = \frac{1}{1-R^2_i}\), with \(R^2_i\) from the model with \(x_i\) as DV and the other \(x's\) as the IVs
- VIF \(\geq\) 10 is problematic

library(car)
vif(m6) # VIF OK

lang  sex 
   1    1

Model comparison: both predictors necessary?

anova(m, m6)

Analysis of Variance Table

Model 1: rating ~ lang
Model 2: rating ~ lang + sex
  Res.Df  RSS Df Sum of Sq    F Pr(>F)  
1    182 80.5                           
2    181 78.5  1      1.94 4.48  0.036 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The additional model complexity is supported by the improved fit to the data

Model comparison vs. summary

summary(m6)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.076      0.080    0.95   0.3424
langNL         0.294      0.097    3.02   0.0029
sexM          -0.208      0.098   -2.12   0.0358

anova(m, m6)

Analysis of Variance Table

Model 1: rating ~ lang
Model 2: rating ~ lang + sex
  Res.Df   RSS Df Sum of Sq     F Pr(>F)  
1    182 80.46                            
2    181 78.52  1     1.942 4.475 0.0358 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both \(p\)-values will be identical if Df equals 1

Comparing the strength of the effect of the two variables

summary(m6)$coef

             Estimate Std. Error   t value  Pr(>|t|)
(Intercept)  0.076419  0.0802789  0.951919 0.3424074
langNL       0.293622  0.0971169  3.023386 0.0028626
sexM        -0.207524  0.0980954 -2.115528 0.0357529

Both predictors are nominal and have two levels, so estimates can be compared
lang is most important as the abs. value of langNL is larger than that of sexM

Question 5

Interpreting the regression model: intercept

summary(m6)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.076      0.080    0.95   0.3424
langNL         0.294      0.097    3.02   0.0029
sexM          -0.208      0.098   -2.12   0.0358

Females (sexM == 0) who speak English (langNL == 0) have an exp. rating of 0.076

Interpreting the regression model: slopes (1)

summary(m6)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.076      0.080    0.95   0.3424
langNL         0.294      0.097    3.02   0.0029
sexM          -0.208      0.098   -2.12   0.0358

Summary shows \(\beta_\textrm{langNL}\) = 0.29
- When the language changes from English to Dutch, the rating increases by 0.29
Summary shows \(\beta_\textrm{sexM}\) = -0.21
- Males have a rating that is 0.21 lower than females
Fitted (predicted) value of the model can be determined using regression formula
- \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 \implies\) rating = 0.08 + 0.29*langNL + -0.21*sexM
- langNL equals 1 for NL and 0 for EN, and sexM equals 1 for M, and 0 for F

Interpreting the regression model: slopes (2)

summary(m6)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.076      0.080    0.95   0.3424
langNL         0.294      0.097    3.02   0.0029
sexM          -0.208      0.098   -2.12   0.0358

rating = 0.08 + 0.29*langNL + -0.21*sexM
- For lang EN and sex F: 0.08 + 0.29 \(\times\) 0 + -0.21 \(\times\) 0 = 0.08 (= Intercept)
- For lang EN and sex M: 0.08 + 0.29 \(\times\) 0 + -0.21 \(\times\) 1 = -0.13
- For lang NL and sex F: 0.08 + 0.29 \(\times\) 1 + -0.21 \(\times\) 0 = 0.37
- For lang NL and sex M: 0.08 + 0.29 \(\times\) 1 + -0.21 \(\times\) 1 = 0.16

Visualization

par(mfrow=c(1,2))
visreg(m6, 'lang')
visreg(m6, 'sex')

Multiple regression: combining nominal and numeric IVs

m7 <- lm(rating ~ lang + BAC.z, data=lls) # sex is excluded due to space restrictions on my slides 
summary(m7)


Call:
lm(formula = rating ~ lang + BAC.z, data = lls)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.8354 -0.3981  0.0604  0.4021  1.6615 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.1431     0.0825   -1.74   0.0843 . 
langNL        0.2966     0.0963    3.08   0.0024 **
BAC.z        -0.3505     0.1265   -2.77   0.0062 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.653 on 181 degrees of freedom
Multiple R-squared:  0.0858,    Adjusted R-squared:  0.0757 
F-statistic:  8.5 on 2 and 181 DF,  p-value: 0.000298

Model comparison: both IVs necessary?

anova(m, m7) # Yes!

Analysis of Variance Table

Model 1: rating ~ lang
Model 2: rating ~ lang + BAC.z
  Res.Df  RSS Df Sum of Sq    F Pr(>F)   
1    182 80.5                            
2    181 77.2  1      3.27 7.67 0.0062 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Checking assumptions

vif(m7) # VIF OK

 lang BAC.z 
    1     1

Collinearity is OK
Residuals are normally distributed and homoscedastic (not shown)

How good is the model: Adjusted \(R^2\)

summary(m7)$adj.r.squared

[1] 0.0757

The model explains about 7.5% of the variation in the ratings

Which variable is most important?

summary(m7)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    -0.14      0.082    -1.7   0.0843
langNL          0.30      0.096     3.1   0.0024
BAC.z          -0.35      0.127    -2.8   0.0062

Effect of BAC.z is largest, as each increase of 1 results in a decrease of rating by 0.35
- Changing the language (lang) only changes the rating by 0.3

Question 6

Interpreting the regression model: intercept

summary(m7)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    -0.14      0.082    -1.7   0.0843
langNL          0.30      0.096     3.1   0.0024
BAC.z          -0.35      0.127    -2.8   0.0062

A person speaking English (langNL == 0) with an average BAC (BAC.z == 0) has an expected rating of -0.14

Interpreting the regression model: slopes (1)

summary(m7)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    -0.14      0.082    -1.7   0.0843
langNL          0.30      0.096     3.1   0.0024
BAC.z          -0.35      0.127    -2.8   0.0062

Summary shows \(\beta_\textrm{langNL}\) = 0.3
- When the language changes from English to Dutch, the rating increases by 0.3
Summary shows \(\beta_\textrm{BAC.z}\) = -0.35
- For every increase of BAC.z with 1 (standard deviation) the rating decreases by 0.35
Fitted (predicted) value of the model can be determined using regression formula
- \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 \implies\) rating = -0.14 + 0.30*langNL + -0.35*BAC.z
- langNL equals 1 for NL and 0 for EN

Interpreting the regression model: slopes (2)

summary(m7)$coef

            Estimate Std. Error t value Pr(>|t|)
(Intercept)    -0.14      0.082    -1.7   0.0843
langNL          0.30      0.096     3.1   0.0024
BAC.z          -0.35      0.127    -2.8   0.0062

rating = -0.14 + 0.30*langNL + -0.35*BAC.z
- For lang EN and BAC.z equals 0: -0.14 + 0.30 \(\times\) 0 + -0.35 \(\times\) 0 = -0.14 (= Intercept)
- For lang NL and BAC.z equals 2: -0.14 + 0.30 \(\times\) 1 + -0.35 \(\times\) 2 = -0.54
Note that the effects are independent and do not influence each other!

Visualization shows independence

visreg(m7,'BAC.z', by='lang', overlay=T) # overlay=T visualizes the lines in one graph

Part III: Multiple regression with an interaction

Multiple regression captures a linear relationship between dependent variable (DV) and independent variables (IVs)
An interaction indicates that the effect of the interacting IVs on the DV is dependent on each other
- Regression formula \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{1,2} x_1 x_2 + \epsilon\)
- \(y\): dependent variable (= predicted or fitted values), \(x_i\): independent variables, \(\epsilon\): resid.
  - \(\beta_0\): intercept (value of \(y\) when all \(x_i's\) are \(0\))
  - \(\beta_1\): influence (slope) of \(x_1\) on \(y\) when \(x_2\) equals 0
  - \(\beta_2\): influence (slope) of \(x_2\) on \(y\) when \(x_1\) equals 0
  - \(\beta_{1,2}\): interaction effect (slope) of \(x_1\) and \(x_2\)

Be careful when interpreting an interaction!

Many people interpret an interaction in regression incorrectly
In an interaction, the terms involved in the interaction cannot be interpreted by themselves anymore
- They have to be interpreted jointly (i.e. always in the context of the other variable)
Visualization helps interpretation

Our first interaction: assessing our RQ

summary(m8 <- lm(rating ~ BAC.z * lang, data=lls))


Call:
lm(formula = rating ~ BAC.z * lang, data = lls)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7972 -0.3765  0.0637  0.3826  1.6101 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -0.1029     0.0974   -1.06    0.292  
BAC.z         -0.2421     0.1881   -1.29    0.200  
langNL         0.2236     0.1345    1.66    0.098 .
BAC.z:langNL  -0.1982     0.2544   -0.78    0.437  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.654 on 180 degrees of freedom
Multiple R-squared:  0.0889,    Adjusted R-squared:  0.0737 
F-statistic: 5.85 on 3 and 180 DF,  p-value: 0.000776

Question 7

Better than simpler model?

anova(m7, m8) # note that the order of terms in each model is irrelevant (A * B == B * A)

Analysis of Variance Table

Model 1: rating ~ lang + BAC.z
Model 2: rating ~ BAC.z * lang
  Res.Df  RSS Df Sum of Sq    F Pr(>F)
1    181 77.2                         
2    180 76.9  1     0.259 0.61   0.44

Our hypothesis was one-tailed, but the \(p\)-values of anova and summary are two-tailed
- One-tailed \(p\)-value (as direction in line with hypothesis): \(\frac{0.44}{2} = 0.22\) (\(> \alpha\) of 0.05)
- Not enough support for a sign. difference between the negative BAC effect on L1 vs. L2
  - Despite our earlier finding that the effect was significant for Dutch, but not English
Next, we will learn how to interpret the (non-significant) interaction pattern

Interpreting the interaction numerically

             Estimate Std. Error t value Pr(>|t|)
(Intercept)     -0.10      0.097   -1.06    0.292
BAC.z           -0.24      0.188   -1.29    0.200
langNL           0.22      0.134    1.66    0.098
BAC.z:langNL    -0.20      0.254   -0.78    0.437

Regression formula \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{1,2} x_1 x_2 \implies\)
rating = -0.10 + -0.24 * BAC.z + 0.22 * langNL + -0.20 * BAC.z * langNL
- For BAC.z of 0 and lang EN: -0.10 + -0.24\(\times\)0 + 0.22\(\times\)0 + -0.20\(\times\)0\(\times\)0 = -0.10
- For BAC.z of 0 and lang NL: -0.10 + -0.24\(\times\)0 + 0.22\(\times\)1 + -0.20\(\times\)0\(\times\)1 = 0.12
- For BAC.z of 2 and lang NL: -0.10 + -0.24\(\times\)2 + 0.22\(\times\)1 + -0.20\(\times\)2\(\times\)1 = -0.76

Interpreting the interaction logically

             Estimate Std. Error t value Pr(>|t|)
(Intercept)     -0.10      0.097   -1.06    0.292
BAC.z           -0.24      0.188   -1.29    0.200
langNL           0.22      0.134    1.66    0.098
BAC.z:langNL    -0.20      0.254   -0.78    0.437

For the EN lang, each unit increase of BAC.z changes rating by -0.24
For the NL lang, each unit increase of BAC.z changes rating by -0.24 + -0.20 = -0.44
The BAC.z slope is shifted downwards by 0.22 for the NL lang
So: the effect of BAC.z is more negative for L1 than L2 (but not significantly so)

Interpreting the interaction visually

             Estimate Std. Error t value Pr(>|t|)
(Intercept)     -0.10      0.097   -1.06    0.292
BAC.z           -0.24      0.188   -1.29    0.200
langNL           0.22      0.134    1.66    0.098
BAC.z:langNL    -0.20      0.254   -0.78    0.437

visreg(m8,"BAC.z",by="lang", overlay=TRUE) # visualization shows dependency of BAC en lang

Question 8

Recap

In this lecture, we’ve covered
- Linear regression with a single nominal independent variable
- Multiple linear regression with multiple independent variables
- Introduction to multiple linear regression with an interaction
Next lecture: Multiple linear regression with interactions, Cronbach’s alpha and recap

Please evaluate this lecture!

Exam question

Questions?

Thank you for your attention!

https://www.martijnwieling.nl

m.b.wieling@rug.nl