STATISTICS COURSE - LAB SESSION 1

Martijn Wieling, http://www.martijnwieling.nl

This dataset is also discussed in Baayen (2008: Ch. 7.1)

## Generated on: December 25, 2014 - 11:09:20

## MAKE SURE YOU HAVE THE MOST RECENT 64 BITS VERSION OF R INSTALLED (2014-11-03: 3.1.2)
##
## FOR A GOOD INTRODUCTION TO R, YOU CAN FOLLOW THE FOLLOWING COURSERA COURSE:
## https://www.coursera.org/course/compdata
##
## SOME PEOPLE FIND R STUDIO A BIT MORE USER FRIENDLY. YOU CAN DOWNLOAD THIS HERE:
## http://www.rstudio.com/ide/download/

# A0. Start R 3.1.2
# Start R and make sure the libraries lmer and car are installed
# Menu: Packages -> Install package(s)...  -> Select mirror -> Select packages
# or: 
# install.packages(c("lme4","car"),repos='http://cran.r-project.org')

# A1. These commands should work when the packages are installed
library(lme4)
library(car)

# version information
R.version.string

## [1] "R version 3.1.2 (2014-10-31)"

packageVersion('lme4')

## [1] '1.1.7'

packageVersion('car')

## [1] '2.0.22'

# A2. Now download and load the following file containing some functions for plotting: 
download.file('http://www.let.rug.nl/wieling/statscourse/lecture1/lab/myFunctions.R', 'myFunctions.R')
download.file('http://www.let.rug.nl/wieling/statscourse/lecture1/lab/lexdec.rda', 'lexdec.rda')
source('myFunctions.R')

# A3. load the dataset lexdec
load('lexdec.rda')

# A4. data investigation
head(lexdec)

##   Subject       RT Trial Sex NativeLanguage Correct PrevType PrevCorrect       Word Frequency FamilySize SynsetCount
## 1      A1 6.340359    23   F        English correct     word     correct        owl  4.859812  1.3862944   0.6931472
## 2      A1 6.308098    27   F        English correct  nonword     correct       mole  4.605170  1.0986123   1.9459101
## 3      A1 6.349139    29   F        English correct  nonword     correct     cherry  4.997212  0.6931472   1.6094379
## 4      A1 6.186209    30   F        English correct     word     correct       pear  4.727388  0.0000000   1.0986123
## 5      A1 6.025866    32   F        English correct  nonword     correct        dog  7.667626  3.1354942   2.0794415
## 6      A1 6.180017    33   F        English correct     word     correct blackberry  4.060443  0.6931472   1.3862944
##   Length  Class FreqSingular FreqPlural DerivEntropy Complex      rInfl meanRT SubjFreq meanSize meanWeight      BNCw
## 1      3 animal           54         74       0.7912 simplex -0.3101549 6.3582     3.12   3.4758     3.1806 12.057065
## 2      4 animal           69         30       0.6968 simplex  0.8145080 6.4150     2.40   2.9999     2.6112  5.738806
## 3      6  plant           83         49       0.4754 simplex  0.5187938 6.3426     3.88   1.6278     1.2081  5.716520
## 4      4  plant           44         68       0.0000 simplex -0.4274440 6.3353     4.52   1.9908     1.6114  2.050370
## 5      3 animal         1233        828       1.2129 simplex  0.3977961 6.2956     6.04   4.6429     4.5167 74.838494
## 6     10  plant           26         31       0.3492 complex -0.1698990 6.3959     3.28   1.5831     1.1365  1.270338
##        BNCc       BNCd BNCcRatio BNCdRatio
## 1  0.000000   6.175602  0.000000  0.512198
## 2  4.062251   2.850278  0.707856  0.496667
## 3  3.249801  12.588727  0.568493  2.202166
## 4  1.462410   7.363218  0.713242  3.591166
## 5 50.859385 241.561040  0.679589  3.227765
## 6  0.162490   1.187616  0.127911  0.934882

str(lexdec)

## 'data.frame':    1659 obs. of  28 variables:
##  $ Subject       : Factor w/ 21 levels "A1","A2","A3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ RT            : num  6.34 6.31 6.35 6.19 6.03 ...
##  $ Trial         : int  23 27 29 30 32 33 34 38 41 42 ...
##  $ Sex           : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
##  $ NativeLanguage: Factor w/ 2 levels "English","Other": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Correct       : Factor w/ 2 levels "correct","incorrect": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PrevType      : Factor w/ 2 levels "nonword","word": 2 1 1 2 1 2 2 1 1 2 ...
##  $ PrevCorrect   : Factor w/ 2 levels "correct","incorrect": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Word          : Factor w/ 79 levels "almond","ant",..: 55 47 20 58 25 12 71 69 62 1 ...
##  $ Frequency     : num  4.86 4.61 5 4.73 7.67 ...
##  $ FamilySize    : num  1.386 1.099 0.693 0 3.135 ...
##  $ SynsetCount   : num  0.693 1.946 1.609 1.099 2.079 ...
##  $ Length        : int  3 4 6 4 3 10 10 8 6 6 ...
##  $ Class         : Factor w/ 2 levels "animal","plant": 1 1 2 2 1 2 2 1 2 2 ...
##  $ FreqSingular  : int  54 69 83 44 1233 26 50 63 11 24 ...
##  $ FreqPlural    : int  74 30 49 68 828 31 65 47 9 42 ...
##  $ DerivEntropy  : num  0.791 0.697 0.475 0 1.213 ...
##  $ Complex       : Factor w/ 2 levels "complex","simplex": 2 2 2 2 2 1 1 2 2 2 ...
##  $ rInfl         : num  -0.31 0.815 0.519 -0.427 0.398 ...
##  $ meanRT        : num  6.36 6.42 6.34 6.34 6.3 ...
##  $ SubjFreq      : num  3.12 2.4 3.88 4.52 6.04 3.28 5.04 2.8 3.12 3.72 ...
##  $ meanSize      : num  3.48 3 1.63 1.99 4.64 ...
##  $ meanWeight    : num  3.18 2.61 1.21 1.61 4.52 ...
##  $ BNCw          : num  12.06 5.74 5.72 2.05 74.84 ...
##  $ BNCc          : num  0 4.06 3.25 1.46 50.86 ...
##  $ BNCd          : num  6.18 2.85 12.59 7.36 241.56 ...
##  $ BNCcRatio     : num  0 0.708 0.568 0.713 0.68 ...
##  $ BNCdRatio     : num  0.512 0.497 2.202 3.591 3.228 ...

summary(lexdec)

##     Subject           RT            Trial     Sex      NativeLanguage      Correct        PrevType      PrevCorrect  
##  A1     :  79   Min.   :5.829   Min.   : 23   F:1106   English:948    correct  :1594   nonword:855   correct  :1542  
##  A2     :  79   1st Qu.:6.215   1st Qu.: 64   M: 553   Other  :711    incorrect:  65   word   :804   incorrect: 117  
##  A3     :  79   Median :6.346   Median :106                                                                          
##  C      :  79   Mean   :6.385   Mean   :105                                                                          
##  D      :  79   3rd Qu.:6.502   3rd Qu.:146                                                                          
##  I      :  79   Max.   :7.587   Max.   :185                                                                          
##  (Other):1185                                                                                                        
##         Word        Frequency       FamilySize      SynsetCount         Length          Class      FreqSingular   
##  almond   :  21   Min.   :1.792   Min.   :0.0000   Min.   :0.6931   Min.   : 3.000   animal:924   Min.   :   4.0  
##  ant      :  21   1st Qu.:3.951   1st Qu.:0.0000   1st Qu.:1.0986   1st Qu.: 5.000   plant :735   1st Qu.:  23.0  
##  apple    :  21   Median :4.754   Median :0.0000   Median :1.0986   Median : 6.000                Median :  69.0  
##  apricot  :  21   Mean   :4.751   Mean   :0.7028   Mean   :1.3154   Mean   : 5.911                Mean   : 132.1  
##  asparagus:  21   3rd Qu.:5.652   3rd Qu.:1.0986   3rd Qu.:1.6094   3rd Qu.: 7.000                3rd Qu.: 146.0  
##  avocado  :  21   Max.   :7.772   Max.   :3.3322   Max.   :2.3026   Max.   :10.000                Max.   :1518.0  
##  (Other)  :1533                                                                                                   
##    FreqPlural     DerivEntropy       Complex         rInfl             meanRT         SubjFreq        meanSize    
##  Min.   :  0.0   Min.   :0.0000   complex: 210   Min.   :-1.3437   Min.   :6.245   Min.   :2.000   Min.   :1.323  
##  1st Qu.: 19.0   1st Qu.:0.0000   simplex:1449   1st Qu.:-0.3023   1st Qu.:6.322   1st Qu.:3.160   1st Qu.:1.890  
##  Median : 49.0   Median :0.0370                  Median : 0.1900   Median :6.364   Median :3.880   Median :3.099  
##  Mean   :109.7   Mean   :0.3856                  Mean   : 0.2845   Mean   :6.379   Mean   :3.911   Mean   :2.891  
##  3rd Qu.:132.0   3rd Qu.:0.6845                  3rd Qu.: 0.6385   3rd Qu.:6.420   3rd Qu.:4.680   3rd Qu.:3.711  
##  Max.   :854.0   Max.   :2.2641                  Max.   : 4.4427   Max.   :6.621   Max.   :6.040   Max.   :4.819  
##                                                                                                                   
##    meanWeight          BNCw               BNCc              BNCd           BNCcRatio         BNCdRatio     
##  Min.   :0.8244   Min.   : 0.02229   Min.   : 0.0000   Min.   :  0.000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:1.4590   1st Qu.: 1.64921   1st Qu.: 0.1625   1st Qu.:  1.188   1st Qu.:0.09673   1st Qu.:0.5551  
##  Median :2.7558   Median : 3.32071   Median : 0.6500   Median :  3.800   Median :0.27341   Median :0.9349  
##  Mean   :2.5516   Mean   : 7.37800   Mean   : 5.0351   Mean   : 12.995   Mean   :0.45834   Mean   :1.5428  
##  3rd Qu.:3.4178   3rd Qu.: 7.10943   3rd Qu.: 2.9248   3rd Qu.: 10.451   3rd Qu.:0.55550   3rd Qu.:2.1315  
##  Max.   :4.7138   Max.   :79.17324   Max.   :83.1949   Max.   :241.561   Max.   :8.29545   Max.   :6.3458  
##

nrow(lexdec) # 1659 rows

## [1] 1659

plot(density(lexdec$Frequency)) # e.g., to investigate distributions

# frequency normally shows a skewed distribution, and was therefore log-transformed
# this was the original distribution:
plot(density(exp(lexdec$Frequency)))

# A5. data cleaning: remove impossible RT's (< 200ms. or > 1100ms.)
# and investigate only correct responses
lexdec2 = lexdec[exp(lexdec$RT)>200 & exp(lexdec$RT) < 1100,]
nrow(lexdec2)/nrow(lexdec) # 97.6% retained

## [1] 0.9764919

lexdec3 = lexdec2[lexdec2$Correct == 'correct',] 

# A6. We create our first linear regression model (no mixed-effects yet)
linearModel = lm(RT ~ Trial, data=lexdec3)

# A7. Look at the results of the model and note that Trial is not significant here
summary(linearModel)

## 
## Call:
## lm(formula = RT ~ Trial, data = lexdec3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4725 -0.1486 -0.0282  0.1151  0.6304 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.3881479  0.0127642 500.476   <2e-16 ***
## Trial       -0.0002023  0.0001108  -1.825   0.0681 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.207 on 1556 degrees of freedom
## Multiple R-squared:  0.002137,   Adjusted R-squared:  0.001496 
## F-statistic: 3.332 on 1 and 1556 DF,  p-value: 0.06812

###### MIXED-EFFECTS REGRESSION: RANDOM INTERCEPTS ######
# B1. Since we want our results to be generalizable across words and subjects,
# we add random intercepts for these factors.
# We start with a random intercept for Subject: (1|Subject) 
mixedModel1 = lmer(RT ~ Trial + (1|Subject), data=lexdec3)

# B2. Look at the results, note that p-values are not present anymore, 
# but |t| > 1.65 equals p < 0.05 (one-tailed) and |t| > 2 equals p < 0.05
# (for a large data set)
# Note that Trial is still not significant
summary(mixedModel1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + (1 | Subject)
##    Data: lexdec3
## 
## REML criterion at convergence: -1098.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5540 -0.6987 -0.1318  0.5377  3.9070 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 0.01778  0.1334  
##  Residual             0.02710  0.1646  
## Number of obs: 1558, groups:  Subject, 21
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)  6.389e+00  3.083e-02  207.23
## Trial       -1.734e-04  8.838e-05   -1.96
## 
## Correlation of Fixed Effects:
##       (Intr)
## Trial -0.301

# B3. Test if the random intercept is necessary (significant improvement)
AIC(linearModel) - AIC(mixedModel1) # should be at least 2, then the 2nd model is better

## [1] 607.2465

# B4. To see how the intercepts for subjects vary, we can plot them
# (this custom function uses the random effects extracted with ranef())
# remove the ## to save it as a pdf instead (in your present directory)
## pdf('plotname.pdf', width = 10, height = 10)
myInterceptPlot(lexdec,"RT","Trial","Subject",mixedModel1)

## dev.off()
# use: getwd() to see in which folder it was saved

# B5. We now add random intercepts for Word and display the results
mixedModel2 = lmer(RT ~ Trial + (1|Subject) + (1|Word), data=lexdec3)
summary(mixedModel2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + (1 | Subject) + (1 | Word)
##    Data: lexdec3
## 
## REML criterion at convergence: -1245.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3694 -0.6577 -0.0995  0.5112  4.4522 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept) 0.004741 0.06886 
##  Subject  (Intercept) 0.018698 0.13674 
##  Residual             0.022657 0.15052 
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)  6.3947330  0.0322446  198.32
## Trial       -0.0001874  0.0000821   -2.28
## 
## Correlation of Fixed Effects:
##       (Intr)
## Trial -0.268

# Note that trial is now significant. So when taking into account the 
# structural variability associated with word, trial actually plays a significant role. 

## EXERCISE 1: Plot how the intercepts vary per word
# Hint: Consider saving it as a pdf to show a bigger graph (increase width and height)

## ANSWER 1:
pdf('wordintercepts.pdf', width = 15, height = 15)
myInterceptPlot(lexdec,"RT","Trial","Word",mixedModel2)
dev.off()

## pdf 
##   2

getwd() # see where it was stored

## [1] "C:/Users/Martijn/Documents/lecture1"

# B6. We test if the inclusion of this random intercept is an improvement
# Caveat: all these methods are somewhat problematic as it is unclear how
# to define the number of degrees of freedom in a mixed model...
# Note that the p-value of the LRT below is conservative 
# (see http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf, page 46)
AIC(mixedModel1) - AIC(mixedModel2) # > 2, so mixedModel2 is much better

## [1] 145.0771

# alternative testing method: Likelihood Ratio Test (LRT), note that refit
# needs to be False as we're comparing random effects, so REML fitting is
# appropriate
anova(mixedModel1,mixedModel2,refit=F)

## Data: lexdec3
## Models:
## mixedModel1: RT ~ Trial + (1 | Subject)
## mixedModel2: RT ~ Trial + (1 | Subject) + (1 | Word)
##             Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## mixedModel1  4 -1090.1 -1068.7 549.03  -1098.1                             
## mixedModel2  5 -1235.1 -1208.4 622.57  -1245.1 147.08      1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# B7. How well does the model do? (i.e. explained variance)
cor(fitted(mixedModel2),lexdec3$RT)^2 # 50% of the variance explained

## [1] 0.5019252

# this is due to random intercepts!, see:
cor(fitted(linearModel),lexdec3$RT)^2

## [1] 0.002137039

# B8. Now we can add another fixed effect-factor
mixedModel3 = lmer(RT ~ Trial + NativeLanguage + (1|Subject) + (1|Word), data=lexdec3)
summary(mixedModel3) # NativeLanguage is significant, people with the Other native language are slower

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage + (1 | Subject) + (1 | Word)
##    Data: lexdec3
## 
## REML criterion at convergence: -1246.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3619 -0.6603 -0.1045  0.5116  4.4583 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept) 0.004747 0.06889 
##  Subject  (Intercept) 0.014724 0.12134 
##  Residual             0.022655 0.15052 
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)          6.3366318  0.0372423  170.15
## Trial               -0.0001872  0.0000821   -2.28
## NativeLanguageOther  0.1356033  0.0540709    2.51
## 
## Correlation of Fixed Effects:
##             (Intr) Trial 
## Trial       -0.233       
## NtvLnggOthr -0.622  0.001

# B9. Test if adding the fixed effect parameter significantly improves the model
# We need to set the REML parameter to false when fixed effect parameters are
# to be compared. For the final model and when comparing random effect, REML
# should be set to true (the default). While significant parameters will
# generally result in a significant improvement, this test is necessary
# when including multi-level factors or interactions
mixedModel2b = lmer(RT ~ Trial + (1|Subject) + (1|Word), data=lexdec3, REML=F)
mixedModel3b = lmer(RT ~ Trial + NativeLanguage + (1|Subject) + (1|Word), data=lexdec3, REML=F)
AIC(mixedModel2b) - AIC(mixedModel3b) # 3b is better

## [1] 3.986497

# EXERCISE 2: Add Frequency as a fixed-effect predictor to the model and assess
#             if the model improves using model comparison (hint: REML=F for
#             model comparison). Store the model as mixedModel4.

## ANSWER 2:
mixedModel4 = lmer(RT ~ Trial + NativeLanguage + Frequency + (1|Subject) + (1|Word), data=lexdec3)
summary(mixedModel4)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage + Frequency + (1 | Subject) + (1 |      Word)
##    Data: lexdec3
## 
## REML criterion at convergence: -1279.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3206 -0.6637 -0.1029  0.5361  4.4615 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept) 0.002357 0.04855 
##  Subject  (Intercept) 0.014723 0.12134 
##  Residual             0.022659 0.15053 
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)          6.519e+00  4.455e-02  146.32
## Trial               -1.880e-04  8.189e-05   -2.30
## NativeLanguageOther  1.358e-01  5.407e-02    2.51
## Frequency           -3.831e-02  5.256e-03   -7.29
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO
## Trial       -0.198              
## NtvLnggOthr -0.518  0.001       
## Frequency   -0.563  0.007 -0.003

mixedModel4b = lmer(RT ~ Trial + NativeLanguage + Frequency + (1|Subject) + (1|Word), data=lexdec3, REML=F)
AIC(mixedModel3b) - AIC(mixedModel4b) # 4b is much better

## [1] 39.26802

###### MIXED-EFFECTS REGRESSION: RANDOM SLOPES ######
# C1. We now want to see if subjects become slower or faster during the course of 
# the experiment: include a random slope for Trial 
# Note that it is important to center Trial! Also note you will get warning messages,
# which you should read carefully.
lexdec3$cTrial = scale(lexdec3$Trial,scale=F) # centering
mixedModel4c = lmer(RT ~ cTrial + NativeLanguage + Frequency + (1|Subject) + (0+cTrial|Subject) + (1|Word), 
                    data=lexdec3)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?

# The model gives a warning that the model is nearly unidentifiable and recommends rescaling the predictors. 
# We follow its recommendation for trial as this variable caused the problem: 
lexdec3$zTrial = scale(lexdec3$Trial) # z-transforming (centering, then dividing by standard deviation)
mixedModel4c = lmer(RT ~ zTrial + NativeLanguage + Frequency + (1|Subject) + (0+zTrial|Subject) + (1|Word), 
                    data=lexdec3) # no warning

# C2. We now test if subjects show a different response depending on word frequency: 
# include a random slope for Frequency per subject. For simplicity we continue building
# from mixedModel4.
lexdec3$cFrequency = scale(lexdec3$Frequency,scale=F) # centering
mixedModel5 = lmer(RT ~ Trial + NativeLanguage + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1|Word), 
                   data=lexdec3)
summary(mixedModel5)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (0 +      cFrequency | Subject) + (1 | Word)
##    Data: lexdec3
## 
## REML criterion at convergence: -1284.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2952 -0.6487 -0.1000  0.5262  4.4695 
## 
## Random effects:
##  Groups    Name        Variance  Std.Dev.
##  Word      (Intercept) 0.0023953 0.04894 
##  Subject   cFrequency  0.0001858 0.01363 
##  Subject.1 (Intercept) 0.0147470 0.12144 
##  Residual              0.0223558 0.14952 
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)          6.335e+00  3.685e-02  171.92
## Trial               -1.907e-04  8.157e-05   -2.34
## NativeLanguageOther  1.363e-01  5.411e-02    2.52
## cFrequency          -3.852e-02  6.057e-03   -6.36
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO
## Trial       -0.233              
## NtvLnggOthr -0.629  0.001       
## cFrequency   0.004  0.006 -0.003

# C3. Compare to previous best model (but also make sure the Frequency is centered there)
mixedModel4 = lmer(RT ~ Trial + NativeLanguage + cFrequency + (1|Subject) + (1|Word), data=lexdec3)
AIC(mixedModel4) - AIC(mixedModel5) # mixedModel5 is better

## [1] 3.552462

## EXERCISE 3. Test for a correlation parameter between the random slope for frequency and the random intercept
# (i.e. test if slower subjects respond in a different way to increasing frequency than faster subjects)

## ANSWER 3:
mixedModel5c = lmer(RT ~ Trial + NativeLanguage + cFrequency + (1+cFrequency|Subject) + (1|Word), data=lexdec3)
summary(mixedModel5c) # does not look interesting: -0.06

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage + cFrequency + (1 + cFrequency |      Subject) + (1 | Word)
##    Data: lexdec3
## 
## REML criterion at convergence: -1285
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2959 -0.6480 -0.0974  0.5262  4.4710 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  Word     (Intercept) 0.002397 0.04896       
##  Subject  (Intercept) 0.014759 0.12149       
##           cFrequency  0.000186 0.01364  -0.06
##  Residual             0.022355 0.14952       
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)          6.337e+00  3.685e-02  171.97
## Trial               -1.908e-04  8.157e-05   -2.34
## NativeLanguageOther  1.311e-01  5.408e-02    2.42
## cFrequency          -3.853e-02  6.059e-03   -6.36
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO
## Trial       -0.233              
## NtvLnggOthr -0.629  0.001       
## cFrequency  -0.018  0.006 -0.002

AIC(mixedModel5) - AIC(mixedModel5c) # and does not improve the model

## [1] -1.975181

# Note that if you would not have centered, you would have found an (uninformative) higher correlation
summary(lmer(RT ~ Trial + NativeLanguage + Frequency + (1+Frequency|Subject) + (1|Word), data=lexdec3))

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage + Frequency + (1 + Frequency | Subject) +      (1 | Word)
##    Data: lexdec3
## 
## REML criterion at convergence: -1285
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2959 -0.6480 -0.0974  0.5262  4.4710 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  Word     (Intercept) 0.002397 0.04896       
##  Subject  (Intercept) 0.020010 0.14146       
##           Frequency   0.000186 0.01364  -0.51
##  Residual             0.022355 0.14952       
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)          6.522e+00  4.736e-02  137.72
## Trial               -1.908e-04  8.157e-05   -2.34
## NativeLanguageOther  1.311e-01  5.408e-02    2.42
## Frequency           -3.853e-02  6.059e-03   -6.36
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO
## Trial       -0.185              
## NtvLnggOthr -0.488  0.001       
## Frequency   -0.628  0.006 -0.002

# IMPORTANT REMARK: factorial/binary predictors used as random slope ALWAYS need to be put in the same block 
#                   (i.e. correlated) with the random intercept (as they are contrasted w.r.t. the intercept).

# C4. Plot the random slopes (based on code on p. 249 of Baayen, 2008)
myXYPlot(lexdec3,"RT","cFrequency","Subject",mixedModel5)

# C5. Plot the sorted coefficients - from high negative (subjects are very fast for higher frequency words)
# to less negative (subjects are less fast for higher frequency words)
par(mfrow=c(1,1))
myCoefPlot(mixedModel5,"Subject","cFrequency")

# C6. Compare both plots, the first shows the lines, the second only the slope of the line
myXYPlot(lexdec3,"RT","cFrequency","Subject",mixedModel5)

x11() # opens new plot window, use quartz() when on Mac OS
myCoefPlot(mixedModel5,"Subject","cFrequency")

# alternatively use: par(mfrow=c(1,2)) and then call both plot functions without x11()

# C7. Now we test if the effect of native language varies per word 
# Since this is a factor variable, it ALWAYS needs to be put in the model together with the random intercept
# in one block (note this will really NOT converge)
mixedModel6 = lmer(RT ~ Trial + NativeLanguage + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1+NativeLanguage|Word), 
                   data=lexdec3)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad|
## = 0.372856 (tol = 0.002, component 3)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge: degenerate
## Hessian with 1 negative eigenvalues

# C8. It did not converge. We can now try again with different optimizers. The default optimizer is
# 'bobyqa'. Another is 'Nelder_Mead':
mixedModel6a1 = lmer(RT ~ Trial + NativeLanguage + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1+NativeLanguage|Word), 
                     data=lexdec3, control = lmerControl(optimizer="Nelder_Mead")) # this does converge

# And another one is the optimizer of the old lme4 (version 0.9):
mixedModel6a2 = lmer(RT ~ Trial + NativeLanguage + cFrequency+ (1|Subject) + (0+cFrequency|Subject) + (1+NativeLanguage|Word), 
                     data=lexdec3, control=lmerControl(optimizer="nloptwrap")) # also converges, the other warning can be ignored

## Warning in data("nloptr.default.options", package = "nloptr", envir = ee): data set 'nloptr.default.options' not found

# As the two models converged, their summary and AIC is effectively identical
summary(mixedModel6a1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (0 +  
##     cFrequency | Subject) + (1 + NativeLanguage | Word)
##    Data: lexdec3
## Control: lmerControl(optimizer = "Nelder_Mead")
## 
## REML criterion at convergence: -1291.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3662 -0.6569 -0.1077  0.5200  4.4829 
## 
## Random effects:
##  Groups    Name                Variance  Std.Dev. Corr
##  Word      (Intercept)         0.0018629 0.04316      
##            NativeLanguageOther 0.0014744 0.03840  0.26
##  Subject   cFrequency          0.0001702 0.01304      
##  Subject.1 (Intercept)         0.0148734 0.12196      
##  Residual                      0.0219844 0.14827      
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)          6.336e+00  3.689e-02  171.76
## Trial               -1.980e-04  8.131e-05   -2.44
## NativeLanguageOther  1.375e-01  5.450e-02    2.52
## cFrequency          -3.652e-02  5.966e-03   -6.12
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO
## Trial       -0.232              
## NtvLnggOthr -0.626  0.001       
## cFrequency   0.004  0.006 -0.001

summary(mixedModel6a2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (0 +  
##     cFrequency | Subject) + (1 + NativeLanguage | Word)
##    Data: lexdec3
## Control: lmerControl(optimizer = "nloptwrap")
## 
## REML criterion at convergence: -1291.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3661 -0.6569 -0.1077  0.5200  4.4828 
## 
## Random effects:
##  Groups    Name                Variance  Std.Dev. Corr
##  Word      (Intercept)         0.0018623 0.04315      
##            NativeLanguageOther 0.0014728 0.03838  0.26
##  Subject   cFrequency          0.0001702 0.01305      
##  Subject.1 (Intercept)         0.0148720 0.12195      
##  Residual                      0.0219848 0.14827      
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)          6.336e+00  3.689e-02  171.76
## Trial               -1.980e-04  8.131e-05   -2.43
## NativeLanguageOther  1.375e-01  5.449e-02    2.52
## cFrequency          -3.652e-02  5.966e-03   -6.12
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO
## Trial       -0.232              
## NtvLnggOthr -0.626  0.001       
## cFrequency   0.004  0.006 -0.001

AIC(mixedModel6a1) - AIC(mixedModel6a2)

## [1] -3.731718e-07

# If the model would not have converged with any of the optimizers, the next step would be to test the correlated random slope
# and intercept, and if this also would not have converged, the final step would be to exclude the random slope completely
# (i.e. stick with mixedModel5) 

# C9. Test if the random slope is necessary
AIC(mixedModel5) - AIC(mixedModel6a1) # it is indeed necessary

## [1] 2.376281

# visualize the random slopes: interesting pattern - non-native speakers are slower for 
# animal words than fruit words
myCoefPlot(mixedModel6a1,"Word","NativeLanguageOther")

###### TESTING INTERACTIONS ######
# D1. We saw that non-native speakers showed a specific pattern in their responses to a word
# Perhaps this is related to word class, so we add this interaction and see if this 
# reaches significance and also influences the random slopes
mixedModel6b = lmer(RT ~ Trial + NativeLanguage + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1+NativeLanguage|Word), 
                    data=lexdec3, control = lmerControl(optimizer="Nelder_Mead"), REML=F)
mixedModel7b = lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1+NativeLanguage|Word), 
                    data=lexdec3, control = lmerControl(optimizer="Nelder_Mead"), REML=F) 

summary(mixedModel7b) # interaction clearly significant

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage * Class + cFrequency + (1 | Subject) +  
##     (0 + cFrequency | Subject) + (1 + NativeLanguage | Word)
##    Data: lexdec3
## Control: lmerControl(optimizer = "Nelder_Mead")
## 
##      AIC      BIC   logLik deviance df.resid 
##  -1315.5  -1251.3    669.7  -1339.5     1546 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4428 -0.6518 -0.1064  0.5217  4.4803 
## 
## Random effects:
##  Groups    Name                Variance  Std.Dev. Corr
##  Word      (Intercept)         0.0017919 0.04233      
##            NativeLanguageOther 0.0007533 0.02745  0.13
##  Subject   cFrequency          0.0002359 0.01536      
##  Subject.1 (Intercept)         0.0134466 0.11596      
##  Residual                      0.0219494 0.14815      
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                                  Estimate Std. Error t value
## (Intercept)                     6.344e+00  3.577e-02  177.37
## Trial                          -1.929e-04  8.109e-05   -2.38
## NativeLanguageOther             1.588e-01  5.237e-02    3.03
## Classplant                     -2.058e-02  1.456e-02   -1.41
## cFrequency                     -4.279e-02  6.267e-03   -6.83
## NativeLanguageOther:Classplant -4.844e-02  1.711e-02   -2.83
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO Clsspl cFrqnc
## Trial       -0.235                            
## NtvLnggOthr -0.620  0.000                     
## Classplant  -0.174 -0.021  0.055              
## cFrequency  -0.041  0.000 -0.005  0.255       
## NtvLnggOt:C  0.067  0.009 -0.147 -0.383  0.019

AIC(mixedModel6b) - AIC(mixedModel7b) # mixedModel7b is an improvement

## [1] 9.31148

# D2. perhaps the random slope of NativeLanguage per word can be dropped now? 
mixedModel7 = lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1+NativeLanguage|Word), 
                    data=lexdec3, control = lmerControl(optimizer="Nelder_Mead")) # REML=T instead of REML=F
mixedModel7simpler = lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1|Word), 
                    data=lexdec3, control = lmerControl(optimizer="Nelder_Mead"))
AIC(mixedModel7simpler) - AIC(mixedModel7)

## [1] -2.139666

# < 2, so the simpler model is better (i.e. the more complex model is not better)

# note that the default optimizer works fine for this method as well:
mixedModel7simpler2 = lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1|Word), 
                          data=lexdec3)
AIC(mixedModel7simpler2) - AIC(mixedModel7simpler) # in principle identical, also compare summaries:

## [1] -1.518606e-08

summary(mixedModel7simpler2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage * Class + cFrequency + (1 | Subject) +  
##     (0 + cFrequency | Subject) + (1 | Word)
##    Data: lexdec3
## 
## REML criterion at convergence: -1289.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4224 -0.6628 -0.1053  0.5141  4.4612 
## 
## Random effects:
##  Groups    Name        Variance  Std.Dev.
##  Word      (Intercept) 0.0020978 0.04580 
##  Subject   cFrequency  0.0002693 0.01641 
##  Subject.1 (Intercept) 0.0148163 0.12172 
##  Residual              0.0221549 0.14885 
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                                  Estimate Std. Error t value
## (Intercept)                     6.344e+00  3.743e-02  169.47
## Trial                          -1.887e-04  8.121e-05   -2.32
## NativeLanguageOther             1.588e-01  5.470e-02    2.90
## Classplant                     -2.109e-02  1.518e-02   -1.39
## cFrequency                     -4.405e-02  6.445e-03   -6.83
## NativeLanguageOther:Classplant -4.936e-02  1.599e-02   -3.09
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO Clsspl cFrqnc
## Trial       -0.225                            
## NtvLnggOthr -0.625  0.000                     
## Classplant  -0.174 -0.021  0.056              
## cFrequency  -0.040  0.000 -0.003  0.251       
## NtvLnggOt:C  0.076  0.009 -0.132 -0.434  0.004

summary(mixedModel7simpler)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + NativeLanguage * Class + cFrequency + (1 | Subject) +  
##     (0 + cFrequency | Subject) + (1 | Word)
##    Data: lexdec3
## Control: lmerControl(optimizer = "Nelder_Mead")
## 
## REML criterion at convergence: -1289.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4224 -0.6628 -0.1053  0.5141  4.4612 
## 
## Random effects:
##  Groups    Name        Variance  Std.Dev.
##  Word      (Intercept) 0.0020978 0.04580 
##  Subject   cFrequency  0.0002693 0.01641 
##  Subject.1 (Intercept) 0.0148167 0.12172 
##  Residual              0.0221549 0.14885 
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                                  Estimate Std. Error t value
## (Intercept)                     6.344e+00  3.743e-02  169.47
## Trial                          -1.887e-04  8.121e-05   -2.32
## NativeLanguageOther             1.588e-01  5.470e-02    2.90
## Classplant                     -2.109e-02  1.518e-02   -1.39
## cFrequency                     -4.405e-02  6.445e-03   -6.83
## NativeLanguageOther:Classplant -4.936e-02  1.599e-02   -3.09
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  NtvLnO Clsspl cFrqnc
## Trial       -0.225                            
## NtvLnggOthr -0.625  0.000                     
## Classplant  -0.174 -0.021  0.056              
## cFrequency  -0.040  0.000 -0.003  0.251       
## NtvLnggOt:C  0.076  0.009 -0.132 -0.434  0.004

# D3. Model criticism
qqp(resid(mixedModel7simpler)) # not perfect at the tails, model criticism is needed

# check heteroscedasticity, no clear increase or decrease in variance should be visible, 
# at least within solid lines
plot(scale(resid(mixedModel7simpler)) ~ fitted(mixedModel7simpler), pch='.',cex=2, ylim=c(-5,5))
abline(h = c(-2.5, 2.5))

# new model with the data points which are problematic for the model removed
lexdec3b = lexdec3[ abs(scale(resid(mixedModel7simpler))) < 2.5 , ]
(1-(nrow(lexdec3b))/nrow(lexdec3))*100 # 2.37% removed

## [1] 2.37484

mixedModel7simpler.v2 = lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1|Subject) + (0+cFrequency|Subject) + (1|Word), 
                    data=lexdec3b)

# much better
qqp(resid(mixedModel7simpler.v2))

# heteroscedasticity is also improved
plot(scale(resid(mixedModel7simpler.v2)) ~ fitted(mixedModel7simpler.v2), pch='.',cex=2, ylim=c(-5,5))
abline(h = c(-2.5, 2.5))

# D4. Bootstrapping
library(boot) # if not present: install.packages('boot')

## 
## Attaching package: 'boot'
## 
## The following object is masked from 'package:lattice':
## 
##     melanoma
## 
## The following object is masked from 'package:car':
## 
##     logit

bs = confint(mixedModel7simpler.v2, method='boot', nsim = 500, level = 0.95)

## Computing bootstrap confidence intervals ...

bs

##                                        2.5 %        97.5 %
## sd_(Intercept)|Word             0.0369660649  5.827004e-02
## sd_cFrequency|Subject           0.0068649707  2.352829e-02
## sd_(Intercept)|Subject          0.0833314083  1.637717e-01
## sigma                           0.1246953444  1.343079e-01
## (Intercept)                     6.2543337166  6.400187e+00
## Trial                          -0.0003048184 -3.080615e-05
## NativeLanguageOther             0.0502063140  2.682469e-01
## Classplant                     -0.0495304280  9.170742e-03
## cFrequency                     -0.0551492650 -3.084523e-02
## NativeLanguageOther:Classplant -0.0746136588 -2.030631e-02

###### CONCLUDING REMARKS ######
# REMARK 1: AIC comparisons are only valid when the models have been fitted on 
#           the same dataset. So if you add a variable with missing values, this
#           is not the case. When you want to see if the addition of this variable
#           is necessary, you need to fit the simpler model also on the dataset
#           where the observations with missing values for the new variable have
#           been excluded. A simple test to see if no data points have been dropped
#           when adding a new variable is:
(length(fitted(mixedModel1)) - length(fitted(mixedModel2))) == 0 # True == OK

## [1] TRUE

# REMARK 2: The explained variance of your model will be very high due 
#           to the random-effects structure, so optimizing your explained variance 
#           is not useful. Instead you should compare AIC values. A decrease of at 
#           least 2 AIC units indicates an improved model.
#
# REMARK 3: Look at the graphical results of your model. They may reveal patterns 
#           suggesting the inclusion of several control variables
#
# REMARK 4: Pay close attention to Baayen (2008: Ch 6) on how to interpret interactions
# 
# REMARK 5: If interested, take a look at myFunctions.R
# 
# REMARK 6: to conduct multiple comparisons (comparing multiple factor levels) the 
#           the library multcomp can be used as follows:
#
# install.packages('multcomp') # if not yet present
library(multcomp)

## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: splines
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:boot':
## 
##     aml
## 
## Loading required package: TH.data

lexdec3$Factor = interaction(lexdec3$Complex,lexdec3$Class)
model = lmer(RT ~ Trial + Factor + (1|Subject) + (1|Word), data=lexdec3)
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ Trial + Factor + (1 | Subject) + (1 | Word)
##    Data: lexdec3
## 
## REML criterion at convergence: -1234
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3679 -0.6550 -0.1055  0.5096  4.4529 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Word     (Intercept) 0.004627 0.06802 
##  Subject  (Intercept) 0.018699 0.13674 
##  Residual             0.022657 0.15052 
## Number of obs: 1558, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)           6.486e+00  5.401e-02  120.11
## Trial                -1.877e-04  8.211e-05   -2.29
## Factorsimplex.animal -9.640e-02  4.573e-02   -2.11
## Factorcomplex.plant  -9.479e-02  5.268e-02   -1.80
## Factorsimplex.plant  -9.380e-02  4.644e-02   -2.02
## 
## Correlation of Fixed Effects:
##             (Intr) Trial  Fctrsmplx.n Fctrc.
## Trial       -0.161                          
## Fctrsmplx.n -0.790  0.004                   
## Fctrcmplx.p -0.685 -0.004  0.809            
## Fctrsmplx.p -0.778 -0.001  0.918       0.797

summary(glht(model,linfct=mcp(Factor = "Tukey"))) # no significant differences among the levels

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = RT ~ Trial + Factor + (1 | Subject) + (1 | Word), 
##     data = lexdec3)
## 
## Linear Hypotheses:
##                                        Estimate Std. Error z value Pr(>|z|)
## simplex.animal - complex.animal == 0 -0.0964002  0.0457348  -2.108    0.137
## complex.plant - complex.animal == 0  -0.0947860  0.0526771  -1.799    0.254
## simplex.plant - complex.animal == 0  -0.0937959  0.0464387  -2.020    0.165
## complex.plant - simplex.animal == 0   0.0016142  0.0310823   0.052    1.000
## simplex.plant - simplex.animal == 0   0.0026043  0.0186394   0.140    0.999
## simplex.plant - complex.plant == 0    0.0009902  0.0321054   0.031    1.000
## (Adjusted p values reported -- single-step method)

# REMARK 7: if you have a dependent variable Y which is binary (1: success, 0: failure)
#           you can follow the same approach as before, but now using logistic regression.
#           The commands are very similar: you use glmer instead of lmer and specify family:
#           glmer(Y ~ ..., family='binomial') 
#           [note that if you need to change the optimizer, use control = glmerControl(...)]
#           [There is also no REML=... option for binomial models, so you always fit the same type of model.]
#           [Residuals also do not need to be normally distributed anymore, so model criticism is not necessary.]
#
#           Alternatively, for logistic regression, you may also specify your dependent 
#           variable as the number of successes and the number of failures (e.g., over
#           a certain time span and stored in columns NrSuccesses and NrFailures) in the following way:
#           glmer(cbind(NrSuccesses,NrFailures) ~ ..., family='binomial')

Replication of the analysis

To run all analyses yourself, you first have to install pandoc, and then you can just copy the following lines to the most recent version of R. You need the packages ‘lme4’, ‘car’, ‘boot’, ‘multcomp’ and ‘rmarkdown’. If these are not installed (the library commands will throw an error), you can uncomment (i.e. remove the hashtag) the first five lines to install them.

#install.packages('lme4',repos='http://cran.us.r-project.org')
#install.packages('car',repos='http://cran.us.r-project.org')
#install.packages('boot',repos='http://cran.us.r-project.org')
#install.packages('multcomp',repos='http://cran.us.r-project.org')
#install.packages('rmarkdown',repos='http://cran.us.r-project.org')
download.file('http://www.let.rug.nl/wieling/statscourse/lecture1/lab/answers/lab-including-answers.Rmd', 'lab-including-answers.Rmd')
library(rmarkdown)
render('lab-including-answers.Rmd') # generates html file with results
browseURL(paste('file://', file.path(getwd(),'lab-including-answers.html'), sep='')) # shows result