Setting up R session

Install or update the latest version of the packages mgcv and itsadug:

install.packages(c("mgcv", "itsadug"))

Load libraries:

library(mgcv)
library(itsadug)
# indicate that messages of itsadug must be printed to the command line
infoMessages("off")

Check R version and package versions:

R.version.string

## [1] "R version 3.2.1 (2015-06-18)"

packageVersion("mgcv")

## [1] '1.8.6'

packageVersion("itsadug")

## [1] '1.0.2'

If these versions do not agree with the ones included in the Lab Assignment file, please let us know.

Random effects (optional)

Download the source file nonlinear_0.1.tar.gz and install the packages by the following commands in R:

download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/packages/nonlinear_0.1.tar.gz", "nonlinear_0.1.tar.gz")
install.packages("nonlinear_0.1.tar.gz", type="source", repos=NULL)

If the package is installed succesfully, you could remove the source file (delete it).

Run the demo about random effects as follows:

library(nonlinear)
demoRandom()

Inspection of the data

Download the data pupilsize_S1.rds and load the data in R:

download.file("http://www.sfs.uni-tuebingen.de/~jvanrij/LSA2015/Data/pupilsize_S1.rds", "pupilsize_S1.rds", mode="wb")
dat <- readRDS('pupilsize_S1.rds')

Inspect the data frame:

head(dat)
str(dat)

The data set is a subset of the data, with only 10 participants included in order to run some more complex models.

The column Pupil contains the raw pupil dilation measure (blinks are removed). The data is sampled with 500 Hz, but downsampled to 50 Hz (i.e., a measurement each 20 ms).

Plot the grandaverage of the data. You can use your own code, or complete the code in the R block below.

# First we group the timestamps in bins of 100 ms:
dat$Timebin <- timeBins(dat$Time, 100)

# Calculate the grand averages with tapply or aggregate for the different timebins.
# Complete the code: 
#   - replace the dots in line A with the column of dat 
#     that contains the data to be averaged.
#   - replace the dots in line B with the column(s) of dat 
#     thatgroup the data.

avg <- aggregate( dat$Pupil,    # line A
      list( Timebin=dat$Timebin ),         # line B
      mean, na.rm=TRUE)

# Plot the averages:
# Complete the code: 
#   - replace the dots in line A with the values to plot on the x-axis
#   - replace the dots in line B with the values to plot on the y-axis
plot( avg$Timebin ,  # line A
      avg$x ,  # line B
      type='l', main="Grand average",
      xlab='Time (ms)', ylab='Pupil size',
      ylim=c(1000,1200), 
      lwd=2, 
      bty='n')
# Note: see help(par) for info about the different parameters

Alternative ways to use aggregate to plot the data:

avg <- aggregate( Pupil ~ Timebin, data=dat, mean, na.rm=TRUE)
# or:
avg <- aggregate( Pupil ~ Timebin + Context, data=dat, mean, na.rm=TRUE)

Example of plotting two lines, one for each condition:

# First we group the timestamps in bins of 100 ms:
dat$Timebin <- timeBins(dat$Time, 100)

# Calculate the grand averages with tapply or aggregate for the different timebins.
# Complete the code: 
#   - replace the dots in line A with the column of dat 
#     that contains the data to be averaged.
#   - replace the dots in line B with the column(s) of dat 
#     thatgroup the data.

avg <- aggregate( dat$Pupil,    # line A
      list( Timebin=dat$Timebin, Context=dat$Context),         # line B
      mean, na.rm=TRUE)

# Plot the averages:
# Complete the code: 
#   - replace the dots in line A with the values to plot on the x-axis
#   - replace the dots in line B with the values to plot on the y-axis
plot( avg[avg$Context=="AP", ]$Timebin ,  # line A
      avg[avg$Context=="AP", ]$x ,  # line B
      type='l', main="Grand averages",
      xlab='Time (ms)', ylab='Pupil size',
      ylim=c(1000,1200), 
      lwd=2, 
      bty='n')
lines(avg[avg$Context=="PA", ]$Timebin ,  # line A
      avg[avg$Context=="PA", ]$x, lwd=2, lty=2 )
legend("bottomright",
       legend=c("AP", "PA"),
       lwd=2, lty=c(1,2))
# Note: see help(par) for info about the different parameters

What does the Timebin column represent? In case you want to see the code for calculating the timebins type timeBins in the command line. Why would it sometimes be useful to plot Timebin rather than raw timestamps?

The timebin column groups the timestamps in larger time bins of 100 ms. This makes the line smoother and allows for inspection of the major effects.

Inspect the plot. Why would there be such a sudden increase in dilation after 3000 ms? Hint: type tapply(dat$Time, list(dat$Event), max) in the commandline.

Most time series run until 3000 ms, so the averages after 3000 ms are based on limited data points. That’s why these might go more extreme.

Baseline model

a. Smooth

Check the numer of rows in the data. With more than 10.000 data points you should use the function bam() rather than gam(). Also when you’re planning to include many random smooths and other complex model terms, bam() is more efficient.

nrow(dat)

## [1] 44093

Run the model with a nonlinear effect for Time. Replace the dots ... with a nonlinear smooth for Time.

m1 <- bam( Pupil ~ s(Time), data=dat  )

Inspect the summary (summary(m1)).

summary(m1)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pupil ~ s(Time)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1090.137      1.963   555.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(Time) 5.013  6.114 58.81  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.00808   Deviance explained = 0.819%
## fREML = 3.2806e+05  Scale est. = 1.6979e+05  n = 44093

What does the intercept represent?
The intercept is a contant value to be added to the other modelterms (here only s(Time)). Because we don’t have any categorical predictors included, it is roughly the mean value of pupil dilation (1089.92).

What does the summary of smooth terms tell about the wigglyness of the smooth? (See the column edf)

The edf is 5, which means that the smooth is not a straight line (in that case, the edf would be 1).
What does it mean that the smooth term is significant?

The smooth is at some point significantly different from a flat horizontal zero line. The confidence intervals of the smooth do not contain the zero line.

Use the following code to visualize the smooth.

par(mfrow=c(1,2), cex=1.1)

# Partial effect plot:
plot(m1, select=1, shade=TRUE)
abline(h=0)

# Fitted values:
plot_smooth(m1, view="Time")

What is the difference between the two plots? What do the plots represent?

The first plot visualizes the partial effect of the smooth of Time. So it selectively visualizes the first line of the smooth term summary, but no other terms are included. The second plot visualizes the summed or fitted effects, which are the model’s predictions. This plot sets every value in the model to a certain value and sums all these effects.

b. Random intercepts

Participants differ considerably in pupil dilation size. Visualize the individual patterns. You could use the code below, or use another plotting method.

boxplot(Pupil ~ Subject, data=dat)

# or alternatively:
library(lattice)
xyplot(Pupil ~ Time | Subject, pch=16, cex=.5, data = dat)

To take into account differences between participants we could add a random intercept for each subject to the model. Replace the dots ... with the random intercepts for Subject and random intercepts for Item.

m2 <- bam( Pupil ~ s(Time) 
           + s(Subject, bs='re')
           + s(Item, bs='re'), data=dat  )

Inspect the summary (summary(m2)).

What does the intercept represent? The intercept is a contant value to be added to the other modelterms (here only s(Time)).
What does a random intercept terms represent? Adjustments from the intercept for each subject and item, drawn from a normal distribution. For example, a negative value of a random intercept moves the whole fitted effects for that particular subject downwards. These random effects account for differences between subjects and items in average pupil dilation size.

Use the following code to visualize the smooth.

# make sure to have messages on:
infoMessages("on")
# split plo window in two:
par(mfrow=c(1,2), cex=1.1)

# Partial effect plot:
plot(m2, select=1, shade=TRUE)
abline(h=0)

# Fitted values:
plot_smooth(m2, view="Time")

## Summary:
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 3243.000000. 
##  * Subject : factor; set to the value(s): s08. 
##  * Item : factor; set to the value(s): 30.

What do the plots represent? (Also pay attention to the summary in the command window for the plot_smooth function.) The first plot visualizes the partial effect of Time, i.e. the first line of the smooth term summary. The second plot visualizes the summed effects over time, i.e., the intercept and values for Subject and Item are included. The message in the commandline suggests that the plot is the fitted effect of Time for subject “s08” and item “30”.
Run the plot_smooth function again, but now add rm.ranef=TRUE as argument: plot_smooth(m2, view="Time", rm.ranef=TRUE). Why did the plot change so much?

plot_smooth(m2, view="Time", rm.ranef=TRUE)

## Summary:
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 3243.000000. 
##  * Subject : factor; set to the value(s): s08. (Might be canceled as random effect, check below.) 
##  * Item : factor; set to the value(s): 30. (Might be canceled as random effect, check below.) 
##  * NOTE : The following random effects columns are canceled: s(Subject),s(Item)
##

Now the effects of subject and item (intercept adjustments) have been canceled out. Now the confidence intervals have increased, because this plot now shows the uncertainty around the interval. In the previous plot the confidence bands were smaller because the model was more certain about the intercept of the specific subject and item combination.

We now inspect the random effects. Run the following code.

par(mfrow=c(1,3), cex=1.1)

# Partial effect plot:
plot(m2, select=2)

#  get the values for individual subjects:
rand <- get_random(m2)
# note that get_random returns a list with
# the random adjustments for each random 
# intercept or slope:
rand

## $`s(Subject)`
##        s01        s02        s03        s04        s05        s07 
## -101.35582 -236.97476   39.30879   69.15500 -702.93196 -203.94801 
##        s08        s09        s10        s06 
##  386.88710  -55.72687  849.04654  -43.46002 
## 
## $`s(Item)`
##          1          2          3          4          5          6 
## -22.382871 -45.003660  26.324149  17.031028  14.617727 -39.481939 
##          7          8          9         10         11         12 
## -30.403224 -23.352711  27.480479  26.427325  24.066908  82.179005 
##         13         14         15         16         17         18 
## -79.583049   7.162514  53.285312 110.270764 -43.767303 -51.976279 
##         19         20         21         22         23         24 
## -62.005568 -48.999765  13.031326  18.448216  76.034841 -17.816977 
##         25         26         27         28         29         30 
## -37.125977  16.777733  -6.531431 -28.830664  23.932075  34.731534 
##         31         32 
## -42.589029   8.049511

# similar plot with subj names added:
xy <- qqnorm(rand[[1]], main='s(Subject)', 
    col='steelblue', pch=16, bty='n') # <- some additional arguments that could be left out
qqline(rand[[1]])

text( xy$x, rand[[1]], labels=names( rand[[1]] ),
      pos=3, col="steelblue", xpd=TRUE) # <- some additional arguments that could be left out

# same for items:
xy <- qqnorm(rand[[2]], main='s(Item)', 
    col='steelblue', pch=16, bty='n') # <- some additional arguments that could be left out
qqline(rand[[2]])

text( xy$x, rand[[2]], labels=names( rand[[2]] ),
      pos=3, col="steelblue", xpd=TRUE) # <- some additional arguments that could be left out

What does this plot represent? What values are on the x-axis and what values are on the y-axis?

The plots show random adjustments in intercepts for Subject and Item. With negative values for a random intercept, the whole fitted effects become lower with this constant value.

The random intercept component s(Subject, bs="re") equals (1|Subject) in lmer.

Describe what these plots tell about the pupil dilation of subjects s05, s10 and s06.

1. s05 has a very negative value as random intercept, which means a negative adjustment of the intercept. So this participants has generally lower pupil dilation size.

2. s10 has a very positive value as random intercept, which means a positive adjustment of the intercept. So this participants has generally much higher pupil dilation size.

3. The value of s06 is very close to 0, which means only a small adjustment of the intercept. So this participants has a pupil dilation size close to the average.

The following plotting function produces plots that compare the fitted effects with the data. Run the code. The most important line of the function is the first command dat$fit <- fitted(model): The function stores the fitted values (model predictions) as a separate column in the data.

# This is a simple function for comparing the 
# fitted effect with the data:
plot.fitted <- function(model, subject, n=3, column="Pupil"){
  dat$fit <- fitted(model)

  # extract the items for this subject:
  items <- unique( as.character(dat[dat$Subject==subject, "Item"] )) 
  # if n > length of items, then take length of items:
  n <-  min(n, length(items))
  # select n random items
  items <- sample(items,n)
  # select these items from the data:
  tmp <- droplevels( dat[dat$Subject==subject & dat$Item %in% items, 
                         c("Time", "Item", column, "fit")])
  # set up empty plot window
  emptyPlot(range(tmp$Time), range(tmp[,column]), main=subject)
  # plot the items and their fitted effects:
  for(i in items){
    lines(tmp[tmp$Item==i,]$Time, tmp[tmp$Item==i,column], col='darkgray', xpd=TRUE) 
    lines(tmp[tmp$Item==i,]$Time, tmp[tmp$Item==i,]$fit, col='red', xpd=TRUE) 
  }
}
## end of function

You can run the function with a subject as input. The optional argument n tells the function how many items to plot of that particular subject (defaults to 3):

par(mfrow=c(1,2))
plot.fitted(m2, 's01')

plot.fitted(m2, 's10', n=15)

Why does the fitted effect pattern look different in each new plot?

The y-limits are different for each plot. With a smaller y-axis range, the pattern in the curves is more pronounced. However, the trend over time is the same for all participants.

What model terms are the same for all subjects and what is different?

Same: Intercept, s(Time), adjustments of intercepts for Items s(Item) is also the same for all participants Different: adjustments of intercepts for Subjects s(Subject) - each subject has a different adjustment

c. Random smooths

Run the following model (ignore the warning) with random smooths over Time for each participant. Include similar random smooths for Items.

m3 <- bam( Pupil ~ s(Time) 
           + s(Time, Subject, bs='fs', m=1)
           + s(Time, Item, bs='fs', m=1), data=dat  )

## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.

Inspect the summary (summary(m3)).

summary(m3)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pupil ~ s(Time) + s(Time, Subject, bs = "fs", m = 1) + s(Time, 
##     Item, bs = "fs", m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1086        128   8.483   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf  Ref.df        F  p-value    
## s(Time)           4.701   5.099    4.998 0.000128 ***
## s(Time,Subject)  74.230  89.000 6244.853  < 2e-16 ***
## s(Time,Item)    201.938 287.000   30.866  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.928   Deviance explained = 92.9%
## fREML = 2.7052e+05  Scale est. = 12244     n = 44093

What can we conclude about the wigglyness of the smooth for Timebin? Is it still significant?

The smooth has become less wiggly (around 4 edf), but is still nonlinear. The smooth is significantly different from 0.

Why do the random smooths terms have such high edf value?

These terms fit 10 smooths, one for each participant, and 32 smooths, one for each item.

Use the following code to visualize the smooth.

par(mfrow=c(1,3), cex=1.1)

# Model term 1
plot(m3, select=1, shade=TRUE, scale=0)
abline(h=0)

# Model term 2
plot(m3, select=2)
abline(h=0)

# Summed effect of time:
plot_smooth(m3, view="Time", plot_all="Subject", rug=FALSE)

## Summary:
##  * Time : numeric predictor; with 30 values ranging from 0.000000 to 3243.000000. 
##  * Subject : factor; set to the value(s): s01, s02, s03, s04, s05, s06, s07, s08, s09, s10. 
##  * Item : factor; set to the value(s): 30.

What do the different plots represent?

The first plot visualizes the partial effect of the main effect of Timebin (without intercept or other modelterms included), the second plot visualizes the partial effect of the random smooths for Subject over Time. This are the random adjustments of the general time smooth for each subject. The last plot shows the summed effects for each subject. That includes for every line: the intercept, the main trend of Timebin, and a random adjustment for that participant over Timebin, and a random adjustment for an arbitrary item (30 according to the command line summary).

Side step: flexible visualization functions (advanced)

The following three plots show the separate components that build up the third plot. Note that with the function inspect_random the individual item plots could be visualized.

par(mfrow=c(1,3), cex=1.1)

# s(Time) + intercept
# no random effects
plot_smooth(m3, view="Time",
            main="intercept + s(Time)",
            ylim=c(500,2000),
            rm.ranef=TRUE, rug=FALSE)

# random effect for item 30:
inspect_random(m3, select=3, cond=list(Item=30), lwd=2,
          main="+ Item 30", ylim=c(-750,750))

# random effects for subjects:
inspect_random(m3, select=2, main="+ s(Timebin, Subject)", ylim=c(-750,750), xpd=TRUE)

How could we plot the effects for each subject without the random effects for items included? With the argument rm.ranef=TRUE, both Subject and Item effects are removed.

The function get_predictions from the package itsadug (which in turn is a wrapper around mgcv’s predict.gam) allows for retrieving model predictions with specific random effects being canceled. The rm.ranef argument accepts a logical value or a regular expression when specific effects need to be removed. Make sure that this expression only holds for the random effect to remove, and not by accident applies to a fixed effect.

par(mfrow=c(1,2), cex=1.1)

# same time values as previous plots:
times <- seq(0,3243, length=30) 

emptyPlot(range(times), c(400,2000), 
          main="No item effects", xlab='Time (ms)', 
          ylab="Summed effect, excl. item effect")

subj <- unique(dat$Subject)
for(su in 1:length(subj)){
    p <- get_predictions(m3, 
          cond=list(Time=times, Subject=subj[su]),
          rm.ranef="Item")
    plot_error(p$Time, p$fit, p$CI, shade=TRUE, 
          col=rainbow(length(subj))[su], lwd=2, xpd=TRUE)
}

# For comparison:
# plot with item effects included:
plot_smooth(m3, view="Time", plot_all="Subject", rug=FALSE,
          main="Incl. Item 30", xlab='Time (ms)', 
          ylab="Summed effect, incl. item effect")

If you use predictions like this, make sure that your predictions make sense.

How do the random smooths change the effect of pupil dilation over time?

The random smooths capture differences in the pupil dilation average (intercept differences) and differences in trend over time.

Why did we not include a random intercept for subjects together with the random smooth for subjects? The random smooths can also model intercept adjustments.

Use the same function as before to inspect the fitted effects:

par(mfrow=c(1,1))
plot.fitted(m3, 's01', n=10)

Although the random smooths capture some of the variation between participants, they do not capture all the variation within subjects.

In addition to the random smooths over Time, we could include random intercept for Event (unique Trial and Subject combination).

Run the following model (may take some time):

m4 <- bam( Pupil ~ s(Time) 
           + s(Time, Subject, bs='fs', m=1) 
           + s(Time, Item, bs='fs', m=1) 
           + s(Event, bs='re') , data=dat  )

## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.

Use the following code to visualize the smooth terms

par(mfrow=c(1,3), cex=1.1)

# Model term 1
plot(m4, select=1, shade=TRUE, scale=0)
abline(h=0)

# Model term 2
plot(m4, select=2)
abline(h=0)

# Model term 3
plot(m4, select=3)
abline(h=0)

Inspection of the fitted effects:

par(mfrow=c(1,1))
plot.fitted(m4, 's01', n=15)

Why is it useful to add intercepts for each Event rather than for each Item? Why would the random effects of Subject and Item in many time series data not be sufficient?

Random smooths for Subject capture the average pattern for each participant, random smooths for Items capture the average pattern for each item. However, a time series consist of multiple measurements which are not only determined by the averages of participants and items, but are also influenced by events happening in that particular time series (e.g., associations with stimuli, or still thinking about the previous trial). If these individual time series show a lot of variation, as in pupil dilation data, the individual time series are not fit very well by the model. This will result in autocorrelation, because the model fit deviates from the data in a systematic way. Random intercepts for individual time series reduce the distance between the data and the model fit, thereby reducing the size of the residuals.

What is your opinion on the model fit? What random effect would improve the model further?

Different answers are possible. Our opinion: the model fit has improved with the event intercept adjustments, but a random smooth for each Event instead of the Subject and Item random smooths would maybe capture additional variation within each time series. However, this takes a long time to run and is for many data sets simply not possible.

d. Model criticsm * assignment *

Assumptions of regression:

1. Residuals are normally distributed

2. There is no structure in the residuals (independent observations)

To test whether this model meets these assumptions, we inspect the residuals of the model.

Plot the distribution of the residuals in a QQ-norm plot:

qqnorm(resid(m4))
qqline(resid(m4))

Plot and ACF plot of the residuals:

par(mfrow=c(1,1))
acf(resid(m4))

Shortly describe the Q-Q plot: what do the axis represent, and how could a Q-Q plot be interpreted.

On x-axis: quantiles from normal (Gaussian) distribution. On y-axis: quantiles from the residuals of the model.

The straight line is where the residual quantiles would be if they would follow a normal distribution.

Describe the pattern in the QQ plot. What do you conclude with respect to the distribution of the residuals? Is this pattern good or bad?

The residuals clearly do not follow a normal (Gaussian) distribution: the most negative residuals are more negative than expected from a normal distribution, and the extreme positives are more positive than expected. This looks more like a t-distribution, which is steeper in shape than the normal distribution, as illustrated in the two QQ-plots (and associated density plots) below.

Shortly describe the ACF: what do the axis represent, and how could an ACF plot be interpreted.

An ACF plot visualizes the autocorrelation in the residuals. On the x-axis are the Lags, which is the distance between the residuals. The y-axis shows the average correlation between data points. Lag of 0 represents the correlation between the data points (i.e., the residuals of the model) and themselves, which is always equal to 1. For Lag 1 the average correlation between all the residuals and all the residuals’ next data points. For Lag 2 the the average correlation between all the residuals and all the residuals of two measurements further. So basically Lag n equals $corr(residuals_{t}, residuals_{t+n})$. Here are some examples:

# lags 0, 1, 2, and 5:
tmp <- data.frame(t0 = resid(m4),
                  t1 = move_n_point(resid(m4), n=1),
                  t2 = move_n_point(resid(m4), n=2),
                  t5 = move_n_point(resid(m4), n=5))
# move_n_points shifts the values n points forward

# Print ACF values:
print(acf(resid(m4), plot=FALSE))

## 
## Autocorrelations of series 'resid(m4)', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.977  0.953  0.925  0.893  0.859  0.822  0.784  0.744  0.704 
##     10     11     12     13     14     15     16     17     18     19 
##  0.663  0.623  0.583  0.543  0.505  0.468  0.432  0.398  0.365  0.333 
##     20     21     22     23     24     25     26     27     28     29 
##  0.303  0.275  0.248  0.222  0.198  0.175  0.152  0.132  0.112  0.092 
##     30     31     32     33     34     35     36     37     38     39 
##  0.073  0.056  0.038  0.022  0.006 -0.010 -0.025 -0.039 -0.053 -0.066 
##     40     41     42     43     44     45     46 
## -0.079 -0.090 -0.102 -0.113 -0.123 -0.133 -0.142

# Compare correlations
# (if necessary, remove NA's in first row(s))

# Lag 0:
cor(tmp$t0, tmp$t0)

## [1] 1

# Lag 1:
tmp1 <- tmp[2:nrow(tmp),]
cor(tmp1$t0, tmp1$t1)

## [1] 0.9774553

# Lag 2:
tmp2 <- tmp[3:nrow(tmp),]
cor(tmp2$t0, tmp2$t2)

## [1] 0.9530716

# Lag 5:
tmp5 <- tmp[6:nrow(tmp),]
cor(tmp5$t0, tmp5$t5)

## [1] 0.8588949

Describe the pattern in the ACF plot. What do you conclude with respect to the independence of the residuals? Is there structure in the residuals?

The residuals are clearly not independent, because there is a strong correlation between the residuals and the residuals of following data points (Lag 1 correlation of .977). So there is some structure in the residuals that is not captured by the regression model.

We can clearly see this structure when plotting the residals over Time: we can clearly recognize individual timeseries in the residuals (which should not be the case if the residuals are randomly distributed).

plot(dat$Time, residuals(m4), 
     pch=16, col=alpha(1), cex=.75, # layout points
     ylab = "Residuals", xlab="Time", # labels
     bty='n')
abline(h=0)

To account for auctorrelation in residuals, we could include an AR1 model in the GAMM.

First, we need to generate a ‘start event’ column:

dat <- start_event(dat, event="Event")

Inspect the new column. What does the column indicate? Why does the AR1 model need this information?

head(dat[, c("Subject", "Item", "Time", "start.event")])

##   Subject Item Time start.event
## 1     s01    1    2        TRUE
## 2     s01    1   22       FALSE
## 3     s01    1   42       FALSE
## 4     s01    1   62       FALSE
## 5     s01    1   82       FALSE
## 6     s01    1  102       FALSE

# one true and rest false for each event:
head( table(dat$Event, dat$start.event) )

##        
##         FALSE TRUE
##   s01.1   147    1
##   s02.1   142    1
##   s03.1   142    1
##   s04.1   147    1
##   s05.1   142    1
##   s07.1   142    1

The first time step of each trial and subject combination (i.e., of each individual time series) is set to TRUE and all other data points to FALSE to mark the start of each individual time series.

The AR1 model needs this information, because without this information it will assume that all datapoints are connected and correlated. However, between each time series there might have been some time without recording data, so the data points of different time series are much less correlated. Including an AR.start column which values are correlated and where the breaks between timeseries are.

Second, we need to have a start value for the parameter rho, which indicates the amount of autocorrelation in the residuals. A good starting point is the Lag 1 value:

rho <- acf(resid(m4), plot=FALSE)$acf[2]
rho

## [1] 0.9774546

Run the model again with AR1 model included:

m4b <- bam( Pupil ~ s(Time) 
          + s(Time, Subject, bs='fs', m=1) 
          + s(Time, Item, bs='fs', m=1) 
          + s(Event, bs='re') 
          , data=dat, AR.start=dat$start.event, rho=rho  )

## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.

Inspect the summary and the plots to see what has changed in the model.

summary(m4)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pupil ~ s(Time) + s(Time, Subject, bs = "fs", m = 1) + s(Time, 
##     Item, bs = "fs", m = 1) + s(Event, bs = "re")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1070.3      127.3   8.407   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                    edf Ref.df         F  p-value    
## s(Time)           1.00      1 1.102e+01 0.000901 ***
## s(Time,Subject)  83.04     89 4.944e+06  < 2e-16 ***
## s(Time,Item)    223.47    287 8.064e+03 2.89e-15 ***
## s(Event)        286.14    308 2.873e+02  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.976   Deviance explained = 97.7%
## fREML = 2.471e+05  Scale est. = 4053.1    n = 44093

summary(m4b)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pupil ~ s(Time) + s(Time, Subject, bs = "fs", m = 1) + s(Time, 
##     Item, bs = "fs", m = 1) + s(Event, bs = "re")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1078.5      127.7   8.444   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf  Ref.df        F  p-value    
## s(Time)           8.229   8.335    13.22  < 2e-16 ***
## s(Time,Subject)  80.082  89.000 18748.90  < 2e-16 ***
## s(Time,Item)    240.571 287.000    32.96 2.91e-11 ***
## s(Event)        268.872 308.000    15.40  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.972   Deviance explained = 97.2%
## fREML = 1.5454e+05  Scale est. = 1362.2    n = 44093

The most striking differences in the summaries are that including the AR1 model resulted in a more wiggly smooth for Time (edf 1 (linear) changed to edf 8.3), and a higher intercept (from 1070.3 to 1078.5).

par(mfrow=c(1,2))
plot(m4, select=1, scale=0, rug=FALSE,
     main="Model m4", shade=TRUE)
abline(h=0)
plot(m4b, select=1, scale=0, rug=FALSE,
     main="Model m4b", shade=TRUE)
abline(h=0)

The plots confirm the more wiggly pattern in the model including the AR1 model.

In addition some of the random effects have changed. Below the comparison for the Item smooths are plotted:

par(mfrow=c(1,2))
plot(m4, select=3, rug=FALSE,
     main="Model m4", shade=TRUE, ylim=c(-100,100))
abline(h=0)
plot(m4b, select=3, rug=FALSE,
     main="Model m4b", shade=TRUE)
abline(h=0)

Two plots of the new residuals:

par(mfrow=c(1,2))

acf(resid(m4b), main='Residuals')
acf_resid(m4b, main='Corrected residuals')

What do these plots represent? Did the AR1 model reduce the autocorrelation?

- Left plot: residuals of the model. - Right plot: residuals of the model after correction for autcorrelation by the AR1 model. This plot shows that the AR1 model reduced the autocorrelation, but did not eliminate it completely.

Use the function compareML to compare the two models, m4 and m4b. Which one is preferred?

compareML(m4, m4b)

## m4: Pupil ~ s(Time) + s(Time, Subject, bs = "fs", m = 1) + s(Time, 
## 
## m4b: Pupil ~ s(Time) + s(Time, Subject, bs = "fs", m = 1) + s(Time, 
##  m4:     Item, bs = "fs", m = 1) + s(Event, bs = "re")
## 
## m4b:     Item, bs = "fs", m = 1) + s(Event, bs = "re")
## 
## Model m4b preferred: lower fREML score (92565.223), and equal df (0.000).
## -----
##   Model    Score Edf Difference    Df
## 1    m4 247102.6   8                 
## 2   m4b 154537.4   8 -92565.223 0.000

## Warning in compareML(m4, m4b): AIC is not reliable, because an AR1 model
## is included (rho1 = 0.000000, rho2 = 0.977455).

Model m4b is clearly preferred, because it has a considerably lower fREML score, although the degrees of freedom are the same.

With the argument n, the function acf_resid compares the ACF Lag 1 value for different events.

acf_resid(m4b, split_pred=list(Event=dat$Event), n=9)

## Quantiles to be plotted:
##          0%       12.5%         25%       37.5%         50%       62.5% 
## -0.31079713 -0.02105642  0.02197881  0.05849034  0.09767531  0.15852502 
##         75%       87.5%        100% 
##  0.23152098  0.32193121  0.79064821

Each of the panels is representive for a quantile of the data (1/9th part). These plots indicate that there are some extreme events with high autocorrelation, for which including the AR1 model did not work.

The current implementation of AR1 model may be too simple to account for all autocorrelation problems in the data. Given the 9 plots, what would be a useful extansion / adjustment of the AR1 model?
The nine plots show that some events still show highly correlated residuals, although the AR1 model accounted for that. On the other hand, the first plot shows a negative correlation at Lag1, which indicates too much correction for autcorrelation. Together these plots suggests large variability in the events in how well the model fitted the data (and consequently how correlated the residuals are).

A different value of $\rho$ for each subject, or maybe even for each time series event, would be better than just one value for all trials and subjects together.

Assignment

Hand in the R code, R output, the plots, and the answers on the questions in the section ‘d. Model criticism’.

Deadline: before next class.

Format: html document (R Markdown)

An easy way to include everything in a single document is by using R markdown.

Quick instructions for R Markdown in R Studio: www.sfs.uni-tuebingen.de/~jvanrij/Rinfo.html)

When using NOT using R Studio, try the following steps:

download.file('http://www.let.rug.nl/~wieling/statscourse/lecture4/lab/answers/lab-including-answers.Rmd', 'analysis.Rmd')
library(rmarkdown)
render('analysis.Rmd') # generates html file with results
browseURL(paste('file://', file.path(getwd(),'analysis.html'),
sep='')) # shows result

Alternatively you could hand in a pdf that contains all the information.

The assignments can be send by mail to vanrij.jacolien ‘AT’ gmail.com, with “LSA lab 1” in the subject line.

Further information for pupil dilation analysis (not part of assignment)

Note: nonlinear interactions and the use of the ti() smooth will be explained in the next class.

Baseline

Often we are interested in the change in pupil size after a stimulus. In these case it might be useful to work with the baselined pupil size. Instead, we could also include baseline as a fixed-effect predictor to account for pupil size at the start of the analysis window:

m5 <- bam( Pupil ~ s(Time) 
          + s(baseline) + ti(baseline, Time)
          + s(Time, Subject, bs='fs', m=1) 
          + s(Time, Item, bs='fs', m=1) 
          + s(Event, bs='re') , data=dat  )

Note that the effects are highly similar to a model of baselined pupil dilation:

m5b <- bam( Pupil_base ~ s(Time) 
          + s(baseline) + ti(baseline, Time)
          + s(Time, Subject, bs='fs', m=1) 
          + s(Time, Item, bs='fs', m=1) 
          + s(Event, bs='re') , data=dat  )

The predictor baseline and the interaction between baseline and Time are included to account for differences in the trend of pupil dilation depending on the size of the pupil at the start of the trial.

Gaze position

To account for the effect of gaze position on pupil dilation, we include an interaction between X position and Y position of the fixation. As both predictors are on the same scale (pixels on screen), we could use the function s() for modeling the interaction.

m6 <- bam( Pupil ~ s(Time) 
          + s(baseline) + ti(baseline, Time)
          + s(Xgaze, Ygaze)
          + s(Time, Subject, bs='fs', m=1) 
          + s(Time, Item, bs='fs', m=1) 
          + s(Event, bs='re') , data=dat  )

Note that this is not included as a random effect by subject, but rather as a fixed effect to reduce the processing time for the model.

Inspect the interaction:

par(mfrow=c(1,2), cex=1.1)
plot(m6, select=2)

# same plot but now with colors:
pvisgam(m6, select=4, view=c("Xgaze", "Ygaze"))
# make white the areas where no data was observed:
fadeRug(dat$Xgaze, dat$Ygaze)
# indicate the center of the screen:
abline(v=1024/2, lty=3)
abline(h=768/2, lty=3)

LSA | Advanced Analysis of Time series data

Lab session 1: Pupil dilation (with answers)

Jacolien van Rij and Martijn Wieling

Setting up R session

Random effects (optional)

Inspection of the data

Baseline model

a. Smooth

b. Random intercepts

c. Random smooths

d. Model criticsm * assignment *

Assignment

Further information for pupil dilation analysis (not part of assignment)

Baseline

Gaze position