The data we will use in the first part is a hypothetical study on child language acquisition (created by a colleague of mine and slightly adapted by me). We want to investigate the effects of amount of time spend in front of TV to two-year-old children's language development. The response variable in this data set, cdi
, is a standard measure of children's language abilities based on parental reports. The predictor of interest is tv.hours
, which is the weekly hours of TV time for each child.
You will have to fill in most commands yourself, but this should be feasible given the slides of the lecture, which can be viewed here: https://www.let.rug.nl/wieling/Statistics/Regression. While you can just enter the commands in RStudio, it is also possible to modify the source of this so-called R-markdown file directly in RStudio and press the "Knit HTML" button to generate an html file which contains both the commands you've used and their output. You can download the file to your current working directory in R by pasting the following command: download.file('http://www.let.rug.nl/wieling/Statistics/Regression/lab/lab.Rmd', 'lab.Rmd')
. You can then open this file in RStudio. In this file, all R commands which are located within chunks (beginning and ending with three backticks) will be evaluated. Creating an R markdown file is very useful as your analysis becomes reproducible and easy to check for others. Note that chunks have options, with which you can customize the output. See for more information: https://raw.githubusercontent.com/rstudio/cheatsheets/master/rmarkdown-2.0.pdf
Note that the significance threshold \(\alpha\) is set to 0.05 as usual.
We will first download the data (which was saved as a dataframe in R: an .rda
file), and load it into R.
download.file("http://www.let.rug.nl/wieling/Statistics/Regression/lab/tv.rda", "tv.rda")
load("tv.rda") # an rda file can be loaded with the command load
Investigate the data with descriptive statistics and plots. Also calculate the correlations between the numerical variables.
# your code goes here
In this section, we will fit the best model for the data, predicting the cdi
language development score on the basis of various predictors.
Fit the best model without taking into account interactions.
# your code goes here
# your code goes here
# multicollinearity
# homoscedasticity
# distribution of residuals
Fit the best model while taking into account interactions. For simplicity and speed, we'll only investigate potential interactions with sex. Also visualize the results if an interaction is necessary.
# your code goes here
Obtain the effect size, both of the full model and the individual predictors
# your code goes here
# Please note that installing 'lmSupport' using install.packages('lmSupport') does not work anymore,
# instead follow the following steps
# (If using Windows: download and install RTools from:
# https://cran.r-project.org/bin/windows/Rtools/rtools42/rtools.html)
# Then run the following commands from R:
# install.packages(c('psych','AICcmodavg','gplots','gvlma','pwr'))
# install.packages('https://cran.r-project.org/src/contrib/Archive/lmSupport/lmSupport_2.9.13.tar.gz', repos=NULL, type='source')
Apply model criticism by excluding observations with residuals outside 2.5 SD.
# your code goes here
Conduct bootstrap sampling with 1000 repetitions. Use set.seed(100)
to set the seed of the random number generator.
# your code goes here
The answers to the questions in this file can be viewed here: https://www.let.rug.nl/wieling/Statistics/Regression/lab/answers. The associated R markdown file can be downloaded here: https://www.let.rug.nl/wieling/Statistics/Regression/lab/answers/answers.Rmd
From within RStudio, you can simply download this file using the commands:
# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab.Rmd")) {
download.file("http://www.let.rug.nl/wieling/Statistics/Regression/lab/lab.Rmd", "lab.Rmd")
}
Subsequently, open it in the editor and use the Knit HMTL button to generate the html file.
If you use plain R, you first have to install Pandoc. Then copy the following lines to the most recent version of R.
# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
install.packages("rmarkdown")
}
library(rmarkdown) # load rmarkdown package
# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab.Rmd")) {
download.file("http://www.let.rug.nl/wieling/Statistics/Regression/lab/lab.Rmd", "lab.Rmd")
}
# generate output
render("lab.Rmd") # generates html file with results
# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab.html"), sep = "")) # shows result