Introduction

The data we will use in this part is a hypothetical study on child language acquisition (created by a colleague of mine and slightly adapted by me). We want to investigate the effects of amount of time spend in front of TV to two-year-old children’s language development. The response variable in this data set, cdi, is a standard measure of children’s language abilities based on parental reports. The predictor of interest is tv.hours, which is the weekly hours of TV time for each child.

As before, you will have to fill in most commands yourself, but this should be feasible given the slides of the lecture, which can be viewed here: http://www.let.rug.nl/wieling/BCN-Stats/lecture3. While you can just enter the commands in RStudio, it is also possible to modify the source of this so-called R-markdown file directly in RStudio and press the “Knit HTML” button to generate an html file which contains both the commands you’ve used and their output. You can download the file to your current working directory in R by pasting the following command: download.file('http://www.let.rug.nl/wieling/BCN-Stats/lecture3/lab/lab.Rmd', 'lab.Rmd'). You can then open this file in RStudio. In this file, all R commands which are located within chunks (beginning and ending with three backticks) will be evaluated. Creating an R markdown file is very useful as your analysis becomes reproducible and easy to check for others. Note that chunks have options, with which you can customize the output. See for more information: http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf

Importing the data

We will first download the data (which was saved as a dataframe in R: an .rda file), and load it into R.

download.file("http://www.let.rug.nl/wieling/BCN-Stats/lecture3/lab/tv.rda", "tv.rda")
load("tv.rda")  # an rda file can be loaded with the command load

Structure of the data

Investigate the data with descriptive statistics and plots.

# your code goes here

Regression modeling

In this section, we will fit the best model for the data, predicting the cdi language development score on the basis of various predictors.

The best model without interactions

Fit the best model without taking into account interactions.

# your code goes here

Assess the regression assumptions

# linearity

# homoscedasticity

# multicollinearity

# autocorrelation in residuals

# distribution of residuals

Are interactions necessary?

Fit the best model while taking into account interactions. For simplicity and speed, we’ll only investigate potential interactions with gender.

# your code goes here

Model criticism

Apply model criticism by excluding observations with residuals outside 2.5 SD.

# your code goes here

Overfitting

Check for overfitting using the validate() function of library(rms).

# your code goes here

Bootstrap sampling

Conduct bootstrap sampling with 1000 repetitions.

# your code goes here

Effect size

Obtain the effect size, both of the full model and the individual predictors

# your code goes here

Answers

The answers to the questions in this file can be viewed here: http://www.let.rug.nl/wieling/BCN-Stats/lecture3/lab/answers. The associated R markdown file can be downloaded here: http://www.let.rug.nl/wieling/BCN-Stats/lecture3/lab/answers/answers.Rmd

Replication

From within RStudio, you can simply download this file using the commands:

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/BCN-Stats/lecture3/lab/lab.Rmd", "lab.Rmd")
}

Subsequently, open it in the editor and use the Knit HMTL button to generate the html file.

If you use plain R, you first have to install Pandoc. Then copy the following lines to the most recent version of R.

# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
    install.packages("rmarkdown")
}
library(rmarkdown)  # load rmarkdown package

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/BCN-Stats/lecture3/lab/lab.Rmd", "lab.Rmd")
}

# generate output
render("lab.Rmd")  # generates html file with results

# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab.html"), sep = ""))  # shows result

Correlation and regression

Martijn Wieling (http://www.martijnwieling.nl)

Generation date: jun 10, 2016 - 19:53:01