1 Introduction

In this lab session, we will experiment with the various tests you’ve encountered. You will have to fill in most commands yourself, but this should be feasible given the slides of the lecture, which can be viewed here: http://www.let.rug.nl/wieling/statscourse/CrashCourseR. While you can just enter the commands in RStudio, it is also possible to modify the source of this so-called R-markdown file directly in RStudio and press the “Knit HTML” button to generate an html file which contains both the commands you’ve used and their output. You can download the file to your current working directory in R by pasting the following command: download.file('http://www.let.rug.nl/wieling/statscourse/CrashCourseR/lab/lab.Rmd', 'lab.Rmd'). You can then open this file in RStudio. In this file, all R commands which are located within chunks (beginning and ending with three backticks) will be evaluated. Creating an R markdown file is very useful as your analysis becomes reproducible and easy to check for others. Note that chunks have options, with which you can customize the output. See for more information: http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf

2 Importing the data

We will first download a csv file generated in Excel. It is your task to import this data into R.

download.file("http://www.let.rug.nl/wieling/statscourse/CrashCourseR/lab/mtcars.csv", "mtcars.csv")

# now import the data yourself into an R data frame with the name dat using the function: read.csv2()

3 Structure of the data

Note that this dataset is similar to the mtcars dataset standard available in R, so the description of the columns can be obtained with ?mtcars. There is one addition column ‘region’ which contains the region which the car maker originated from. In the following, you will look at the structure of the data using various functions.

# Look at the structure of the data using the functions: str, summary and head

4 Modifying the data

In this section, you will add two columns to the data.

# Add a column to the data relHP which should contain the hp of the car divided by the weight (column
# wt)

# Next, add a column to the data named sportscar which is TRUE when the relHP > 42 and FALSE
# otherwise

# Look at the data using head

5 Investigating the data

In this section, we will look at the variables in more detail. Specifically, we will look at measures of spread and central tendency, and frequency tables for individual variables. Furthermore, we will investigate the relationship between pairs of variables.

# How many sportscars are there (according to our definition)? Hint: use table()

# What is the mean weight of the cars?

# What is the standard deviation of the weight of the cars?

# How many cars have 6 cylinders?

# What is the correlation between weight and horsepower?

# How are being a sportscar and the number of gears related?

6 Visualizing the data

In this section, we will look at the variables in more detail through visualization.

# Create a boxplot with the weight for sportscars

# Create a boxplot with the weight, separately for the number of cylinders Hint: boxplot can also be
# used with the formula interface: wt ~ cyl, data=dat

# Show the histogram for relHP

# Show the histogram for wt and hp next to each other Set the color of the bars to 'red' for wt and
# 'blue' for hp.  Hint: use par() to place the graphs besides each other and use ?hist to see what
# parameter to use for the color

# Show the Q-Q plot of qsec (time for driving 1/4 mile)

# Create a new data frame named 'tmp' excluding the outlier

# Create a barplot contrasting automatic vs. manual transmission (column 'am') Give the plot a
# header: 'Transmission' and provide names below the bars: 'A' and 'M'

# Create a segmented barplot showing the relationship between being a sportscar and the type of
# transmission

7 Basic statistical analyses

In this section, we will conduct basic statistical analyses of the data.

7.1 Comparing one or two means

# We will investigate if the weight in the sample significantly differs from 3 (x1000) lbs. However,
# before conducting a t-test we need to assess if the variable is normally distributed

# Assess if wt is normally distributed

# Run the one-sample t-test

# Assess if the weight of sportscars differs significantly from non-sportscars.  First assess if the
# distribution of wt of both groups is approximately normal.  If not, use an appropriate
# non-parametric alternative

7.2 Categorical dependence

# Assess if there is a dependency between transmission and sportscar.

7.3 Comparing three or more means

# Assess if the region of the car maker influences the weight.  If the region influences the weight
# also assess which regions differ.

# Investigate if the region of the car maker and the type of car (sportscar or not) influence the
# weight. As we will use factorial ANOVA, all predictors need to be factors. Currently sportscar is a
# logical predictor (TRUE or FALSE), so we need to convert it to a factor with: dat$sportscar <-
# as.factor(dat$sportscar).

# Finally assess if adding the number of carburators (carb) is a signiciant covariate.

7.4 Multiple predictors

# Assess how well the weight of a sportscar can be predicted by the number of cylinders, the region,
# and the type of transmission.

# Is region necessary in the model?  Hint: use model comparison

8 More advanced analyses

For more advanced analyses, such as when there are multiple values per (e.g.,) subject and item, I recommend using mixed-effects regression analyses. How to use that is explained in these lectures: http://www.let.rug.nl/wieling/statscourse/lecture1 and http://www.let.rug.nl/wieling/statscourse/lecture2 together with the lab sessions: http://www.let.rug.nl/wieling/statscourse/lecture1/lab and http://www.let.rug.nl/wieling/statscourse/lecture1/lab.

If you want to model non-linear relationships, please refer to http://www.let.rug.nl/wieling/statscourse/lecture3, http://www.let.rug.nl/wieling/statscourse/lecture4 and http://www.let.rug.nl/wieling/statscourse/lecture5 together with their lab sessions.

9 Answers

The answers to the questions in this file can be viewed here: http://www.let.rug.nl/wieling/statscourse/CrashCourseR/lab/answers. The associated R markdown file can be downloaded here: http://www.let.rug.nl/wieling/statscourse/CrashCourseR/lab/answers/answers.Rmd

10 Replication

From within RStudio, you can simply download this file using the commands:

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/statscourse/CrashCourseR/lab/lab.Rmd", "lab.Rmd")
}

Subsequently, open it in the editor and use the Knit HMTL button to generate the html file.

If you use plain R, you first have to install Pandoc. Then copy the following lines to the most recent version of R. (Or use the Knit HTML button when viewing the file lab.Rmd in RStudio.)

# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
    install.packages("rmarkdown", repos="http://cran.us.r-project.org")
}
library(rmarkdown)  # load rmarkdown package

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/statscourse/CrashCourseR/lab/lab.Rmd", "lab.Rmd")
}

# generate output
render("lab.Rmd")  # generates html file with results

# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab.html"), sep = ""))  # shows result

Statistical tests in R

Martijn Wieling (http://www.martijnwieling.nl)

Generated on: Tue Dec 13 10:22:55 2016