1 Introduction

In this lab session, we will experiment with data exploration and visualization. You will have to fill in most commands yourself, but this should be feasible given the slides of the lecture, which can be viewed here: https://www.let.rug.nl/wieling/Statistics/Intro-R. While you can just enter the commands in RStudio, it is also possible to modify the source of this so-called R-markdown file directly in RStudio (or R, see below for the procedure) and press the “Knit HTML” button to generate an html file which contains both the commands you’ve used and their output. You can download the file to your current working directory in R by pasting the following command: download.file('http://www.let.rug.nl/wieling/Statistics/Intro-R/lab/lab.Rmd', 'lab.Rmd'). You can then open this file in RStudio. In this file, all R commands which are located within chunks (beginning and ending with three backticks) will be evaluated. Creating an R markdown file is very useful as your analysis becomes reproducible and easy to check for others. Note that chunks have options, with which you can customize the output. See for more information: https://raw.githubusercontent.com/rstudio/cheatsheets/master/rmarkdown-2.0.pdf

2 Importing the data

We will first download a csv file generated in Excel. It is your task to import this data into R. Note that if you would like to import data from SPSS, you can do this via library(foreign); dat <- read.spss('file.sav',to.data.frame=T).

download.file("http://www.let.rug.nl/wieling/Statistics/Intro-R/lab/mtcars.csv",
    "mtcars.csv")

# now import the data yourself into an R data frame with the name dat
# using the function: read.csv2()

3 Exploring the structure of the data

Note that this dataset is similar to the mtcars dataset standard available in R, so the description of the columns can be obtained with ?mtcars. There is one addition column ‘region’ which contains the region which the car maker originated from. In the following, you will look at the structure of the data using various functions.

# Look at the structure of the data using the functions: str, summary and
# head

4 Modifying the data

In this section, you will add two columns to the data.

# Add a column to the data relHP which should contain the hp of the car
# divided by the weight (column wt)

# Next, add a column to the data named sportscar which is TRUE when the
# relHP > 42 and FALSE otherwise

# Look at the data using head

5 Investigating the data

In this section, we will look at the variables in more detail. Specifically, we will look at measures of spread and central tendency, and frequency tables for individual variables. Furthermore, we will investigate the relationship between pairs of variables.

# How many sportscars are there (according to our definition)? Hint: use
# table()

# What is the mean weight of the cars?

# What is the standard deviation of the weight of the cars?

# How many cars have 6 cylinders?

# What is the correlation between weight and horsepower?

# How are being a sportscar and the number of gears related?

6 Visualizing the data

In this section, we will look at the variables in more detail through visualization.

# Create a boxplot with the weight for sportscars

# Create a boxplot with the weight, separately for the number of cylinders
# Hint: boxplot can also be used with the formula interface: wt ~ cyl,
# data=dat

# Show the histogram for relHP

# Show the histogram for wt and hp next to each other Set the color of the
# bars to 'red' for wt and 'blue' for hp.  Hint: use par() to place the
# graphs besides each other and use ?hist to see what parameter to use for
# the color

# Show the Q-Q plot of qsec (time for driving 1/4 mile)

# Create a new data frame named 'tmp' excluding the outlier

# Create a barplot contrasting automatic vs. manual transmission (column
# 'am') Give the plot a header: 'Transmission' and provide names below the
# bars: 'A' and 'M'

# Create a segmented barplot showing the relationship between being a
# sportscar and the type of transmission

7 Answers

The answers to the questions in this file can be viewed here: https://www.let.rug.nl/wieling/Statistics/Intro-R/lab/answers. The associated R markdown file can be downloaded here: https://www.let.rug.nl/wieling/Statistics/Intro-R/lab/answers/answers.Rmd

8 Replication

From within RStudio, you can simply download this file using the commands:

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/Statistics/Intro-R/lab/lab.Rmd",
        "lab.Rmd")
}

Subsequently, open it in the editor and use the Knit HMTL button to generate the html file.

If you use plain R, you first have to install Pandoc. Then copy the following lines to the most recent version of R.

# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
    install.packages("rmarkdown")
}
library(rmarkdown)  # load rmarkdown package

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/Statistics/Intro-R/lab/lab.Rmd",
        "lab.Rmd")
}

# generate output
render("lab.Rmd")  # generates html file with results

# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab.html"), sep = ""))  # shows result