Exercise 7
----------
   
linear regression

Jim Baumann en Leah Jones of the School of Education, Purdue University,
did research for methods in reading education. The students that were
questioned took two tests before the lessons and three tests after.
The variable "before" lists the averages of the tests before the
lessons and the variable "after" lists the averages of the tests after
the lessons. The results are given in the table below.
(Source: research done by Jim Baumann and Leah Jones from the School of
Education of Purdue University.)

case        before   after

 1           3.50    16.67
 2           5.50    18.33
 3           6.50    17.00
 4           9.00    19.67
 5          10.50    21.67
 6          14.00    20.67
 7          11.00    20.67
 8           9.50    14.00
 9           7.50    16.00
10           8.00    17.67
11          10.00    19.33
12           5.50    17.67
13           8.50    16.33
14           7.00    20.00
15           7.00    15.33
16          10.00    22.67
17           6.50    15.33
18           8.50    15.00
19           7.00    15.00
20           6.00    15.00
21           5.00    21.00
22           7.50    15.67

The data must be entered by hand. Define the data columns and choose 
suitable variable names.

a. Draw a scatterplot of "before" vs "after", with the least-squares
   line. Is there a linear correlation? Is it alright to determine the
   least-squares line?
   Examine the residues (the differences between the observed values
   and the values predicted by the least squares line). Draw two
   scatterplots: case vs residue and before-scores vs residue. The mean of the
   residues always equals 0. Draw the line residue=0 in each of the two
   scatterplots. Can you see suspect patterns or abnormal observations?

b. Determine b1 and b0 and give the equation of the least-squares line.

c. We will investigate the residues further. Draw a normal quantile
   plot of the residues. Do they form a straight line? Do they have a
   normal distribution? Give s, the standard error of the residues.

d. Determine s_b1, the standard error for b1, and determine s_b0, the
   standard error for b0.

e. We want to check whether the students who scored relatively high
   before the lessons also score relatively high after the lessons.
   Give a 95% confidence interval for beta1. Formulate H_0 and H_a and
   prove that the test scores after the lessons have a positive
   correlation with the scores before the lessons.

f. The constant beta0 represents the average score after the lessons
   for students with a score equal to 0 before the lessons.
   Give the 95% confidence interval for beta0. Formulate H_0 and H_a
   and show that beta0 is positive.