Logistic Regression

    Logistic Regression

Simple logistic regression. 


In 1972 William Labov, the founder of sociolinguistics, investigated
the pronunciation of /r/ at the ends of words in different social
classes in New York.  The /r/ could then be pronounced as a real [r],
as in the American Midwest, but it can also be pronounced as a "schwa"
[@], or even omitted completely, much as it is in Boston or in the
standard speech of the UK.  Labov noticed that New York speech was
changing, and he asked whether the changes had anything to do with
social class.

He found a clever way to gather data quickly by going into three
different department stores, one very expensive (Saks), one
accessible to middle class incomes (Macy's), and one very inexpensive
(S.Klein's).  In each of these Labov identified an article on sale on
the fourth floor, and then asked one of the salespeople where he could
find it.  The sales personnel answered "on the fourth floor," or
something like that.  Labov sometimes elicited a second, emphatic
pronunciation by feigning not having heard: Pardon me?  `FOURTH FLOOR'.
In this way pronunciations of /r/ could be collected quite quickly.

We provide the data Labov collected in the table below.  For each
department store we note how many people pronounced the /r/ completely
consonantally, how many completely vocalically (including no
realization at all), and how many mixed.  We then wish to analyze
whether there is a difference in the frequency distribution of /r/
variants which is due to social class (source: William Labov,
Sociolinguistic Patterns, 1972. University of Pennsylvania Press,
Philadelphia).

          conson.   mixed   vocalic 
   Saks    30         32       6 
 Macy's    20         31      74 
S.Klein     4         17      50 

Enter this data into SPSS by hand.  Define the variables, and recall
the technique of weighing cases by frequency introduced during the
chi-square lab.

a. In order to facilitate the analysis of the pronunciation
differences among the social classes, we will restrict our attention
to the first and last columns of the table, to the people who
pronounced the /r/ either purely consonantally or purely as a vowel.
To do this one needs to filter the data from the SPSS worksheet.  In this
way we obtain a dichotomous (two-way variable) indicating whether
the /r/ was or was not pronounced consonantally.

b. Create a suitable graph to illustrate the distribution of the 
two variants of /r/ in the three social groups, and include this 
in your report.

c. Formulate a null hypothesis and an alternative hypothesis for the
logistic regression, identifying the independent and dependent
variables.  You then need to choose how to code the variable
representing social class before you make execute the procedure
for logistic regression.

d. What is your conclusion with respect to the dependence of /r/'s 
pronunciation on social class.

e. Estimated the amount of explained variance.

f. Examine the estimated values of the odds ratios characterizing the
difference between S.Klein and Saks and the difference between S.Klein
and Macy's.  What do the estimates show?

For a detailed explanation of logistic regression, and especially for
the explanation of a special package for the application of logistic
regression in the study of linguistic variation, VARBRULE, see John
Paolillo. 2002. Analyzing Linguistic Variation: Statistical Models and
Methods. Stanford: CSLI.  I used this in creating this exercise.
I kept the exercise in SPSS because the other course materials are in
SPSS.  VARBRULE was originally developed by David Sankoff in the 1970's.
John Nerbonne
Last modified: Sat May 7 15:14:30 CEST 2005