Logistic Regression

Simple logistic regression. In 1972 William Labov, the founder of sociolinguistics, investigated the pronunciation of /r/ at the ends of words in different social classes in New York. The /r/ could then be pronounced as a real [r], as in the American Midwest, but it can also be pronounced as a "schwa" [@], or even omitted completely, much as it is in Boston or in the standard speech of the UK. Labov noticed that New York speech was changing, and he asked whether the changes had anything to do with social class. He found a clever way to gather data quickly by going into three different department stores, one very expensive (Saks), one accessible to middle class incomes (Macy's), and one very inexpensive (S.Klein's). In each of these Labov identified an article on sale on the fourth floor, and then asked one of the salespeople where he could find it. The sales personnel answered "on the fourth floor," or something like that. Labov sometimes elicited a second, emphatic pronunciation by feigning not having heard: Pardon me? `FOURTH FLOOR'. In this way pronunciations of /r/ could be collected quite quickly. We provide the data Labov collected in the table below. For each department store we note how many people pronounced the /r/ completely consonantally, how many completely vocalically (including no realization at all), and how many mixed. We then wish to analyze whether there is a difference in the frequency distribution of /r/ variants which is due to social class (source: William Labov, Sociolinguistic Patterns, 1972. University of Pennsylvania Press, Philadelphia). conson. mixed vocalic Saks 30 32 6 Macy's 20 31 74 S.Klein 4 17 50 Enter this data into SPSS by hand. Define the variables, and recall the technique of weighing cases by frequency introduced during the chi-square lab. a. In order to facilitate the analysis of the pronunciation differences among the social classes, we will restrict our attention to the first and last columns of the table, to the people who pronounced the /r/ either purely consonantally or purely as a vowel. To do this one needs to filter the data from the SPSS worksheet. In this way we obtain a dichotomous (two-way variable) indicating whether the /r/ was or was not pronounced consonantally. b. Create a suitable graph to illustrate the distribution of the two variants of /r/ in the three social groups, and include this in your report. c. Formulate a null hypothesis and an alternative hypothesis for the logistic regression, identifying the independent and dependent variables. You then need to choose how to code the variable representing social class before you make execute the procedure for logistic regression. d. What is your conclusion with respect to the dependence of /r/'s pronunciation on social class. e. Estimated the amount of explained variance. f. Examine the estimated values of the odds ratios characterizing the difference between S.Klein and Saks and the difference between S.Klein and Macy's. What do the estimates show? For a detailed explanation of logistic regression, and especially for the explanation of a special package for the application of logistic regression in the study of linguistic variation, VARBRULE, see John Paolillo. 2002. Analyzing Linguistic Variation: Statistical Models and Methods. Stanford: CSLI. I used this in creating this exercise. I kept the exercise in SPSS because the other course materials are in SPSS. VARBRULE was originally developed by David Sankoff in the 1970's.

John Nerbonne
Last modified: Sat May 7 15:14:30 CEST 2005