Fieldworker effect
Fitting
This page describes an attempt to compensate for the
fieldworker effect.
data: raw phonetic strings, no lexical variants
subset: fieldworkers: Lowman, McDavid
worksheets: Middle Atlantic, South Atlantic
1030 informants
grouping: fieldworker + worksheet + community: 461 locations
(two locations merged because of identical coordinates)
method: Levenshtein
maps: stochastic clustering, group average + weighted average,
followed by classical MDS on cophenetic differences
The plot below shows the average phonetic difference as a function of
geographic distance. There are three sets of differences: differences
within data collected by Lowman, differences
within data collected by McDavid, and differences between data from
Lowman and data from McDavid.
The phonetic differences between Lowman data and
McDavid data is consistently higher than differences within each set of
data. This strongly suggest that Lowman and McDavid transcribed
identical sound in a different manner.
The plot below shows the standard deviation of the phonetic differences as a function of
geographic distance. Again, there is a striking differentiation.
We use polynomial fitting to model both mean and standard deviation of dialect
differences as a
function of geographic distance. We do this for all dialect differences, as
well as for each of the three subsets. Then we use these models to
correct each subset towards the overall mean and standard deviation.
(See
graphs of fitting, PostScript.)
The plots below show the corrected data.
Below are dialect maps of uncorrected and corrected dialect differences. Grey
areas are for data from fieldworkers other than Lowman and McDavid,
and/or from worksheets other than Middle Atlantic and South Atlantic.
Fitting seems to work quite well in the north, but doesn't have any visible
effect in the south. Note that Pennsylvania doesn't gain new dialect
borders. They just don't show on the big map of uncorrected differences,
because it shows only the most distinctive clusters.