Student Project: Understanding Numerical Graphics
This describes a project suitable for three students. Please let me
(nerbonne@let.rug.nl) know by Nov. 1 if you intend to undertake the
project. Students who undertake the project are exempted from the
final exam.
The assignment is focused on graphics depicting numerical relations,
but it could be acceptable to contrast alternative graphical
communication in another area. If you wish to do this, send a
proposal for approval to the course instructor, John Nerbonne, at
nerbonne@let.rug.nl
Background: Understanding Graphics Depicting Numerical Relations
Tufte (The Visual Display of Quantitative Information, p.81) is
scathing about graphics that display relations in forms fit for single
variables. He describes these as "convoluted specimens" of "elliptical
and eccentric communication". But more than fifteen years later the
same sorts of graphical devices were used to describe what went wrong
in the American presidential elections in Palm Beach, where the form
of the ballot apparently led to significant miscasting of votes.
The Issue being Communicated
A voting ballot with an unusual form was used in Palm Beach, where a
very conservative candidate, Pat Buchanan, received an unusual number
of votes in the 2000 US Presidential Election. See (the beginning of)
Matt Ruben's
informative web site on the problem. Since we'll focus on the
choice of graphic design used to illustrate the problem, it is not
necessary (and probably not possible even for me) to try to understand
all the various statistical analyses that have been published about
this question. Briefly, analyses have tried to show that Buchanan
received many more votes than one would expect based on the number
of votes which candidates of similar political leanings received.
The Graphics
We take this to be a case where two variables (quantities) are
compared, the number of votes for Buchanan on the one hand, and the
number of votes, e.g., for Bush on the other. It is a standard case
where a scatterplot, perhaps with a regression line, is normally used,
just as Tufte (above) insists. See Michael Shamos's use of a
scatterplot by Greg Adams on the last page of this powerpoint presentation on the Palm Beach
predicament.
But not everyone has illustrated the issue using a scatterplot. For
example Til Rosenband at MIT expression the ratio between the two
votes as a single variable, and plots that in a
bar chart.
The Communicative Effect
The task in this final project will be, not to determine how likely
it is that voters miscast their votes because they misread the ballot,
but rather to determine how well people (students) understand the
argument that Buchanan probably received too many votes on the basis
of the different numerical graphs.
Design
The idea is to present similar groups of students with
one of two graphs, for example, either as a single bar chart or as a
scatterplot showing the variable values (vote counts or percentages).
The students should get a chance to examine the graphics, after which
they will be asked questions about the content. The purpose of the
questions will be to check on how well the graphics convey the
information that they are supposed to convey.
Material
Since you wish to see what information is available
from the graphics, you should not use the Palm Beach
issue directly. If you used the Palm Beach issue, you might just elicit
information about what people think about Palm Beach. A first task is
then to make up an issue involving two quantities (perhaps
student-staff ratio in different departments, or the thesis grade and
the number of months taken to complete it). The graphs reporting on
the issue should be significantly different, e.g., one a scatterplot
(such as Adams used) and the other either a representation of single
ratio between the two variables (such as Rosenband's) or perhaps a
representation of the two variables in bar charts next to each other.
It is essential that the graphics be as comparable as possible except
for this design point. A second task is thus to redraw them.
It is essential that the graphics be as comparable as possible except
for this design point. Second, the graphics should be accompanied by
a small amount of prose (a caption), explaining the message the
graphic is meant to support. Ideally, the caption would be the same
for both forms of the graphic, but certainly as close as possible.
The captions and graphs should definitely all be in the same language,
and I would suggest that it be done in Dutch so that fluency in
English is not a confounding factor.
Test
The test should focus on whether the subjects have
understood the issue as the graphics presents it. An important task
is therefore to design the test questions. Aim for ten questions, and
include some questions that are very easy (as a check that people are
filling in the test seriously). For example, if the test were on the
Palm Beach issue, suitable questions might be the following:
- For what counties are voting results reported in the graph?
Answer Florida counties.
- For which candidates are the voting results reported?
Answer Bush and Buchanan.
- Is it the opinion of the graphics author-designer that Buchanan
received too many votes in Palm Beach county?
Answer Yes, too many.
- How many votes might have been miscast according to the
author-designer of the graphics?
Answer About 3,000.
- What does author-designer of the graphics identify as the cause
of the unusual number of votes?
Answer A confusing ballot.
Naturally, your test questionaire will not include the answers. You
might prepare the graphics either on the web or on paper, but it's
probably a good idea to have the questions on paper so that you'll
have a record.
"Pilot runs"
It is a good idea to have a couple of people try your test
before gathering data. This way you won't do a great deal of
work, and then discover, only in the analysis that subjects
misunderstood your questions, or that there was an error in the graph.
Select people outside your group to try out the graphics and the test.
Collecting Data
Aim to have 40 subjects in total read the graphics and answer the
questions, i.e., 20 for the one sort, and 20 for the other. Since you
won't have a great deal of time to complete the study, I'd suggest
that you approach students in the cafeteria. Note that we don't
suspect that this group should be different with respect to their
sophistication in graphical understanding. Flip a coin to decide
which graphic they get so that they are assigned randomly to the one
or other group. Once you have 20 students in one group, just let all
the other subjects take the test for which you need more data. Do not
explain the test in great detail, only that it's a test of effective
communication, that you'll ask them to answer some questions about a
graphical presentation, and that you'll measure how long it takes them
to answer. You may refer them to this web site if they want further
information.
Control the conditions under which people take the test. It is
best to have them work in conditions where distractions are not great
--- perhaps a quiet time of day. Let people look back at the graphic
in order to answer the questions. It is essential that people
work under roughly the same conditions.. If the test is
interrupted, let it continue and record the data, but note that this
has happen. It is best to collected enough data so that you don't
need to include this questionable data in the analysis.
Let them take as long as they want (once they have been told that
the time is also interesting). Measure and record the time with
the test material. Thank the subjects when they're done. You may
explain the experiment in more detail once they're done, if they
ask for this.
Analysis
Since you'll be dealing with two non-overlapping groups of subjects,
you can analyse whether there is a difference in their average
accuracy or speed by performing a t-test for independent
samples on the two groups. This can be done easily in SPSS, and
there are indications of how to do it available under the web site
for the course in Toetsende Statistiek. This will be easiest
for someone who has already taken that course. For someone who's
already taken the course, it should also be straightforward to check
on whether there's a relation between speed and accuracy by
performing a linear regression analysis.
Report
In your (group) report, you should explain the issue, including
Tufte's ideas on this and perhaps the ideas of others you might find
through a literature search. Describe why your experiment
should have bearing on the issue, and in particular what you hope to
prove or disprove.
Describe the graphics you use, and the test and include all of the
material in the report. Describe the experiments, including any
unexpected developments that might have influenced the outcome.
Report on the statistical analysis, preferably in the standard way
(known to those who've already taken the statistics course). Finally,
discuss your results, and include suggestions for further experiments
in a similar vein.
Schedule
Nov. 4 |
Groups formed. Commitment to project instead
of exam. Two alternative graphical representations made. |
Nov. 11 |
Test materials developed. "Pilot" test run
completed. |
Nov. 18 |
50% of subjects completed. |
Nov. 29 |
Reports due (5 pm). |
Turn in your report by Nov. 29, 2002 at 5 pm
John Nerbonne
Last modified: Oct. 14, 2002