Vol 26 4

TABLE OF CONTENTS
A Journal for Teachers of English to Speakers of Other Languages

and of Standard English as a Second Dialect
Editor
SANDRA SILBERSTEIN, University of Washington
Associate Editor
SANDRA McKAY, San Francisco State University
Review Editor
HEIDI RIGGENBACH. University of Washington
-
Brief Reports and Summaries Editors
GAIL WEINSTEIN-SHR, San Francisco State University
Teaching Issues Editor
SANDRA McKAY, San Francisco State University
Assistant Editors
DEBORAH GREEN, University of Washington
MARILYN KUPETZ, TESOL Central Office
Editorial Assistants
CHERYL MOEN and MAUREEN P. PHILLIPS, University of Washington
Editorial Advisory Board
Roberta G. Abraham Thom Hudson
Iowa State University University of Hawaii at Manoa
Joan G. Carson Claire Kramsch
Georgia State University University of California, Berkeley
Graham Crookes Anne Lazaraton
University of Hawaii at Manoa The Pennsylvania State University
Jim Cummins
Ontario Institute for Studies in Education Michael K. Legutke
Catherine Doughty Goethe Institute, Munich
The University of Sydney David Nunan
Miriam Eisenstein Macquarie University
New York University Teresa Pica
Yehia E1-Ezabi University of Pennsylvania
United Arab Emirates University/
The American University in Cairo N. S. Prabhu
Susan Gass National University of Singapore
Michigan State University Thomas Ricento
Jean Handscombe Central Michigan University
North York Board of Education, Toronto Patricia L. Rounds
Thomas Huckin University of Oregon
University of Utah
Sarah Hudelson Andrew F. Siegel
Arizona State University University of Washington
Additional Readers
H. Douglas Brown, Patricia A. Dunkel, Fred Genesee, Ann M. Johns, Liz Hamp-Lyons, Sharon Hilles, Braj Kachru, Ruth
Larimer, Ilona Leki, Leo van Lier, Michael H. Long, Peter Lowenberg, Peter, Master, Mary McGroarty, Joan Morley,
Alastair Pennycook, Patricia A. Porter, Charlene J. Sato, Thomas Scovel, Sheila M. Shannon, Margaret S. Steffensen,
Merrill Swain, James W. Tollefson, Jessica Wilkins, Lise Winer, Vivian Zamel, Jane Zuengler
Credits
Advertising arranged by Maria Minor, TESOL Central Office. Alexandria, Virginia
Typesetting by World Composition Services, Inc.
Printing and binding by Pantragraph Printing, Bloomington, Illinois
VOLUMES MENU
TESOL QUARTERLY
CONTENTS
To print, select PDF page
nos. in parentheses
ARTICLES
Statistics as a Foreign Language—Part 2: More Things to
Consider in Reading Statistical Language Studies 629 (10-45)
James Dean Brown
Demystifying the TOEFL® Reading Test 665 (46-72)
Bonny Norton Peirce
Planning, Discourse Marking, and the Comprehensibility of
International Teaching Assistants 693 (74-92)
Jessica Williams
Discourse Structure and the Perception of Incoherence in
International Teaching Assistants’ Spoken Discourse 713 (94-110)
Andrea Tyler
The Role of Conjunctions in L2 Text Comprehension 731 (112-128)
Esther Geva
REVIEWS
Publications on Grammar Teaching 749
Grammar and Second Language Teaching:
A Book of Readings
William Rutherford and Michael Sharwood Smith
Second Language Grammar: Learning and Teaching
William E. Rutherford
Reviewed by Peter Master
BOOK NOTICES
Grammar Textbooks 753
Steven L. Shaw, Guest Editor
English Alive: Grammar, Function, Setting,
Gail Fingado and Mary Reinbold Jerome (Steven L. Shaw)
The English Connection: A Content-Based Grammar and Discussion Text,
Gail Fingado, Leslie J. Freeman, Mary Reinhold Jerome, and
Catherine Vaden Summers (Steven L. Shaw)
Building English Structures: A Communicative Course in English,
Chuck Seibel and Russ Hodge (Shelley Gibson)
121 Common Mistakes of Japanese Students of English,
James H. M. Webb (Ron Post)
Grammar with a Purpose: A Contextualized Approach,
Myrna Knepler (Douglas Collins)
English Structure in Focus, Polly Davis (Jean Jorgensen)
Grammar Work: English Exercises in Context (Vol. 4),
Pamela Breyer (Misaki Shimada)
Volume 26, Number 4 ❑ Winter 1992
Interactions I: A Communicative Grammar (2nd ed.),

Elaine Kirn and Darcy Jack (Kirk L. VanScoyoc)
Interactions II: A Communicative Grammar (2nd ed.),
Patricia Kay Werner, Mary Mitchell Church, and Lida R. Baker
(Kirk L. VanScoyoc)
Grammar Dialogues: An Interactive Approach,
Allan Kent Dart (Steven L. Shaw)
Grammar Troublespots: An Editing Guide (2nd ed.),
Ann Raimes (Wendy Asplin)
How English Works: A Grammar Handbook with Readings,
Ann Raimes (Wendy Asplin)
Grammar Plus, Judy DeFilippo and Daphne Mackey (Wendy Asplin)
BRIEF REPORTS AND SUMMARIES
The Effects of Syntactic Simplification and Repetition
on Listening Comprehension 767
Raoul Cervantes and Glenn Gainer
The Discourse of Pop Songs 770
Tim Murphey
THE FORUM
Comments on Yoshinori Sasaki’s “A Logical Difficulty
of the Parameter Setting Model” 775
Objection to a Logical Difficulty
Guy Modica
Dual Storage Formats of Linguistic Knowledge? An Untestable
“Explanation”
Yoshinori Sasaki
Teaching Issues
Reading Aloud 784
An Educator Comments. . .
Suzanne M. Griffin
Another Educator Comments. . .
Patricia L. Rounds
Information for Contributors 791
Editorial Policy
General Information for Authors
Publications Received 797
Publications Available from the TESOL Central Office 799
Cumulative Index for the TESOL Quarterly,
Volumes 25 and 26, 1991–1992 801
TESOL Membership Application 820
TESOL QUARTERLY
Editor’s Note
■ With this issue, we conclude our reliance on traditional typesetting and

embark upon an adventure in electronic editing. As noted in the last issue,
we will continue to process submitted manuscripts on paper, but authors
of accepted manuscripts will be asked, whenever possible, to provide an
additional copy on disk. This production transition has occasioned a mod-
est redesign of the journal format, intended to increase ease of reading.
With this change, the assistant editorial functions move to the TESOL
Central Office, into the capable hands of Marilyn Kupetz. I take this
opportunity to thank retiring Assistant Editor Deborah Green, University
of Washington, for 2 years of insightful and professional service. I warmly
wish her well in her new endeavors.
With regret I also announce that new responsibilities are taking Gail
Weinstein-Shr from her post as editor of the Brief Reports and Summaries
section of the Quarterly, I join our readers in thanking Gail for her valued
service. I am fortunate that Graham Crookes and Kathryn Davies of the
University of Hawaii at Manoa have agreed to coedit this section. I welcome
Anne Lazaraton, Graham Crookes’s able successor, as editor of the subsec-
tion Research Issues. These editors can be contacted at the addresses
provided in the Information for Contributors section of the Quarterly.
In this Issue
■ Articles in this issue of the TESOL Quarterly set out to “demystify” aspects
of our field. The lead article, the second of two parts, explicates advanced
statistics “as a foreign language.” The next article seeks to demystify the
TOEFL reading test. Two following articles investigate discourse-level
sources of the comprehensibility problems attributed to international
teaching assistants and provide suggestions to assist nonnative-speaking
teachers achieve higher levels of comprehensibility. The last paper ex-
plores the role of conjunctions in L2 text comprehension.
625
Part 2 of James Dean Brown’s article is once again addressed to those
practitioners who find themselves avoiding statistical reports. Here,
Brown explicates more advanced “statistics as a foreign language.”
Explanatory charts and sample tables illustrate the strategies sug-
gested and render his explanations both detailed and clear.
Bonny Peirce seeks to demystify the TOEFL reading test at both

descriptive and theoretical levels. Peirce’s paper first details the as-
sumptions and procedures that generate the test and raises questions
about its theoretical adequacy. Next she explores the implications of
unequal power relations (between those who make and take the test)
for the way the test is read and scores employed in an international
context. The author ends with a call to examine “whose interests the
TOEFL serves and how the TOEFL can best serve those interests.”
In particular, she hopes that current reexamination of the TOEFL
by its publisher, the Educational Testing Service, will lead to a test
that engenders positive “washback” effects, that is, positive effects on
the classroom pedagogy and on educational policy.
Jessica Williams’s study suggests that explicit marking of discourse
structure can contribute substantially to the comprehensibity of
speech by nonnative speakers. Her study of’ international teaching
assistants compared planned and unplanned speech. She found that
these differed more in terms of discourse marking than grammatical
accuracy. On the basis of these findings, Williams suggests that nonna-
tive-speaking teachers in English-speaking environments may need
to compensate for comprehensibility problems such as pronunciation
by using more discourse marking devices than do native speakers.
Williams argues, “This is an area of strategic competence that can be
taught.”
Andrea Tyler also investigates the speech of international teaching
assistants. Her qualitative analysis of differences between native and
nonnative-speaker discourse patterns reveals that native speakers use
more discourse structuring devices. Tyler suggests using teaching
materials and techniques that train nonnative speakers to recognize
and use these devices. Finally, the author suggests that general lan-
guage proficiency tests may well be inadequate to gauge the compre-
hensibility of nonnative-speaking teachers in an academic context.
Esther Geva tests the extent to which level of L2 proficiency affects
comprehension of conjunctions by adult readers. Her results show
that the ability to recognize the logical relationships signaled by con-
junctions in local contexts is a necessary but not sufficient requirement
for their comprehension in extended discourse. The author recom-
mends providing students with opportunities to read connected dis-
course and to explore the linguistic markers that implicitly or explictly
signal text relations.
626 TESOL QUARTERLY

Also in this Issue:
● Reviews: Peter Master reviews two publications on grammar teaching:
William Rutherford and Michael Sharwood Smith’s Grammar and Sec-
ond Language Teaching: A Book of Readings and William E. Rutherford’s
Second Language Grammar: Learning and Teaching.
● Book Notices: Grammar textbooks are the focus of this issue’s Book
Notices section, for which Steven Shaw has been the guest editor.
● Brief Reports and Summaries: Raoul Cervantes and Glenn Gainer
find that syntactic simplification and repetition aid listening compre-
hension, but they suggest that simplification may not be necessary if
other modifications, such as repetition, are employed; Tim Murphey
examines the discourse of pop songs, arguing that they can play a
facilitating role in stimulating language acquisition.
● The Forum: Guy Modica’s commentary on Yoshinori Sasaki’s TESOL
Quarterly contribution, “A Logical Difficulty of the Parameter Setting
Model,” is followed by a response by the author. In the subsection
Teaching Issues, Suzanne Griffin and Patricia Rounds comment on
reading aloud.
Sandra Silberstein
IN THIS ISSUE 627

TESOL QUARTERLY Vol. 26, No. 4, Winter 1992
Statistics as a Foreign Language—

Part 2: More Things to Consider in
Reading Statistical Language Studies*
JAMES DEAN BROWN
University of Hawaii at Manoa
As was Part 1 of this article, Part 2 is addressed to those practicing

EFL/ESL teachers who currently avoid reading statistical studies.
Assuming that the first article has been read, the discussion here
continues by exploring more advanced strategies that will help in
understanding statistical studies on language learning and teaching.
To that end, five new strategies are proposed: (a) think about the
variables of focus, (b) examine whether the correct statistical tests
have been chosen, (c) check the assumptions underlying the statistical
tests, (d) consider why the statistical tests have been applied, and
(e) practice reading statistical tables. Along the way, necessary termi-
nology is explained so that each of these strategies can be clearly
understood. In addition, each of the strategies is discussed with ap-
propriate tables, figures, and examples drawn from recent issues of
the TESOL Quarterly— all of which are explained in turn. This article
attempts to cover a very complex subject area, statistical language
research, in a manner that will give readers the necessary means for
gaining access to such studies. Hopefully, these introductory articles
will also whet the appetite of a number of readers so that they will be
inspired to continue expanding their knowledge in this vital area of
language research.
I n Part 1 of this article, strategies were presented to help those EFL/

ESL teachers who currently avoid statistics to gain access to statistical
studies on language learning and teaching. In that article, five attack
strategies were recommended:
1. Use the abstract to decide if the study has value for you.
2. Let the conventional organization of the paper help you.
3. Examine the statistical reasoning involved in the study.
*Part 1 of this article appeared in Volume 25, Number 4 of the TESOL Quarterly,
629
4. Think about what you have read in relation to your professional
experience.
5. Learn more about statistics and research design.
Each of these strategies was discussed, and examples were drawn
from another article in the same issue of the TESOL Quarterly (Brown,
1991).
In the present article, the goal will be to expand the number of
strategies that readers can use to decipher the meaning of statistical
articles to include the following five:
6. Think about the variables of focus.
7. Examine whether the correct statistical tests have been chosen.
8. Check the assumptions underlying the statistical tests.
9. Consider why the statistical tests have been applied.
10. Practice reading statistical tables and interpreting statistics.
The issues involved in advanced statistical studies are sometimes
very complex, requiring years of study to be adequately understood.
Nevertheless, this article seeks to familiarize readers with key terminol-
ogy and furnish a framework for considering advanced statistical pro-
cedures.
THINK ABOUT THE VARIABLES OF FOCUS IN

THE STUDY
By definition, a variable is any attribute or set of observations that

can change, or vary, in a study. Indeed, much of statistical research
could be characterized as the study of the effects of systematically
manipulating different combinations of variables under carefully ar-
ranged conditions. To genuinely understand variables, it is important
to grasp three ideas: (a) variables can take on different roles, (b) vari-
ables can be measured on different scales, and (c) variables can have
independent or repeated levels.
Variables Can Take on Different Roles
Variables can perform five different functions in a study, and they

are often labeled according to those functions as follows: dependent
variables, independent variables, moderator variables, control vari-
ables, and intervening variables.
A dependent variable is the variable which is of most interest in a study;
it is measured or observed primarily to determine what effect, if any,
630 TESOL QUARTERLY

other variables have on it. In contrast, an independent variable is a
variable that has been chosen by the researcher to determine its effect
on the dependent variable. Consider a study in which the research
question is, What is the effect of B on A? In this case, B is the indepen-
dent variable and A is the dependent variable. The most important
relationship in a study is usually that between the independent and
dependent variables.
However, the researcher may also want to include a moderator variable
in order to determine the effect, if any, that this moderator variable has
on the primary relationship between the dependent and independent
variables. Including a moderator variable in the example above, the
research question might be posed as follows: What is the effect of B
on A when C is present or absent? B is still the independent variable,
and A is the dependent variable, but C is a new moderator variable.
Because language research may involve many variables interacting at
the same time, variables other than the dependent, independent, and
moderator variables must sometimes be accounted for without actually
including them in the research, Such variables may be treated as control
variables, that is, variables that the researcher wants to inhibit from in-
terfering with the interpretation of the results. Control can be accom-
plished by (a) removing a variable from the study, (b) holding a variable
constant, (c) making a variable a covariate, or (d) using random selection.
Consider a study of the effect of teaching Method X on English
language proficiency. The researcher might decide to use TOEFL
scores as the dependent variable, English proficiency. The indepen-
dent variable, called method, might be the existence or absence of
Method X, which would be studied by comparing English proficiency
scores on a test given to two groups—one group, which had been
taught by Method X (called a treatment group) and another, which
had received no instruction (called a control group). The researcher
would be primarily interested in the relationship between the indepen-
dent variable (method) and dependent variable (English proficiency).
However, there are a number of variables which might have an
impact on such a study: gender, years of language study, whether the
students had lived with a native speaker, number of languages spoken
at home, language aptitude, motivation, and so forth. The list of poten-
tial control variables could become quite long. The researcher might
choose to control one variable by (a) removing it from the study. (For
example, living with a native speaker could be removed from the study
as a variable by using only subjects who have never lived with a native
speaker.) The researcher might also decide to control another variable
by (b) holding it constant. (For instance, years of language study could
be held constant by using only those students who had studied 6 years
of English.) Still another variable might be controlled by (c) making it
STATISTICS AS A FOREIGN LANGUAGE 631

a covariate. A covariate is a variable that is measured so that it can be
controlled statistically. The effect of the covariate is removed statisti-
cally from the study during the analysis. For instance, the researcher
might choose to treat language aptitude as a covariate. Using scores on
a language aptitude test, this variable could be controlled by including
it as a covariate and statistically removing its effects.
One efficient way to simultaneously control many variables is by (d)
using random selection. If individuals are randomly selected when the
researcher is forming the groups in a study and the number of people
in each group is sufficiently large, the groups can be considered equiva-
lent on all variables other than those purposely manipulated as inde-
pendent, dependent, and moderator variables. This attribute of ran-
dom selection explains why it is sometimes so prominently discussed
in statistical studies.
Intervening variables are the last type to be covered here. This label
can cause confusion because it is used in two distinctly different ways
in the literature. In some situations, the intervening variable label may
be used to describe the theoretical relationship between the indepen-
dent and dependent variables. For instance, in the example study of
the effect of Method X on English language proficiency, the researcher
might label the theoretical relationship between the independent and
dependent variables as a “method effect” or “pedagogy” or “language
learning” or something else—depending on how the relationship is
conceptualized by the researcher.
In other situations, the intervening variable label may be used to
describe a variable that was unexpected in the study, yet surfaced
because it might explain the relationship between the independent and
dependent variable. For instance, it might turn out that any difference
discovered in the average TOEFL scores of the Method X and control
groups was caused by an unanticipated intervening variable rather
than by the different methods. The teacher of the Method X class
might have been more lively and interesting than the teacher of the
control group. Thus, any differences in student performance in En-
glish proficiency might have been due to differences in teaching style
(an intervening variable), which were not anticipated in the study, yet
could explain any relationship that was found. Researchers generally
prefer to avoid such surprises by anticipating and controlling potential
intervening variables.
Variables Can Be Measured on Different Scales
Once defined and labeled, the variables can be analyzed statistically

only after they have first been tallied, counted, or otherwise measured
in some way. Such quantification can come in a variety of forms called
632 TESOL QUARTERLY

scales. There are four types of scales that can be used to quantify
variables in language research, some of which are more precise than
others. Hence scales can be thought of as being hierarchically arranged
from least precise to most precise as follows: nominal, ordinal, interval,
and ratio scales.
Nominal scales are appropriate when data are categorized into groups.
These groupings might be natural or artificial. Naturally occurring
nominal scales in language research would include gender (female/
male), native language (Spanish/German, etc.), academic status (under-
graduate/graduate), and so forth. Other more artificial nominal scales
might include groupings like assignment by a researcher to an experi-
mental or control group, groups of elementary-, intermediate-, or
advanced-level students, and so forth. Nominal scales are also some-
times called categorical scales, or, if there are only two categories like
undergraduate/graduate, they may be called dichotomous scales. Regard-
less of what they are called, nominal scales identify and give names to
the categories or groups involved.
Ordinal scales order or rank the data. For example, a researcher might
want to use a scale that orders language learners from least to most
proficient in their overall English. To do this, the investigator could
use proficiency test scores to arrange the learners from low to high and
then assign each a rank using simple ordinal numbers. The highest
learner would be “first,” followed by the “second” student, and so forth.
Examples of potential ordinal scales in language research include the
rankings of students’ performances in a particular class, the ordering
of teachers’ abilities in a language program, and the achievement rank-
ings of various sections of a course. The important point is that ordinal
scales rank people, objects, or concepts, with each point on the scale
being “more than” and “less than” the other points on the scale.
Interval scales also describe the order of the points on the scale but
indicate the intervals, or distances, between the points, as well. Exam-
ples of interval scales include virtually all language tests as well as other
scales used to measure things like attitude, aptitude, and learning
styles. In brief, interval scales (sometimes also called continuous scales)
indicate the relative order and distances between points along the scale.
Ratio scales have all of the characteristics of the interval scales, but in
addition, they have a true zero and the points on the scale are precise
multiples, or ratios, of other points on the scale. For example, ratio
scales include things like students’ ages, the number of pages per
book in a library, and the number of students enrolled in a language
program. These are ratio scales because it makes sense to refer to zero
(i.e., zero years, zero pages, or zero students) and because ratios on the
scales make sense (i.e., 20 years old is twice as old as 10; 300 pages is
three times as many as 100; and 400 students is two times as many as

200). In contrast, an interval scale like TOEFL scores has no true zero
(the score obtained by guessing is 200), and ratios between points on
the scale are not exact (Is a student with 600 on the TOEFL exactly
twice as proficient as one with 300?). In short, ratio scales show the
relative order and distances of points along the scale but also have a
true zero, and exact ratios between points on the scale make sense.
Variables Can Have Independent or Repeated Levels
There is one last concept that is crucial in thinking about the vari-
ables. A nominal variable, by definition, can include a number of
categories. These categories are also called levels, particularly in refer-
ring to the number of categories. For instance, for a variable like
gender, there are two levels, female and male. For a variable like
language course of enrollment, there might be three levels—elemen-
tary, intermediate, and advanced. Naturally, there are as many levels
as there are categories within a nominal variable.
This definition of levels will prove important in thinking about statis-
tical studies because of the concept of independence. If the levels of a
variable are made up of two or more different groups of people, they
are viewed as independent. For instance, in a study comparing the
means of an experimental group and a control group, the groups can
be independent only if they were created using different groups of
people. Many statistics can only be applied if the groups being com-
pared are independent.
However, there are studies in which it might be necessary or desirable
to make repeated observations on the same group of people. For
instance, a researcher might want to compare the means of a single
group of students before and after some type of language instruction
in what is called a pretest-posttest study. Such investigations are called
repeated measures studies, and the groups can be said to lack indepen-
dence because they are the same people. When independence cannot
be assumed, different choices of statistics must be made. Thus it is
important to understand the difference between independent levels of
a variable (which were created by using different groups of people)
and repeated levels of a variable (which were created through repeated
measures or observations of the same people).
EXAMINE WHETHER THE CORRECT STATISTICAL

TESTS HAVE BEEN CHOSEN
As discussed in Part 1 of this article, there are three families of

statistics that are used for testing hypotheses, which involve (a) compar-
634 TESOL QUARTERLY

ing means, (b) comparing frequencies, and (c) comparing correlation
coefficients to zero. These three families represent the three central
types of research questions tackled in most statistical studies. In Table
1, the first column gives the central research issue involved for each of
the statistics. The succeeding columns list the scale and number of the
dependent variables (DVs), the scale and number of the independent
variables (IVs), any other conditions that must exist, and then the
statistic that would be appropriately applied if all of the foregoing
scales and conditions existed in a particular study. For instance, in
a study that focuses on the issue of comparing means (i.e., group
differences), with one interval dependent variable and one nominal
independent variable, and other conditions including repeated mea-
sures of two levels, the appropriate statistic would be a matched-pairs
t test (see Table 1). However, if everything were the same except that
the groups were independent, the appropriate statistic would be the z
statistic (if the samples are large—as a rule of thumb, over 30) or the
t test (for any size sample).
At first glance, Table 1 may seem very complex because it contains
an extraordinary amount of information in a compact form. However,
with practice, this table can prove very useful either in determining
which statistic to use given the conditions of a particular study or in
looking up the conditions that must exist for a given statistic to be
appropriately applied.
Consider a study in which the frequency of visits to a language
laboratory for males and females is being studied as they vary for
graduate and undergraduate students. Here, there is one nominal
dependent variable, sex (with two levels, male and female), and one
nominal independent variable, the status of the students (with two
levels, graduate and undergraduate). Since the central research issue
is comparing frequencies and there is one nominal dependent variable
and one nominal independent variable (with independent groups), the
table indicates that a chi-square statistic would be appropriate.
The reverse logic can also be applied to find out what conditions are
required to correctly apply a certain statistic. For instance, in the case
of multivariate ANOVA, the statistic would only be appropriate for
comparing group differences when there are two or more interval
scale dependent variables, one nominal independent variable, and the
groups are independent.
Turning now to Table 2, notice in the table title that it includes
exploratory statistics rather than hypothesis testing statistics, which
means that, unlike the statistics covered in Table 1, the ones in Table
2 do not rigorously test specific hypotheses (at specific probability
levels) as explained in Part 1 of this article. Instead, as shown in the
first column of Table 2, these statistical procedures are used to explore

TABLE 2
Choosing From Among Other More Exploratory Statistics
relationships between variables by helping to identify (a) underlying
variables, (b) similar subject performance, (c) group membership,
(d) the existence of a scale, or (e) causal relationships. The other col-
umns in the table list the types of scales that are usually involved, any
other conditions required for the particular statistic, and the appro-
priate statistic itself. Like Table 1, this table can be used either to
determine which statistic to use given the conditions of a particular
study or to look up the conditions that should exist for a given statistic
to be appropriately applied.
CHECK THE ASSUMPTIONS UNDERLYING THE

STATISTICAL TEST
Assumptions are preconditions that are necessary for accurate applica-

tion of a particular statistical test. In some cases, these assumptions are
not optional; they must be met for the statistical test to be meaningful.
However, the assumptions will be of lesser or greater importance for
different statistical tests. In addition, arguments have been proposed
that certain statistics are robust to violations of certain assumptions (i.e.,
not greatly affected by violations of those assumptions). Nonetheless,
it is important for readers to verify that the researcher has thought
about and checked the assumptions underlying any statistical tests
which were used in a study and that those assumptions were met or
are discussed in terms of the potential effects of violations on the results
of the study.
Principal Assumptions
The principal assumptions that will be discussed here are those which
come up most frequently in language studies: independence of groups
and observations, normality of the distributions, equal variances, lin-
earity, nonmulticollinearity, and homoscedasticity.
The assumption of independence of groups implies that there must be
no association between the groups in a study. Put another way, knowing
the data in one group should give no information about the data in
another group. The most obvious violations of this assumption occur
when the same people appear in more than one group. Note that
independence of groups is far from a universal assumption. Some
statistical tests assume independence of groups, whereas others allow
for repeated measures of the same people (as discussed above). Indeed,
correlation coefficients require repeated measures if they are to be
calculated at all. When assumed for a particular statistic, independence
of groups can be checked by answering one question: Is there any

reason to believe that there is association between the observations
made on the groups in this study?
Another fairly common assumption is independence of observations,
often required for proper application of correlational and other statis-
tics. Here, the assumption is that there is no association between the
observations within a group. For instance, if students within a group
were copying answers from one another on a test, this assumption
would be violated because the scores on the tests would no longer be
independent of each other. This assumption is best met by carefully
planning and carrying out the procedures for gathering data (includ-
ing random selection, when possible) so that potential violations are
minimized.
Normality of the distributions is often required for proper application
of statistical tests in mean comparisons. Violations of this assumption
are less troublesome if the sample sizes are large, which is to say that
it is more important to worry about violations of this assumption if
sample sizes are small. Nonetheless, the researcher (and the reader)
can check for serious violations of this assumption by careful examina-
tion of the descriptive statistics (explained in Part 1) for all groupings
in a study. If the descriptive statistics for all levels of all variables
indicate that the distributions are normal, there is no need to worry
about violations of this assumption. Essentially, the distribution can be
taken as normal if there is room for two or three standard deviations
on either side of the mean and if there are no outliers (extremely large
or small values).
Obvious violations of the assumption of equal variances can also be
detected by examining the standard deviations in a study because the
variances are simply the standard deviations squared. If there are big
differences in these squared values, there are probably violations of
this assumption. More subtle differences are investigated by applying
statistics, like the F max test, designed to detect such differences. For
instance, in Brown ( 1991), it was stated that
F max = 2.2695, which was significant at p <.05 . . . this result indicates that
there is a probable violation of’ a restrictive assumption (i.e., homogeneous
variances) that accompanies repeated measures designs like the one re-
ported here. To address this issue . . . (p. 594)
Notice that the assumption was checked in Brown (1991), a violation
was found, and the issue was addressed. The researcher was forthright
about the issue.
The assumption of linearity often applies in the correlational and
prediction family of statistics. Simply stated, it means that there is
a straight-line relationship between the two variables involved. This
assumption can be checked by examining a scatterplot of the two
640 TESOL QUARTERLY

variables. Figure 1 shows a plot of the scores on Test Y and those on
Test Z. Notice that the X in the lower left corner within the graph
represents the scores for a person who scored 2 on Test Y and 2 on
Test Z. Each X in the figure represents both scores for a single individ-
ual. Thus a quick count will indicate that at least 18 people were
involved. Notice also that the Xs as a group line up pretty much in an
angled straight line. Thus the variables Y and Z can be said to be
linearly related. Figure 2 represents another type of situation, called a
curvilinear relationship, because the Xs form a curve rather than a
straight line. There are other possible variations away from linearity,
but the point is that this assumption can be checked by examining a
scatterplot of the variables involved and deciding whether the relation-
ship displayed is linear or curvilinear.
The assumption of nonmulticollinearity is a problem if the variables in
a study are too highly interrelated. This assumption, often applied in
statistical procedures which are based on correlation and prediction,
can easily be checked by examining a table of the correlation coeffi-
cients for each pair of variables in a study. A problem of multicollinear-
ity may exist if any two variables are highly correlated. If this assump-
tion is required by a statistical procedure, the researcher should
FIGURE 1
Relationship Between Scores on Tests X and Y

FIGURE 2
Relationship Between Scores on Tests A and B
provide a correlation table so that the reader can see that none of the
correlations is high.
The final assumption of concern here is that of homoscedasticity. This
assumption, which is often applied to statistical procedures based on
correlation and prediction, is that the variability of scores on one
variable is about the same at all values of the other variable. One way
to check this assumption is to examine a scatterplot of the two variables,
like that shown in Figure 1, and determine whether or not the points
that deviate away from the straight line are about the same distance
from the line all the way along it. For example, those data points that
do vary away from the straight line in Figure 1 do appear to be about
equal in distance from the line. Thus the assumption can be checked
roughly by examination of scatterplots.
Which Assumptions Underlie Each Statistical Test?
Table 3 provides a list of all of the statistics covered in this paper

along with a tally of the assumptions that apply to each statistic. It is
important to note that this table is a summary of information contained
in a number of different books (Conover, 1980; Guilford & Fruchter,
642 TESOL QUARTERLY

1973; Hatch & Lazaraton, 1991; Shavelson, 1981; Tabachnick & Fidell,
1989). In the process of summarizing, it was necessary to interpret and
make a number of judgments. In general, these decisions were made
to minimize the number of assumptions rather than to maximize them.
If an assumption was never specifically mentioned for a particular
statistic, it was not included here even though it might make sense.
Similarly, if an assumption that was listed was more closely related to
the design considerations already covered in Tables 1 and 2 of this
paper, it was not mentioned again in Table 3. Thus this table should
be interpreted as including only the minimum assumptions.
To read Table 3, first notice that the statistical families and statistics
are listed in the left-hand column, while the seven most commonly
required assumptions are labeled across the top. At the far right, there
is an additional column for other less commonly required assumptions.
Thus the rows in the table represent different statistics, and the col-
umns represent the different assumptions. If an assumption is required
for a particular statistic, an X appears in the applicable column and
row. For instance, the Pearson r has four assumptions: independence
of observations, normality, linearity, and homoscedasticity. These as-
sumptions are represented by the Xs in the second, third, fifth, and
seventh columns of the first row in Table 3.
CONSIDER WHY THE STATISTICAL TESTS HAVE

BEEN APPLIED
There are obviously a large number of statistical tests listed in Tables

1 and 2, many of which may be unfamiliar to readers. Detailed explana-
tions of each statistic are beyond the scope of this article. (For more
information, see books like Brown, 1988; Butler, 1985; Hatch & Far-
hady, 1982; Hatch & Lazaraton, 1991; Seliger & Shohamy, 1989; or
Woods, Fletcher, & Hughes, 1986.) However, in this section, the three
families of statistics will once again be visited so that the reasons for
applying one statistic over another within these families can be ex-
amined.
Mean Comparisons
In order to understand the relationships between the various statisti-

cal tests for mean comparisons, it may be useful to first realize that
there are three basic kinds, depending on whether the dependent
variable is one interval scale, includes two or more interval scales, or is
a single ordinal scale as shown in Figure 3.
For interval scale comparisons, the essential comparisons being made

FIGURE 3
Decision Tree for Group Comparisons (i.e., Nominal Scale IVs)
between groups are between the means. When there is one interval
scale dependent variable (DV) and one nominal independent variable
(IV), and only two independent means are being compared as levels
of the IV, it is appropriate to use the z statistic or the t test. As noted
in Figure 3, the essential difference is that the z statistic can only be
appropriately applied to large samples (as a rule of thumb, 30 or more),
whereas the t test can be applied to any sample size. If two or more
independent means are being compared as levels of a single IV, one-
way analysis of variance (ANOVA) can be used. However, in the same
situation, but with a covariate, one-way analysis of covariance
(ANCOVA) would be the statistical procedure of choice.
When there is one DV and one IV, but the means that are being
compared are not independent, that is, are repeated, there are statisti-
cal tests analogous to those in the previous paragraph for each set of
circumstances. If two nonindependent means are being compared as
levels of one IV, a matched-pairs t test would be appropriate. If two or
more nonindependent means are being compared for one IV, repeated
measures ANOVA can be used. However, in the same situation, but
with a covariate, repeated measures ANCOVA would be the statistical
procedure of choice.
If there is one DV, but two or more IVs, and the means being
compared are all independent, it is appropriate to use n-way ANOVA
(where n is the number of IVs, e.g., two-way, three-way, etc.). However,
if there is a covariate involved, n-way ANCOVA is the statistic of
choice. If some or all of the means are not independent, analogous
comparisons can be made by using n-way repeated measures ANOVA
or n-way repeated measures ANCOVA. Note that Figure 3 indicates
that, when there are two or more interval scale dependent variables,
exactly the same statistical tests as those just described should be used,
but in their multivariate versions.
Notice that the purpose of all of the statistical procedures described
above is to determine whether any differences observed among the
means are significant (as described in Part 1 of this paper). A number
of problems in interpreting the probability values arise if more than
two means are compared using ordinary t tests (see Brown, 1990,
and Siegel, 1990, for more discussion of this issue). In brief, because
probability levels become difficult to interpret if used over and over,
other forms of analysis have been created to deal with various kinds of
multiple comparisons while holding probability levels steady. Thus in
situations where more than two means are to be compared, it is neces-
sary to employ the more complex statistical described above. However,
as shown in Figure 3, the choices among these various analysis of
variance procedures are relatively simple if one considers all of the
necessary factors.

Also note that all of these mean comparison procedures may be
followed up by more detailed comparisons like Scheffé, Tukey, Dunn,
or other planned or post hoc comparisons to determine exactly where
any significant differences may be located. However, in most cases, the
researcher is interested in whether differences in particular pairs of
means are statistically significant.
For ordinal scale comparisons, the group comparisons of interest will
typically be between medians rather than means. In most cases, these
statistical tests cm-t also be applied if the data are on interval scales.
However, they are usually applied to interval scales only when those
scales do not meet the assumptions of the statistics listed in Figure 3
because the statistics in this subsection are less powerful, that is, they
are less likely to find statistically significant differences when they are
present in the population. The statistics typically used for making mean
or median comparisons for one ordinal scale DV between two levels of
a single IV are the Median test or U test (also called the Wilcoxon test).
If there are three or more levels in such a comparison, the Kruskal-
Wallis test would be used. When the comparisons are not independent,
two levels of a single IV would typically be compared using a Sign test,
while three or more levels could be compared using the Friedman one-
way ANOVA. There are other statistical methods that can be used for
making comparisons on ordinal scales, but the ones listed here are
those most commonly reported in language research.
Correlational Analyses
In order to understand the relationships between the various statisti-

cal tests for correlational and prediction analysis, it will be useful to
first realize that there are four basic kinds depending on whether the
dependent variable is one interval scale, includes two or more interval
scales, is a single ordinal scale, or is a single nominal scale as shown in
Figure 4. If the DV is one interval scale, and there is a single interval
IV, Pearson r and/or simple regression are called for (particularly if
prediction of the DV from the IVs is of interest). Pearson r can be used
alone in situations where the researcher wants to examine a single
correlation coefficient or where the focus is on the correlations of each
of a number of variables with each other. However, each time r is
calculated, the researcher is treating it as though there is a single DV
and a single IV. When there are two or more IVs being examined in
relationship with a single DV, multiple regression and/or multiple R
would be appropriate (particularly for prediction of the DV from the
IVs and when multiple correlations are of interest).
If, on the other hand, there are two or more DVs and two or more
IVs and the researcher wants to study how the correlations among the
648 TESOL QUARTERLY

FIGURE 4
Decision Tree for Correlation/Prediction Statistics
DVs are related to the correlations among the IVs, canonical correla-
tion analysis would be appropriate.
When the DV is an ordinal scale and the IV is a single ordinal scale,
the appropriate form of analysis would be Spearman rho or Kendall
tau. If there are two or more levels in the IV, as in five sets of ranks that
are to be examined for the degree to which they are all simultaneously
related, then Kendall W would be the correct statistic.
Finally, if the DV is a nominal scale and there is one interval IV
which is a naturally occurring dichotomy, or “true” dichotomy (e.g.,
male/female), the point-biserial correlation should be used. If the di-
chotomy is an artificial one (e.g., old/young based on age groupings,
where young = 39 years or less, and old = 40 years or more), the
appropriate statistic would be the biserial correlation. If the DV is
nominal and there are two or more interval scale IVs, loglinear analysis
would be useful (especially when prediction of the DV from the IVs is
of interest). If the DV is a nominal scale and there is one nominal scale
IV which is a true dichotomy, the phi coefficient should be used. If the
dichotomy is an artificial one, the appropriate statistic would be the
tetrachoric correlation.
Frequency Comparisons
To better understand the relationships between the various statistical

tests for frequency comparisons, it is important to realize that all of the
analyses described in Figure 5 have one DV which is by definition a
nominal scale frequency count. The IVs are nominal and differ only

FIGURE 5
Decision Tree for Frequency Statisticsa
in their number in three basic ways, depending on whether there are

one, two, or two or more IVs. If there is only one IV with two or more
independent levels, the chi-square test would typically be used. If those
levels are not independent, the McNemar test would be appropriate.
In a situation where there are two nominal IVs and exactly two inde-
pendent levels of each (2 x 2), Fisher’s exact test would be used. In
studies that have two or more IVs, n-way chi-square or multiway fre-
quency analysis would normally be used.
Exploratory Statistics
Another set of statistical analyses that was characterized earlier as

exploratory is shown in Figure 6. These analyses can be differentiated
by what it is they are designed to explore. As indicated on the left side
of the tree, there are five such purposes: finding underlying variables,
exploring similar subject performance, determining group member-
FIGURE 6
Decision Tree for Exploratory Statistics
ship, exploring the existence of a scale, and investigating the existence
of causal relationships.
When searching out underlying variables (mostly among interval
scales) that are linearly related, principal components analysis and
factor analysis would be appropriate. However, if the relationships are
multidimensional rather than linear, multidimensional scaling would
be more appropriate. If similar subject performance across a number of
interval scales is the issue and the relationships are linear in nature,
cluster analysis would often be applied. In situations where group mem-
bership is of interest on interval scales and there is one attribute of
interest, discriminant analysis would be applied. If two or more attri-
butes are involved, n-way discriminant analysis could be used. To
investigate the existence of a scale on a nominal variable, Guttman scaling
would be used. Finally, causal relationships can be explored for interval
scales by using path analysis, and for nominal scales by using loglinear
path analysis.
PRACTICE READING STATISTICAL TABLES AND

INTERPRETING STATISTICS
Tables are a common way of summarizing a great deal of statistical

information in a small amount of space. Unfortunately, since some
readers are intimidated by tables, they may simply skip over them. In
my view, that strategy is unfortunate because readers miss the heart of
the statistical argument being presented and lose an opportunity to
read critically. The main point in facing tables is that the reader should
systematically set about the task of figuring out what the tables mean.
The following three basic steps might prove useful in systematically
interpreting numerical tables:
1. Examine the table.
a. Figure out the column labels and sublabels.
b. Figure out the row labels and sublabels.
c. Figure out the statistical abbreviations.
d. Figure out the variable abbreviations.
2. Identify the statistics of interest in the study (this will differ from
statistic to statistic—see more specific discussions in the following
four sections).
3. Interpret the results.
a. Check if the statistics chosen were the appropriate ones given the
number and types of variables, whether they were independently
measured, and so forth (see Tables 1 and 2, and Figures 3–6).

b. Look to see if the researcher has checked the assumptions under-
lying the statistical analysis chosen for the study (see Table 3).
c. Analyze whether the statistically significant differences are
meaningful in view of the descriptive statistics that are reported
in the study.
These steps will be discussed below as they apply to tables used
to report descriptive statistics, mean comparisons, correlations, and
frequency analyses. The same general strategies will work for any type
of table including those for the exploratory statistics listed in Table 2
and Figure 6.
Descriptive Statistics
In any table, readers should begin by determining what the column

and row labels mean. To do so, they should work from the outside edges
of the table inward. For instance, consider Table 4. First, there is the
general class of labels given across the top of the table for the columns;
these labels delineate the columns for Student type. Second, under-
neath that overall Student type label, there are three subcategories:
ENG and ESL essay scores for the two types of students (i.e., from the
English and ESL departments) and another category that combines
the English and ESL types of students (Essay scores combined). Third,
under each of the three subcategories, there are three statistics labeled
in columns: the number of students, the mean, and the standard
deviation, or n, M, and SD. At the far left in Table 4, there is one other
label, Rater faculty, which identifies the general category for the three
subcategories represented in each row of the table: ENG, ESL, and
Faculties combined.
TABLE 4
Descriptive Statistics for Student Type and Rater Faculty
Student type
ENG 100 essay ENG 100 essay Essay scores
scores scores combined
Rater
faculty n M SD n M SD n M SD
ENG 112 2.46 1.11 112 2.30 .77 224 2.38 .96
ESL 112 2.37 1.16 112 2.31 .96 224 2.34 1.06
Faculties
combined 224 2.42 1.14 224 2.30 .87 448 2.36 1,01
Note. From Brown, 1991, p. 593.
652 TESOL QUARTERLY

After figuring out the column labels and sublabels as well as the row
labels and sublabels, it is often necessary to decipher the abbreviations
that the author has used. Typically, there are two types of abbrevia-
tions: statistical abbreviations and abbreviations for variables.
Deciphering statistical abbreviations usually involves looking at the
table title, table footnotes, or accompanying prose description in order
to figure out what the abbreviations symbolize, but it is the author’s
responsibility to provide the reader with that information somewhere.
In the case of Table 4, n is for number of essays, M is for mean, and
SD is for standard deviation. These abbreviations are fairly common,
but may vary from journal to journal.
In deciphering abbreviations for variables, experience will not help
as much. Since the variables differ from study to study, there is no
standardization at all. Readers need to figure out what the author
intended with each abbreviation by referring to the table title, any table
footnotes, and the prose description associated with that table. Effective
authors will make all such abbreviations clear. For instance, the author
of the article from which Table 4 was drawn provided prose descrip-
tions of the differences between the ENG and ESL groupings of stu-
dents’ essays and faculty raters, both of which are referred to in Table
4. However, readers will have to decide for themselves if those prose
explanations were adequate. If readers are still confused by the abbre-
viations after checking the prose descriptions, the author has not been
sufficiently clear.
Only after having identified the rows and columns and the abbrevia-
tions associated with them is it possible to interpret the numbers that
are found in the table. If readers would like to compare their interpre-
tations of Table 4 with mine, they will find at least a portion of mine
in the section above on the assumption of normality of the distributions;
and a fuller discussion in the original article (Brown, 1991).
Table 5 provides another slightly different approach to displaying
descriptive statistics. Starting with the column labels readers will find
that there is only one level of labels and that the two to the right are
for two different groups: one made up of the students from a SLEP
sample and another from the original Norm sample. Notice that the
table is laid out so that the performance of the two groups can easily
be compared on the basis of statistics which are labeled in a Statistic
column on the left side of the table. In comparing Tables 4 and 5,
readers should notice four things: (a) that Table 5 is relatively easy to
read because it has comparatively few levels of labels and fewer num-
bers, (b) that Table 5 has the statistics arranged in rows rather than in
the columns that served the same purpose in Table 4, (c) that additional
statistics are given in Table 5, and (d) that the labels are slightly differ-
ent for those statistics.

TABLE 5
Summary Descriptive Statistics (HSTECa Scores) for Norm and
SLEP Samples for Ninth Grade Only
Correlational Analyses
Often when correlation coefficients are summarized in statistical

studies, they are presented in what are called correlation matrices. A
correlation matrix is just a grid made up of correlation coefficients.
Table 6 illustrates some of the typical characteristics of a correlation
matrix. For each correlation coefficient, the essential question that
readers must ask themselves is, What is being correlated with what?
For instance, consider the .36 correlation shown in the upper-right
corner of Table 6. This statistic represents the correlation between
what and what? To figure that out, readers must once again work from
the outside inward by using the column and row labels to decipher
each particular statistic. For instance, in the case of the .36, the column
label is ESL/B and the row label is ENG/A. Looking to the table title,
which includes the phrase Rater Groups and the Interrater correlation
label within the table, it appears that the .36 correlation coefficient
TABLE 6
Correlation Matrix for Rater Groups
represents the relationship between the scores assigned by the B group
of ESL raters and the A group of English department raters. The same
process can be repeated for each of the statistics, and, with practice, it
should become relatively easy.
There are several other things that readers should notice about the
correlation matrix shown in Table 6. First there is a series of 1.00
correlations that run diagonally across the table. These 1.00 correla-
tions result because they represent the correlation between each vari-
able and itself. Readers can use the diagonal to help find and keep
their place in a correlation matrix. Note also that often correlation
coefficients are given only below the diagonal (or only above, as in
Table 6) because some researchers feel that presenting them both
above and below the diagonal would be redundant: It would mean
presenting two sets of the same numbers.
Some authors will choose to put the correlation coefficients above
the diagonal and other types of coefficients below the diagonal. Au-
thors will also vary in whether they use the upper portion of the table
or the lower portion. The point is that readers will need to examine
the table, use the diagonal to help orient themselves, and figure out
the meaning of each statistic from the column and row labels.
Readers may also have noticed in Table 6 that there are asterisks
next to each of the coefficients. These asterisks refer to the note just
below the table which in turn indicates that each coefficient with an
asterisk was significant at p < .05.
Table 7, taken from Patkowski (1991), illustrates a way to present
correlation coefficients that is not a matrix. Systematic examination will
reveal that there are three row labels. Looking back to the text associ-
ated with the table, you will find that these three abbreviations (math,
CRAT, and WAT) are for three variables: math scores derived from
a basic skills test, scores on the CUNY Reading Assessment Test, and
scores on the Writing Assessment Test. The column labels are for two
statistics: r for correlation coefficients and p for probability. The next
step is to figure out what variables are involved in each of the correla-
TABLE 7
Pearson Correlations Between Subjects’ Entering Scores and Grade Point Averages
Independent variables r p
Math a (n = 263) .344 .000
CRATa (n = 263) .255 .000
WAT (n = 264) .169 .006
Note.
a
From Patkowski, 1991, p. 736.
CRAT = CUNY Reading Assessment Test; WAT = Writing Assessment Test.

tion coefficients. It is only by looking at the table title that it becomes
clear that each of these coefficients represents the relationship between
grade point averages and the variable label for the corresponding
row. In other words, the correlation of .344 represents the degree of
relationship between grade point average and students’ math scores,
the .255 represents the degree of relationship between grade point
average and CRAT scores, and so forth.
The p values on the right side indicate the significance level. In the
case of .000, it can most easily be interpreted by adding a 5 on the end
of the number so that, for the .344 correlation coefficient, p can be
interpreted as being less than .0005. Since all of these p values are well
below the .05 level, the three coefficients can be taken to be statistically
significant. Naturally, interpretation of these statistical significance
should be tempered by interpretation of their meaningfulness as well.
As the author put it, “the best predictor of grade point average of ESL
students was performance on the math test; yet, even that correlation
was very modest (an r of .344 only explains 11.8% of the variance). ”
Another more complex matrix of correlation coefficients is pre-
sented in Table 2 of Hudson (1991). Readers may wish to examine that
table in order to practice deciphering it. Readers will notice that (a)
Hudson is presenting two different levels of column and row headings,
(b) there are really nine little matrices in the table, (c) each matrix has
its own diagonal, and (d) the coefficients are below the diagonal in each
matrix.
Group Comparisons
Before discussing tables that present group comparisons, it is neces-

sary to define a number of terms. The first is main effect, which can be
defined as the overall effect of an independent variable on a dependent
variable. The main effects will normally be discussed in terms of the
statistical significance of a main effect within ANOVA-type designs
that have two or more independent variables. For instance, in an
imaginary study of the effect of motivation and aggressiveness (inde-
pendent variables) on ESL proficiency (dependent variable), the re-
searcher might break a group of 126 students into high and low motiva-
tion groups (on the basis of some measure) and high and low
aggressiveness groups (on the basis of another measure). The result
would be four groups as shown in Figure 7: one group of high motiva-
tion and high aggressiveness (n = 30), one group of high motivation
and low aggressiveness (n = 32), one group of low motivation and high
aggressiveness (n = 31), and one group of low motivation and low
aggressiveness (n = 33). The study might then proceed by giving all
of these students an overall ESL proficiency test and comparing the
656 TESOL QUARTERLY

FIGURE 7
Motivation and Aggressiveness Imaginary Study
proficiency means for the four groups. The researcher would be most
interested in the main effects due to motivation and aggressiveness.
Essentially, the main effect for motivation is arrived at by comparing
the overall mean of the two groups that are high motivation with the
overall mean of the two groups that are low motivation. In other words,
the combined groups with n = 62 and n = 64 (as shown at the bottom
of Figure 7) would be compared to determine the main effect due to
motivation. In addition, the main effect for aggressiveness is arrived
at by comparing the overall mean of the two groups that are high in
aggressiveness with the overall mean of the two groups that are low in
aggressiveness. In other words, the groups with n = 61 and n = 65 (as
shown to the right of Figure 7) would be compared to determine the
main effect due to aggressiveness.
An interaction effect is quite different from the main effects in a study,
yet it is important in interpreting the main effects. Essentially, if there
is a statistically significant interaction effect in a study, it means that the
main effects are not in a straightforward relationship. More precisely, a
significant interaction effect indicates that the effects of one indepen-
dent variable differ at various levels of the other independent variable.
For the example given in Figure 7, it turns out that there is an interac-
tion effect for motivation and aggressiveness, as shown in Figure 8.

FIGURE 8
Interaction of Motivation and Aggressiveness in Imaginary Study
The fact that the lines in Figure 8 cross indicates that the high-
aggressiveness students are low in proficiency if they are low in motiva-
tion, and high in proficiency if they are high in motivation, whereas
the opposite is true for low-aggressiveness students, that is, they are
low in proficiency if they are high in motivation, and high in proficiency
if they are low in motivation (note that in some significant interactions,
the lines will only approach each other, nearly crossing). In other
words, even if there were a significant main effect for either motivation
or aggressiveness, it would be necessary to explain the interaction effect
because the motivation and aggressiveness main effects interact and
the relationship is not simple and straightforward (as it would be if the
lines were parallel and there was no significant interaction effect).
Thus, if there are significant interaction effects in a study, the authors
should explain them using a figure like Figure 8 or a prose explanation
of what they think is going on.
As will be explained below, in many mean comparison tables, statis-
tics will be reported for each main effect and for all possible interaction
effects. Statistics will also be reported for what is usually called the
residual (or error), which is all of the variance not accounted for by the
main effects and interactions.
In the process of reporting the main effects, interactions, and resid-
ual, there may be some differentiation between within-subjects and
between-subjects comparisons. Within-subjects comparisons are those
658 TESOL QUARTERLY

which are not independent or are repeated (in the terms discussed
above). In contrast, between-subjects comparisons are between groups
that are independent of one another. As discussed earlier, this distinc-
tion is particularly important in differentiating between ordinary mean
comparisons and repeated measures comparisons.
With all of these terms in hand, the following special substeps might
prove useful in identifying the statistics of interest in group comparison
studies: (a) find the main effect(s) and whether they are significant, (b)
find the interaction effect(s) and whether they are significant, and (c)
check for any follow-up statistics that were necessary to investigate
individual pairs of means.
Focusing for the moment on those three steps, consider the one-way
repeated measures A NOVA example presented in Tables 8– 10 from
Oh ( 1992). The researcher wanted to know if there were any significant
differences among the means shown in Table 8. Because there was
only one independent variable, Measures, with four levels (Cloze,
Think-aloud, Comprehension/recall 1, and Comprehension/recall 2),
the researcher began by performing a one-way ANOVA for the four
sets of scores taken together, as shown in Table 9. Since the same
people were assessed on the four measures, the measures were not
independent. Hence a repeated measures design was used, and the
effect for between-measures comparisons was reported as a within-
subjects comparison.
TABLE 8
Descriptive Statistics of CIQa Scores
M SD Range
Comprehension/recaIl 1 21.72 6.04 15–39
Comprehension/recall 2 24.78 6.80 16–39
Cloze 33.33 9.87 17–48
Think-aloud 31.11 10.86 18–55
Note.
a
From Oh, 1992, p. 174.
CIQ = Cognitive Interference Questionnaire.
The effect for between measures is easy to spot in Table 9, and it
was significant as indicated by the F and p values on the right side of
the table. Since this is a one-way ANOVA, in which there can be no
interaction effects, there is no need to think about interactions here.
The other statistics given in Table 9 were all part of the calculations
that led to the F and p values. To give you just a taste of the relationships
that can exist in such a table, notice that the mean squares (MS) values
are all derived by dividing the sums of squares (SS) values by the
degrees of freedom (df); for example, the MS for between measures is

TABLE 9
Results of Repeated Measures ANOVA for CIQa Scores
525.83, or SS ÷ df = 1577.49 ÷ 3 = 525.83. Note also that the F value

is calculated by dividing its MS value by the MS value for the residual,
A full explanation of these relationships is beyond the scope of this

paper. However, it is important to note that, while the F and p values
are of most interest to most readers, some will be interested in the
underlying steps that lead to those values. Hence, all of these statistics
are usually reported.
Next, it is necessary to look for follow-up statistics. Since the ANOVA
reported in Table 9 only tells the researcher that there is at least
one significant difference somewhere among the four means being
compared, follow-up analyses were necessary to determine more pre-
cisely where the difference(s) may be. Oh (1992) chose to use Bonfer-
roni significant difference tests (see Table 10), which are a special type
of t test designed for making multiple comparisons after an overall
significant difference has been found in an ANOVA study. There are
other types of follow-up analyses from which the researcher could have
TABLE 10
The Results of Bonferroni Significant Difference Tests
chosen, including Scheffé, Tukey, Dunn, orthogonal comparisons. All
of these follow-up statistics are designed to find out which particular
pairs of means are significantly different. In this case, it turned out
that 5 of the 6 possible pairs of differences were statistically significant.
The exception was the cloze versus think-aloud comparison (i.e., CL
vs. TA in Table 10). Recall that all of this indicates that the significant
pairs of mean differences occurred for other than chance reasons with
at least 95% certainty (as shown in the footnote, p < .05). Following
through on Steps 3a-c above should then lead to thinking about
whether the correct statistical analysis was chosen, whether the assump-
tions of that statistical analysis were met, and whether the results are
also meaningful in light of the descriptive statistics (i.e., the actual
means involved). Readers might refer directly to Oh (1992) and read
the whole article in order to think about those three issues.
More complex designs are just more elaborate variations of the same
issues. With patience and the step-by-step approach advocated here,
any statistical table can be understood (as long as it makes sense to start
with). For example, Table 11 (from Chiang & Dunkel, 1992) is a fairly
complicated looking four-way repeated measures ANOVA design (or
TABLE 11
ANOVR Summary of Postlecture Comprehension Test Scores for Variables of Prior
Knowledge, Speech Modification, Listening Proficiency, and Test-Item Type
Source MS df F

ANOVR in their table title) because it has four independent variables,
one of which is repeated. To begin, identify the main effects of interest.
In this case, Table 11 indicates that Prior knowledge (PK), Speech
modification (SM), and Listening proficiency (LP) are independently
sampled independent variables because they fall under the Between-
subjects heading, and Test-item type (TT) is a repeated measure be-
cause it falls under the Within-subjects heading. It turns out that each
of these main effects was statistically significant at p <.001, as indicated
by the double asterisks and the note below the table.
The next step involves examining the interaction effects. Table 11
gives all possible combinations of the four variables (e.g., PK x SM, PK
x LP, etc. ) and shows the tests for significant interactions. It turns out
that only two of these interactions, SM x LP and PK x TT, were
significant. The authors very properly discuss in detail those significant
interactions including several figures showing how the lines approach
each other, along with prose descriptions of their interpretation of
what was going on.
The final step involves examining any follow-up procedures. In this
study, which had only two levels of each variable, the mean comparisons
for main effects gives the only possible comparison within each inde-
pendent variable. Thus follow-up procedures were unnecessary.
Frequency Comparisons
The same general steps apply to reading tables of frequency compar-

isons such as those shown in Table 12. At this point, readers should be
able to interpret Table 12 on their own, applying the same steps that
were used above to the task of interpreting Table 12.
My interpretation of the table is presented in Part 1 (p. 580) of this
article and in the source article (Brown, 1991), as well. Readers may
find it interesting to look at those interpretations after having consid-
ered their own. Remember, readers’ carefully formulated interpreta-
tions are just as valid as mine.
CONCLUSION
Learning to read statistical language studies requires a continuing

investment of time and energy. Remember, like ESL or applied linguis-
tics, research design and statistics are legitimate fields of study leading
to MA or PhD degrees. It may therefore have been a quixotic task that
I set myself in trying to cover so much ground in only two articles.
Nevertheless, these two articles have attempted to provide readers
with tools that will help them gain access to the many statistical studies
662 TESOL QUARTERLY

TABLE 12
Overall Best Features Identified by Each Faculty
that appear in the TESOL Quarterly and other journals in our field. Ten
strategies were suggested that should help readers to become more
comfortable with statistical studies. I hope that some readers have
found that these articles have whetted their appetites to expand their
knowledge of research design and statistics. It is important for as many
people as possible within the field to have a critical understanding of
statistical studies so that a large number of critical readers help ensure
the quality of those studies.
ACKNOWLEDGMENTS
I would like to thank Andrew F. Siegel and Ann Wennerstrom for their insightful
comments and suggestions on an earlier version of this paper.
THE AUTHOR
J. D. Brown is on the faculty of the Department of ESL at the University of Hawaii

at Manoa. He has published numerous articles on language testing and curriculum
development, and a book on critically reading statistical studies (Understanding
Research in Second Language Learning, 1988, Cambridge University Press).
REFERENCES
Brown, J. D. (1988). Understanding research in second language learning: A teacher’s

guide to statistics and research design. Cambridge: Cambridge University Press.

Brown, J. D. (1990). The use of multiple t tests in language research. TESOL
Quarterly, 24 (24), 770-773.
Brown, J. D. (1991). Do English and ESL faculty rate writing samples differently?
TESOL Quarterly, 25 (4), 587-603.
Brown, J. D. (1992). Identifying ESL students at linguistic risk on a state minimal
competency test. TESOL Quarterly, 26 (1), 167–172.
Butler, C. (1985). Statistics in linguistics. Oxford: Blackwell.
Chiang, C. S., & Dunkel, P. (1992). The effect of speech modification, prior
knowledge, and listening proficiency on EFL lecture learning. TESOL Quarterly,
26 (2), 345–374.
Conover, W. J. (1980). Practical nonparametric statistics(2nd ed.). New York: Wiley.
Guilford, J. P., & Fruchter, B. (1973). Fundamental statistics in psychology and education
(5th ed.). New York: McGraw-Hill.
Hatch, E., & Farhady, H. ( 1982). Research design and statistics for applied linguistics.
Rowley, MA: Newbury House.
Hatch, E., & Lazaraton, A. (1991). The research manual: Design and statistics for applied
linguistics. New York: Newbury House.
Hudson, T. (1991). A content comprehension approach to reading English for
science and technology. TESOL Quarterly, 25 (1), 77–121.
Oh, J. (1992). The effects of L2 reading assessment methods on anxiety level.
TESOL Quarterly, 26 (1), 172–176.
Patkowski, M. S. (1991). Basic skills tests and academic success of ESL college
students. TESOL Quarterly, 25 (4), 735–738.
Seliger, H. W., & Shohamy, E. (1989). Second language research methods. Oxford:
Oxford University Press.
Siegel, A. F. (1990). Multiple t tests: Some practical considerations. TESOL Quar-
terly, 24 (4), 773–775.
Shavelson, R. (1981). Statistical reasoning for the behavioral sciences. Boston: Allyn
and Bacon.
Tabachnick, B. G., & Fidell, L. S. (1989). Using multiuariate statistics(2nd ed.). New
York: HarperCollins.
Woods, A., Fletcher, P., & Hughes, A. (1986). Statistics in language studies. Cam-
bridge: Cambridge University Press.
664 TESOL QUARTERLY

®
Demystifying the TOEFL
Reading Test
BONNY NORTON PEIRCE
Ontario Institute for Studies in Education
Despite the growing international influence of the TOEFL (Test of

English as a Foreign Language), no articles have been published on
how the test is actually developed by the Educational Testing Service
(ETS). In this article, the author, who worked in the Test Develop-
ment department at ETS from 1984 to 1987, seeks to demystify the
TOEFL reading test at both a descriptive and theoretical level. First,
the author draws on data from a case study of a reading test she
developed in 1986 to illustrate the technical rigor with which the test
is developed, and to raise questions about its theoretical adequacy.
Second, the author draws on the theory of genre proposed by Kress
(1989, 1991) to (a) illustrate how the unequal relationship between test
makers and test takers predisposes TOEFL candidates to a particular
reading of TOEFL texts; and (b) locate the TOEFL reading test
within the larger social context of the TOEFL internationally, where
competence in English means access to power. The author concludes
that the TOEFL–2000 test development team at ETS, who are cur-
rently reviewing the test, needs to address the washback effect of the
test in consultation with both ESOL teachers and TOEFL candidates
internationally.
T he global influence of the TOEFL (Test of English as a Foreign

Language) is increasing at a rapid rate. In the 1988– 1989 adminis-
tration year, 566,000 candidates registered to take the TOEFL; in
1989–1990 this figure jumped to 675,000, climbing again in 1990–
1991 to 741,000 (Educational Testing Service [ETS], 1990, 1991a,
1992). To date, however, there are no published articles on the way
the TOEFL is actually created by test developers at the Educational
Testing Service (ETS) in Princeton, New Jersey, U.S.A. Current publi-
cations on the TOEFL describe its history (Spolsky, 1990); compare it
to other major ESOL tests (Bachman, Vanniarajan, & Lynch, 1988);
or present the results of a wide variety of research questions on, among
® TOEFL is a registered trademark of Educational Testing Service.
665
others, the context bias of the TOEFL (Angoff, 1989), the TOEFL from
a communicative point of view (Duran, Canale, Penfield, Stansfield, &
Liskin-Gasparro, 1985), and TOEFL examinee characteristics (Wilson,
1982). However, as Raimes (1992) has suggested, research of this na-
ture has not succeeded in demystifying the TOEFL for many TOEFL
candidates and TESOL professionals, nor has it addressed more basic
assumptions about what the TOEFL actually tests, why, and how. Given
the fact that the TOEFL is currently undergoing review as part of the
TOEFL–2000 project (Chyn, DeVincenzi, Ross, & Webster, 1992; ETS,
1991), a close examination of TOEFL test development procedures is
timely.
In this paper, I wish to demystify the TOEFL reading test at a
descriptive and theoretical level by drawing on my practical experience
in the Test Development department at ETS from 1984 to 1987 and by
drawing on theoretical insights from recent research in genre analysis. I
will begin the paper with a brief description of the TOEFL as a whole,
introduce some basic terminology used in psychometric testing, and
describe the procedures I followed in the development of the TOEFL
reading test. Thereafter, I will use a passage I assembled, reviewed,
and pretested for a particular TOEFL reading test to illustrate how
these test development procedures are put into practice and how the
statistical analysis of individual items is used in the test development
process. Thereafter, I will critically examine some of the assumptions
I brought to the test development of the reading test, focusing on
notions of authenticity, background knowledge, and test validity. I will
then present the argument that a standardized reading test is best
understood as a specific genre which presupposes an unequal relation-
ship between test makers and test takers within the context of larger
and frequently inequitable social structures. In this view, such a rela-
tionship has a significant impact on the social meaning of texts consti-
tuted within this genre. I will use these insights from genre analysis to
help explain my case study data as well as to locate the TOEFL reading
test within the larger social context of the TOEFL internationally. I will
frame my concluding remarks with reference to possible innovations in
TOEFL test development.
THE TOEFL
The TOEFL, first developed in 1963, is used to assess the English

proficiency of candidates whose native language is not English, and
scores are used by more than 2,400 colleges and universities in the
United States and Canada to determine whether a candidate’s level of
proficiency in English is acceptable for the institution in question; the
TOEFL is also used by institutions in other countries where English is
666 TESOL QUARTERLY
the medium of instruction (ETS, 1992). ETS does not determine pass-
ing or failing scores; the decision on which students are accepted by a
particular institution is dependent on the policy makers of each individ-
ual institution. Policy varies from institution to institution, often de-
pending on the kind of program a student has applied for and whether
the institution offers supplementary courses in English. The test is
administered by ETS on a monthly basis in approximately 1,250 test
centers in 170 countries around the world (ETS, 1992). Any given
TOEFL form is used only once, and the Test Development staff at
ETS produces a new TOEFL form for each monthly administration.
The test itself has a multiple-choice format and is divided into three
sections: Section 1, Listening Comprehension; Section 2, Structure and
Written Expression; Section 3, Vocabulary and Reading Comprehen-
sion. The TOEFL Test of Written English, a short essay test, is included
in five TOEFL administrations a year. The TOEFL Policy Council,
comprising a Committee of Examiners, a Research Committee, and a
Services Committee are responsible for different areas of program
activity.
A short description of the pretesting process in the TOEFL helps to
explain how one form of the TOEFL is made equivalent to another
form. All TOEFL questions (items) are pretested on a sample TOEFL
population. The experimental or pretest items are inserted into what
is called the final form of a TOEFL. The final form contains all the
items that have already gone through the pretesting process and been
approved for use in a TOEFL administration. TOEFL candidates are
tested on the inserted pretest items in the same way that they are tested
on the final form items. (Candidates do not know which items are being
pretested.) The pretest items are scored alongside the final form items,
but the results on the pretest items are not calculated into the sample
population’s final TOEFL score. An item analysis is then conducted on
each of the pretested items. The purpose of the item analysis is to
determine the level of difficulty of each item as well as its discriminating
ability; it also helps to pinpoint potential problems with the item. When
the pretested items are ultimately used in a TOEFL final form, the
statistics from the pretested items help a test developer assemble a final
form with a level of difficulty equivalent to that of previous TOEFL
tests. Items that have little discriminating ability are revised or dis-
carded.
THE LANGUAGE OF MULTIPLE-CHOICE ITEMS
During the course of this paper, I will be using a number of psycho-

metric terms and wish to introduce this vocabulary at the outset of the
discussion. Consider the following question:
DEMYSTIFYING THE TOEFL READING TEST 667
The answer in a multiple-choice question is referred to as
(A) an item
(B) a distracter
(C) an option
(D) a key
Choices A, B, C, and D are all referred to as options. The answer to
the question, D in this case, is referred to as the key. The incorrect
answers, A, B, and C in this case, are referred to as distracters. The
question itself (The answer to a multiple-choice question is referred to as) is
called the stem. The stem plus options are collectively referred to as the
item. A reading comprehension passage combined with a number of
items is referred to as a set.
THE TOEFL READING COMPREHENSION SECTION
Because the data on which I base my discussion is drawn from the

reading section of the TOEFL, it is necessary to describe this section
of the test in some detail. In Section 3 of the TOEFL, the Vocabulary
and Reading Comprehension section, there are 30 vocabulary items
and 30 reading comprehension items, and candidates are given 45 min
to complete the section. When pretested items are included in the test,
the number of items in the section increases to 90, and the time limits
are modified. While all the vocabulary questions are discrete (individ-
ual) items, the reading comprehension section is divided into about
five reading comprehension passages with approximately six items per
reading passage. The five passages span a variety of disciplines—from
passages with a focus in the humanities to passages with a more scien-
tific focus. The TOEFL Bulletin of Information for TOEFL/TWE and
TSE, 1992–1993 (ETS, 1992) states that the Vocabulary and Reading
Comprehension section of the test “measures ability to understand
nontechnical reading matter” (p. 3) in standard written English. Candi-
dates are required to answer the multiple-choice questions on the basis
of what is “stated” or “implied” in each of the passages (p. 2 1), and they
must choose what they consider the best of the four options provided in
each item. The construct of reading that is measured in the TOEFL
reading test is not made explicit in the ETS literature.
The passages that are chosen for the reading comprehension section
are expository texts that have been drawn from academic magazines,
books, newspapers, and encyclopedias; they are not written specifically
for the TOEFL. To preserve the original quality of the texts, test
developers are discouraged from changing the author’s words, al-
though deletions are permitted. The rationale for such a policy is
that TOEFL candidates should be exposed to what is called authentic
668 TESOL QUARTERLY

language used by a variety of writers and not a customized “TOEFL
English.” However, passages that are potentially offensive are excluded
from the test. A potentially offensive passage (called sensitive at ETS)
is difficult to define and, in fact, a subject of much debate amongst
members of the TOEFL test development team. At ETS, topics on
politics, religion, or sex were considered to be sensitive topics. One of
the main criteria whereby a passage was judged to be sensitive was its
potential to create unnecessary anxiety for some candidates which
would in turn compromise their performance on the test. A passage
on birth control or abortion, for example, might create emotional stress
for some candidates and lead to poor test performance. Sexist language
(such as the use of the generic he) might be both offensive and ambigu-
ous. In addition, topics that deal with a country other than a North
American country might be perceived as giving unfair advantage to
candidates who have background knowledge from the country in ques-
tion. For example, a topic on the coffee industry in Brazil might be
perceived by candidates from other parts of the world as giving unfair
advantage to South Americans. It might also cause confusion amongst
those South Americans who have a different understanding of the
coffee industry from that of the author of the given text.
THE TEST DEVELOPMENT PROCESS
Because a new TOEFL form has to be produced each month, ETS

trains private individuals outside ETS to perform the first step of the
test development process. These individuals, known as item writers,
are given assignments to find a variety of passages of appropriate
length and to develop approximately six or seven items based on each
passage. Item writers are given detailed test specifications to facilitate
this process. The completed assignments are forwarded to ETS where
a member of the test development team takes responsibility for con-
verting the passage and items into a publishable pretest set. This was
one of the functions I performed in the Test Development department
at ETS. In the following paragraphs I will describe the procedures I
followed in the development of items for the TOEFL reading test.
While many of these procedures are standard practice at ETS, they
are not unique to that institution (see Madsen, 1983).
When I was given an item writer’s submission to develop for pre-
testing purposes, I did not initially refer to the questions that had been
developed by the item writer because I wanted to explore my own
response to the text before being influenced by the questions that the
item writer had submitted. It was only after I had completed my own

analysis of the text and created my own questions that I would refer
to the item writer’s submissions and proceed with the revision process.
First, I adopted the position of a “reader” rather than a test developer
in the initial stages of test development. I asked myself whether the
text was interesting and held my attention. Where my concentration
lapsed or I found myself rereading a portion of the text for clarifica-
tion, I recorded my observations. Where there were stylistic shifts
in the text, interesting use of metaphor or analogy, inferences and
comparisons, contradictions or ambiguities, I made a note. At this
initial stage I did not refer to the test specifications. I found that if I
tried to rework the text and test items to suit the test specifications, I
lost my reader response to the text. It is perhaps gratuitous to state
that TOEFL candidates don’t know what the test specifications are.
Thus I tried to preserve, for as long as possible, my initial responses
to the text. This enabled me to detect interesting nuances in the text
as well as ambiguities and potential difficulties.
Second, once I had responded to the text as a reader, I began to
examine the text as a test developer. One of the assumptions I made
as I developed the items is that a reader’s understanding of the text
does not terminate once the reader has read the passage once or
twice. I assumed that the longer readers work ‘with a text, the more
comprehensive their understanding of the text becomes. Indeed, a
candidate’s understanding of the test questions themselves also draws
on the candidate’s reading ability. Thus, with respect to the assessment
of reading ability, there is an artificial distinction between the reading
passage and the items, and I was sensitive to the fact that the test
questions are an integral part of the process of reading comprehension.
The main principles I followed when developing the items are summa-
rized below.
Use the Candidates’ Time Efficiently
Given the fact that the candidates have only 45 min for Section 3 of
the TOEFL, which includes a vocabulary and reading section, and
consequently less than 30 min for the reading section, one of my
primary concerns as a test developer was to ensure that I used the
candidates’ time efficiently. I tried to ensure that there were as many
items in a set as the passage could sustain; content that added little to
the coherence of the passage and was not used for testing purposes
was deleted from the text. I assumed it would be frustrating for candi-
dates to grapple with portions of text and then have little opportunity
to demonstrate their comprehension of this content. Furthermore, I
tried to use closed rather than open stems. If the stem is open, that is,
if there is no question mark at the end of the stem to consolidate the
670 TESOL QUARTERLY

thought expressed in the stem, candidates have to repeat the stem each
time an option is read in order to follow the grammatical logic of the
option. This is time-consuming and a burden on memory.
Help the Candidates Orient Themselves to the Text
Because all TOEFL reading passages have been removed from one
context and transplanted in another context, I tried to help the candi-
dates orient themselves to the content of the passages being tested.
Because TOEFL passages do not have titles, I tried to ensure that the
first item in a reading comprehension set addressed the main idea or
subject matter of the passage. I hoped this would give candidates an
organizing principle to help them in their attempts to develop a better
understanding of more detailed aspects of the text. I was aware, how-
ever, that many texts do not have a main idea at all—they may be
descriptive or narrative with little coherent argument as such. In such
cases, a stem like What does the passage mainly discuss? would be preferable
to What is the main idea of the passage? Furthermore, I assumed it would
be generally helpful to candidates if the order of items in the set
followed the order of information in the text itself. This would enable
candidates to locate tested information with relative ease and enable
them to build on their understanding of “old” or “given” information
in the text. I used line references as much as possible, provided that
they did not compromise the intent of the item (e.g., a scanning item).
I was aware, however, that items which address the prevalent tone of
the passage cannot always be directly associated with a particular line
or sentence in a passage and are best left to the end of the item set.
Make Sure the Items Are Defensible
There are two issues that pertain to the defensibility of items: the
items in combination and the items individually. With reference to the
items in combination, I tried to ensure that the items did justice to the
content and level of difficulty of the text. This is where the art of test
development was central to the test development process. I had to use
judgment and imagination to assess interesting (and uninteresting)
characteristics of the passage and develop items that gave the candi-
dates sufficient opportunity to demonstrate their understanding of
these characteristics. For example, if the text was detailed and complex,
I did not wish to underestimate the candidates’ reading ability by asking
trivial questions. In addition, I knew the same information from the
passage could not be tested twice—albeit in different forms. To do so
would place some candidates in double jeopardy. Conversely, the stem
in one item could not reveal the answer to a question in another item.

For example, if the key to a particular item was Gold is expensive, then
another item in the same set could not be worded, Gold is expensive
because . . . .
With reference to individual items, I had to ensure that the stem and
key of each item were unambiguous. Each stem had to contain as much
information as was necessary and sufficient to answer the question
posed, and each item had to have only one correct key. In addition,
the options in any one item could not logically overlap in meaning. For
example, if the key to an item were The region suffered severe drought and
one of the distracters in the item were The climatic conditions were harsh,
there would be logical overlap between the two options as the key would
be encompassed within the distracter. This would create ambiguity and
confusion for the candidates and present the candidates with two
potential keys. Nevertheless, the distracters in an item needed to have
some link to information in the text and they had to be plausible.
Implausible distracters would be eliminated by candidates and lead
them to choose the correct key by default. The distracters, however,
could not be keyable. By this I mean that the distracters could not,
potentially, be correct. This is an area that caused considerable debate
in the test development process. Was a distracter drawing candidates
because it was a good distracter or because it was ambiguous or poten-
tially keyable? Furthermore, the item could not be keyable without
reference to the passage; that is, it could not be keyable with reference
to general background knowledge. Finally, no one option could stand
out as being structurally or stylistically different from the other options.
Thus if I put a definite article the in front of key in the example given
earlier, Option D would stand out as different from the other options,
which are preceded by indefinite articles. This could attract undue
attention from candidates, who might (correctly) key it by default.
THE REVIEW PROCESS
Because it is not possible for one person to offer a definitive reading

of a text or avoid all the potential problems associated with test develop-
ment, a comprehensive review process has been developed at ETS.
There are two cornerstones of the review process: first, a series of test
reviews by approximately six different test development specialists;
second, a pretesting process as described earlier in this paper. After
the test developer is satisfied that the pretest has been adequately
prepared, the test goes for a test specialist review (TSR). The test
specialist reviewer (also TSR) is a member of the TOEFL test develop-
ment team; indeed, all test developers are reviewers and all reviewers
are test developers. The passage and items are systematically reviewed
672 TESOL QUARTERLY

by the TSR, who is simultaneously “taking” the test and reviewing it.
The reviewer notes comments in a memo and returns it to the test
developer. The test developer then works through these comments,
makes appropriate changes to the items, and then arranges a meeting
to discuss the review with the TSR. The test developer has to defend
the action taken with respect to the TSR’s suggestions.
After the test has gone through the TSR stage, it goes to the TOEFL
coordinator who examines all the items again, two editors who focus
on stylistic problems in the test, and a sensitivity reviewer who seeks to
eliminate any potentially offensive material in the test. At each of these
stages, the test is returned to the test developer for discussion and
revision. During this entire process, the history of each item can be
located by any one reviewer because all the reviews are kept in a folder
until the testis ready for publication. Once galleys of the test have been
made, it is then returned to the Test Development department for a
final review before it is published in a TOEFL test booklet.
Each member of the team had her or his own style of reviewing.
When I reviewed the test of another test developer, I had to make a
decision about those items that I thought were acceptable, those that
were definitely not acceptable, and those that had minor flaws. Thus
reviewing a test could be quite a delicate process. On the one hand, I
felt a responsibility to help create as defensible a test as possible;
on the other hand, I didn’t want to be unreasonable as this would
compromise my efforts to defend those comments I felt strongly about.
I was particularly concerned about items that I thought were ambigu-
ous, had more than one potential key, or perhaps no clear key at
all. I felt less strongly about items that were stylistically weak or had
implausible distracters. If a test developer and reviewer could not
resolve a problem that each felt strongly about, the issue was referred
to a more senior member of the team for arbitration.
A TOEFL READING COMPREHENSION CASE STUDY
In order to illustrate the debates that arose in the test development

process, and to contextualized the comments that I will make in the
latter part of this paper, I will draw on the experience I went through
as I developed one particular reading comprehension pretest for the
TOEFL in 1986. There is nothing special about the passage and the
items; they were chosen at random from a number of passages that I
had developed, in collaboration with my colleagues, and which had
gone through the pretesting and statistical analysis stages. If I had not
chosen to use the passage and items for case study purposes, they
would have been revised for the last stage of the test development

process: the final form. When I worked on the passage and the items,
I had not anticipated that they would be used for the purposes of this
exposition. Fortunately, however, I was able to locate the history of all
the reviews in the ETS archives.
The passage that I have chosen to illustrate the above discussion is
one that examines the farming of corn in the Middle West United
States. Space does not permit a full discussion of all the items in the
TSR and later reviews (though these can be found in Peirce, in press).
I have chosen two items for the purposes of illustration and discussion
because each raises a number of interesting theoretical issues about the
nature of standardized reading tests.
The Passage
Running a farm in the Middle West today is likely to be a very expensive

operation. This is particularly true in the Corn Belt, where the corn that
fattens the bulk of the country’s livestock is grown. The heart of the Corn
Belt is in Iowa, Illinois, and Indiana, and it spreads into the neighboring
(5) states as well. The soil is extremely fertile, the rainfall is abundant and well
distributed among the seasons, and there is a long, warm growing season.
All this makes the land extremely valuable, twice as valuable, in fact, as the
average farmland in the United States. When one adds to the cost of the
land the cost of livestock, seed, buildings, machinery, fuel, and fertilizer,
(lo) farming becomes a very expensive operation. Therefore many farmers are
tenants and much of the land is owned by banks, insurance companies, or
wealthy business people. These owners rent the land out to farmers, who
generally provide machinery and labor. Some farms operate on contract to
milling companies or meat-packing houses. Some large farms are actually
(15) owned by these industries. The companies buy up farms, put in managers
to run them, provide the machinery to farm them, and take the produce
for their own use. Machinery is often equipped with electric lighting to
permit round-the-clock operation.
In general, all the reviewers found the passage to be acceptable for
the purposes of the TOEFL. The only minor change took place when
businessmen was changed to the current business people (Line 12) in
keeping with a policy that encourages nonsexist language. The TSR
did raise the following two issues, and then proceeded to resolve them:
1. Line l—Do we need to say Middle West U. S.? Guess U.S. is in line 9 . . .
Really seems like there should be a paragraph cut-off somewhere, but I
guess that’s authentic language for you.
In the interests of efficiency, the comments that are made by the
various reviewers at ETS are written in an abbreviated style. Each test
developer soon develops a style that is accessible to other members of
the team. A common abbreviation however is S: This is short for I
674 TESOL QUARTERLY

suggest you do the following. In the first comment, the TSR is concerned
that the introduction to the text does not offer sufficient geographical
context for the reader, but is then satisfied that United States is men-
tioned elsewhere in the text. The second comment indicates that the
reviewer would like to have the paragraph divided up for easier read-
ing, but acknowledges TOEFL policy that discourages editorial
changes in the interests of authenticity.
The Items
Note that the items are numbered in the 70s because they have been
inserted into the final form of a TOEFL Vocabulary and Reading
Comprehension section (normally 60 items) and numbered accord-
ingly. The two items I am discussing are those that were presented to
the TSR and the TOEFL coordinator. The revised items that were
finally published in pretest form during an administration of the
TOEFL are given in the Statistical Analysis section below. The complete
set of pretested items is given in the Appendix.
Item 71 assesses a candidate’s understanding of information that is
not given explicitly in the passage but is strongly implied by the author
in Lines 2 and 3 of the text.
71. It can be inferred from the passage that in the United States corn is
(A) the least expensive food available
(B) used primarily as animal feed
(C) cut only at night
(D) used to treat certain illnesses
Phrases used to introduce this item type would include It can be
inferred from the passage that; The passage supports which of the following
conclusions?; The author implies that. Comment 2 below was written by
the TSR, Comment 3 by the coordinator.
2.71. B—I think this only refers to Middle West corn. S: “. . . that Middle
West corn is”—(A) only option that doesn’t start w. verb. S: Sold at very
low prices (for (B) could say “grown” instead of “used”)
3. 71. C + D where do they come from?

The issues referred to above relate respectively to the accuracy of
the stem, the stylistic quality of the options, and the suitability of the
distracters. The first comment indicated that the use of United States
was too vague, and I accordingly changed the stem to Middle West. The
second comment indicated that Option A was not stylistically parallel
to the other options. As I examined this option more carefully, I
became conscious of two other problems with the option. The first was

that the word expensive had been used in the previous item—as the
key—and was repeated in Item 75—again as the key. This overlap was
undesirable as it might have attracted undue attention to these options.
In addition, the use of the least made the distracter somewhat implausi-
ble: While it is plausible that corn might be an inexpensive product, it
is far less likely to be the least expensive food available. I was therefore
happy to use the reviewer’s suggested revision. The TSR’s final com-
ment was written in parentheses to indicate that she did not feel
strongly about the comment. Nevertheless, I was happy to change used
to grown since the verb wsed was already present in Option D.
The coordinator’s query (C + D where do they come from?) was an
abbreviated way of asking how these particular distracters could be
seen as plausible. I could defend Option C on the grounds that ma-
chines were used round-the-clock. It did occur to me however, that the
key to another item (Item 78) was to be revised to at night and because
I wanted to avoid overlap with this item, I proceeded to revise Option
C to cut in the morning. I thought Option D was justified because of the
repeated references to the word operation (Lines 2, 10, 18) which has
a medical connotation. (In retrospect, however, I think the option is a
weak one.) It was only after I had checked the statistics that came back
from the pretesting of this item set that I realized there was a far more
serious flaw in this item than any of the reviews had picked up. This
will be discussed in the Statistical Analysis section below.
Item 75 calls for an understanding of information that is given
explicitly in the passage:
75. According to the passage, a plot of’ farmland in an area outside the
Corn Belt as compared to a plot of land inside the Corn Belt would
probably be
(A) less expensive
(B) smaller
(C) more fertile
(D) more profitable
The answer to the question is clearly indicated in Lines 7–8 of the
passage, which states that the land inside the Corn Belt is extremely
valuable, twice as valuable in fact, as the average farmland in the United
States. The first comment below was written by the TSR, the second by
the coordinator.
4. 75. A.—(D) inferable, since the land would presumably cost less, hence
less overhead S: easier (or “more mechanized”?) I wonder if (B) isn’t
inferable, too? S: less tiring (?)
5. 75. stem very wordy—any way to simplify?
676 TESOL QUARTERLY

Significantly, the TSR was as concerned with what could be reason-
ably inferred from the passage as with what was explicitly stated in the
passage. Thus she argued that the key could not be defended simply
on the basis of the opening phrase According to the Passage. This is
generally considered a last line of defense, but is best avoided in the
interests of clarity and test quality. As I reflected on the TSR’s com-
ments, I could see why Option D might be construed as inferable and
hence confusing to candidates. I took the suggestion that I should
change the wording to more mechanized. In one of the later reviews,
however, one of the editors took exception to the use of more mechanized,
saying that it was “implausible” to call a plot of farmland mechanized.
I changed the wording again to more desirable. At the time, I did not
agree that Option B could be inferable. Logically, I believed that since
the land outside the Corn Belt was depicted as less valuable—and hence
less expensive—than that inside the Corn Belt, it was likely that the
plots of farmland outside the Corn Belt would be larger and not smaller
than those inside the Corn Belt. I took the position that the distracter
was a good one, rather than an unfair one. I decided, however, that if
another reviewer had a similar problem with Option B, I would change
it. In the final analysis, a statistical analysis would tell me if Option B
had presented problems to otherwise competent readers. In response
to the coordinator’s comment, I did simplify the stem without compro-
mising the clarity of the question. The item that was finally pretested
is presented in the Statistical Analysis section below.
Statistical Analysis
Once the passage and items had passed through all the reviews
and I had adjusted the items where I thought necessary, the test was
pretested in a TOEFL administration (see the Appendix). The results
of the pretests were forwarded to the Statistical Analysis department
at ETS, which completed an item analysis on each item and forwarded
the results to the Test Development department. The work of the test
developers at this stage was to assess the results of the item analyses,
decide which items worked, which needed to be revised, and which
needed to be discarded. How does a test developer know whether an
item has “worked”? In standardized reading tests a successful item is
one that discriminates successfully between “good” and “poor” candi-
dates. The level of difficulty of an item, on the other hand, is a function
of the percentage of candidates who choose the correct key. The latter
statistic is not difficult to compute. However, the test developer needs
to be assured that the relative difficulty of the item is a function of the
relative levels of proficiency of the candidates—as measured by the
test—and not a function of a poorly constructed or ambiguous item.

In order to determine whether an item discriminates successfully
between good and poor candidates, there needs to be a criterion (stan-
dard) by which to judge the item. The criterion that is used in the
TOEFL reading testis the candidates’ performance on Section 3 of the
TOEFL. Thus, for example, an item is considered to have “worked” if
most of the top candidates in the Vocabulary and Reading Comprehen-
sion section get the item right and if candidates who choose the correct
key are not randomly distributed through the sample. If the latter were
the case, the item would have no discriminating power. In order to
determine who the top candidates are, each candidate’s total score on
Section 3 is computed, and candidates are given percentile rankings.
On the basis of these percentile rankings, the total group is then divided
into 5 subgroups, ranging from the top 20% to the bottom 20%. Once
this information is tabulated, the performance of each individual item
is determined with respect to these 5 groups. The index of discrimina-
tion, the biserial correlation, is a correlation coefficient that measures
the extent to which candidates who score high on Section 3 as a whole
tend to get the item, right, and those who score low tend to get it wrong.
The item is working successfully if the biserial correlation is above .5.
In the passage that I had pretested, all the items except one can be
considered to have discriminated successfully between the candidates.
The biserial correlations for all the items except Item 71 were above
.5, and the item set was judged to be of average difficulty for the
TOEFL population. What then was the problem with Item 71, which
had in fact been revised considerably (see below)? In the population
on which my reading comprehension passage was pretested there were
1,280 candidates, all of whom were divided into five different groups of
256 candidates based on percentile rankings. (See Table 1, a simplified
form of an ETS item analysis.) Note that by the time Item 71 was
pretested, 9 candidates in the two weakest groups had dropped out,
which explains the slight discrepancy in the Total row at the bottom
of Table 1. A candidate who has “dropped out” is no longer attempting
TABLE 1
Item 71
678 TESOL QUARTERLY

to answer any questions; a candidate who “omits” an item is still never-
theless attempting to answer all questions and is therefore included in
the Total figures. The item analysis follows.
71. It can be inferred from the passage that Middle West corn is
(A) sold at very low prices
(B) grown primarily as animal feed
(C) cut in the morning
(D) used to treat certain illnesses
As a preliminary analysis of Item 71, compare the candidates who
chose Option A, a distractor, with those who chose Option B, the key.
A large number of candidates in the weakest group chose the distracter
A as the key (85 in all), while a smaller group (57) in the strongest
group chose the distracter A as the key. Significantly, the situation is
reversed for Option B, the key: While only 95 of the weakest candidates
correctly chose Option B as the key, 196 of the strongest candidates
correctly chose Option B as the key. A cursory glance indicates that
the item is working quite well: 52% of the candidates chose the correct
option—664 out of a total of 1,271 candidates—a moderately difficult
item. It is clear that Option A was the most attractive distracter as 459
of the total 1,271 candidates chose this option as the key; 52 candidates
chose Option C; 81 chose Option D.
Despite these apparently favorable results, there are some disturbing
issues that arise from the analysis: An uncomfortably high number of
candidates chose Option A as the key—160 of whom were in the top
two groups—and 15 candidates omitted this item—6 of whom were in
the top two groups. It was for these reasons that the biserial correlation
fell below.5 to .35, and I carefully scrutinized the item. As I reexamined
the key in Item 71 and the information in the passage from which it
was drawn, it became clear to me that the key, strictly speaking, was
inaccurate. The passage states that most of the livestock in the United
States is fed on corn that originates in the Corn Belt. This does not
imply, however, that Middle West corn is grown primarily as animal
feed. Although this may indeed be the case in the United States, such
an inference cannot be drawn from the passage per se. For example,
the corn could be grown primarily for export purposes, even though
it is the staple diet for livestock in the United States. In a TOEFL final
form, the item would have needed to be revised or excluded from the
item set.
By way of comparison, consider the statistical analysis of Item 75
(see Table 2). The item read as follows:
75. According to the passage, a plot of farmland in an area outside the Corn
Belt as compared to one inside the Corn Belt would probably be

(A) less expensive
(B) smaller
(C) more fertile
(D) more desirable
TABLE 2
Item 75
In Item 75, which had a biserial correlation of .55, only 66 candidates

in the weakest group correctly chose Option A as the key, whereas 222
in the strongest group correctly chose Option A as the key. In contrast,
62 of the weakest group incorrectly chose Option B as the key, whereas
only 21 in the strongest group incorrectly chose Option B as the key.
Similar comparisons can be drawn with Options C and D. In total, 9
candidates omitted the item, only 2 of whom were in the top two
groups. The percentage of candidates who chose the correct key was
58% (729 of a total 1,263)—thus the item was of average difficulty.
The remaining candidates were relatively evenly divided in their choice
of distracters. Nevertheless, it was still a little disturbing that 50 candi-
dates in the top two groups incorrectly chose Option B as the key. The
comment of my TSR reviewer had been validated. This distracter
would have needed revision before it reached the final form stage.
DISCUSSION
I have demonstrated in the above discussion that TOEFL test devel-

opment procedures incorporate a complex set of checks and balances
which include both qualitative and quantitative feedback. With refer-
ence to qualitative feedback, I have demonstrated that the development
of the TOEFL reading test is a collaborative effort in which test devel-
opers work with colleagues to minimize ambiguity and confusion within
individual items. Such collaboration gives test developers the opportu-
nity to subject TOEFL texts and items to alternative readings and
interpretations. With reference to quantitative feedback, I have dem-
onstrated that the statistical analysis of pretested items provides a
680 TESOL QUARTERLY

different kind of feedback for the test developer. It may confirm the
reservations that the test developer has had about a particular item; it
may draw attention to aspects of the item that have been overlooked;
it may help to resolve disputes about the suitability of an item. However,
notwithstanding the technical rigor with which the TOEFL reading test
is developed, the above discussion raises a number of critical questions
about the assumptions I brought to the test development process.
The questions I wish to address concern the three related issues of
authenticity, background knowledge, and test validity.
Authenticity
I have suggested that TOEFL test developers strive to utilize “au-

thentic” reading passages: Passages used in the TOEFL are extracted
from “real” texts, and test developers are discouraged from tampering
with these extracts. As demonstrated in the discussion above, while the
TSR wished to put in a paragraph break in the text on the United
States corn industry, she resisted the desire to do so because of the
policy on authentic language. If there were no paragraph break in the
original text, she assumed the extract would have the same meaning
as the original only if the paragraph break were omitted. I concurred
with this observation.
In retrospect, however, it is clear that this approach to authenticity
is flawed, both at the level of textual meaning and at the level of social
meaning. First, at the level of textual meaning: If a passage is extracted
from a larger text, and readers have no access to this larger text—the
type of text, the title, the author, the intended audience, the date of
publication, the publisher—the extract has little resemblance to its
authentic textual origins. Furthermore, if parts of this extract are
deleted for test development purposes, the extract has even less claim
to authenticity. Second, at the level of social meaning: As argued by
educators who adopt a poststructuralist approach to text (Belsey, 1980;
Hill & Parry, 1992; Morgan, 1987; Peirce, 1991; Simon, 1992—to
name a few), the meaning of a text is not only derived from what an
author “demonstrates” in a text but also from the conditions under
which the text is received. In poststructuralist theory, “meaning” there-
fore refers not only to the sentence-level meaning of a text but also to
its social meaning. The social meaning of a text is constituted at the
intersection between the words in the text, the reader’s investment in
the text, and the particular space/time location in which the text is read.
In this view, using the author’s words in a standardized reading test
does not guarantee that the text will have the same meaning as the
original from which it was extracted, notwithstanding attempts at au-

thenticity; its meaning derives from the interaction between the text,
the test taker, and the testing situation in which the text is read.
By way of illustration, one need only examine the text I have used
in this article. Under what conditions might I, as a student in the “real”
world, be interested in the United States corn industry? If I were
waiting for the campus doctor and picked up the text to pass the time,
I would approach it from one perspective; if I needed to read the text
for an oral presentation in a business course, I would approach it from
another perspective; if I were taking the TOEFL exam, my approach,
yet again, would be radically different. On each of these occasions, the
value ascribed to the text—the social meaning of the text—would be
different because the social conditions under which I was reading it
and the purpose for which I was reading it would vary considerably:
If I were passing time in a doctor’s office, the points that I would find
salient in the text would be mediated by my own personal interest in
the topic and perhaps by a certain degree of anxiety about the condition
of my health at that point in time. If I were reading the text for an oral
presentation in a business course, the salient points in the text would
be mediated by the questions that I brought to the text and my percep-
tion of what my fellow students and instructor would find interesting
in the larger context of the business course. If I were reading the text
for a TOEFL exam, the points that I would find salient would be
mediated almost entirely by the questions that the test maker had
formulated. The extent to which this unequal test maker/test taker
relationship predisposes the TOEFL candidates to a particular reading
of TOEFL texts is discussed further in the Genre Analysis section
below.
Background Knowledge
The second question I wish to raise concerns the place of background

knowledge in standardized reading tests. I raise this question because,
although I have stated that TOEFL test developers strive not to test a
candidate’s background knowledge, a TESOL Quarterly reviewer
claimed that s/he could answer Question 75 without reference to the
text. While this issue deserves fuller attention (see, e.g., Clapham,
1991), I will address only two concerns that arise from the reviewer’s
observation. First, if a TOEFL candidate answers this question correctly
(and the key is not randomly chosen) one of the following assumptions
can be made: (a) The candidate (call him A) knows nothing about real
estate prices in the United States but has read and understood both the
passage and the question. (b) The candidate (call her B) has background
knowledge about real estate prices in the United States and has read
and understood the question alone. In other words, whether or not a
682 TESOL QUARTERLY

candidate has background knowledge about real estate prices in the
United States, the candidate still has to have sufficient command of
the English language to understand the question in order to answer it
correctly. I made a similar point earlier when I argued that a distinction
between the passage and the questions is an artificial one—both the
passage and the questions test a candidate’s reading ability. If the
language of the question is easier than the language of the text, Candi-
date B would have an advantage over Candidate A with respect to
background knowledge and time. However, if the language of the
question is no simpler than the language of the text, then the only
advantage that Candidate B would have over Candidate A would be a
time advantage. That is, she would not have to take up valuable time
to consult the text.
Second, while I do not wish to trivialize a time advantage in a test
situation, I think there is a more fundamental issue at stake here which
pertains to the nature of the testing situation. Hill and Parry (1992)
have argued convincingly that in standardized reading tests, “personal
knowledge must be continually suppressed for fear of making an inap-
propriate response” (p. 458). When candidates come to the test situa-
tion, they assume that the background knowledge they already have
must not interfere with the knowledge that is “demonstrated” in the
texts that they will be required to read: Candidate B’s source of infor-
mation about real estate prices might be different from that of the
author, or the text might be out of date. It would not be in Candidate
B’s best interests to trust her own judgment and knowledge in order
to answer Question 75. TOEFL test developers are well aware of this
dilemma, which is why all questions of this nature are prefaced by
phrases such as According to the passage. As Hill and Parry argue, it is
indeed ironic that readers “are encouraged to hold separate the very
knowledge which is crucial to their effective engagement with text”
(p. 458).
Test Validity
Simply put, a valid test is one that measures what it intends to

measure (Henning, 1987). The TOEFL claims to measure a candidate’s
ability to understand nontechnical reading matter. To what extent can
this claim be upheld? I have argued that attempts at authenticity are
flawed. I have argued that if a candidate has background knowledge
of reading matter that is actually tested in a TOEFL reading test,
it may not be in the candidate’s best interests to use it. I have also
demonstrated that the acceptability of an item is determined with
reference to performance of the candidates on Section 3 as a whole.
In other words, the quality of TOEFL reading test Item X is a function

of the quality of the sum total of all the items in the TOEFL section of
which Item X is one part. The only conclusion that can confidently be
drawn is that if a candidate performs well on the TOEFL reading test,
the TOEFL candidate is a good reader of TOEFL tests. Thus when I
judged the acceptability of the items I had pretested, and judged all of
them, apart from Item 75, to successfully discriminate between “good”
and “poor” TOEFL candidates, I did soon the basis of a self-referential
criterion, rather than an independent measure of reading ability. This
has implications for the validity of the TOEFL reading test. While the
TOEFL can confidently claim to measure a candidate’s ability to read
nontechnical reading matter in a TOEFL test, the extent to which these
measures apply to preparation for an undergraduate oral presentation,
the doctor’s waiting room, and the countless other occasions in which
the candidate reads nontechnical reading matter must be called into
question.
GENRE ANALYSIS AND THE TOEFL READING TEST
Having critically examined some of the assumptions I brought to the

development of the TOEFL reading test, I wish to argue that the
conception of a standardized reading test as a particular genre is a
theoretically useful lens through which to examine my case study data
as well as the location of the TOEFL reading test within the larger
social context of the TOEFL internationally. While genre analysis has
been utilized in a wide variety of fields such as literary studies, linguis-
tics, and rhetoric (see Swales, 1990), it has not as yet made a significant
impact on the field of language testing. Following Kress (1989, 1991),
who draws on the poststructuralist theory of Foucault (1977), the con-
cept of genre I wish to use in this paper is that of genre as “text”-
either oral or written—constituted within and by a specific social occa-
sion which has a conventionalized structure and which functions within
the context of larger institutional and social processes.
In Kress’s formulation, the social occasions which constitute a genre
may be formulaic and ritualized, such as a wedding or committee
meeting, or less ritualized, such as a casual conversation. The important
point is that the conventionalized forms of these occasions and the
organization, purpose, and intentions of social participants within the
occasion give rise to the meanings associated with the specific genre,
whether it be a tutorial, interview or—as I will argue—a standardized
reading test. Furthermore, Kress (1989) has demonstrated that increas-
ing difference in the power relations between participants in an interac-
tion has a particular effect on the social meaning of the texts within a
particular genre. In essence, in genres in which there is great power
684 TESOL QUARTERLY

difference between the social participants, the mechanism of interaction,
the conventionalized form of the genre, is most foregrounded, while
the substance of the interaction, the content, is least foregrounded. The
conception of genre that Kress is proposing, which foregrounds the
centrality of power within a particular social occasion in the context of
larger social processes, is a departure from more conventional ap-
proaches to genre analysis. The latter tend to present genres as uncon-
troversial forms of texts such as sonnets, term papers, and interviews,
with little reference to the larger and frequently inequitable social
structures in which these texts are constituted.
While test makers have generally assumed that a standardized read-
ing test is an aberration in the “real” world, I wish to argue that it is no
less authentic a social situation than an oral presentation or a visit to a
doctor. In a standardized reading test, the value ascribed to texts within
this genre and the meaning that is constructed is associated with a
ritualized social occasion in which participants share a common pur-
pose and set of expectations. The social occasion is characterized by
strict time limits in which test takers have little control over the rate of
flow of information in the activity—what Peirce, Swain, and Hart (in
press) refer to as the “locus of control” in the activity. The test takers
are expected to be silent at all times, respect rigorous proctoring proce-
dures, and read the text in solitude. As Hill and Parry (1992) argue,
social behavior in a testing situation is tantamount to cheating. Both
test makers and test takers recognize that the purpose of the test is
to discriminate between readers of varying levels of proficiency with
reference to a criterion established a priori by the test makers. The
expectations are that the background knowledge of the test takers has
little relevance to the items being tested and that the test makers decide
what an acceptable reading of the text should be. Thus the relationship
between the test makers and the test takers, a manifestly unequal one,
has a direct bearing on the social meaning ascribed to texts in the
standardized reading test. Furthermore, the standardized reading test
must be understood with reference to larger social processes in which
test takers have unequal access to material, educational, and linguistic
resources: While some test takers have comfortable homes where liter-
acy material is commonplace, superior educational opportunities, and
familiarity with the conventions of standardized reading tests, other
test takers have no electricity in their homes, limited access to literacy
material, and few educational opportunities.
The conception of the standardized reading test as genre helps
explain data from the case study described above. Consider for exam-
ple the statistical analysis of Item 71. Item 71 was a flawed item: There
was no key. Nevertheless, over 66% of the candidates in the top two
categories (339 of 512 candidates) chose the intended key. Apparently,

these candidates knew the conventions of standardized reading tests:
One of the options was intended to be correct; their task was to deter-
mine which one of the four options I, as the test maker, had in mind.
Although the words in the question were inappropriate, the test takers
sought to understand what I meant, not what I said: In other words,
how did I, as the test maker, intend the TOEFL candidates to “read”
the text? The unequal relationship between me as a test maker and the
candidates as test takers had a direct bearing on the social meaning of
the text. The test takers’ personal investment in the test mitigated
against their objecting to the poor quality of the item. In such a social
situation, they had to conform to the conventionalized rules of the test,
or resist at great personal cost. It is significant that of the 512 candidates
in the top two categories who examined this question, only 6 exercised
the dubious right to omit responding to the flawed question.
Furthermore, the conception of the standardized reading test as
genre leads to an examination of the location of the TOEFL reading
test within the larger context of the TOEFL internationally. In this
spirit, the point that needs to be stressed is that the TOEFL is not just
any standardized test—it is the largest test of English in a world that
has adopted English as its lingua franca. For this reason, people who
are considered to have command of the English language not only have
linguistic versatility but educational, economic, and political power
(Peirce, 1989; Pennycook, 1992; Phillipson, 1992; Tollefson, 1991). A
test that determines who has command of the English language has
inordinate power to influence not only the educational future of indi-
viduals but the political future of nations. This is the larger social
context of the TOEFL, the link between TOEFL test makers, TOEFL
test takers, and larger social processes in which competence in English
means access to power. Thus the social meaning of texts used in the
TOEFL are constituted with reference to the test takers’ personal
investment in a uniquely powerful standardized test.
CONCLUSION: IMPLICATIONS FOR TOEFL—2000
Given the location of the TOEFL with respect to the increasing

power of the English language internationally, the challenge for the
TOEFL–2000 project at ETS is to determine whose interests the
TOEFL serves and how the TOEFL can best serve those interests. The
TOEFL–2000 committee has stated that the aim of the TOEFL is not
simply to serve the interests of admissions officers at United States and
Canadian universities. It has indicated that ETS is committed to serving
the ESL/EFL community, and that it wishes to “better reflect current
understanding of language and communication and second language
686 TESOL QUARTERLY

learning and testing” (ETS, 199lc). Given the expressed interest in
serving the ESL/EFL community, I would like to suggest that revisions
to the TOEFL be made with reference to a consideration of the “wash-
back” effect of the TOEFL.
The washback effect of a test, sometimes referred to as the systemic
validity of a test (Alderson & Wall, 1992), refers to the impact of a test
on classroom pedagogy, curriculum development, and educational
policy. Swain (1985) indicates that a concern with washback was a
guiding principle in the development of communicative language tests
for French immersion programs in Canada. Wesche (1987) states that
interest in positive washback was of primary concern in the develop-
ment of the Ontario Test of English as a Second Language. Recent
research by Shohamy (1992) has found that the introduction of three
national language tests in Israel has had a dramatic impact on classroom
pedagogy and educational policy in the country. With reference to the
TOEFL, however, reports on the washback effect of the test remain
anecdotal. Its effect can only be extrapolated from the vibrant industry
in TOEFL preparation books.
When the TOEFL is revised as part of the TOEFL–2000 project,
TOEFL test developers and TOEFL consultants should take the oppor-
tunity to consider what kind of impact the TOEFL has had on class-
room pedagogy and educational policy not only in North America but
in some of the 170 countries in which the TOEFL is administered.
ESL teachers internationally should be consulted to determine what
construct of reading should be assessed in the TOEFL reading test and
how the test can best serve the interests of their programs and the
needs of their students. TOEFL candidates should be consulted to
determine how preparation for the TOEFL could promote language
learning as well as improved test-taking strategies, and how the anxiety
associated with the TOEFL testing situation could be alleviated. The
outcome of such research has the potential to transform the current
relationship between TOEFL test makers and TOEFL test takers—
between those who “have” and “have not” command of the English
language internationally.
ACKNOWLEDGMENTS
I would like to acknowledge former ETS colleagues in the Languages Group of

the Test Development department, who contributed in many ways to the develop-
ment of this paper. I would also like to thank Merrill Swain, Kathleen Troy, Kate
Parry, David Mendelsohn, and Sandra Silberstein for their insightful comments
on an earlier draft of the paper; two TESOL Quarterly reviewers for their rigorous
critiques; ETS for giving me access to TOEFL archives; and the Social Sciences
and Humanities Research Council of Canada for its financial support.

THE AUTHOR
Bonny N. Peirce, a PhD candidate in the Modern Language Centre, Ontario

Institute for Studies in Education, Canada, is interested in the relationship between
social theory and the practical concerns raised by second language learners, teach-
ers, and testers internationally, She is a corecipient of the 1990 Malkemes Prize
for her 1989 TESOL Quarterly article, “Toward a Pedagogy of Possibility in the
Teaching of English Internationally: People’s English in South Africa.”
REFERENCES
Alderson, J. C., & Wall, D. (1992). Does washback exist? Paper presented at the 14th
Annual Language Testing Research Colloquium, Vancouver, Canada.
Angoff, W. (1989). Context bias in the Test of English as a Foreign Language (TOEFL
Research Rep. No. 29). Princeton, NJ: Educational Testing Service.
Bachman, L., Vanniarajan, A. K. S., & Lynch, B. (1988). Task and ability analysis
as a basis for examining content and construct comparability in two EFL profi-
ciency test batteries. Language Testing, 5, 128–159.
Belsey, C. (1980). Critical practice. London: Methuen.
Chyn, S., DeVincenzi, F., Ross, J., & Webster, R. (1992). TOEFL–2000: Update.
Paper presented at the 26th Annual TESOL Convention, Vancouver, Canada.
Clapham, C. (1991). The effect of academic discipline on reading test performance.Paper
presented at the 13th Annual Language Testing Research Colloquium,
Princeton, NJ.
Duran, R. P., Canale, M., Penfield, J., Stansfield, C., & Liskin-Gasparro, J. E.
(1985). TOEFL from a communicative viewpoint of language Proficiency(TOEFL
Research Rep. No. 17). Princeton, NJ: Educational Testing Service.
Educational Testing Service. (1990). Bulletin of information for TOEFL and TSE,
1990–91. Princeton NJ: Author.
Educational Testing Service. (199 la). Bulletin of Information for TOEFL and TSE,
1991–92. Princeton NJ: Author.
Educational Testing Service. (1991b, Spring). Newsline. Princeton, NJ: Author.
Educational Testing Service. (1991c). TOEFL–2000: Planning for change. Princeton,
NJ: Author.
Educational Testing Service. (1992). Bulletin of information for TOEFL/TWE and
TSE, 1992–93. Princeton, NJ: Author.
Foucault, M. ( 1977). What is an author? In D. Bouchard (Ed.), Language, counter-
memory, practice. Ithaca, NY: Cornell University Press.
Henning, G. (1987). A guide to language testing. Cambridge, MA: Newbury House.
Hill, C., & Parry, K. (1992). The test at the gate: Models of literacy. TESOL
Quarterly, 24 (3), 433-461.
Kress, G. R. (1989). Linguistic processes in sociocultural practice. Oxford: Oxford
University Press.
Kress, G. R. (1991). Critical discourse analysis. Annual Review of Applied Linguistics,
1990, 11, 84–99.
Madsen, H. ( 1983). Techniques in testing. New York: Oxford University Press.
Morgan, R. (1987). Three dreams of language. College English, 49, 449–458.
Peirce, B. N. (1989). Toward a pedagogy of possibility in the teaching of English
internationally: People’s English in South Africa. TESOL Quarterly, 23 (3), 40l–
420.
688 TESOL QUARTERLY

Peirce, B. N. (1991). Review of the TOEFL Test of Written English (TWE) Scoring
Guide. TESOL Quarterly, 25 (l), 159–163.
Peirce, B. N. (in press), The development of a TOEFL reading test. In C. Hill &
K. Parry (Eds.), Testing and assessment: International perspectives on English literacy.
Harlow, England: Longman.
Peirce, B. N., Swain, M., & Hart; D. (in press). Self-assessment, French immersion,
and locus of control. Applied Linguistics.
Pennycook, A. (1992), The cultural politics of teaching English in the world. Unpub-
lished doctoral dissertation. Ontario Institute for Studies in Education/Univer-
sity of Toronto.
Phillipson, R. (1992). Linguistic imperialism. Oxford: Oxford University Press.
Raimes, A. (1992). Comments on “The TOEFL Test of Written English: Causes
for concern”. The author responds to Traugott, Dunkel, and Carrell.TESOL
Quarterly, 26 (l), 186-190.
Simon, R. I. ( 1992). Beyond the racist text. Teaching against the grain: Texts for a pedagogy
of possibility. Toronto: OISE Press.
Spolsky, B. (1990). The prehistory of TOEFL. Language Testing, 7, 98–118.
Shohamy, E. (1992). The power of tests: A study on the impact of language tests on teaching
and learning. Paper presented at the 14th Annual Language Testing Research
Colloquium, Vancouver, Canada.
Swain, M. (1985). Large-scale communicative language testing: A case study. In S.
Savignon & M. Burns (Eds.), Initiatives in communicative language teaching.Read-
ing, MA: Addison-Wesley.
Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge:
Cambridge University Press.
Tollefson,J.W.(1991). Planning language, Planning inequality. New York: Longman.
Wesche, M. (1987) Second language performance testing: The Ontario Test of
ESL as an example. Language Testing, 4, 28–47.
Wilson, K. (1 982). A comparative analysis of TOEFL examinee characteristics, 1977–1979
(TOEFL Research Rep. No. 11). Princeton, NJ: Educational Testing Service.

APPENDIX
Questions 70–78
Running a farm in the Middle West today is likely to be a very expensive
operation. This is particularly true in the Corn Belt, where the corn that fattens
the bulk of the country’s livestock is grown. The heart of the Corn Belt is in Iowa,
Illinois, and Indiana, and it spreads into the neighboring states as well. The soil is
(5) extremely fertile, the rainfall is abundant and well-distributed among the seasons,
and there is a long, warm growing season. AH this makes the land extremely valuable,
twice as valuable, in fact, as the average farmland in the United States. When one adds
to the cost of the land the cost of livestock, seed, buildings, machinery, fuel, and
fertilizer, farming becomes a very expensive operation. Therefore many farmers are
(10) tenants and much of the land is owned by banks, insurance companies, or wealthy
business people. These owners rent the land out to farmers, who generally provide
machinery and labor. Some farms operate on contract to milling companies or
meat-packing houses. Some large farms are actually owned by these industries.
The companies buy up farms, put in managers to run them, provide the machinery
(15) to farm them, and take the produce for their own use. Machinery is often equipped
with electric lighting to permit round-the-clock operation.
70. What is the author’s main point? 74. The author mentions all of the
following as features of the Corn Belt
(A) It is difficult to raise cattle. EXCEPT
(B) Machinery is essential to today’s
farming. (A) rich soil
(C) Corn can grow only in certain (B) warm weather
climates. (C) cheap labor
(D) It is expensive to farm in the (D) plentiful rainfall
Middle West.
75. According to the passage, a plot of
71. It can be inferred from the passage farmland in an area outside the Corn
that Middle West corn is Belt as compared to one inside the
Corn Belt would probably be
(A) sold at very low prices
(B) grown primarily as animal feed (A) less expensive
(C) cut in the morning (B) smaller
(D) used to treat certain illnesses (C) more fertile
(D) more desirable
72. In line 3, the word “heart” could best
be replaced by which of the following? 76. As described in the passage, which of
(A) spirit the following is most clearly analogous
m the relationship between insurance
(B) courage company and tenant farmer?
(C) cause
(D) center (A) Doctor and patient
(B) Factory owner and worker
73. It can be inferred from the passage (C) Manufacturer
that the region known as the Corn Belt (D) Business executive and secretary
is so named because it
77. The word “their” in line 15 refers to
(A) is shaped like an ear of corn
(B) resembles a long yellow belt (A) companies
(C) grows most of the nation’s corn (B) farms
(D) provides the livestock hides for (C) managers
leather belts (D) machinery
690 TESOL QUARTERLY

78. According to the passage, some
machinery is equipped with electric
lighting so that it can be used
(A) indoors
(B) in the fog
(C) at night
(D) while it rains
Note. From the Test of English as a Foreign Language (Form 3IATF1O), 1986. Princeton,
NJ: Educational Testing Service. Copyright 1986 by Educational Testing Service.
Reprinted by permission of Educational Testing Service.

Planning, Discourse Marking, and the

Comprehensibility of International
Teaching Assistants
JESSICA WILLIAMS
University of Illinois
An examination of the planned and unplanned production of 24

nonnative-speaking teaching assistants indicates that there is a greater
difference between the 2 conditions in the degree of discourse mark-
ing than in grammatical accuracy. In planned production, discourse
moves were more likely to be marked overtly and explicitly than in
unplanned production, whereas the level of syntactic and morpholog-
ical errors differed only slightly. This increased marking in the
planned condition appeared to contribute significantly to compre-
hensibility, suggesting that explicit marking of discourse structure
is a crucial element of the comprehensibility of nonnative-speaker
production.
A n increasing number of universities have come to depend on

nonnative-speaking (NNS) graduate students to teach introduc-
tory undergraduate courses. There has been a simultaneous rise in the
number of complaints from undergraduates, their parents, and other
members of the university community regarding the comprehensibility
of the speech of these international teaching assistants (ITAs). As a
result, more and more TESOL professionals have been asked to de-
velop programs that will improve the ITAs’ communicative skills and
teaching effectiveness. In order to develop successful programs, how-
ever, it is first necessary to establish what it is about ITA discourse that
often renders it incomprehensible to the undergraduates toward whom
it is directed.
The first thing that undergraduates often react to in their ITAs’
speech is nonnativelike pronunciation, and clearly this is an important
factor as previous research into ITA production has shown (Anderson-
Hsieh & Koehler, 1988; Carrier et al., 1991; Gallego, 1990). However,
it is not the only issue. Interviews with undergraduates who were asked
to rate ITA comprehensibility (Williams, 1990) revealed that initially
693
ITA pronunciation is often a problem but may diminish in impor-
tance over time. Many of the undergraduates who had had an ITA
over an entire term maintained that the ITA’s accent was an obstacle
in the beginning but that they eventually adjusted to it, making the
appropriate phonological substitutions and even reporting that they
became accustomed to systematic grammatical errors. This suggests
that there may be important aspects of the comprehensibility problem
other than pronunciation and grammar. This study will focus on
one such area: the contributions which discourse marking makes to
comprehensibility.
COMPREHENSIBILITY OF ITA DISCOURSE
The “ITA problem” is by now well known to TESOL professionals

and to undergraduates alike. There have been several notable attempts
to determine why some ITAs are so difficult to understand. Rounds
(1987) notes in particular that in comparison to native-speaker (NS)
TAs, ITAs frequently do not adequately elaborate the key points of
their presentations. They often do not name important steps, mark
junctures explicitly, or make cohesive links between ideas. Williams,
Barnes, and Finger (1987) came to similar conclusions, finding that
ITAs often do not repeat or rephrase important points; digress from
the main line of thought and move on to new topics without warning;
omit discourse marking to overtly frame illustrations, examples, and
axioms; and do not summarize material. It should not be surprising
that listeners have trouble comprehending when all of these aspects of
discourse structure are left unmarked. According to Tyler (1988),
ITAs either do not use or they misuse various lexical, syntactic, and
prosodic cues on which NS listeners depend to interpret discourse.
Taken together, these omissions or misuses can seriously reduce com-
prehensibility. Tyler maintains that unsuccessful ITAs consistently do
not orient their listeners adequately to the relative importance of ideas
as well as to how they are linked to one another. Tyler (1989) tested
this notion, using undergraduate judges and found that the increased
and accurate use of discourse markers greatly increased comprehensi-
bility scores. She compared undergraduate evaluations of the actual
production of ITAs with their evaluations of a version that contained
the same information but had been altered by inserting and changing
various macro- and microcues (see Chaudron & Richards, 1986).
Research on the effectiveness of NS explanations is reported by
Brown (1978). In a review of the relevant literature, he reports that
694 TESOL QUARTERLY

good explanations usually involved task orientation statements, such as
“Now, let’s look closely at . . .” [Furthermore,] successful explanations
contained signposts such as “There are three main areas. First . . .” They
also contain statements linking various elements of the explanation, such
as, “So far, we have looked at . . . Now.” (p. 11)
This kind of marking acts as an indicator or speaker advance of
overall planning (Faerch & Kasper, 1983). Such markings are also the
ones that were missing or misused in the ITA discourse in the studies
named above. Chaudron and Richards (1986), in a study of second
language learner (SLL) comprehension of university lectures, used the
term macromarkers to describe words or phrases which “are explicit
expressions of the planning of lecture information” (p. 123). They
found them to be an important factor in facilitating SLL listening
comprehension. However, because the listeners in the case of the pres-
ent study are NS undergraduates, the results of the Chaudron and
Richards research, which used SLL subjects, can only be generalized
with caution.
PLANNED VERSUS UNPLANNED PRODUCTION
In order to determine the effect of using discourse marking on

comprehensibility, it is necessary to examine the production of ITAs
with and without such marking. In an effort to address this issue in
context, that is, to examine the marked and unmarked discourse of
ITAs in naturally occurring production, this study compares the
planned and unplanned explanations of ITAs. It has already been
noted that the use of such marking may be related to the degree
of planning involved in production (Crookes, 1989). It is perhaps
belaboring the obvious to assert that planning has a significant effect
on oral production. A number of studies attest to this, in the production
of both NSS (Danielewicz, 1984; Givón, 1979; Ochs, 1979) and NNSs
(Crookes, 1989; Ellis, 1987; Tomlin, 1984). A comprehensive review
of research on the effect of planning on both NS and NNS production
appears in Crookes (1988).
Much of the work in this area of second language acquisition research
involves the construct attention to speech, the central idea being that
unplanned production requires less attention than planned produc-
tion. The validity of this construct has been debated in both sociolin-
guistics and second language acquisition research (Bell, 1984; Preston
1989; Rampton, 1987; Sato, 1985; Wolfson, 1976). One of the greatest
difficulties in using attention to speech as a variable is ascertaining what
ITA PLANNING AND DISCOURSE MARKING 695

sorts of tasks demand the most attention. Sato (1985) questions the
unitary nature of the notion attention, pointing out that certain tasks
“require a great deal of attention, but this attention must be paid,
not simply to language form but also to other demands of real-time
discourse production: recall and encoding of rhetorical structure, lexi-
cal items, clause sequencing, etc.” (p. 195). In other words, increased
attention need not necessarily lead to increased accuracy in the use of
grammatical forms.
Ellis (1987) maintains that the effect of increased attention to form
and of increased planning time are separable, citing the work of Huls-
tijn and Hulstijn (1984), who found that time pressure alone had no
effect on accuracy in the use of two Dutch word-order rules, whereas
focus on form increased accuracy significantly. Ellis (1989) examined
the effect of planning time on accuracy in grammatical morphology.
He found that morphological accuracy was generally the highest in
tasks in which speakers were given more time to plan. Tasks with
greater time pressure showed more variation. Crookes (1989), on the
other hand, found that in the planned condition, NNSs produced more
complex speech and a greater variety of lexis than in the unplanned
condition but that accuracy in the 2 conditions was not significantly
different. In investigating the organization of discourse, he found that
there was greater use of discourse markers in the planned condition
in one of his experimental tasks. The present study does not attempt
to separate these issues of planning opportunity and attention to form,
however. In the planned condition, NNS subjects were given both
extensive planning time and were asked to concentrate on specific
aspects of their presentations.
Research in psycholinguistics and cognitive science suggests that
there are different kinds of planning. Within the field of second lan-
guage production, the work of Faerch and Kasper (1983) and Lennon
(1984), among others, points to differences between long-range macro-
planning, on the one hand, and more local microplanning, on the
other. The first affects overall semantic and syntactic organization of
discourse; it is more subject to planning. The second affects local
organization and links between propositions as well as lexical selection
and tends to be mapped out as the speaker goes along. This study
focuses on the former.
Planning is used as the independent variable in this study, in an
attempt to determine the effect of the use of discourse markers, which
have been associated with planning, on the comprehensibility of NNS
production. NSs are included in the study to ascertain whether such
differences are characteristic of the production of NSs and NNSs alike
or whether the effect of planning is of particular importance for the
comprehensibility of NNS production. It was hypothesized that
696 TESOL QUARTERLY

la. planned ITA production would contain more overt marking of
discourse function than unplanned ITA production (cf. Crookes,
1989);
lb. unplanned ITA production would contain more unmarked key
statements than planned production (a key statement is one that
is central to the structure of the argument or explanation, Brown,
1978);
2a. NS production would contain more overt marking as to discourse
function than that of ITAs (cf. Tyler, 1988; Williams, Barnes, &
Finger, 1987).
In both 1a and 2a, “more overt marking” is taken to mean a greater
absolute number of markings as well as more explicit marking. In light
of this definition, it was hypothesized that
2b. ITA production would contain more unmarked key statements
than that of the NSs (cf. Tyler, 1988; Williams et al., 1987).
Next, the link between discourse marking and comprehensibility
needs to be established. To this end, it was hypothesized that
3. comprehensibility would increase with more overt marking of dis-
course function (cf. Chaudron & Richards, 1986; Tyler, 1989).
Finally, following the work of Crookes (1989), an aim of this study
was to determine whether other differences in the planned versus
unplanned condition, such as syntactic or morphological accuracy and
complexity, might account for any differences in comprehensibility.
THE STUDY
Subjects
The data in this study were collected over a 2-year period from 24
teaching assistants in various university departments at a major U.S.
university. Eight were native Korean speakers and 14 were native
Mandarin speakers. All had studied English formally for between 5
and 12 years. They had been in the United States for between 3 months
and 4 years. During the time of the study, they were all participating
in a preparation course for ITAs. Also included in this study were 5
native-speaking teaching assistants (NSTAs). These baseline data were
necessary in order to determine the effect of planning on the use of
discourse marking and comprehensibility in general before going on
to make claims about its effect on NNS production.

Task
Each of the TAs was videotaped on two separate occasions, 2 weeks

apart as part of a 10-week ITA preparation course. On the first occa-
sion, the participants were permitted to choose their own topic. They
were asked to explain a concept or specific problem that would be
covered during a first-year introductory course in their field. They
were given a week to prepare their presentations. They were allowed
to bring note cards, but reading was not permitted. In the second
instance, the TAs submitted a list of 10 topics, also from introductory
courses in their fields. The instructor chose from among them, giving
each TA approximately 3 min to plan the presentation. Thus, in the
first task, planning was both possible and encouraged, whereas in the
second task, little planning was possible. In each case they were given
7 to 8 min to speak. The subjects also submitted what they considered
to be the main idea of their presentation, to be compared later with
that named by independent raters.
Instruction in the use of discourse markers and effective packaging
of information was a major focus in the ITA preparation course which
preceded the data collection. The ITAs were told that these tasks were
tests, and they were aware that a good performance would include
the accurate and explicit use of these markers. This usage had been
practiced previously in more abbreviated exercises and activities in
class by all the ITAs in this study and participants were reminded of
its importance prior to their presentations. It is likely then that they
used discourse markers to the extent that they were able within the
constraints of the tasks.
The NS task was somewhat different; therefore, these participants
cannot be called a control group. The NS data consist of segments taken
from actual classes. This corpus also includes instances of relatively
planned and unplanned speech, but NSTAs’ tasks cannot be viewed as
comparable to the ITAs’ tasks. The planned speech consisted of NSTA
presentations of problem sets and reviews of lecture material. All of
the NSTAs spoke primarily from notes, though it is quite likely that
some portions of their presentations were also extemporaneous. The
unplanned speech for the NSTAs occurred when a student, by prear-
rangement with the researcher, asked the NSTA to go over problems
not assigned specifically for that day or to review material from a
previous unit. However, these segments were not as long as the un-
planned presentations of the ITAs. NS participants in the study re-
ceived no instruction of any kind but were interviewed after their
presentation regarding the extent of their planning. All reported gen-
eral planning of ideas and explanations of specific problems, and using
notes. All said they did not plan the actual language they would use.
698 TESOL QUARTERLY

The NSTA and ITA tasks are clearly different, but topics, length of
presentation, and planning opportunity make the comparison between
the 2 groups a reasonable one.
Data Analysis
Discourse Marking
The data analysis was carried out in several steps. Before turning to
the question of comprehensibility, it was first necessary to establish
the effect of planning on the presence and explicitness of discourse
marking. The use of Chaudron and Richards’ (1986) discourse cues,
specifically macrocues, was the focus of this investigation, in particular,
the level of explicit marking of key statements in ITA and NSTA
explanations. As noted above, a key statement is one that is central to
the structure of the argument or explanation. One way a key statement
may be marked is by indicating speaker intention, as in Example 1:
1. Today I want to spend a few minute to explain what trigonometric function
are.
Another form of marking is the identification of the actual function
of the statement within the explanation, as in Example 2:
2. The second element of physiology is study about transport system. For
example, our heart will transport blood to all the part of our body.
Some statements may be marked for both speaker intention and
function in the explanation, as in Example 3:
3. Now I’d like to give you the definition of molecule.
In contrast, some statements may go unmarked, as in Examples 4
and 5:
4. This cotangent involving adjacent and opposite.
5. This the change of the chromosome in cell division.

In fact, Example 4 was meant to be a definition or at least instructions
for using the trigonometric function. Example 5 was meant as a sum-
mary of the previous material.
There are various types of key statements contained in these presen-
tations. The following 6 types of statements were examined in this
study: definition, example/illustration, restatement/rephrasing, identi-
fication/naming, introduction/new topic, and summary/review. The
coding of statement type as well as the discourse cues was done by the
author and a graduate student. Disputed items, which represented

9% of the corpus, were removed from the analysis. Examples of key
statements included in this study are given below. Some are overtly
marked, containing reference to the discourse function itself, whereas
others show less explicit marking.
6. Definition: I give you the definition of instantaneous velocity. [a definition
follows]
7. Example: We know in the early 1976 Challenger falling down. [follows a

brief introduction on the topic of technological failure]
8. Restatement: That means between these times the car we think it’s the same
acceleration. [follows an example of a moving vehicle as an illustration
of the principle of constant acceleration]
9. Identification: This is called harmonic oscillator. [follows a description of

the piece of equipment]
10. Introduction/new topic: I want speak something about temperature. [the first
statement in the presentation]
11. Summary: That’s what it mean a binary operation. [follows a lengthy expla-
nation and examples of binary operations]
Comprehensibility
The comprehensibility of the various explanations was determined

in the following way. First, the videotapes of both the NSTAs and ITAs
were played to 25 undergraduates and 10 ESL specialists. The ITA
planned and unplanned presentations were approximately 7 to 8 min
long. Excerpts of the NSTA classes, including planned and unplanned
portions, ranged from 7 to 10 min in length. The tapes were played to
these 2 groups in batches of 8 to avoid fatiguing the raters. Speakers
were presented in random order. Raters were asked to evaluate various
components of the speakers’ language proficiency and ability to explain
on a scale of 0 to 3, similar to that used in the Speaking Proficiency
English Assessment Kit (SPEAK) with a total possible score of 18 (see
the Appendix). Clearly, on the language proficiency portions, the NSs
would be expected to receive the maximum score of 3. The scores of
the 2 rater groups were averaged to yield a mean for each group’s
evaluations of the ITA planned and unplanned presentations and the
NSTA presentations, giving a total of 6 scores.
The raters had not been informed of the difference between the
presentations; they were simply told that they would see each ITA
twice. Post-rating interviews with both sets of raters were conducted,
700 TESOL QUARTERLY

during which they were asked to rate which 2 ITA presentations in
each batch of 8 were the easiest to understand.
In order to verify the self-report of their comprehension level, the
raters were also asked to answer two questions for each presentation:
First, they were asked to name the topic and second, to name the main
idea. More specific questions were not asked since much of the material
was difficult for the undergraduates as well as the ESL specialists to
understand in detail. The self-reported comprehensibility scores alone
have high face validity because what undergraduates perceive at this
level may, in turn, determine whether they simply tune out in the first
place (see Carrier et al., 1991). The comprehension questions were
added simply to corroborate these results.
RESULTS
The first general research question concerns the relationship be-

tween planning and discourse marking for ITAs and NSTAs. In order
to address this issue, analysis of the production data focuses on two
questions: first, whether certain moves are marked at all and second,
the degree of explicitness in marking. The number of marked state-
ments in the 6 categories under investigation for the 2 groups is shown
in Table 1. The totals for all 24 ITAs are combined. The first column
in each section shows the number of key statements made in each
category by each speaker group and in each condition. The second
column in each section shows the percentage of marking of any kind.
The NSTAs, of course, have lower numbers since there were only 5 of
them, compared to 24 ITAs, and since their unplanned segments
were much shorter than their planned segments. Chi-square tests were
TABLE 1
Marking of Key Statements

performed on the ITA data in order to show whether the degree to
which they mark their key statements at all differed in their planned
and unplanned production. This does not reveal differences in how
they mark them, only whether they mark them. The 2 conditions were
shown to be significantly different, x2(1, N = 24) = 16.83, with Yates
correction factor). No statistical analysis was done on the NSTA data
because the planned and unplanned data sets were not comparable.
However, a comparison of the percentages of marked key statements
(planned, 65.33%; unplanned, 62.96%) suggests that the difference
between the 2 conditions for NSTAs is not significant.
The second issue to be addressed is the degree of explicitness. As
mentioned earlier, some marking contains reference to speaker inten-
tion or some sort of advance warning regarding the information that
is about to be given, as in Examples 1 and 3. First, the speaker an-
nounces what he or she is going to do, then does it. This presumably
would increase the salience of the point being made. In other instances,
there is no such announcement, but the utterance contains some sort
of identification of its function, with a lexical item actually naming the
function, such as in Examples 2 and 3. These are what are called
explicit markers in Tables 2 through 5. In other cases, the function of
the discourse move is signaled implicitly, with the clarity of the move’s
function depending more on other contextual factors in the presenta-
tion. In Example 12, we see a more implicitly signaled introduction or
topic shift. Up to this point, the ITA had been speaking in rather
general, theoretical terms. Algebraic calculations were only introduced
into the lecture after the following statement:
12. We talk a little bit algebra.
This marking of function is less explicit than in the previous exam-
ples (We talk a little bit algebra vs. Now I’m going to show you the algebraic
calculations behind these ideas) but still contains some indication of the
speaker’s intention. In contrast, in unmarked utterances, there is no
such identification, and the function of the utterance is relatively diffi-
cult to discern. Example 13 is actually a definition that was used to
introduce this topic. Prior to this utterance, the ITA had been dis-
cussing nonvector quantities and operations and was moving on to the
new topic of vectors.
13. Vector cannot stand for by only one number.
The lack of markings, either to introduce the topic or to identify this
utterance as a definition, along with some syntactic/lexical problems (A
vector cannot be represented by only one number) make this statement diffi-
cult to process.
Tables 2 through 5 show the degree of explicitness in the marking
702 TESOL QUARTERLY

of the 6 types of statements for the 2 speaker groups under the 2
conditions.
In each of these tables, the Total column represents only those
statements which were marked in some way and therefore a portion
of the total reported in Table 1. For instance, only 55.56% of the
definitions were marked at all in ITA planned presentations. There-
fore, the total appearing in Table 2 for this category is 45. This total
is broken down in each table in terms of kind of marking. Among
ITAs, both the number and the proportion of more overt forms of
marking, that is, those statements containing speaker intention or ex-
plicit mention of function, increases in planned production. The same
cannot be clearly said of the NSTAs. Differences between the ITAs
and NSTAs are also not clear. In planned production, the absolute
presence of marking seems to differ little between ITAs and NSTAs,
although the kinds of marking that the 2 groups employ does differ
somewhat. In unplanned production, on the other hand, the absolute
use of marking differs considerably, whereas the kind of marking does
not.
The second general research question concerns the link between
TABLE 2
Kinds of Marking in Key Statements: ITA Unplanned
Total % Speaker intent % Explicit % Implicit

Definitions 45 28.89 53.33 17.78
Illust/examples 38 7.89 68.42 23.68
Restatements 30 13.33 70.00 13.33
Identifications 37 0.00 67.57 32.43
Introductions 10 50.00 20.00 30.00
Summaries 6 16.67 33.33 50.00
Totals 166 15.66 59.64 23.49
TABLE 3
Kinds of Marking in Key Statements: ITA Planned

Definitions 47 34.04 59.57 6.38
Illust/examples 60 21.67 73.33 5.00
Restatements 55 16.36 78.18 5.45
Introductions 16 75.00 25.00 0.00
Summaries 14 50.00 42.86 7.14
Totals 233 24.89 66.95 8.15
lTA PLANNING AND DISCOURSE MARKING 703

TABLE 4
Kinds of Marking in Key Statements: NSTA Unplanned

Definitions 4 0.00 50.00 50.00
Illust/examples 12 33.33 50.00 16.67
Restatements 11 18.18 63.64 18.18
Introductions 0 0.00 0.00 0.00
Summaries 2 0.00 100.00 0.00
Totals 34 17.65 55.88 26.47
TABLE 5
Kinds of Marking in Key Statements: NSTA Planned

Definitions 14 35.71 57.14 7.14
Illust/examples 23 21.74 65.22 13.04
Restatements 25 16.00 68.00 16.00
Introductions 14 71.43 28.57 0.00
Summaries 9 55.56 33.33 11.11
Totals 98 29.59 58.16 12.24
comprehensibility and discourse planning. By establishing a link be-

tween planning and marking, on the one hand, and explicit marking
and comprehensibility, on the other, it is possible to establish an indirect
connection between planning and comprehensibility. Table 6 displays
combined scores of how the undergraduates and ESL specialists rated
TABLE 6
Combined Ratings Given to NSTAS and ITAs
Undergraduates ESL specialists

M SD M SD t
ITA unplanned 9.56 .75 10.63 .70 8.89*
ITA planned 10.81 .86 12.23 .78 8.11*
t 4.95* 7.70*
NS 17.73 .25 17.78 .18
*p <.01.
704 TESOL QUARTERLY

the 3 sets of data. The NSTAs are not divided into planned and
unplanned because both were part of a single presentation.
The NSs are indisputably rated by both groups as the more compre-
hensible and the more skilled at providing explanations. There is a less
drastic but still noticeable difference between the evaluation of the ITA
planned and unplanned presentations. Matched t tests show that this
difference is significant for both rater groups: undergraduates, t (24)
= 4.95; ESL instructors, t (9) = 7.7, p <.01. It is also interesting to note
that the scores of the 2 rater groups for the ITAs are somewhat
different, indicating that the ESL professionals, who are usually in
charge of ITA programs, may not always adequately reflect the views
of undergraduates. The ESL professionals consistently rate the ITAs
higher than the undergraduates for both planned and unplanned
presentations, as demonstrated again by matched t tests: planned, t (34)
= 8.11; unplanned, t (34) = 8.89, p <.01.
In the post-rating interviews, raters generally judged the planned
production higher than the unplanned. In each batch of 8, raters
were asked to pick the 2 speakers whom they thought were the most
effective. Planned presentations were chosen by 78%, 83%, and 67%
of the raters for the 3 batches. The fact that the percentages are even
as low as this is probably accounted for by the fact that 2 of the ITAs
had higher language proficiency, thereby enhancing both of their
presentations. The responses to the comprehension questions, with
a few exceptions, demonstrate that the raters were at least able to
understand the main idea of the presentations, in both conditions.
Ninety percent of the ESL specialists and 92% of the undergraduates
were able to identify the main idea as stated, or nearly so, by the ITA
and NSTA speakers.
Finally, in order to determine the importance of other factors in
comprehensibility ratings, the degree of morphological and syntactic
accuracy and complexity was examined. These data are reported in
Table 7. A 2-min section from each of the ITA tapes was scored for
these features, following the method suggested by Bardovi-Harlig and
Boffman (1989). The measure of complexity is clauses/T unit. The
TABLE 7
Grammatical Accuracy and Complexity for ITAs
M clauses/ t units SD t M error clause SD t

Unplanned 1.20 .20 4.92* .54 .13 ns
Planned 1.44 .14 .62 .12
*p <.01.

measure of accuracy is errors/clause. The 3 error types described by
Bardovi-Harlig and Boffman—syntactic, lexical-idiomatic, and mor-
phological—were combined for a general error count. Differences in
complexity were found to be significant, t (23) = 4.92, p < .01, in
contrast to differences in accuracy, which were not.
DISCUSSION
For the ITAs, the planned explanations were found to contain more
explicit marking and more of it than the unplanned explanations. They
also contained fewer unmarked key statements. Thus, Hypotheses 1a
and lb were supported. The same difference was not found in the
planned and unplanned production of the NSTAs. There seems to be
minimal difference between the planned and unplanned conditions
for the NSTAs, at least insofar as the absolute use of marking is
concerned. There is a greater degree of explicitness used by the NSTAs
in the planned versus the unplanned condition, although it is not clear
how strong this trend is, given the small quantity of data, especially in
the unplanned condition. Unsupported was the idea, contained in
Hypotheses 2a and 2b, that NSTAs do considerably more marking
than ITAs, at least in the planned condition. This is contrary to earlier
findings by Williams et al. (1987) and Rounds (1987). Again, because
of the small amount of unplanned NSTA data, it is difficult to compare,
but it appears that the kind of marking which NSTAs and ITAs do in
the unplanned condition is also similar. In addition, Table 1 shows that
in the planned condition, the degree to which NSTAs mark their
discourse moves at all is very similar to that of the ITAs. However,
ITAs tend to be somewhat more explicit, as shown in Tables 3 and 5.
The biggest difference remains between the ITA planned and un-
planned conditions. Yet, in spite of the sometimes minimal difference
in marking and explicitness between the ITA planned and the NSTA
data and, in some cases, even the more explicit marking by ITAs,
undergraduate and ESL specialist raters understood the NSTAs far
more easily. This would indicate that the NSTAs do not need to mark
as much or as explicitly as the ITAs in order to be understood; the
NSTA presentations are easily understood without their doing so. For
the NSTAs, the lack of planning time seemed to make much less
difference in whether and how much they marked their key statements.
As NSs, they have other ways of making their presentations compre-
hensible. Tyler's research (1988, 1989) certainly indicates that compreh-
ensibility, or lack thereof, has multiple sources. It is likely that NSTAs
choose to exploit other means of expressing themselves clearly, rather
than make extensive use of macromarkers. For ITAs, on the other
706 TESOL QUARTERLY

hand, the increased and more explicit use of marking appeared to
enhance comprehensibility considerably, judging by the evaluations of
both rater groups.
It is, of course, possible that there were other differences between
the planned and unplanned conditions which had little to do with
marking, namely, grammatical accuracy and complexity. As can be
seen from Table 7, it appears that differences in accuracy level cannot
explain the differences in ratings, since the 2 presentations do not
differ significantly in this respect. This is consistent with Crookes
(1989), who used error-free T units as a measure of accuracy and
found no significant differences between the 2 conditions. As regards
complexity, the planned production is indeed more complex than
the unplanned, although how this might affect comprehensibility is
unclear (but see Chaudron, 1983). Again, these results are similar
to those of Crookes, who found that on several different measures,
production in the planned condition was more complex but that the
differences did not reach significance.
Differences in phonological accuracy were not specifically measured
here and are an obvious area for further investigation. Speaking rate
has been shown to be an important factor in comprehensibility (Ander-
son-Hsieh & Koehler, 1988). The planned speech did appear to be
somewhat more rapid; T-unit counts for the 2-min coded segments
were slightly, though not significantly, higher. However, since T units
are not a measure of amount of speech, these figures only suggest a
difference. This again is an area that needs to be explored further.
However, even if these areas are shown to be of importance in compre-
hensibility, we are still left with the question of pedagogical implica-
tions. It has already been noted that modifying pronunciation is notori-
ously difficult, whereas teaching and learning the use of discourse
marking may prove far easier (see Mendelsohn, 1991–1992).
These findings suggest that ITAs need to use more explicit discourse
markers in order to overcome other comprehensibility difficulties that
may be the result of more local problems, such as pronunciation. This
also means, insofar as the use of discourse markers is concerned, that
ITAs should not necessarily be targeting NS behavior. In this instance,
they may need to go beyond it in order to achieve the same result as
the NSTAs in terms of comprehensibility. This is an area of strategic
competence that can be taught and may have an immediate effect on
undergraduates’ comprehension. ITAs, and perhaps other NNSs, can
compensate for skills that they lack by using strategies which do not
precisely mirror the behavior of NSs. This may be particularly impor-
tant for NNSs who appear to make little progress in areas such as
pronunciation, in spite of instruction. In sum, the explicit marking of
functions within explanations appears at once to have direct impact on

comprehensibility and to be relatively easy to learn. Incorporating
instruction in their use may go some way toward alleviating the “ITA
problem,” and may be usefully extended to the teaching of oral skills
to other NNSs as well.
ACKNOWLEDGMENTS
This research was supported by a grant from the Fund for the Improvement of
Post-Secondary Education. This is an expanded version of presentations made at
the American Association of Applied Linguistics (AAAL) Conference, Washing-
ton, DC, and the Second Language Research Forum (SLRF), Eugene, OR. I would
like to thank Barbara Hoekje and Margie Berns as well as two anonymous TESOL
Quarterly reviewers for their helpful comments on earlier drafts of this paper.
THE AUTHOR
Jessica Williams is Assistant Professor of Linguistics and director of the interna-

tional teaching assistant program at the University of Illinois at Chicago.
REFERENCES
Anderson-Hsieh, J., & Koehler, K. (1988). The effect of foreign accent and speak-
ing rate on native speaker comprehension. Language Learning, 38, 561–613.
Bardovi-Harlig, K., & Boffman, T. ( 1989). Attainment of syntactic and morpholog-
ical accuracy by advanced language learners. Studies in Second Language Learning,
11, 17–34.
Bell, A. (1984). Language style as audience design. Language in Society, 13, 145–
204.
Brown, G. (1978). Lecturing and explaining. London: Methuen.
Carrier, C., Dunham, T., Hendel, D., Smith, K., Smith, J., Solberg, J., & Tzenis,
C. (1991, April). Evaluation of teaching effectiveness of international teaching
assistants who participated in the teaching assistant program. In International
teaching assistant development at the crossroads: Interpreting evaluation data for change.
Symposium presented at the Annual Conference of the American Educational
Researchers Association, Chicago, IL.
Chaudron, C. (1983). Foreigner talk in the classroom—An aid to learning? In H.
Seliger & M. Long (Eds.), Classroom oriented research in second language acquisition
(pp. 127–143). Rowley, MA: Newbury House.
Chaudron, C., & Richards, J. C. (1986). The effect of discourse markers on the
comprehension of lectures. Applied Linguistics, 7, 113–127.
Crookes, G. (1988). Planning, monitoring and second language development: A review
(Tech. Rep. No. 6). Honolulu: University of Hawaii at Manoa, Center for
Second Language Classroom Research.
Crookes, G. (1989). Planning and interlanguage variation. Studies in Second Lan-
guage Acquisition, 11, 367–384.
Danielewicz, J. (1984). The interaction between text and context: A study of how
708 TESOL QUARTERLY

adults and children use spoken and written language in four contexts. In A.
Pellegrini & T. Yawkey (Eds.), The development of oral and written language in social
contexts (pp. 243–260). Norwood, NJ: Ablex.
Ellis, R. (1987). Interlanguage variability in narrative discourse: Style shifting in
the use of the past tense. Studies in Second Language Acquisition, 9, 1–20.
Ellis, R. (1989). Sources of intra-learner variability in language use and their
relationship to second language acquisition. In S. Gass, C. Madden, D. Preston,
& L. Selinker (Eds.), Vatiation in second language acquisition: Psycholinguistic issues
(pp. 22–45). Clevedon, England: Multilingual Matters.
Faerch, C., & Kasper, G. (1983). Plans and strategies in second language communi-
cation. In C. Faerch & G. Kasper (Eds.), Strategies in interlanguage communication
(pp. 20–60). London: Longman.
Gallego, J. C. (1990). The intelligibility of three non-native English speaking
teaching assistants. Issues in Applied Linguistics, 1, 219–237.
Givón, T. (1979). Understanding grammar. New York: Academic Press.
Hulstijn, J., & Hulstijn, W. (1984). Grammatical errors as a function of processing
constraints and explicit knowledge. Language Learning, 34, 23–43.
Lennon, P. ( 1984). Retelling a story in English as a second language. In H. Dechert,
D. Möhle, & M. Raupach (Eds.), Second language productions (pp. 50–68). Tu-
bingen: Gunter Narr Verlag.
Mendelsohn, D. (1991–1992). Instruments for feedback in oral communication.
TESOL Journal, 1 (2), 25–30.
Ochs, E. (1979). Planned and unplanned discourse. In T. Givon (Ed.), Syntax and
semantics: Vol. 12. Discourse and syntax (pp. 51 –80). New York: Academic Press.
Preston, D. (1989). Sociolinguistics and second language acquisition. Oxford: Basil
Blackwell.
Rampton, B. (1987). Stylistic variability and not speaking “normal” English: Some
post-Labovian approaches and their implications for the study of interlanguage.
In R. Ellis (Ed.), Second language acquisition in context (pp. 47–58). Englewood
Cliffs, NJ: Prentice Hall.
Rounds, P. ( 1987). Characterizing successful classroom discourse for NNS teaching
assistant training. TESOL Quarterly, 21 (4), 643–672.
Sato, C. ( 1985). Task variation in interlanguage phonology. In S. Gass & C. Madden
(Eds.), Input in second language acquisition (pp. 181-196). Rowley, MA: Newbury
House.
Tomlin, R. (1984). The treatment of foreground and background information in
the on-line descriptive discourse of second language learners. Studies in Second
Language Acquisition, 6, 115–142.
Tyler, A. ( 1988). Discourse structure and coherence in international TAs’ spoken discourse.
Paper presented at the 22nd Annual TESOL Convention, Chicago, IL.
Tyler, A. (1989). Does order of ideas affect comprehensibility in nonnative discourse?
Paper presented at the 23rd Annual TESOL Convention, San Antonio, TX,
Williams, J. (1990). The training and assessment of foreign teaching assistants. Final
report submitted to the Fund for the Improvement of Post-Secondary Education
(Contract No. 116B 80057). Chicago: University of Illinois at Chicago.
Williams, J., Barnes, G., & Finger, A. (1987). FTAs: Report on a needs analysis. Paper
presented at the 21st Annual TESOL Convention, Miami Beach, FL.
Wolfson, N. (1976). Speech events in natural speech: Some implications for socio-
linguistic methodology. Language in Society, 5, 189–209.

Discourse Structure and the Perception

of Incoherence in International
Teaching Assistants’ Spoken Discourse
ANDREA TYLER
University of Florida
Work by discourse analysts shows that listeners’ interpretation of

discourse is determined not only by a speaker’s pronunciation and
grammar but also by discourse-level patterns of language use. To
date, relatively little is known about the discourse-level patterns typi-
cally found in the English of nonnative speakers, how they diverge
from discourse produced by native speakers, or how differences in
nonnative discourse patterns affect native English listeners’ under-
standing of the discourse. Using a qualitative discourse-analytic
framework, this paper compares the planned spoken English of a
native speaker of Chinese whose English discourse was perceived by
native speakers of English as difficult to follow with that of a native
speaker of U.S. English. The analyses reveal a variety of differences
in the use of discourse structuring devices, specifically in the areas of
lexical discourse markers, lexical specificity, and syntactic incorpora-
tion. It is argued that these differences in discourse-level patterns
interfere with the listeners’ ability to construct a coherent interpreta-
tion of the Chinese speaker’s discourse.
T he past few years have witnessed a growing concern with the

English proficiency of international teaching assistants (ITAs) in
U.S. universities. These ITAs are advanced learners of English who
have scored high enough on the TOEFL and GRE to be admitted to
their respective universities but who nevertheless often have difficulty
communicating effectively with their English-speaking students. For
those working with such advanced learners, a persistent question is just
what aspects of the ITA’s oral production interfere with their listeners’
ability to understand the intended message. Work in discourse analysis
(e.g., Green, 1989; Gumperz, 1982a; Hatch, 1992) has established that
the listener’s understanding is affected not only by pronunciation and
grammar but also discourse-level patterns of language use. It seems
clear that an important step in effectively addressing the “foreign TA
713
problem” is a more accurate understanding of how the presence and/
or absence of particular discourse-level patterns in ITA speech may
affect the native listener’s ongoing interpretation of the discourse.
Recent pedagogical work has recognized that use of discourse structur-
ing cues affects comprehensibility (Davies, Tyler, & Koran, 1989; Pica,
Barnes, & Finger, 1990). For instance, Pica et al. (1990) encourage
ITAs to attend to discourse structuring cues in order to improve the
comprehensibility of their spoken discourse. However, studies aimed
at identifying these discourse structuring cues and how they may affect
native listener understanding are few (Bardovi-Harlig & Hartford,
1989; Tyler, 1988; Tyler, Jefferies, & Davies, 1988). This paper reports
on a qualitative discourse analysis the purpose of which is to begin to
identify those elements.
PREVIOUS RESEARCH
Research on U.S. English speakers’ discourse in an academic setting

(Biber, 1988; Chafe, 1982; Danielewicz, 1984; Griffin& Mehan, 1979)
indicates that native-speaker speech contains a number of devices that
orient the listener to the relative importance of the ideas within a
discourse and simultaneously convey the interrelationships among
those ideas. Some of the discourse structuring cues which have been
identified are lexical discourse markers, patterns of repetition, pros-
ody, anaphora (e.g., patterns of ellipsis and pronominalization includ-
ing demonstrative pronouns), and use of syntactic incorporation
(hypotactic constructions) versus use of simple clauses conjoined by
coordinating conjunctions (paratactic constructions). These informa-
tion structuring devices provide native listeners with a set of cues
which facilitates their construction of a coherent interpretation of the
discourse. Among other things, discourse structuring cues often signal
logical and prominence relations among the ideas being expressed.
Work in contrastive discourse analysis demonstrates that languages
vary in their use of information structuring cues, distributing the sig-
naling load differently among various levels of linguistic structure
(Connor & Kaplan, 1987; Gumperz, 1982b; Li & Thompson, 1981;
Scollon & Scollon, 1981). Thus, a potential source of cross-linguistic/
cross-cultural miscommunication is the failure of the speaker to use
information structuring cues in ways that match the listener’s expecta-
tions. To date, we have little information about how ITAs use these
cues in their English speech.
Some of the information structuring cues identified in native English
speakers’ discourse have been investigated in an L2 context. Chaudron
(1983) examined effects of repetition, conditionals, and synonyms on
714 TESOL QUARTERLY

L2 listeners’ recall and recognition of topic restatements. Chaudron
and Richards (1986) examined the effect of manipulating macrolevel
organizational phrases (those which give information about the global
structure of the discourse) and microlevel discourse markers (those
which give information concerning more local, intersentential rela-
tions) on L2 listeners’ comprehension of English lectures. Both these
studies provide important information about the effects of repetition,
synonyms, and lexical discourse markers on L2 listeners’ comprehen-
sion. However, they do not provide information on L2 speakers’ pro-
duction of these cues. The present study investigates L2 speakers’
production of several discourse structuring cues in comparison to na-
tive speakers’ production and presents a qualitative analysis of the
potential effects of these discourse-level patterns on native English
listeners.
A range of problems in lTAs’ production of discourse has been
noted over the years (Bailey, 1983, 1984a, 1984b; Hinofotis & Bailey,
1980; Sadow & Maxwell, 1983; Tyler, 1988; Tyler, Jefferies, & Davies,
1988). For instance, Hinofotis and Bailey (1980) examined the effects
of pronunciation, vocabulary, grammar, and speech flow on under-
graduates’ ratings of videotaped samples of ITAs’ discourse. Bailey
(1983) found a significant correlation between ratings by undergradu-
ates and ITAs’ level of oral proficiency; ITAs who scored 1 + or lower
on the FSI (Foreign Service Institute) oral interview tended to receive
lower ratings for teaching. Of course, a rating of 1 + on the FS1 oral
interview indicates a variety of problems with grammar, vocabulary,
and pronunciation. Tyler (1988) and Tyler et al. (1988) examined the
discourse of Chinese and Korean ITAs whose speech had been judged
as difficult to follow and found that these speakers used lexical dis-
course markers (both macro and micro) in nonnativelike ways, rarely
used relative clauses and other forms of syntactic incorporation, and
had nonnativelike prosodic patterns.
Although several of the problems identified in these studies overlap
with elements examined here, the orientation differs in important
ways. Most of the previous work on ITAs’ discourse has tended to
quantify grammatical errors and examine the relationship between
number of errors and ratings of teaching effectiveness. The present
study differs from earlier work in its attempt to clarify how the presence
(or absence) of particular linguistic patterns affects an ongoing inter-
pretation of the discourse. This analysis is not concerned with gram-
matical or pronunciation errors in general; rather, the analysis specifi-
cally focuses on aspects of the linguistic code which signal logical and
prominence relationships. For instance, the only pronunciation-related
aspect considered is the function of prosody as a discourse structuring
device. Many of the problems noted in this analysis are not technically
ITA DISCOURSE STRUCTURE 715

errors or would not be perceived as errors if the sentence were consid-
ered in isolation, for example, the employment of a series of indepen-
dent clauses conjoined by coordinating conjunctions versus employ-
ment of syntactic incorporation. Thus, the analysis examines the effects
of certain elements of the linguistic code overlooked in past analyses.
METHOD
Procedure
The current study presents a qualitative discourse analysis of the

planned spoken English discourse of one Chinese graduate student,
following the methodology developed by Gumperz, Jupp, and Roberts
(1979) and Gumperz (1982a, 1982b) in their studies of interethnic
miscommunication. The methodology uses data from (a) the text of
the discourse in question, (b) information concerning speaker intent
gathered through discussions with the speaker, and (c) information
gathered from native speakers of English acting as audience. This
discourse is then compared to the discourse produced under similar
circumstances by a native speaker of U.S. English.
The analysis takes as its basic framework the theory of conversational
inference articulated by Gumperz (1982a, 1982b): Participants in a
linguistic exchange depend on a complex constellation of cues from
all levels of linguistic structure to construct an ongoing interpretation
of the discourse. Within a given context, listeners expect a constrained
pattern of cues to help guide them through the discourse. When these
discourse expectations are violated, communication problems arise.
The Speakers
Both speakers involved in the study were members of training classes

for prospective TAs. As part of the regular course work, speakers were
asked to prepare and give a brief introductory lecture on a topic in
their area of specialization aimed at an undergraduate class of native
speakers of U.S. English. The presenters were instructed to keep in
mind that the listeners would not be experts in the presenter’s field.
Neither speaker had any previous university teaching experience. The
lectures were videotaped.
The Experimental Audience
In order to obtain judgments concerning the comprehensibility of

the two texts which were not influenced by nonnativelike pronunciation
716 TESOL QUARTERLY

or knowledge that the discourse was produced by a nonnative speaker,
it was decided to present the two texts to a new audience using a
method which removed nonnativelike pronunciation. Therefore, a
transcript of each of the two lectures was read by a native speaker of
English to 15 native speakers of U.S. English who were graduate stu-
dents in linguistics. These native speakers were asked to listen to the
texts as if they were class lectures and take notes on the material. In
addition, they were asked to describe their overall impression of the
lectures. They were not informed about the language background
of the original speakers. The listeners did not read any background
material pertaining to either of the lectures.
A persistent problem in ITA studies and training programs is lack
of access to audiences which match the background knowledge and
motivation of students in an ongoing class. Many ITA programs at-
tempt an approximation, as in the present study, by using two or three
native-English-speaking students as audience members. Although the
audience members do not provide a perfect match, the reactions of
these native speakers of English to the two texts can give us many
insights into which aspects of the nonnative speakers’ discourse cause
difficulty in establishing an interpretation. An important next step
will be to investigate the reactions of audiences whose background
knowledge more closely match that of the intended audience.
RESULTS AND DISCUSSION
The Chinese Speaker’s Discourse
The speaker whose discourse is examined here is a native speaker

of Chinese originally from Taiwan. At the time the discourse was
produced, he was a graduate student in traffic engineering at a U.S.
university, enrolled in a training class for prospective ITAs. All stu-
dents in the class were asked to take a SPEAK (Speaking Proficiency
English Assessment Kit) test, which is a retired version of the Test of
Spoken English (TSE) developed by the Educational Testing Service.
The test is scored for pronunciation, grammar, fluency, and compre-
hensibility. Comprehensibility represents a more general assessment
of the test taker’s communicative ability and is based to an extent on
the other three components. Comprehensibility is scored on a 300-
point scale. This student scored 216 on the comprehensibility compo-
nent. A score of 220 has been adopted by the State of Florida as the
minimal score for international students to be employed as graduate
teaching assistants. A score of 216 indicates that the speaker will still
have a number of problems with grammar, pronunciation, and vocabu-

lary. A transcript of the initial portion of the lecture appears in
Figure 1.
FIGURE 1
The Chinese Speaker’s Transcript
A. 1. today our topic is introduction to the traffic signal 2. aaa we will talk about five
things 3. the first one is 4. when and where you should have traffic signal 5. and when
and where we should not have traffic signal 6. so that’s the first one on your handout the
warrants for traffic signal installation 7. and then we will see 8. what kind of traffic signal
available in present days 9. also we will see the main equipment for traffic signals 10.
after that we will talk about the major elements for a design a traffic signal and the tool
for design
B. 11. O.K. first of all let’s see the warrants for traffic signal installation 12. there is a
book 13. call called Manual on Uniform Traffic Control Devices 14. any traffic signal used in
the public roadway in the United States its color size shape lighting composition whatever
15. should conform with that manual 16. so aaa I put the first warrant on your handout
17. let’s see the first case of of this warrant 18. that’s for a volume
Note. The discourse is divided into two sections labeled A and B for ease of exposition and
does not indicate any theoretical claims in terms of the existence of the paragraph as
a unit in oral discourse. The native speaker who read the transcript aloud read from
a version of the transcript which used conventional punctuation; thus, the reader did
not attempt to imitate the stress, pausing, and intonation as originally produced by the
Chinese speaker. Numbers represent syntactically defined clauses.
The original audience for the Chinese speaker was composed of the
ESL instructor, the other international students in the ITA training
course, and 3 additional native speakers of U.S. English. The discourse
struck these listeners as circuitous and difficult to follow. Of the 15
additional English listeners who heard the transcript read by a native
speaker, not one was able to catch more than three of the five points
enumerated in the section labeled A. When asked in an open-ended
question for their general assessment of the discourse, they used such
descriptors as muddled, rambling, and speaker seemed unsure of his main
point. None of the 15 described the discourse in terms synonymous
with clear or easy to follow. The analysis which follows sheds some light
on this assessment. Although there are problems at many linguistic
levels which should be dealt with in a complete analysis, here I will
concentrate only on lexical discourse markers, lexical specificity, syntac-
tic incorporation, and prosodies (which will be addressed briefly in the
section on interactive effects).
LexicaI Discourse Markers
When lexical discourse markers match native listener expectations,

they act as explicit announcements about how the listener should incor-
718 TESOL QUARTERLY

porate information into the overall discourse (Keller, 1979). In this
discourse, the speaker uses lexical discourse markers in an unexpected,
nonparallel manner. As a preorganizer (in Clause 2), he announces
that there will be 5 subtopics and (in Clause 3) introduces the initial
subtopic with the phrase the first one. However, the speaker then
abandons the strategy using instead the sequential marker and then
(in Clause 7) to introduce the second main point, which violates the
established expectation that the organizational schema will be a
numerical one. Perhaps more importantly, the speaker mixes the
sequential markers (and then in Clause 7 and after that in Clause 10)
with simple additive markers (also in Clause 9 and and in Clause 10).
In this context, the additive markers give ambiguous signals. It is
not clear if they are signaling the elaboration of an already established
topic or the introduction of a new, major point. This is particularly
apparent
.. in the 10th clause where it is unclear if the phrase and the
tool for design should be interpreted as an elaboration’ of the major
elements for designing traffic signals or as a separate topic. (The
speaker’s intention was to indicate that the tool for traffic design was
the fifth major point.)
Lexical Specificity
The overarching notion of lexical specificity is that the referent in

the discourse should be sufficiently identified to avoid confusion for
the audience. A number of devices have been identified as helping
provide lexical specificity: pronominalization, certain patterns of adjec-
tival modification, repetition, and appropriate lexical choice (Halliday
& Hasan, 1976; Johnstone, 1987; for a fuller theoretical discussion of
the issues surrounding lexical specificity and lexical cohesion, see
Green, 1989, and Tyler, 1992). Moreover, because any one word can
have a wide range of meanings, within any particular exchange, inter-
locutors must establish a mutual interpretation of key lexical items.
This often includes establishing a set of context-specific synonyms
(Tyler, 1992). When introducing a synonym, a speaker can provide
explicit links to signal the listener that, in this particular exchange, this
new lexical item should be taken to have the same interpretation as the
established lexical item. One way of establishing such links is repetition
of modifying phrases which were previously linked to a lexical item
whose interpretation is taken to be established.
To see the effect of the lack of establishing lexical specification in the
Chinese speaker’s discourse, consider the general comprehensibility of
both paragraphs. The first seems relatively easy to understand, whereas
the second emerges as a series of non sequiturs. In part, the second

paragraph is more difficult to follow because of the expectations set by
the introductory paragraph in which the speaker announced that the
first topic would deal with where and when traffic signals should be
installed (in Clauses 4 and 5); when the speaker returns to an expansion
of the first topic, the listener expects a phrase signaling location (the
where and when) of traffic signals. Thus, the announcement that the
first topic will be warrants for traffic signal installation (in Clause 11)
is unexpected and confusing. This is followed by a declarative state-
ment that a Manual for Uniform Traffic Control Devices exists (Clauses 12
and 13). The speaker offers no overt link between the warrants and
this manual. Then come Clauses 14 and 15: Any traffic signal used in the
public roadways in the United States, its color, size, shape, lighting, composition,
whatever, should conform with that manual, which at best seem only tangen-
tially related to the preceding discourse.
One of the reasons there seems to be so little connection between
this sentence and the rest of the discourse is that the list of signal
characteristics given here deals exclusively with a physical description
of signals, whereas the previous discourse focuses on the regulations
concerning placement of traffic signals or the undefined warrants.
Without a phrase such as the where and when of traffic signal installation
which explicitly provides a link between the warrants, the contents
of the manual, and the surrounding discourse, the connections are
obscure at best. Finally, in Clause 16, warrant is again mentioned but
its relation to the manual and the physical description of traffic
signals remains obscure. Even if the listener inferred the initial link
between the where and when of traffic signal installation and the
warrants for traffic signal installation (which is weakly made in Clauses
4, 5, and 6) there is an important switch in referring phrase between
the mention of warrant in Clause 11, let’s see the warrants for traffic
signal installation, and the phrase in Clause 16, so I put the first warrant
on your handout. Normal interpretation is not that warrants and the
first warrant are coreferential. Subsequent conversations with the
speaker revealed that to him, the where and when of traffic signal
installation is synonymous with the first warrant and that the warrants
are synonymous with the information contained in the manual. But
to the listener who does not start with this presupposition, the change
in terminology is perceived as a change in reference, not cohesive
paraphrases. Moreover, the phrase warrants for traffic signal installation
appears to have two interpretations—both the general list of regula-
tions governing all aspects of traffic signals and the more specific
interpretation of regulations governing location for traffic signal
installation. The lack of lexical specification results in the impression
that much of the discourse consists of disconnected detail.
720 TESOL QUARTERLY

Syntax
Considering the syntactic relationships among clauses, note that of

the 6 sentences in the second paragraph, 4 are simple clauses. Studies
by Biber (1988), Chafe (1982), Danielewicz (1984), Lakoff (1984), and
Schachter (1973) have indicated that native English speakers use rela-
tive clauses, complements, and other subordinate structures as impor-
tant information organizing devices to provide cues about prominence,
focus, and logical relations.
Heavy reliance on coordinate conjunction and juxtaposition in lieu
of syntactic incorporation essentially strips the discourse of important
sources of information regarding prominence and logical relation-
ships. This is the case in the discourse under consideration. For exam-
ple, the relationship between the 11th clause, which announces that
the topic is warrants for traffic control, and the 12th clause, which
announces the existence of the manual, is particularly obscure. These
seemingly disjoint pieces of information are expressed in two separate
“sentences” which the listener tends to interpret as being equal in
prominence and centrality to the argument. The logical and hierarchi-
cal relationships between these clauses (which the speaker claimed he
intended) must be inferred on the basis of juxtaposition. In this case
the context is not sufficiently rich for the English listener to establish
a meaningful connection between the warrants and the existence of
the manual. However, when the clauses are linked through syntactic
incorporation, a more interpretatable connection emerges: let’s see the
warrants for traffic signal installation which are found in the book called
Manual on Uniform Traffic Control Devices. The pattern of weak inter-
clausal connection is repeated throughout the discourse. The resulting
impression is that the speaker is wandering or continually digressing
from the main point.
Interactive Effects
Thus far, we have considered each discourse structuring device

in isolation. However, there are often multiple, interacting problems
within a single clause. The final clause illustrates interactive effects:*
Clause 18 is one of the most difficult to integrate into the discourse.

In part, this is because of the unspecified use of the word volume.
1/= pause; number of slashes represents relative duration (i.e., more slashes equals longer
duration). 1 = nuclear stress. Numbers represent syntactically defined clauses. Contour lines
represent intonation curves.

This is the first time the notion of volume has been presented. The
immediately preceding discourse (Clauses 14 and 15) deals with a list
of the physical properties of traffic signals. The expectation is that the
first warrant will deal with something like physical specifications for
traffic signals. The listener is thus surprised by the unexpected an-
nouncement that the first warrant deals with some kind of never-
before-mentioned volume. In terms of anaphora or pronominal refer-
ence, it is not entirely clear what the pronoun that is referring to. The
most plausible explanation would seem to be that the clause is giving
information which modifies or further defines the first warrant in Clause
17. Syntactically, the expected structure to signal this relationship is
the relative clause. The choice of the pronoun that precludes the rela-
tive clause interpretation because this clause is a nonrestrictive relative,
and hence the use of that is unacceptable.
Moreover, prosodic cues indicate that the clause is independent. The
falling intonation pattern used by the Chinese speaker on the final
word of the preceding clause (18), warrant, signals the listener that the
sentence has ended. This interpretation is reinforced by the long pause
preceding the final clause and the high pitch on that’s. According to
Hirst (1986), English uses high pitch to signal the beginning of a new
idea. The heavy stress placed on that’s suggests that the pronoun is
working as the subject of a new sentence. In sum, the interpretation
of this clause is made extremely difficult because of a number of
conflicting or ambiguous cues. The prosodic and syntactic cues point
to the interpretation that the clause is an independent sentence. How-
ever, in terms of semantics, a relative construction seems most appro-
priate. Finally, the lack of lexical specification leaves the lexical inter-
pretation murky.
The Native Speaker’s Discourse
We turn now to an analysis of the planned spoken discourse pro-

duced by a native speaker of U.S. English. The transcript in Figure 2
is an excerpt from his first videotaped teaching demonstration.
The lesson is aimed at first-year university students in an introduc-
tory biology class. The speaker started the lecture with a brief demon-
stration concerning the diffusion of perfume molecules into the air.
He refers to the process by which the perfume molecules spread in his
first sentence with this process.
A transcript of the English speaker’s lecture was read to the same 15
subjects who heard the Chinese speaker’s discourse. All 15 listeners
were able to list the four major points enumerated by the native English
speaker. Terms such as well-organized and clear were used to describe
the discourse. Although two of the listeners complained about the
722 TESOL QUARTERLY

FIGURE 2
The Native English Speaker’s Transcript
1. what I’d like to go over with you today 2. is this process 3. and how it relates to cell
structure and function alright uh 4. there are four ways 5. that that substances can pass
into and out of the cell 6. the first 7. which we’re going to discuss today is diffusion 8.
and that’s just the process of scattering or spreading of of small particles from an area of
greater concentration to an area of lesser concentration 9. in the case of the perfume you
have a concentrated perfume 10. so it’s going from greater concentration to an area of
lesser concentration of perfume 11. the second means of getting substances into and out
of cells ah is called osmosis 12. and we’ll discuss that later on probably this week 13. an-
other way is active transport 14. which is a process 15. which requires energy 16. and
and it usually goes against the gradient 17. in this situation (points to the perfume bottle)
everything is going from greater to lesser 18. in the the case of active transport you go
from an ah an area of lesser concentration against the ah concentration gradient 19. and
that requires ah energy 20. but we’ll discuss those this also later this week 21. and the ah
fourth is called phagocytosis and ah plandicytosis 22. and they are two processes of sim-
ply engulfing 23. ahum if ah if if ah something if a drop or a particle were outside 24.
and the cell involved the a a the membrane to surround and engulf it 25. that’s what
these are two despicable processes of the cell
Note. Numbers represent syntactically defined clauses.
lecture being dry, none described it in terms synonymous with confusing

or incoherent.
The overall discourse structure and organizational strategy used in
the English speaker’s discourse are strikingly similar to those used in
the Chinese speaker’s discourse. Notice that, like the Chinese speaker,
the native speaker begins by announcing the general topic, then an-
nounces the number of subpoints to be covered. In both cases, the
introductory announcement is followed by an enumeration of the
subpoints. Both speakers begin with a numerical strategy for alerting
the listeners to each of the subpoints. Despite the similarity in global
structure, the native-speaker discourse struck the English listeners as
being relatively easy to follow, whereas the discourse of the nonnative
speaker was perceived as being, as one native listener said, “rambling
and scrambling.” Close analysis reveals that the native speaker’s use of
discourse structuring cues is quite different from that of the Chinese
speaker.
Lexical Discourse Markers
The native speaker uses a consistent numerical strategy as seen in

Clauses 4 (first), 11 (second), and 21 (fourth). Notice that in Clause 13
the native speaker does switch from the numerical strategy, introduc-
ing the new point with another way. However, this switch is not confusing

(as was the switch in strategy in the Chinese speaker’s discourse) because
of the speaker’s use of a parallel syntactic structure to introduce each
of the subpoints as well as his repetition of the noun way in the phrase
another way. This last point will be elaborated in the discussion of lexical
specificity. Moreover, the native speaker returns to the numerical strat-
egy when he introduces the fourth point (in Clause 21). Recall that
once the Chinese speaker abandoned the numerical strategy, he did
not resume it. In addition to the consistent numerical strategy, in
Clauses 17 and 18 the native speaker uses the phrases In this situation
and In the case of to focus attention on contrasting information. Finally,
the native speaker’s use of so in Clause 10 conveys the expected sense
of logical conclusion or result (Halliday & Hasan, 1976), linking the
information in the 9th clause with that in the 10th. This is in contrast
to the Chinese speaker’s nonnativelike use of so to link the information
presented in Clauses 14 and 15 with the information in 16.
Lexical Specificity
The discourse produced by the native speaker contains several repe-

titions at both the word and the phrase levels which are lacking in the
Chinese speaker’s discourse. The native speaker’s repetitions work to
specify and establish discourse-situated interpretations of key lexical
items and give directions on how the information should be incorpo-
rated into the ongoing discourse.
Consider the specification and linking of the lexical items way, process,
and means in this discourse. The speaker begins (in Clause 4) by estab-
lishing a narrow, context-specific interpretation of way: ways that sub-
stances pass into and out of cells. In the following sentence (Clause 6), way
does not appear but is implied in the first [way] which we’re going to discuss
today is diffusion. The syntactic positioning of way in Clause 4 places it in
the controlling, focus position (Erhlich, 1988; Fox, 1987) and therefore
makes it the priority candidate for anaphoric reference (either pro-
nominal or elliptical) in Clause 6. In the next sentence (that’s just the
process of. . . ), way is linked with process, establishing that way and process
should be interpreted as synonyms in this discourse. In Clause 11,
means is explicitly established as a synonym of way and process through
repetition of the phrase of getting substances into and out of cells. In Clauses
13 and 14, the speaker explicitly repeats the links between way and
process in another way is active transport which is a process.
In part, it is this explicit establishment of the context-situated inter-
pretation of way which allows the listener to easily interpret another way
in Clause 13 as introducing the third major point even though the
speaker does not continue with the established numerical strategy.
Comparing the native speaker’s use of lexical specification and repe-
724 TESOL QUARTERLY

tition to the Chinese speaker’s, the Chinese speaker’s discourse is almost
totally lacking in the use of this discourse structuring device. The
Chinese speaker does not effectively establish a mutual interpretation
of key terms (such as warrant) and fails to provide links between in-
tended synonyms (e.g., when and where you should have traffic signal
installation and warrants for traffic signal installation).
One place where phrasal repetition does occur is in Clauses 4 and 5,
where and when we should have traffic signal and where and when we should
not have traffic signal. However, this repetition does not serve to link
any new concepts or lexical items. This kind of immediate repetition
seems to signal special focus or contrast which is lacking in the content.
Syntax
The importance of syntactic incorporation in making logical and

prominence relations explicit has already been established here and in
previous work (Biber, 1988; Lakoff, 1984; Schachter, 19’73; Tyler,
Jefferies, & Davies, 1988). A number of instances of syntactic incorpo-
ration occur in the native speaker’s discourse. To see how they signal
prominence and logical relations, we will consider Clauses 21 through
25 (see Figure 1).
If we convert these complex structures into simple clauses conjoined
by and or simply juxtaposed, the argument becomes much less explicit:
2. And the fourth is called phagocytosis and plandicytosis. They are two
processes and they simply engulf. Something, a drop or a particle is
outside and the cell involves the membrane and the membrane surrounds
and engulfs it. That’s these processes and they are two despicable pro-
cesses of the cell.
Table 1 shows the frequency of the differences in the use of discourse
structuring devices in the two speakers’ texts.2
2 The observed tendency of native English speakers to use more discourse structuring devices
is supported by a quantitative analysis of 40 additional texts produced by native speakers of
Korean and Chinese along with 6 additional texts produced by native speakers of U.S.
English. All the subjects were graduate students enrolled in training classes for potential
teachers. None had any previous teaching experience in U.S. universities. All were asked as
a part of their regular coursework to prepare a brief introductory lecture on a subject in
their major area. The teaching presentations were videotaped and subsequently transcribed.
Thus, the general task assigned the subjects, the level of their teaching experience and level
of expertise in their respective fields are highly parallel.
A Hotelling-Lawley Trace test was performed on the data. The results indicated that the
differences in the amounts of syntactic complexity used by the two groups were significant,
F (3, 40) = 47, p <.0001. However, in spite of the high level of significance, these statistical
results must be viewed with some caution as the number of native speakers is quite small.
The important point here is that the native speakers in this study used more relative clauses,
complements, and subordinations than did the nonnative speakers. This finding parallels
the finding of the qualitative analysis and lends empirical support to the hypothesis that the
discourse of the Chinese speaker did not signal the intended logical and prominence relations.

TABLE 1
Use of Discourse Structuring Devices
Native English speaker Chinese speaker

Relative clauses 5 2
Complements 7 3
Other subordination 2 0
Total 14 5
CONCLUSION
This study has presented a qualitative discourse analysis of two texts.

Using an integrative discourse framework, we have found several dis-
tinct patterns of difference between the English discourse produced
by a native and nonnative speaker in the areas of lexical discourse
markers, lexical specificity, and syntactic incorporation.
At first glance, it may appear that some of the findings in this study
are at variance with earlier findings. For instance, Chaudron (1983)
found that synonyms were not a particularly helpful device in aiding
L2 listeners’ recall and recognition of topic reinstatements. The present
analysis argues that establishment of a synonym set is an important
discourse structuring device in native-speaker discourse. Chaudron
and Richards (1986) found that the addition of microlevel, intersenten-
tial connectors did not improve L2 listeners’ comprehension of English
lectures. Here it is argued that cuing of logical and prominence rela-
tions at the microlevel (through syntactic incorporation and lexical
discourse markers) does make an important contribution to compre-
hensibility. A crucial difference between the present study and the two
earlier studies is that the present study considers the potential effect
of these discourse structuring cues on native listeners, whereas the
earlier studies consider their effects on L2 listeners. The fact that L2
listeners seem not to use these cues in aiding their listening comprehen-
sion is arguably consistent with their failure to produce similar dis-
course structuring cues in their own spoken discourse.
In terms of pedagogy, the analysis suggests that discourse-based
differences contribute substantially to communication problems en-
countered by advanced language learners and points to the need to
develop teaching materials and techniques which train L2 speakers in
nativelike use of discourse structuring devices. One relatively straight-
forward application of the present findings is to ask students to com-
pare native- and nonnative-speaker discourse, such as the excerpts
presented here, with an eye towards specific discourse structuring cues.
For instance, students can be asked to identify sets of synonyms in the
726 TESOL QUARTERLY

native-speaker discourse in order to focus their attention on the uses
of paraphrase and repetition. Once they have been introduced to
this pattern in native-speaker discourse, they can be asked to add
repetitions and paraphrases to discourse which lacks this discourse
structuring cue. The next step is to analyze transcripts of their own
discourse for the occurrence of paraphrase and repetition and make
appropriate modifications. My students have used this exercise success-
fully both as individual homework and as a cooperative, small-group
activity in which three or four students analyze and modify each others’
transcripts.
Finally, in spite of the difficulties listeners had in comprehending
the discourse of the nonnative speaker in this study, he obtained a
score on the SPEAK test which is very close to the level needed to be
allowed to teach. The present findings suggest that general language
proficiency tests, such as the SPEAK test, are likely to be inadequate as
the sole assessment for determining the readiness of a nonnative
speaker to provide comprehensible academic discourse. At the very
least, this study points to the need to include some additional measure
of appropriate discourse structuring cues as a relevant variable in
assessing the English skills of potential ITAs.
ACKNOWLEDGMENTS
A version of this paper was presented at the 22nd Annual TESOL Convention in
Chicago, 1988. I wish to thank Catherine Davies, Georgia Green, and Ann Jefferies
for their comments on earlier versions of this paper. I also thank two anonymous
reviewers whose careful reading and comments resulted in a more insightful final
analysis.
THE AUTHOR
Andrea Tyler is Assistant Professor of Linguistics and coordinator of the training

program for international teaching assistants at the University of Florida. Her
work on discourse analysis has appeared in English for Specific Purposes, Studies in
Second Language Acquisition, TEXT, and World Englishes; her work in psycholinguis-
tics has appeared in Memory and Language and Cognition.
REFERENCES
Bailey, K. ( 1983). Foreign teaching assistants at U.S. universities: Problems in

interaction and communication. TESOL Quarterly, 17 (2), 308–310.
Bailey, K. (1984a). The “foreign TA problem.” In K. Bailey, F. Pialorsi, & J.

Zukowski/Faust (Eds.), Foreign teaching assistants in U.S. uniiversities(pp. 3-15).
Washington, DC: NAFSA.
Bailey, K. ( 1984 b). A typology of teaching assistants. In K. Bailey, F. Pialorsi, and
J. Zukowski/Faust (Eds.), Foreign teaching assistants in U.S. uniiversities(pp. 110-
125). Washington, DC: NAFSA.
Bardovi-Harlig, K., & Hartford, B. (1989). Speaking out of turn: Negotiating potentially
disruptive speech acts. Paper presented at the 23rd Annual TESOL Convention,
San Antonio, TX.
Biber, D. ( 1988). Variation across speech and writing. Cambridge: Cambridge Univer-
sity Press.
Chafe, W. (1982). Integration and involvement in speaking, writing, and oral
literature. In D. Tannen (Ed.), Spoken and written language: Exploring orality and
literacy (pp. 35–53). Norwood, NJ: Ablex.
Chaudron, C. (1983). Simplification of input: Topic reinstatements and their
effects on L2 learners’ recognition and recall. TESOL Quarterly, 17 (3), 437–458.
Chaudron, C., & Richards, J. (1986). The effect of discourse markers on the
comprehension of lectures. Applied Linguistics, 7, 113–127.
Connor, U., & Kaplan, R. (1987). Writing across language. Reading, MA: Addison-
Wesley.
Danielewicz, J. (1984). The interaction between text and context: A study of how
adults and children use spoken and written language in four contexts. In A.
Pellegrini & T. Yawkey (Eds.), The development of oral and written languagein social
contexts (pp. 243–260). Norwood, NJ: Ablex.
Davies, C., Tyler, A., & Koran, J. (1989). Face-to-Face with native speakers: An
advanced training class for international teaching assistants. English for Specific
Purposes, 8, 139– 153.
Erhlich, S. ( 1988). Cohesive devices and discourse competence. World Englishes, 7,
111–118.
Fox, B. (1987). Discourse structure and anaphora. Cambridge: Cambridge University
Press.
Green, G. (1989). Pragmatics and natural language understanding. Hillsdale, NJ:
Lawrence Erlbaum.
Griffin, P., & Mehan, H. (1979). Sense and ritual in classroom discourse. In F.
Coulmas (Ed.), Conversational routine: Explorations in standardized communicational
situations and prepatterned speech (pp. 187–214). The Hague: Mouton.
Gumperz, J. (Ed.). (1982a). Language and social identity. Cambridge: Cambridge
University Press.
Gumperz, J. (1982b). Discourse strategies. Cambridge: Cambridge University Press.
Gumperz, J., Jupp, R., & Roberts, C. (1979). Crosstalk: A study of cross-cultural
communication. London: National Center for Industrial Language Training in
association with the British Broadcasting Corporation.
Halliday, M., & Hasan, R. (1976). Cohesion in English. London: Longman.
Hatch, E. (1992). Discourse and language education. Cambridge: Cambridge Univer-
sity Press.
Hinofotis, F., & Bailey, K. (1980). American undergraduates’ reactions to the
communication skills of foreign teaching assistants. In J. Fisher, M. A. Clarke,
& J. Schachter (Eds.), On TESOL ’80: Building bridges: Research and practice in
teaching English as a second language (pp. 120– 133). Washington, DC: TESOL.
Hirst, D. ( 1986). Phonological and acoustic parameters of English intonation. In
A. Johns-Lewis (Ed.), Intonation in discourse (pp. 19–33). London: Croom Helm.
Johnstone, B. (Ed.). (1987). Repetition [Special issue]. Text, 7(3).
Keller, E. (1979). Gambits: Conversational strategy signals. In F. Coulmas (Ed.),
728 TESOL QUARTERLY

Conversational Routine: Explorations in standardized communication situations and
prepatterned speech (pp. 93–114). The Hague: Mouton.
Lakoff, R. ( 1984). The pragmatics of subordination. In C. Brugman & M. Macauley
(Eds.), Proceedings of the 10th Annual Meeting of the Berkeley Linguistics Society (pp.
472–480). Berkeley, CA: Berkeley Linguistics Society.
Li, C., & Thompson, S. (1981).Mandarin Chinese: A functional reference grammar.
Berkeley, CA: University of California Press.
Pica, T., Barnes, G., & Finger, A. (1990). Teaching matters: Skills and strategies for
international teaching assistants.New York: Newbury House.
Sadow, S., & Maxwell, M. (1983). The foreign teaching assistant and the culture
of the American university class. In M. A. Clarke & J. Handscombe (Eds.), On
TESOL ’82: Pacific perspectiues on language learning and teaching. Washington, DC:
TESOL.
Schachter, P. (1973). Focus and relativization. Language, 49, 19–46.
ScolIon, R., & Scollon, S. (1981). Narrative, literacy, and face in interethnic communica-
tion. Norwood, NJ: Ablex.
Tyler, A. (1988). Structure and coherence in foreign TAs’ spoken discourse: An integrated
discourse analysis. Paper presented at the 22nd Annual TESOL Convention,
Chicago, IL.
Tyler, A. (1992, January). Lexical cohesion and discourse structure: A re-examination.
Paper presented at the 66th Annual Meeting of the Linguistics Society of
America, Philadelphia, PA.
Tyler, A., Jefferies, A., & Davies, C. (1988). The effects of discourse structuring
devices on listener perceptions of coherence in non-native university teachers’
spoken discourse, World Englishes, 7, 101–110.

The Role of Conjunctions in

L2 Text Comprehension
ESTHER GEVA
Ontario Institute for Studies tn Educatton
Conjunctions make explicit the logical relations between propositions

and signal text structure. There is evidence from L 1 research litera-
ture to show that skilled and less skilled readers differ in the degree
to which they utilize explicit logical relations markers (i.e., conjunc-
tions) in text and in the degree to which they infer implicit logical
relations. The purpose of the research reported here was to discover
whether and at what level of L2 proficiency the meaning of conjunc-
tions is comprehended by the adult literate L2 learner. University-
level L2 learners with English as L2 performed a number of tasks in
which their comprehension of logical relationships and the conjunc-
tions used to signal them was tested intrasententially, intersenten-
tially, and at discourse level. Results suggest that the ability to realize
the nature of logical relationships within local contexts is a necessary
but not sufficient component of comprehension of such relations in
extended discourse. With increased proficiency, L2 learners improve
their ability to utilize and infer logical relationships in extended dis-
course.
I n written language, conjunctions are used to signal the logical con-

nections between ideas (Kintsch & van Dijk, 1978; van Dijk &
Kintsch, 1983). More specifically, conjunctions are used to mark dis-
course structure and the function of various text segments (Geva,
1983). Causal relations, for instance, are signaled by causal conjunc-
tions (e.g., since, because, due to). The description of a process often
includes temporal links such as first, next, and then.
COMPREHENDING LOGICAL RELATIONSHIPS IN L1
If conjunctions help to make text organization explicit (Meyer, 1977),

and if awareness of text organization is essential for text comprehen-
sion (Meyer, Brandt, & Bluth, 1981), it follows that the presence of
731
conjunctions in text should facilitate the instantiation of textual sche-
mata (Kieras, 1985), help to direct readers’ attention to important text
information (Lorch & Lorch, 1986), and help in checking information
in memory (Spyridakis & Standal, 1987). The question of whether
explicit text signaling facilitates comprehension has been the focus of
a number of studies. Typical research studies addressing this question
compare the effect on comprehension of reading intact texts with
texts from which conjunctions have been removed. Results have been
contradictory. Some studies lead to the conclusion that comprehension
is not affected, whereas others suggest that conjunctions facilitate com-
prehension under some reader and text conditions. Spyridakis and
Standal (1987) found that signaling facilitated comprehension of ex-
pository texts by college students when passages were “neither too easy,
nor too difficult” (p. 285). There is evidence in the L1 literature to
suggest that understanding conjunctions as marking the focus of topi-
cal relations between sentences is a gradual process that is mastered by
literate adults (Johnson & Pearson, 1982; McClure & Geva, 1983;
Zinar, 1990). Geva (1987, 1990) reports that texts in various academic
disciplines vary in terms of the incidence of explicit text structure
markers and the extent to which such markers focus on microlevel or
macrolevel propositions. This study suggests that readers in different
disciplines encounter different conjunctions, the function of which
varies within the text, and that their understanding of conjunctions
may affect differently readers in different disciplines.
Skilled and less skilled readers have been shown to differ in the degree
to which they infer logical relations in text (Bridge& Winograd, 1982;
Evans & Ballance, 1980; Geva, 1986a; Geva & Ryan, 1985; Irwin, 1980).
For instance, Marshall and Glock (1978–1979) found that the presence
in text of connective such as however and on the other hand facilitated
learning for junior college students, but had a minimal effect on univer-
sity students. Meyer, Brandt, and Bluth (1981) showed that connective
facilitated recall among ninth-grade students who were poor compre-
henders but did not among skilled readers. Irwin (1980) examined the
effects of clause order and explicitness of causal relationships by fifth-
grade and college-level students. Irwin found that fifth-grade students
achieved higher comprehension scores in the explicit causal relation-
ships group than did students in the implicit causal relationships group.
Results of Irwin’s college-level study suggest that when the causal rela-
tionships appear in the text in a reversed order, explicitness of causal
relationships facilitates comprehension even for adults.
Geva (1983) found that community college students who were less
skilled readers failed to comprehend cause-effect relations when the
content of the cause-effect relations appeared in expository texts in
reverse order. By succumbing to the “order of mention pitfall” (Geva,
732 TESOL QUARTERLY

1983, p. 399) they interpreted order of mention as corresponding to
order of occurrence of these events in the world. Geva and Ryan ( 1985)
found that when conjunctions were omitted from the texts, fifth- and
seventh-grade skilled readers automatically inferred the missing links,
whereas the less skilled readers did not. Furthermore, the comprehen-
sion of intersentential relations in expository texts by readers at all
levels was enhanced when their attention was directed explicitly to
conjunctions in text. This was achieved by highlighting conjunctions
in text. Geva and Ryan (1985) maintain that this effect was due to the
fact that highlighting the conjunctions reduced processing demands.
These studies demonstrate that skilled and more mature readers
have a better knowledge of conjunctions and their role in marking
logical relationships, and are more likely to use this knowledge to infer
logical relationships than are less skilled and younger readers, and that
they demonstrate better comprehension of texts. Text variables such
as text difficulty and the nature of the logical relations involved seem
to interact with reader characteristics such as developmental level,
reading comprehension ability, and memory (Geva, 1986b; Johnson,
Fabian, & Pascual-Leone, 1989).
COMPREHENDING LOGICAL RELATIONSHIPS IN L2
There are suggestions in L2 literature that discourse comprehension

may be hampered by difficulties in processing logical relationships. For
instance, in a preliminary study, Cohen and Fine (1978) found that
nonnative adult speakers of English do not exploit cohesive textual
links sufficiently and fail therefore to comprehend expository texts
adequately. Among examples given by Alexander (1980) to show the
types of problems encountered by adult students reading scientific
texts is a case where a reader misinterpreted the conjunction since as
being temporal rather than causal. As a result, his translation of that
text segment from English to German was faulty. Sim and Bensoussan
(1979) conclude that the comprehension of expository texts by univer-
sity EFL students may be affected by an incomplete mastery of function
words as much as incomplete mastery of content words. The authors
argue that tests of reading proficiency should, therefore, deal with
comprehension of “the cohesive relationship between the components
of a text, as well as understanding each component separately” (p. 40).
Lipson and Wixson (1986) claim that research on reading ability as
well as reading disability should adopt an interactive view. Such a view
takes into account the dynamic process of reading in which the reader,
text, task, process, and the setting conditions of the reading situation
interact in an active and flexible manner. This claim should be ex-
CONJUNCTIONS AND L2 TEXT COMPREHENSION 733

tended to reading in L2 as well. To understand how L2 learners
comprehend texts, we need to study the differential contribution of
text-based characteristics such as genre, text structure parameters, and
the use of cohesion markers. We need to specify the nature of the
reading tasks under study. We also need to consider the variety of
competencies, skills, and expectations brought by the L2 learner to the
situation and the extent to which L1 and L2 linguistic proficiency and
prior knowledge may interact in the reading process.
Swaffar (1985) points out that L2 readers need to “identify the
particular logic [of text] on the basis of intersentential relationships”
(p. 24). She further suggests that L2 students’ attention needs to be
drawn to intersentential connective and “the relationship between
global meaning and language detail” (p. 24). Yet in the area of L2
research, not enough is known about the interaction of reader and
text. In particular, not much is known of the extent to which L2
learners with different levels of L2 proficiency can utilize the logical
signaling intended by conjunctions and the extent to which they can
infer logical relations when these are not explicitly marked in text (see
Lipson & Wixon, 1986). With regard to reading in L2, Carrell (1982)
has argued that “we must supplant or at least supplement textual
analysis theories such as cohesion theory (e.g., Halliday & Hasan, 1976)
with broader, more powerful theories which take the reader into ac-
count and which look at both reading and writing as interactive pro-
cesses involving the writer and the reader, as well as the text” (p. 487).
One factor to address when studying the effect of conjunction manip-
ulation on comprehension is the function of conjunctions in the text. As
the studies by Cohen and Fine (1978) and McClure and Geva (1983)
suggest, it may be easier to handle intrasentential cohesion than intersen-
tential and interparagraph cohesion. Such an explanation is congruent
with an information processing-limited capacity paradigm. The process
of reading comprehension involves relating new or incoming informa-
tion to information already stored in memory. According to van Dijk
and Kintsch ( 1983), it involves interpretation of phonemes, morphemes,
and clauses. At the same time, clauses must be connected in sentences,
coherent connections among sentences need to be established, and the
reader needs to derive global macrostructures to determine the topic
of the passage. These interpretations, in turn, depend on general and
episodic knowledge stored in memory. Thus, memory must be searched
and inferences have to be made to determine local and global coherence.
The reader needs also to activate knowledge of “schematic superstruc-
tures”: knowledge of style, attitudes, goals, plans, and so on. parallel
processing capacity, however, is limited and the effects of this limitation
become apparent when the subprocesses are not automatized (i.e., accu-
rate, fast, and efficient), as is the case with L2 learners (Segalowitz, 1986).
734 TESOL QUARTERLY

Clearly, when the L2 reading processes are not automatized, related
comprehension strategies must be applied consciously and require
special effort, thus overloading the system (Bialystok & Ryan, 1986;
van Dijk & Kintsch, 1983). If L2 readers allocate most of their resources
to processing basic functions such as decoding lexical access and syntac-
tic information, readers may not have sufficient resources for storage
and for higher level text processing such as elaboration of text informa-
tion into propositional macrostructures and the derivation of a topic
or a theme. Furthermore, readers will be less efficient in utilizing prior
knowledge, controlling the interpretation process, and attending to
global logical relations. It may be hypothesized that less proficient L2
readers may be able to demonstrate comprehension of logical relation-
ships when task demands require the assignment of meaning to senten-
ces or sets of sentences. Such learners will be successful when the task
demands involve local coherence, based on intrasentential or intersen-
tential information. At the same time, due to less efficient execution of
basic linguistic operations, they may be unable to deal with global
coherence, based on larger text chunks.
Accordingly, the research reported here will be based on a study
which addressed the nature of the knowledge of conjunctions pos-
sessed by adult ESL learners who are enrolled in academic institutions
and are expected to “read in order to learn” from academic textbooks.
The approach taken represents a recognition that we must look into
the interaction between L2 learners’ proficiency and their performance
on a variety of tasks, focusing on knowledge and utilization of conjunc-
tions (Bernhardt, 1986). The hypothesis examined was that adult L2
learners may demonstrate familiarity with the meaning of conjunc-
tions, yet fail to utilize them in extended discourse.
The following research questions were addressed in this study:
1. What is the relationship between levels of proficiency in English
and conjunction comprehension?
2. What is the effect of discourse level on conjunction comprehension?
3. Is there a unique contribution of information about intrasentential,
intersentential, and discourse-level knowledge of conjunctions to
the prediction of expository text comprehension?
METHOD
Subjects
L2 subjects were 100 immigrant or international students who at-

tended one of two Canadian universities and were enrolled in courses

designed to upgrade their English. Six classes for English proficiency
upgrading skills were involved. There were 15–20 students in each
class. All subjects were first-year students at one of these institutions.
Eighty to ninety percent of the students in each class agreed to partici-
pate in the study. There were 52 males and 48 females, the majority
ranging in age from 17 to 25.
Materials
Oral English Proficiency
Using a Foreign Service Institute (FSI)-type instrument (Shohami,

Reves, & Bejerano, 1986), ESL teachers were asked to rate their stu-
dents’ oral proficiency on a 1–7 scale. Teachers were provided with a
detailed technical guide describing the nature of skills associated with
each level on the scale. This rating provided external information on
how proficient these students were in English. A rating of 7 is given to
a student whose oral English proficiency is nativelike, while a rating of
1 is assigned to a student whose oral proficiency in English is minimal.
In the study reported here, the L2 proficiency of students ranged from
2–5. The positive and highly significant correlations between the oral
proficiency ratings and scores on the discourse comprehension mea-
sures (see Table 2) provide an indication of the reliability of the proce-
dures employed in this study to assess L2 oral proficiency.
Comprehension of Logical Relationships
Intrasentential conjunction task. In order to determine basic compre-

hension of conjunctions intrasententially, subjects completed the fill-
in-the-blank task (FBT). The FBT (Geva & Ryan, in press) is a revised
version of a similar task developed by Geva (1983). It consists of 30
multiple-choice items, 10 with because, 10 with although, and 10 with if.
In each sentence, the clause following the conjunction has been omit-
ted, and subjects have to choose the option that best completes the
sentence. The options have been designed in such a way that one is
grammatically appropriate but semantically inappropriate; another
is semantically appropriate but grammatically inappropriate; a third
option would have been correct had there been another conjunction
in the sentence (e.g., because instead of although); the fourth option
is correct both semantically and grammatically. The example below
illustrates this task.
1. We could not see the man, although
a. he could have missed the car ride.
736 TESOL QUARTERLY

b. we will have seen the car there.
c. he was hidden behind a tree.
d. we watched the old house.
Students’ scores can range from 0 to 30.
Intersentential conjunction task. Comprehension of the use of conjunc-

tions intersententially was measured in a forced-choice 30-item sen-
tence continuation task (SCT) (see McClure & Geva, 1983), illustrated
below.
2. It was cold outside, although it was sunny.

a. So it was a good day for skiing.
b. So Johnny’s mother made him wear a sweater.
3. It was sunny outside, but it was cold.
a. So Johnny’s mother made him wear a sweater.
b. So it was a good day for skiing.
The specific task used in this study focuses on the conjunctions but
and although. There are 10 but items, 10 items in which although appears
in an initial position (initial although), and 10 items in which although
appears between the two clauses (medial although). Subjects have to
decide whether Continuation a or b should follow the first 2-clause
sentence. Since but is a coordinating conjunction, the continuation
sentence should follow the second clause semantically. Sentences with
although are more complex to process because although is a subordinat-
ing conjunction. In this case the continuation sentence should follow
the main clause semantically. The main clause however, may be first
or second, depending on whether although appears in an initial or
medial position in the sentence. McClure and Geva (1983) have shown
that L1 fourth-grade children do not choose a continuation on the
basis of semantic coherence for but items, nor do they attend to the
marking of focus by although. Eighth-grade children choose a continua-
tion sentence on the basis of semantic coherence but ignore the marking
of focus. Highly literate and proficient speakers of English use both
roles consistently and intuitively. This research investigates whether
adult L2 learners with different levels of proficiency have extracted
such a rule and whether they are sensitive to the more subtle implica-
tions of conjunctions.
Discourse-level conjunction task. Knowledge of conjunctions at dis-

course level was measured by means of a multiple-choice rationale-

deletion cloze test (Geva, 1983). This test consists of two 1-page, college-
Ievel, expository texts from which all conjunctions have been omitted.
Students have to choose the appropriate conjunctions out of sets of
four suggested alternatives provided for each omitted conjunction at
the end of each text. A partial text sample is provided below.
4. There is a growing research literature on the impact that continuous
versus variable schedules have on employee performance. The problem
is (1) that the results of job simulation studies conducted with
student subjects in laboratory settings have different results than those
studies using actual workers in a field setting . . . .
1. (a) although (b) however (c) in addition (d) thereupon.
Only one of the conjunctions in each set is congruent with the macro-
propositions of the text (van Dijk & Kintsch, 1983). Unlike standard
cloze tests, here subjects need to consider macropropositions and to
determine their choice of conjunctions on the basis of the preceding
and ensuing arguments and ideas. Student scores can range from 0 to
24.
Comprehension of Expositoy Academic Prose
Three 1-page expository texts were used, one dealing with Luther’s
basic philosophy, one dealing with basic principles in pump operation,
and one dealing with a comparison of liquid fuel and solid fuel rockets.
Each of these texts could appear in one of three versions: explicit
(intact), implicit (all conjunctions omitted), and highlighted (all conjunc-
tions printed in bold typeface). The completion of this academic text
comprehension (ATC) task by each subject involved reading one ex-
plicit text, one implicit text, and one highlighted text. Test booklets
consisted of three texts and the accompanying comprehension ques-
tions, but different students received different combinations of the
manipulated texts. Order of text presentation was counterbalanced
as well. The different combinations formed a Latin square design.
Following the rationale for conjunction highlighting offered by Geva
and Ryan (1985), each text was followed by four multiple-choice, high-
level comprehension questions focusing on logical relationships in the
texts. Thus, the ATC scores can range from 0 to 12. The total correct
score of the ATC is treated as an overall measure of comprehension
of academic discourse.
Procedure
Initially, all students were told that this was a study of how L2
university students comprehend college-level texts. Data collection of
738 TESOL QUARTERLY

L2 students extended over three sessions and was completed in 3
weeks. In the first session, students filled out a Student Background
Questionnaire and completed the FBT. During the second session, L2
students completed the SCT and the cloze. The ATC was administered
during the third week. Data collection was done on a group basis by
the author and three graduate assistants during the regular spring
English course sessions.
RESULTS
Dual scaling analyses (Nishisato, 1980) revealed that there were no

systematic relationships between any of the background variables (e.g.,
students’ L1, age group, time in Canada) and student performance on
the linguistic and reading comprehension tasks. Therefore, in subse-
quent analyses, these variables were ignored. Before turning to a dis-
cussion of the research questions, one should note as well that all three
measures of comprehension of logical relationships had high reliability
scores: The Hoyt reliability coefficient of the FBT was .85; the Hoyt
coefficient for the SCT was .85. As for the cloze, the Hoyt coefficient
was .85 for the Child Language text and .82 for the Organizational
Behaviour text. The Hoyt reliability coefficient for the ATC was .63.
The Relationships Between L2 Oral Proficiency and

Conjunction Comprehension
The first research question addressed in this study was whether

proficiency in English as L2 is related to conjunction comprehension
or, stated otherwise, whether it is the case that with an increase in L2
proficiency, learners can more accurately complete tasks which vary in
terms of the skills required for conjunction comprehension. On the
basis of the oral proficiency ratings, students were divided into three
groups: oral proficiency scores in the 2–3 range, oral proficiency scores
of 4, and oral proficiency scores of 5. Table 1 presents the means and
standard deviations associated with the intrasentential (FBT), intersen-
tential (SCT), and discourse-level (cloze) measures of knowledge of
conjunctions as well as the means on academic text comprehension
(ATC task). An examination of Table 1 indicates that the three profi-
ciency groups differ on comprehension of logical relationships at all
discourse levels. The ATC mean for L2 readers with low oral profi-
ciency (ratings of 2–3 on oral proficiency) are lower (40%) than for L2
learners with higher proficiency ratings of 4–5, who answered correctly
an average of 53 % on this task.
Table 2 presents the intercorrelations among the various discourse-

TABLE 1
Descriptive Statistics for Conjunction Tasks at Three Discourse Levels (FBT, SCT, and
Cloze) and Academic Text Comprehension (ATC) by Level of Proficiencya
level measures of conjunction comprehension used in this study. One

can see that the oral proficiency ratings correlate positively and signifi-
cantly with all the scores. That is, the higher one’s oral proficiency
rating, the higher one’s scores tended to be on the other measures.
One notes especially the high correlation, r = .69, p <.001, between
oral proficiency ratings and the cloze. This high correlation suggests
that both the cloze and the oral proficiency ratings tap perhaps a
general L2 discourse proficiency factor. Additionally, as indicated in
Table 2, cloze and language proficiency correlate .49 and .43, respec-
tively, with the ATC total scores. Analyses of variance (ANOVA), with
level of L2 oral proficiency as an independent variable and FBT, SCT,
cloze, and ATC as the dependent variables, revealed that L2 students
with different oral proficiency ratings differed significantly from each
other on the FBT (F [2, 97] = 14.06, p < .001), the cloze (F [2, 97] =
4.05, p < .05), and the ATC task (F [2, 57] = 12.01, p < .001). However,
the main effect of oral proficiency for SCT was not significant (P <. 10).
Although the group means certainly point to a gradual improvement
on the SCT with increase in oral proficiency, the differences between
the groups are not statistically significant, suggesting that perhaps
there was a ceiling effect on the SCT. That is, even though the more
proficient L2 learners had somewhat higher scores on the SCT than
the less proficient L2 learners (see means in Table 1), these differences
were minimal, so that even individuals in the latter group achieved a
mean of 72% on this task. However, as can be seen in Table 1, as L2
740 TESOL QUARTERLY

TABLE 2
Correlations Among Oral Proficiency Ratings,
Text Comprehension (ATC),
and Three Levels of Conjunction Comprehension Tasks
learners become more proficient in their oral language, their perfor-

mance on intrasentential, intersentential, and discourse-level tasks
gradually improves.
Conjunction Comprehension at Three Discourse Levels
The next research question focused on the effect of discourse level

on conjunction comprehension. More specifically, the question was
whether it is the case that one can handle conjunctions intrasententially
and intersententially but still be unable to handle logical relationships
in extended discourse. Furthermore, it was asked whether the ability
to handle conjunctions at a discourse level is a prerequisite to being
able to comprehend logical relationships in authentic texts.
Once again, the correlations in Table 2 provide a preliminary answer
to these questions. We note that although the FBT and SCT correlate
fairly highly with each other (r= .43), the correlation of the intrasenten-
tial task with the discourse-level tasks (i.e., cloze and ATC) is fairly
moderate (r = .26, r = .27, respectively). The correlation of the intersen-
tential task with the cloze and the ATC is higher (r = .30, r = .40,
respectively). At the same time, the correlation between the cloze (a
discourse-level conjunction task) and the ATC is higher still (r = .49).
In other words, we see that a measure of knowledge of conjunctions

at the discourse level more accurately represents how L2 learners
comprehend text than measures at the sentential level.
Predicting Text Comprehension
The last question asked in this study was whether each of the three
conjunction tasks, requiring L2 readers to consider logical relationships
in increasingly more demanding contexts, plays a unique role in pre-
dicting text comprehension. To answer this question, a multiple regres-
sion analysis (stepwise) was carried out. In this analysis, student scores
on the ATC were treated as the dependent variable, and student scores
on the intrasentential task (FBT), the intersentential task (SCT), and
the cloze were entered as predictor variables. As can be seen in Table
3, which summarizes the results of this analysis, the cloze is the most
important predictor, explaining 21% of the variance, and the intersen-
tential task is also significant, explaining an additional 9.5% of the
variance. The intrasentential task, which focuses on basic comprehens-
ion of the meaning of conjunctions and the ability to deal with local
coherence, is not a significant predictor. That is, the FBT provides no
further predictive information after the SCT and cloze are taken into
account.
TABLE 3
Predicting Academic Text Comprehension:
Multiple Regression Analysis (Stepwise) Summary Table
Predictors F DF p R2 (adjusted)
Cloze 12.66 1,42 .000 .213
SCT 10.62 2,41 .001 .309
FBT ns
Note. SCT = sentence completion task; FBT = fill-in-the-blank task.
DISCUSSION
Results of the research reported in this paper indicate clearly that

L2 students with better oral language proficiency in the context of an
academic environment demonstrate a better ability to deal with the
logical implications of conjunctions in academic reading tasks at all
levels of discourse. Furthermore, a discourse-level measure such as
the conjunctions cloze, which focuses on the ability to notice logical
relationships among text segments, and which requires readers to
742 TESOL QUARTERLY

consider what occurred earlier in the text, to coordinate this informa-
tion with subsequent information, and to insert appropriate logical
markers accordingly, is a good predictor of comprehension of authen-
tic academic discourse. It is also a better predictor than other tasks
where student ability to deal with logical relationships is measured with
discrete items, intrasententially or intersententially.
Although this type of rationale-deletion cloze may be considered a
good predictor of comprehension of academic discourse, an intersen-
tential task is still a better predictor than is an intrasentential task. Such
results support the overall theoretical framework which motivated the
research reported here. Basically, it is argued that as the reader is
required to process and integrate a larger number of propositions in
order to determine a semantically appropriate logical framework, task
complexity increases. Therefore, an L2 learner who has difficulty pro-
cessing basic lexical and syntactic information should find it more
difficult to attend to text integration of larger chunks of discourse and
the realization of text structure, even when these chunks involve only
two adjoining sentences. Such a learner may experience even more
difficulty with authentic expository discourse. It is therefore also not
surprising that oral proficiency was found to be so highly correlated
with the comprehension of logical relationships in extended discourse.
Those L2 learners who are more proficient in the L2, in terms of their
lexicon and various aspects of syntactic knowledge, are better able to
process and integrate information at more global levels in reading
tasks.
Across the three L2 proficiency levels, the mean percent scores on
the intersentential task (SCT) were higher than the mean percent
scores on the intrasentential task (FBT). It could be argued that this
finding fails to support the theoretical continuum that underlies the
design of this study because scores on the more demanding SCT are
higher than scores on the FBT. However, the two treks did not involve
an identical array of conjunctions, and the most frequent incorrect
choice made on the FBT was the semantically correct but grammatically
incorrect option. In other words, what may have been difficult for some
of the less proficient L2 learners was the coordination of grammatical
accuracy with coherence at the sentential level. The SCT, on the other
hand, focused on intersentential coherence and did not require the
subjects to make any judgments about grammatical accuracy. Addi-
tional research is needed to tease apart the comprehension of specific
conjunctions, grammatical judgment, and sensitivity to coherence at
different levels of L2 discourse proficiency. Nevertheless, the pattern
of correlations supports the general argument that the relationship
between comprehension and performance on the predictor treks in-
creases as the predictor tasks become cognitively more demanding.

The ability to deal with logical relationships within intrasentential or
intersentential constraints is related positively to the ability to process
logical relationships in extended discourse. However, as measures of
knowledge of conjunctions become more subtle and more demanding,
they are more closely associated with comprehension of logical relation-
ships in authentic expository texts. These results may be conceptualized
as a developmental pyramid which describes the growing ability of L2
readers to use the logical meaning of conjunctions in text as their
L2 proficiency increases. The base of the pyramid consists of basic,
intrasentential comprehension of conjunctions. Less proficient learn-
ers may already have mastered this level. The next level consists of
comprehension of conjunctions intersententially, measured in this
study with a task that requires enough L2 proficiency to be able to
attend to subtle cues for coherence and sensitivity to focus markers.
The next level is that of discourse-level comprehension. Presumably,
as the adult L2 learner gains more proficiency and automaticity in
processing various components of the second language in general, and
in reading in particular, the ability to deal with larger chunks of text
and with the logical meaning of conjunctions connecting such chunks
develops.
At this point in the discussion, it is necessary to mention the issue of
content familiarity. In the research reported here, student ability to
integrate logical relationships within or across adjoining sentences was
measured with items based on familiar content. At the same time, the
content of the cloze task (a discourse-level predictor) and the expository
texts (the dependent variables) dealt with scientific information pre-
sumably unfamiliar to most subjects. This point is important because
it suggests that there was a confounding of content familiarity with
level of discourse at which ability to integrate logical information was
measured. Future research should tease apart the role of domain
familiarity and discourse level in the growing ability of L2 learners to
comprehend, utilize, and infer logical relationships in reading tasks.
The results reported here have indicated that the ability to handle
logical relationships is one of the parameters of the L2 learner’s grow-
ing proficiency in the language. Students whose L2 proficiency enables
them to attend to semantic congruence, to become aware of subtle cues
for signaling main and subordinate clauses, and who can determine the
logical relationships in connected discourse and select the appropriate
linguistic means for expressing these relationships are able to success-
fully integrate and comprehend textual information in academic dis-
course. From a practical perspective, these results suggest that a combi-
nation of a careful evaluation of L2 oral proficiency and performance
on intersentential and cloze conjunction tasks can assist in determining
L2 reading comprehension proficiency, in placing students in appro-
744 TESOL QUARTERLY

priate learning environments, and in adapting L2 instruction to specific
needs. Furthermore, these results suggest that L2 learners need to be
provided with ample opportunities to read connected discourse, to
consider the nature of linguistic markers which signal intersegment
text relations, and to infer those relations that are not explicitly marked
in the text.
ACKNOWLEDGMENT
The research reported here was funded by a Social Sciences and Humanities
Research Council of Canada grant (Contract No. 410–84–0108) to the author.
THE AUTHOR
Esther Geva is Assistant Professor at the Ontario Institute for Studies in Education.
She has published widely on comprehension and use of conjunctions. In recent
years, her research and teaching have focused on reading processes and reading
development in LI and L2 among normally achieving and learning disabled learn-
ers and on assessment and instructional issues in multicultural contexts.
REFERENCES
Alexander, R. (1980). A learning to learn perspective on reading in a foreign

language. System, 8, 113–119.
Bernhardt, E. B. (1986). Reading in the foreign language. In B. H. Wing (Ed.),
Northeast conference on the teaching of foreign languages—Listening, reading, writing:
Analysis and application. Middlebury, VT: Northeast Conference.
Bialystok, E., & Ryan, E. B. (1986). Toward a definition of metalinguistics skills.
Merrill-Palmer Quarterly, 31, 229–264.
Bridge, C. A., & Winograd, P. I. (1982). Reader’s awareness of cohesive relation-
ships during cloze comprehension. Journal of Reading Behavior, 14, 299–312.
Carrell, P. L. (1982). Cohesion is not coherence. TESOL Quarterly, 16 (4), 479–488.
Cohen, A. D., & Fine, J. ( 1978). Reading history in English: Discourse analysis and the
experience of native and non-native readers (Working Papers on Bilingualism No.
16). Ontario Institute for Studies in Education, Modern Language Centre,
Toronto.
Evans, R., & Ballance, C. (1980). A comparison of sentence connective recall by
two populations of readers. Journal of Educational Research, 73, 324–329.
Geva, E. ( 1983). Facilitating reading comprehension through flowcharting. Reading
Research Quarterly, 17, 384–341.
Geva, E. (1986a, June). Knowledge of conjunctions and its role in comprehension. Final
Report submitted to the Social Sciences and Humanities Research Council of
Canada (Contract No. 410–83–1230). Toronto: Ontario Institute for Studies in
Education.
Geva, E. (1986b, June). The role of linguistic knowledge, domain familiarity and working

memory capacity in processing logical relationships by young school children. Paper
presented at the Annual Meeting of the Canadian Modern Language Associa-
tion, Toronto.
Geva, E. ( 1987). Conjunctions: Their role in facilitating reading expository texts by adult
second language learners. Final report submitted to the Social Sciences and Hu-
manities Research Council of Canada (Contract No. 410–84–0108). Toronto:
Ontario Institute for Studies in Education.
Geva, E. (1990, December), The use of conjunctions in L2 text comprehension across
academic disciplines. Paper presented at the National Reading Conference, Miami,
FL.
Geva, E., & Ryan, E. B. (1985). Use of conjunctions in expository texts by skilled
and less skilled readers. Journal of Reading Behavior, 17,331–346.
Geva, E., & Ryan, E. B. (in press). Linguistic and cognitive correlates of academic
skills in first and second language. Language Learning.
Halliday, M. A. K., & Hasan, R. (1977). Cohesion in English. London: Longman.
Irwin, W. J. (1980). The effects of explicitness and clause order on the comprehen-
sion of reversible causal relationships. Reading Research Quarterly, 15, 477–488.
Johnson, P., & Pearson, D. P. (1982). Prior knowledge, connectivity, and the
assessment of reading comprehension (Tech. Rep. No. 245). Urbana-Cham-
paign: University of Illinois, Center for the Study of Reading. (ERIC Document
Reproduction Service No. ED 247–525)
Johnson, J., Fabien, V., & Pascual-Leone, J. (1989). Quantitative hardware stages
that constrain language development. Human Development, 32, 245–271.
Kieras, D. E. (1985). Thematic processes in the comprehension of technical prose.
In B. K. Britton & J. B. Black (Eds.), Understanding expository text (pp. 89–107).
Hillsdale, NJ: Lawrence Erlbaum.
Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and
production. Psychological Review, 85, 363–394.
Lipson, M. Y., & Wixon, K. K. ( 1986). Reading disability research: An interactionist
perspective. Review of Educational Research, 56, 111–136.
Lorch, R. A. F., & Lorch; E. P. (1986). On-line processing of summary and impor-
tance signals in reading. Discourse Processes, 9, 489–496.
Marshall, N., & Glock, M. D. ( 1978–1979). Comprehension of connected discourse:
A study into the relationship between the structure of text and information
recalled. Reading Research Quarterly, 14, 10–56.
McClure, E., & Geva, E. (1983). The development of the cohesive use of adversative
conjunctions in discourse. Discourse Processes, 6, 411–432.
Meyer, B. J. F. (1977). The structure of prose: Effects on learning and memory
and implications for educational practice. In R. C. Anderson, R. J. Spiro, &
W. E. Montague (Eds.), Schooling and the acquisition of knowledge (pp. 179–
200). New York: Wiley.
Meyer, B. J. F., Brandt, D. N., & Bluth, G. J. (1981). Use of author’s textual
schema: Key for ninth graders’ comprehension. Reading Research Quarterly, 15,
72–103.
Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications.
Toronto: University of Toronto Press.
Segalowitz, N. (1986). Skilled reading in a second language. In J. Vaid (Ed.),
Language processing in bilingual: Psycholinguistic and neuropsychological perspectives.
Hillsdale, NJ: Lawrence Erlbaum.
Shohamy, E., Reves, T., & Bejerano, Y. (1986). Introducing a new comprehensive
test of oral proficiency. English Language Teaching Journal, 40, 212–220.
Sim, D., & Bensoussan, M. (1979). Control of contextualized function and content
746 TESOL QUARTERLY

words as it affects EFL reading comprehension test scores. In R. Mackay,
B. Barkman, & R. R. Jordan (Eds.), Reading in a second language: Hypotheses,
organization and practices. Rowley, MA: Newbury House.
Spyriadakis, J. H., & Standal, T. C. (1987). Signals in expository prose: Effects on
reading. Reading Research Quarterly, 22, 285–298.
Swaffar, J. K. (1985). Reading authentic texts in a foreign language: A cognitive
model. The Modern Language Journal, 69, 15–33.
van Dijk, T., & Kintsch, W. (1983). Strategies of discourse comprehension. New York:
Academic Press.
Zinar, S. (1990). Fifth-graders’ recall of propositional content and causal relation-
ships from expository prose.Journal of Reading Behavior, 22, 181–199.

REVIEWS
The TESOL Quarterly welcomes evaluative reviews of publications relevant to
TESOL professionals. In addition to textbooks and reference materials, these
include computer and video software, testing instruments, and other forms of
nonprint materials.
Edited by HEIDI RIGGENBACH

University of Washington
Grammar and Second Language Teaching:

A Book of Readings.
William Rutherford and Michael Sharwood Smith. New York:
Newbury House, 1988. Pp. 260.
Second Language Grammar: Learning and Teaching.

William E. Rutherford. London: Longman, 1987. Pp. 195.
The idea that the teaching of a second language can be divorced

from instruction in the grammar of that language is a relatively recent
one in the history of language teaching. This idea has only been around
for 100 years or so and has been brought sharply into focus only in the
last 2 decades with the advent of what has come to be known as the
communicative approach. This approach, influenced heavily by the
work of Krashen (e. g., 1982), has gained considerable support among
teachers of English as a second language, and the field of TESL/TEFL
as a whole has had to start looking closely at what is meant exactly by
communicative language teaching and by the notion of grammatical
instruction itself.
Opponents of grammatical instruction argue that attention to form
in the second language classroom serves more to hinder than to help
the process of language acquisition because grammatical instruction
teaches students about the language rather than giving them the oppor-
tunity to use it. This is an intuitively appealing claim as many ESL
instructors have had the experience of teaching a point of grammar in
the classroom only to have students misuse the same point in spontane-
ous speech. Krashen described grammatical instruction as serving only
to feed a mental editor (the monitor), which has no access to the
fundamental source of spontaneous utterances. The latter, Krashen
claimed, could only be fostered by comprehensible input, which would
749
allow students to make their own hypotheses about the forms of the
language, just as they did in acquiring their first languages.
The most serious challenge to the grammarless communicative ap-
proach is the suggestion that complete inattention to the grammar of
the second language may lead to the development of a pidginized or
fossilized interlanguage, what Higgs and Clifford (1982) describe as
“terminal 2/2 +,” a score certain applicants receive on the 5-point For-
eign Service Institute (FSI) scale of language mastery which is charac-
terized by high vocabulary and low grammatical competence. Such
learners find it very difficult if not impossible to advance beyond this
stage and are, hence, terminal. In short, the role of grammar in the
ESOL classroom is once again a point of contention.
With full knowledge of this state of affairs, William Rutherford and
Michael Sharwood Smith have assembled a book of readings entitled
Grammar and Second Language Teaching (Grammar/Teaching) that is then
refined and clarified in Rutherford’s own view of this material in Second
Language Grammar: Learning and Teaching. The intention of the editors
of Grammar/Teaching is to “explore the dimensions of pedagogical
grammar (PG), or the means by which acquisition of second or foreign
language grammar may be expressly facilitated” and ultimately to
“work toward articulation of a coherent theory of PG” (p. 1). These
goals are realized through 15 published articles by well-known authori-
ties in applied linguistics, which are grouped into three major seg-
ments: theoretical considerations, definitions of pedagogical grammar,
and realizations of pedagogical grammar in the classroom. This volume
attempts to make explicit certain unanalyzed presumptions that lie
behind the “grammar-is-futile” point of view. Furthermore, it sets
the stage for the empirical analysis of Sharwood Smith’s pedagogical
grammar hypothesis, which claims that PG will accelerate second lan-
guage acquisition (SLA) to a greater extent than is possible in “natural”
(i.e., untutored) settings.
In Second Language Grammar, Rutherford has essentially revamped
his own treatise on grammar and consciousness-raising in Grammar/
Teaching, even though the publication date of the latter precedes the
former by a year. Unlike Grammar/Teaching, which requires consider-
able mental agility to unearth the information relevant to constructing
an overall philosophy of the teaching of pedagogical grammar, Second
Language Grammar unfolds like a story, gradually and intelligently
constructing an approach to pedagogical grammar that is only hinted
at in Grammar/Teaching. Rutherford begins by placing the nature of
second language learning (SLL) in its proper context. If SLL were
nothing but the accumulation of entities (i.e., grammatical constructs),
certain learner behaviors would occur (e.g., target structures would
750 TESOL QUARTERLY

appear “full-blown” one after the other) which have never been ob-
served. Rather, learners bring an idea of what the target language may
look like and the ability to make good guesses about what they don’t
know from their L1 experience to communicate in a rudimentary way
in the target language by making a direct relationship between meaning
and form within the bounds of universal grammar. The role of gram-
mar, or grammatical consciousness-raising, is thus not to teach the
entities but rather to facilitate language learning. This is done by
providing data (including negative data) through which learners can
test hypotheses about the language.
Rutherford uses the metaphor of language-as-machine (i.e., the sum
of its parts) to characterize the “accumulation of entities” view of lan-
guage learning, and language-as-organism (i.e., the behavior of the
whole determines the behavior of the parts) to characterize his view
of grammatical consciousness-raising. The former is concerned with
product, the latter with process. Although grammaticization is lan-
guage specific, it is always subject to the universal discourse pressure
to maintain referential links across sentence boundaries by providing
given information prior to new. This sets up a tension between dis-
course requirements and the desire to keep meaning and form simple
and direct. The semantic price that must be paid to resolve this tension
is the disruption of semantic relations, and the best way to help the
learner recognize and respond to this situation is to “maintain a systems
view of language phenomena and its relation to consciousness-raising”
(p. 93). These systems include syntax, discourse, and semantics, and
their various interactions have several consequences (e.g., thematiza-
tion, cohesion), which are clearly described with specific examples.
Pedagogically, this requires making decisions as to what aspects of the
grammatical system will allow the learner to make the most powerful
generalizations, chosing language content to ensure the “timely ap-
pearance” of those aspects and maximizing the probabilities for learner
receptivity (although how this notion would be applied to more than
the handful of aspects suggested is far from clear).
Rutherford has presented such a masterful argument for grammati-
cal consciousness-raising in Second Language Grammar that it is difficult
to recommend Grammar/Teaching in its wake, at least to classroom
ESOL teachers. Most of the probing questions that were presented as
exercises in the latter book are effectively answered in Second Language
Grammar, which is made immensely more readable by the judicious use
of endnotes. The one area that neither volume really concentrates on
is grammatical consciousness-raising at the beginning levels, though
there are some hints and suggestions. Rutherford often states that
simpler examples of a particular exercise can be devised and suggests
REVIEWS 751
.
that “an adequate stock of useful vocabulary but little grammatical
competence will initially serve some learner needs better than gram-
matical competence coupled with an impoverished lexicon” (p. 169).
Despite the disarming simplicity of Rutherford’s prose, the bulk of
the material in Second Language Grammar requires a breadth of
knowledge of linguistics that will be overwhelming to most readers.
Many will no doubt hope that Rutherford will write a textbook incorpo-
rating all of these ideas (he has to some extent in his 1977 text Modern
English), but this would directly contradict his notion that language-as-
organism requires learners to become managers of their own learning.
Since each learner is different, no set curriculum can be effective.
Facilitation at any level ultimately requires more work of the teacher,
at least until the teacher is comfortable with this new role. What Second
Language Grammar does is to provide a framework which may lead
to a unified notion of what constitutes language learning, language
teaching, and language acquisition.
REFERENCES
Higgs, T. V,. & Clifford, R. (1982). The push toward communication. In
T. V. Higgs (Ed.), Curriculum, competence, and the foreign language teacher. Lin-
colnwood, IL: National Textbook Company.
Krashen, S. (1982). Principles and practice in second language acquisition. Oxford:
Pergamon Press.
Rutherford, W. (1977). Modern English. New York: Harcourt Brace Jovanovich.
PETER MASTER
California State University, Fresno
752 TESOL QUARTERLY

BOOK NOTICES
The TESOL Quarterly welcomes short evaluative reviews of print and nonprint
publications relevant to TESOL professionals. Book notices may not exceed 500
words and must contain some discussion of the significance of the work in the
context of current theory and practice in TESOL.
Grammar Textbooks
Grammar classes and textbooks, long neglected or relegated to the
periphery as necessary evils, have undergone changes in recent years
in order to become part of the communicative approach now used in
many ESL classes. Many teachers have come to realize that grammar
needs to be formally taught if students are not to fossilize at a beginning
interlanguage stage. As Master has written in his review of two peda-
gogical grammar theory books, which precedes this collection, it is
dubious whether the field of TESOL can separate the teaching of
grammar from other skills and functions and still produce successful
language learners. Publishers have recently produced a number of
grammar books which can be incorporated into integrated language
classes and which claim to be communicative. The teacher’s task of
selecting a textbook becomes quite difficult, though, when nearly every
grammar textbook published today bears a subtitle or banner including
the words communicating or contextualized. A close examination often
shows that the “communicative approach” goes no further than the
title. Sometimes, the purported communicative approach of a gram-
mar text includes nothing more than asking students to work in pairs
to practice grammar drills. Instructors should inspect textbooks, new
and old alike, for the underlying assumptions which drive the approach
and content being used.
Grammar textbooks have been flooding the TESOL market in recent
years. The textbooks reviewed below were chosen because they claim
communicative, integrated, or contextualized approaches. Not in-
cluded are some of the most widely known older texts. Most of the
texts have been published within the last 2 years, although several date
to the mid- 1980s. It is hoped that the following brief reviews will assist
instructors in making wise selections.
STEVEN L. SHAW, Guest Editor
753
English Alive: Grammar, Function, and Setting (2nd ed.).
Gail Fingado and Mary Reinhold Jerome. New York: Newbury
House, 1991. Pp. xxi + 369.
Fingado and Jerome have updated and improved their popular gram-
mar textbook, making it more user friendly. The new edition has a cleaner
layout, updated topics, two new review chapters, and two new chapters on
the present unreal conditional and the passive voice.
As the authors state in the introduction, this textbook combines the
“three major elements needed to communicate in a language” (p. viii).
Those elements, as suggested by the title, are grammar and its various
functions in numerous settings. English Alive successfully incorporates
these elements into meaningful activities which encourage students to use
grammar and language in context.
The text is intended for beginning-level adult students who have had
minimal exposure to English grammar. The first 25 chapters, including
three review units, focus on specific grammar points. Most of the tenses
(continuous, future, past, and perfective) are covered in addition to mod-
als, infinitives, gerunds, comparative/superlative forms, conditionals, and
some passives. This is enough material for a semester or year-long course,
depending on the level of the students.
Each chapter uses an interesting content focus to introduce one or
two related grammatical structures along with numerous activities. These
topics range from crime and culture to manners and marriage. Most
chapters begin with a dialogue (tapes of these dialogues are available
from the publisher) followed by comprehension questions. This provides
students with a contextual framework in which to place the grammatical
structure being studied. Throughout the rest of the chapter, each gram-
matical point is carefully explained and numerous activities provided for
practice. Most of these activities are controlled fill-in-the blank exercises
supplemented by a dialogue, role play, or short reading passage. Each
chapter concludes with synthesizing activities which range in type from
interviews and writing assignments to questions for open discussion.
The remaining 10 chapters are devoted to function and what Fingado
and Jerome call setting. Functions covered include invitations, giving di-
rections, making suggestions, and requests and favors. The last 6 chapters
integrate the previously studied grammar points into settings, which are
usually labeled situational frameworks. These include traveling, eating in
restaurants, visiting a friend, going to the doctor, using the telephone, and
shopping for clothes.
Fingado and Jerome have taken on an ambitious task and provided
teachers with a unique beginning-level grammar textbook. Instructors
looking for an integrated-approach text should be pleased with this new
edition. Teachers will appreciate the new layout and clear presentations;
754 TESOL QUARTERLY

students will enjoy learning contextualized grammar through the enter-
taining artwork and activities.
STEVEN L. SHAW
The English Connection: A Content-Based Grammar and

Discussion Text (2nd ed.). Gail Fingado, Leslie J. Freeman, Mary
Reinbold Jerome, and Catherine Vaden Summers. New York:
Newbury House, 1991. Pp. xxi + 490.
The English Connection is intended for the intermediate ESOL student.

This is truly one of the better “whole language” grammar textbooks on
the market. While it focuses primarily on grammar, the authors have
successfully incorporated all language skills into a content-based format.
The English Connection’s second edition improves the format, layout, and
activities of the first. The authors assume that students have a familiarity
with English grammar but need practice to achieve mastery and fluency
of use. Therefore, a greater amount of space is devoted to actual use of
the grammar points rather than to detailed explanations of those points.
Of the 27 chapters, 26 present specific grammar points with a thematic
focus, the last chapter provides review. The relevant structures are pre-
sented in an opening dialogue (also available on tape) which helps students
focus on the structure being taught. Reading the dialogue aloud (or using
the tape) allows for listening comprehension practice. Each of the chapters
then moves through a series of activities which range from guided fill-in-
the-blank exercises to less guided sentence and dialogue completions.
The grammar points are clearly and concisely explained, and numerous
examples are presented with the structures in context.
The English Connection, like the lower-level English Alive, covers the basic
tenses. Additional attention is given to medals, continuous tenses, per-
fectives, passives, conditionals, and noun and adjective clauses. The second
edition has clearer explanations of these structures in a clearer layout than
the first. Students will find this textbook to be a good reference book as
well as an excellent grammar text. The numerous exercises also provide
enough material for both in-class activities and homework.
Students most enjoy the integration activities at the end of each chapter.
These open-ended activities may ask students to write advice-column let-
ters, chair an imaginary meeting of the Nuclear Regulatory Commission,
interview classmates, or they may prompt provocative discussions of cur-
rent issues. Finding meaningful activities focusing on specific grammar
points should now be an easier task for busy ESL instructors; the authors
have provided numerous exciting and meaningful activities which students
will enjoy.
STEVEN L. SHAW
BOOK NOTICES 755

Building English Structures: A Communicative Course in
English. Chuck Seibel and Russ Hodge. Engelwood Cliffs, NJ:
Prentice Hall, 1991. Pp. vi + 442.
Building English Structures consists of six units (32 chapters) covering

intermediate to advanced-level grammar lessons for ESOL students. Top-
ics ranging from tag questions to pronouns to noun clauses are covered
on a chapter-by-chapter basis.
The authors suggest using the book as a year-long or 9-month course,
in which case one chapter each week would set a moderate pace. The
authors also suggest ways in which to adapt their book to shorter time
periods, such as treating some chapters as review and then doing thorough
work in the second half of the book.
This text is presented in a lively, positive style; a wide variety of oral
and written activities use humor and group learning formats to reinforce
the skills taught. Games, amusing dialogues, and cartoon situations present
structures on which communicative practice is later based. Chapters begin
with a brief review, followed by an equally brief preview. Next, the student
is presented with a full-page Rule Builder, in which rules of the structure
being taught must be formulated using the examples as clues. This induc-
tive approach should prove effective, because students are actively in-
volved in the process of defining the grammatical rules they are learning.
Each chapter offers a plethora of (predominantly oral) grammar activi-
ties for the classroom, involving either large- or small-group interaction.
There are also workbook activities and story starters for written work.
Suggestions for using community resources are included in every chapter,
and exercises often involve the use of the telephone book, newspaper, TV,
or radio in order to encourage real-life application of the skills learned.
Each chapter includes a short segment of an ongoing detective story,
piquing student interest while demonstrating application of the lesson.
These original story segments feature Detective Stern in his search for the
missing husband of client Deanne Miller. The use of this device adds
excitement and interest to the book, keeping the learner motivated.
The authors have obviously spent a good deal of effort developing
meaningful contexts in which to present their grammar lessons. They have
succeeded in creating a text whose strength lies in its clear explanations
and its varied and interactive activities.
SHELLEY GIBSON
121 Common Mistakes of Japanese Students of English. James

H. M. Webb. Tokyo: Japan Times. 1988. Pp. x + 122.
This book is designed to help young adult Japanese EFL students to

recognize and eliminate 121 of their most frequent grammatical and se-
mantic mistakes. The book is intended for university-level students; how-
756 TESOL QUARTERLY

ever, it could also be useful to younger students because many of the
mistakes are attributed to misunderstandings which occur in beginning-
level grammar classes. 121 Common Mistakes might also be useful to ESL
instructors in U.S. classrooms: It is bilingual, focused on U.S. English, and
provides a substantial inventory of mistakes peculiar to Japanese students.
The author divides the book into seven chapters, each categorizing a
particular type of problem encountered by Japanese students. Chapter
topics include problems with plurals, articles, verbs, adjectives and adverbs,
nouns, prepositions and conjunctions, and miscellaneous problems. Each
is divided into several sections showing examples of incorrect sentences
for the specific structure being presented. Corrected sentences, with trans-
lations and grammatical explanations in both Japanese and English are
then juxtaposed with the incorrect examples. At the end of each chapter
are exercises in which students correct sentences which contain errors.
121 Common Mistakes identifies and clearly explains areas of English
grammar which challenge Japanese students. However, an ESL/EFL
teacher must be concerned with the format of the book. First, its focus on
errors and error correction makes it a book that may raise an affective
barrier for some learners. This heavy focus on error correction may facili-
tate excessive self-monitoring and lead to ineffective communication be-
haviors for some students. The book might be more useful and effective
as a tool for editing written English, rather than improving spoken English.
Second, the grammatical structures might be more effectively presented
in contextualized dialogues, or grouped together by communicative func-
tion, or even by frequency of occurrence in certain communicative con-
texts. This type of organization would perhaps make the book seem less
overwhelming and the task of producing mistake-free English less daunt-
ing. Third, students might be better equipped to eliminate mistakes if the
text had made clear the degree of importance of each mistake.
Perhaps 121 Common Mistakes would be most useful as a reference for
ESL and EFL teachers who teach grammar, writing, or conversation and
who can incorporate a few of these 121 corrected mistakes into other
learning materials. Additionally, upper level Japanese students of English
who already have communicative competence but want to perfect their
English may also find the book useful.
RON POST
Grammar With a Purpose: A Contextualized Approach. Myrna

Knepler. New York: Maxwell Macmillan. 1990. Pp. xvi + 447.
This textbook is aimed at advanced ESL students in academic English

programs. Students preparing for entry into universities or students cur-
rently matriculated but required to take ESL courses will find this text to
be a useful reference book. Instructors teaching these students will find
BOOK NOTICES 757

Knepler’s text a useful way to integrate grammar instruction into reading
and writing courses.
Most impressive is the liveliness of the exercises. Context is carefully
provided in most cases. Where present, even fill-in-the-blank exercises are
often meticulously constructed in story format. The variety of exercises is
also impressive. Exercise formats are recycled intermittently wherever
they are most useful. This is a well-crafted text which could maintain the
interest of advanced students.
The organization of the material is primarily tense based. The first of
the book’s 10 chapters begins with simple present usage, and the final
chapter deals with conditionals. Within each chapter, content is organized
in relation to the tense in question. For example, chapter 7 begins with
past progressive usage, and continues with structures that might often be
confused with that tense (e. g., the modal used to and present participles
used not as progressives but as adjectives). Such related grammar points
are often elegantly intertwined in the concluding exercises for each chap-
ter. However, the elegant organization may have resulted in a few gaps in
coverage. Relative clauses, for instance, receive less than two pages of
coverage in this rather long text.
The author takes great care to prevent grammar from becoming a
tiresome issue. Modals and other constructions are spread lightly and
evenly throughout the book. Terminology is refreshingly unpretentious
(one regular section is called Little Words and treats problems with deter-
miners, articles, pronouns, and prepositions). In addition to many essay-
writing exercises, speaking and listening practice is plentiful.
Although the material is engaging, the 447-page length of the book may
seem a drawback to some teachers. However, as the author points out, the
chapters need not be tackled in order but can be selectively presented
according to the needs of a specific class. There is no assumption of
progressive difficulty-the explanations assume a consistent, rather ad-
vanced, reading level throughout.
Because of the book’s modular nature and because it is geared for higher
levels, it could be used well as a trouble-shooting text for international stu-
dents seeking to adjust their writing and speaking to the standards of aca-
demic English. Since there is a large amount of writing practice included in
the book, it would also make a fine text for a combination grammar/writing
course not requiring a great deal of emphasis on rhetorical patterns.
DOUGLAS COLLINS
English Structure in Focus, (Book 1, 2nd ed.). Polly Davis.

New York: Newbury House, 1987. Pp. ix + 379.
English Structure in Focus, (Book 2, 2nd ed.). Polly Davis.
New York: Newbury House, 1989. Pp. xi + 523.
Book 1 of the English Structure in Focus Series is directed toward the
intermediate-level ESL student. Using the same format with more exten-
758 TESOL QUARTERLY

sive coverage of the structures, Book 2 is for advanced-level students. Each
lesson considers different cultural aspects of life in the United States and
will likely be of interest to students of high school age and older. A lesson
generally is limited to the presentation of two grammatical structures
followed by practice exercises, transfer exercises, and discussion and com-
position topics.
Each lesson begins with a diagram of the featured grammar points.
Although these diagrams present concise, logical summaries, some stu-
dents may have difficulty decoding them. The explanations that follow
are helpful, though they could be strengthened by additional examples.
A variety of short, directed practice exercises of progressive difficulty
follow the structure presentation.
The Transfer Exercises, though still directed, encourage students to
use the structures in new contexts other than the chapter’s theme. For
instance, to practice used to + the base form along with negative statements
in Lesson 14 of Book 1 (“A Change in Lifestyle”), students are asked how
their lives and countries have changed. These exercises are well suited to
pair work. They are relatively brief, however, and would need to be
repeated if done with the entire class, in order for each student to have a
chance to respond.
The Discussion Topics elicit responses that personalize the cultural
information in the lesson. For example, in the chapter “A Change in
Lifestyle, ” students are asked if they have ever camped out and, if so, to
describe what it was like. The Discussion Topics stimulate oral communica-
tion as students share their own experiences and are led to make compari-
sons between their own cultures and values, and those of the United States.
As the Introduction to the text warns, some of the topics contain structures
which have not been presented, but since these require only a receptive
understanding, they should not pose a problem.
The last section of each lesson contains composition topics that expand
on the cultural theme and allow students to practice the grammar points
covered. Since the text does not cover writing techniques, the teacher may
need to provide guidelines.
The exercises in these texts are varied and flexible, and allow the teacher
to play a secondary role. For those teaching grammar with a focus on the
communicative function of language, English Structure in Focus is a text to
seriously consider.
JEAN JORGENSEN
Grammar Work: English Exercises in Context (Vol. 4). Pamela

Breyer. Englewood Cliffs, NJ: Regents/Prentice Hall, 1984. Pp. 114.
This last volume in the Grammar Work Series is primarily a book of

grammar exercises aimed at intermediate-level ESL students. The book
consists of 17 chapters of similar formats. Each chapter, divided into
BOOK NOTICES 759

several lessons, focuses on one traditional grammatical category, such as
nouns, pronouns, and determiners, as well as categories by tense, aspect,
and voice. Like the other books in this series, each lesson comprises four
small sections. First, there are one or two sentences or phrases using the
target structure, along with the name of the structure, presented at the
beginning of the lesson. This is accompanied by a short explanation of
the target structure. Chapter and section numbers of a corresponding
reference book, Grammar Guide, are listed in case students wish to seek
more information about the target structure. The second section provides
examples of the target structure in a contextual setting. The main part of
the lesson is the third section, in which students practice the target struc-
ture. Activities such as fill-in-the-blank exercises or sentence completions
are often used. The author has carefully selected these functional exercises
to fit the context of students’ everyday lives. The fourth section is designed
for students to put the target structure into real contexts. By looking
at photographs and illustrations and answering questions about them,
students can express their own opinions using the target structure.
There are several benefits of using this book in the classroom. First, it
does not take a long time for students to complete a lesson, provided they
have enough vocabulary, because exercises are short and very specific;
they deal with only the target structure. Second, exercises are structured
so that students can progress from structured exercises to more open-
ended exercises as they master the target structure. Third, it is easy to
keep student interest high since exercises focus on real-life situations.
Fourth, since exercises are designed for students to express their opinions,
there are no right and wrong answers, which makes students feel less
threatened and thus promotes more truly communicative discussions in
the classroom.
Grammar Work would be an excellent book for either introducing new
grammar or for reviewing. Each lesson is designed to promote active
discussion in the classroom and also to help students stay focused and
master the target structure. Breyer’s text should help turn often boring
and teacher-centered grammar classes into more active and lively student-
centered lessons.
MISAKI SHIMADA
Miyazaki International College
Interactions I: A Communicative Grammar (2nd ed.). Elaine

Kirn and Darcy Jack. New York: McGraw-Hill, 1990. Pp. xvii +
295.
Interactions II: A Communicative Grammar (2nd ed.).
Patricia K. Werner, Mary Mitchell Church, and Lida R. Baker. New
York: McGraw-Hill, 1990. Pp. xii + 363.
Interactions I and Interactions II are grammar texts designed for high-
beginning through intermediate ESL students.
The books are organized similarly: Each has 12 chapters, each of which
760 TESOL QUARTERLY

uses a different theme as a vehicle for the introduction of new structures.
For example, in Interactions I, chapters 1 through 3 are on “School Life,”
“Nature,” and “Living to Eat or Eating to Live?” The structures introduced
in these chapters include be, simple present tense, possessives, present con-
tinuous tense, modals, and comparison. Structures are contextualized by
coordinating their introduction both according to a relevant theme and to a
sequencing which builds from simpler to more complex forms and patterns.
Each chapter is divided into four parts. In Interactions I, the first three
parts are organized in a traditional presentation/explanation/practice for-
mat. Each part begins with an illustration and a contextualized presentation
of the structure, usually in the form of a text with questions. Then the struc-
ture is formally explained in box format. This is followed by practice in
controlled exercises and communicative activities. The fourth part of each
chapter consists of a concise summary of the structures introduced in the
chapter, with more exercises and activities, and a selection of Useful Expres-
sions which introduce language functions nominally related to the theme of
the chapter. Every fourth chapter is a review chapter, summarizing and
contrasting the structures introduced in the previous three chapters.
In Interactions 11, the organization is the same, except that Part 4 in each
chapter is similar to the other sections and not a summary. Four of the
chapters (1, 2, 3, and 8) also include a For Your Reference section, with
useful lists of irregular verbs, spelling rules, medals, comparative and
superlative forms, and proper nouns that take the. There are no periodic
review chapters.
Interaction I and Interactions II are highly structured, well-organized
texts which should allow for considerable flexibility according to the needs,
constraints, and interests of individual students and teachers. The books
are increasingly academic in tone from Interactions I to II, which makes the
second book more cramped in its layout due to longer source texts which
are necessarily more tightly spaced. The formal grammar explanations
are simple and clear, but information on pragmatics, why and when to use
certain structures, is perhaps not furnished consistently enough. The
individual chapter sections seem weighted more toward the controlled
exercises than toward the communicative activities, which makes the subti-
tle, A Communicative Grammar, problematic. However, the content itself is
excellent, and can certainly be supplemented with additional communica-
tive activities. The many illustrations are wonderful. Perhaps the best thing
about these books is their high level of organization, especially the self-
contained nature of each chapter, which allows teachers to easily pick and
choose what they want to use.
KIRK L. VANSCOYOC
Grammar Dialogues: An Interactive Approach. Allan Kent Dart.

New Jersey: Regents/Prentice Hall, 1992. Pp. xiv + 290.
This recent grammar textbook is geared toward the high-intermediate
to advanced ESL student in a college-prep or university-level class.
BOOK NOTICES 761

Based on typical grammatical structures, Grammar Dialogues’ 10 chapters
are further divided into numerous subsections of explanations, dia-
logues, and exercises. Each chapter opens with an exercise that can be
treated as a preview or pretest, followed by alternating explanations
and exercises. The various sections are arranged so that previous
material is recycled.
In addition to the usual lessons on the parts of speech, there are
chapters on articles, prepositions, and modal auxiliaries and related
idioms. Dart provides brief explanations and practice exercises in order
to help students gain crucial practice in these areas, which are often
relegated to appendices in other texts if covered at all. The modals and
related idioms should be of special interest since these are often
problematic areas.
The last five chapters deal with more complex grammatical points such
as coordinate and subordinate conjunctions, compound and complex
sentences, adjective and participial clauses, causative verbs and noun
clauses, and indirect speech. This is definitely not a textbook for
beginners.
Grammar Dialogues looks like a traditional grammar exercise book with
lots of fill-in-the-blank exercises. The explanations and examples are
brief and students needing or expecting thorough explanations will not
find them here. Students may also find the use of technical terms
daunting. Unless students already have a command of the English
language and grammar system, teachers should expect to do a lot of
explaining.
Similarly, some students may find the exercises difficult to complete
without teacher assistance. Based on a brief overview and example of the
point begin covered, students are expected to be able to complete the fill-
in-the-blank exercises. Advanced-level and motivated students, though,
may find the exercises challenging and fun to complete or discuss in small
groups as the author suggests.
The communicative aspect of the book is somewhat questionable. Unless
the instructor encourages students to deviate from the printed dialogues,
the purported communicative nature of this text will remain artificial. Real
and meaningful communication will not be achieved simply by having
students complete and practice the numerous fill-in-the-blank dialogues.
However, a creative teacher using the author’s teaching suggestions could
certainly incorporate these materials into a communicative classroom.
Because of the jargon, very brief explanations, and advanced grammar
structures, Grammar Dialogues would probably be most appropriate for
an advanced-level grammar review class. Instructors expecting to find a
“stand-alone” text should be warned that this may not be an appropriate
textbook for the typical grammar class. However, teachers looking for a
supplemental or review grammar textbook with lots of exercises might
find Grammar Dialogues quite useful.
STEVEN L. SHAW
762 TESOL QUARTERLY

Grammar Troublespots: An Editing Guide (2nd ed.). Ann
Raimes. New York: St. Martin’s Press, 1992. Pp. v + 170.
In this slim volume, Raimes focuses on 21 kinds of problems that students
face in writing, both grammatical (sentence structure, tenses, punctuation)
and those related to academic composition (citing, paraphrasing). The
guide is designed for reviewing, not introducing, grammar points and
thus is better suited to academic rather than intensive English programs.
In her preface, Raimes addresses the instructor but she makes it clear that
this guide would also serve the student well as a handbook to be used in
self-editing. (The guide is the bare bones of the recently released second
edition of Exploring Through Writing: A Process Approach to ESL Composition
[Raimes, 1992], which would probably serve as a course text more effec-
tively than would this text.)
Each unit, while brief, is complete, useful, and carefully laid out to be
accessible to students working on their own. Each section begins with
explanatory notes related to the points being studied; key information for
the section is summarized in chart format. The clean graphics and succinct
presentation of the salient points of the unit make each chart handy for
quick review. Exercises follow.
The feature that makes this most useful for students as a handbook is
a flowchart at the end of each unit which gives specific editing advice. The
flowcharts comprise a series of questions that aid students in assessing
their own work. In answering yes, students are judging their work to be
well written. When they answer no, Raimes gives advice for (a) where
and how to make a particular kind of correction and (b) where to find
information in the text to review that point.
An Answer Key is provided in the Appendix, as are the forms of nearly
200 irregular verbs, making the text a complete resource for students
grappling with writing problems.
Because of its brevity and clarity, as well as the appropriateness of
the troublespots covered, Grammar Troublespots should be a part of every
student’s own reference library. For teachers, it would serve as a good
foundation upon which to build a course or as a supplement to a basal
text.
REFERENCE
Raimes, A. Exploring through writing: A process approach to ESL composition (2nd ed.).
New York: St. Martin’s Press, 1992.
WENDY ASPLIN
How English Works: A Grammar Handbook with Readings.

Ann Raimes. New York: St. Martin’s Press, 1990. Pp. xxiii + 389.
HOW English Works focuses on one aspect of grammar and applies it to
writing problems faced by intermediate and advanced students. The fea-
BOOK NOTICES 763

ture that sets this apart from most other grammar textbooks and reference
books is that authentic readings written by well-known authors are used
throughout the book; short excerpts from these readings are used for
analysis in relation to the specific grammar focus in that unit.
In Part 1, students are asked to analyze reading excerpts with the
help of tasks that guide the students in understanding the use of specific
elements of grammar in context. The grammar structure is explained and,
in a format similar to that of Grammar Troublespots, key information is
boxed into a chart for quick review. Each unit, focusing on a general topic,
consists of several subsections. For example, in the unit covering active
and passive voice; the subsections include forms of the passive; the use of
the passive; get; have passives; passives with direct and indirect objects;
been and being; and participial adjectives. Each of these points is covered
as a discrete topic, with exercises following each point of explanation. This
helps students to focus on one specific point at a time rather than having
them try to remember a half dozen points of explanation before applying
them. Those exercises which are more challenging are specially marked.
The types of exercise vary greatly including cloze exercises for practicing
verb tenses and articles, exercises for correcting errors in sentences, tasks
for creating sentences based on prompts that focus on particular struc-
tures, and discussion prompts appropriate for pair work and grammar
analysis of sentences.
Each unit in Part 1 also includes writing samples (written by Raimes’s
own students) that provide editing practice for students. Clear, specific,
and interesting writing prompts are provided with each unit as well. Edit-
ing advice summarizing key points of the unit ends each chapter in the
form of series of questions or reminders to help students to look critically
at their own work.
Part 2 includes the original readings from which the various excerpts
throughout the text have been taken, presented glossed and in their en-
tirety. Each reading is followed by two writing prompts relating to the essay.
In the 69-page Instructor’s Manual, Raimes provides advice not only on
using the text but also on more global concerns about using grammar in
teaching writing.
For students who have mastered a certain level of English and are trying
to use it as a native speaker might, this text is invaluable in helping them
to see ways that language is used in authentic texts. Further, as the title
suggests, seeing “how English works” is the key to helping students under-
stand and edit their own work. This thorough and well-designed textbook
is an effective tool to help students do just that.
WENDY ASPLIN
Grammar Plus: A Basic Skills Course. Judy DeFilippo and
Daphne Mackey. Reading, MA: Addison Wesley. 1937. Pp. v + 298.
In order for a student to be successful in all language skill areas, a good
foundation in grammar is critical. The corollary is that learning grammar
764 TESOL QUARTERLY

rules without applying that knowledge in meaningful contexts might be
an interesting academic exercise but useless in developing language profi-
ciency. This premise is apparent in Grammar Plus; the focus of this high
beginner/low intermediate textbook is grammar, but its scope includes
related practice in listening, reading, writing, and speaking. Although the
authors suggest that the text is aimed toward university-bound students,
teachers working with immigrants have reported success in using the text,
largely because of the universal themes that are used in the follow-up
activities.
Each of the 18 units focuses on a particular grammar point; basic
structures are clearly presented and then followed by an oral practice
exercise. The grammar points are expanded throughout the unit, with
more complex or challenging exercises introduced later in the chapter.
Each unit concludes with a practice listening and grammar quiz. A well-
developed student workbook is available to supplement the text.
The course has several strengths that distinguish it from other basic
grammar texts. The exercises in both the text and workbook are extensive
and varied. Rather than a collection of repetitive cloze exercises, the variety
of exercises constantly gives students new ways to use their grammar
skills. For the teacher who doesn’t have time to create materials or design
activities to augment a more standard bare-bones text, Grammar Plus is
ideal.
Another plus is the text’s visual quality. Both the text and workbook are
well designed, including simple yet attractive line drawings, appropriate
and clear photographs, uncluttered charts and graphs, and generous
spaces in boxes and blanks for written responses.
The helpful, well-written teacher’s guide begins each unit with a brief
overview, including a pedagogical basis, for each unit. It is followed by
teaching tips, including typical problems that students may encounter,
ways to abbreviate the chapter, and ideas for approaching the text. Concise
but complete suggestions about how to handle the material effectively on
an exercise-by-exercise basis are given. (Listening practice scripts and
answer keys are included.)
In sum, the workbook and textbook complement each other beautifully.
The teacher’s guide lends critical support to these two texts. Together,
this package makes the course user friendly, whether for students or for
teachers.
WENDY ASPLIN
BOOK NOTICES 765

BRIEF REPORTS AND SUMMARIES
The TESOL Quarterly invites readers to submit short reports and updates on
their work. These summaries may address any areas of interest to Quarterly
readers. Authors’ addresses are printed with these reports to enable interested
readers to contact the authors for more details.
Edited by GAIL WEINSTEIN-SHR

San Francisco State University
The Effects of Syntactic Simplification and

Repetition on Listening Comprehension
RAOUL CERVANTES
University of Illinois at Urbana-Champaign
GLENN GAINER
The Institute of Language and Business Communication
In recent years, considerable attention has been given to the role of

input in second language acquisition. Krashen (1982) has argued that
comprehensible input is a necessary factor for successful language acquisi-
tion. He has also stated that linguistic simplifications, including syntactic
simplification, “clearly help make input language more comprehensible”
(p. 65). Long (1985), however, has argued that conversational adjustments,
such as repetition and redundancy, play a greater role in making input
comprehensible.
A number of studies have shown that repetition is effective in facilitating
listening comprehension (Cervantes, 1983; Chaudron, 1983). In addition,
Fujimoto, Lubin, Sasaki, and Long (1987) reported that comprehension
scores for subjects who heard input which contained either linguistic modi-
fications or redundancy and elaboration were significantly higher than
scores for subjects who heard a native-speaker version. In another related
study, Pica, Young, and Doughty (1987) investigated the effects of interac-
tion versus linguistic modification on comprehension. They found that
allowing the subjects to interact with the native speakers in a direction-
giving task increased repetition and significantly improved comprehen-
sion. However, results indicated that input premodified by decreased com-
plexity and increased redundancy did not assist comprehension.
Although these studies clearly indicate that repetition aids comprehen-
sion, the absolute and relative effects of linguistic simplification require
further investigation. Also, there have been no studies to date that have
767
attempted to isolate the effects of syntactic simplification on listening
comprehension. The purpose of this study, which consists of two experi-
ments, is to explore the absolute and relative effectiveness of syntactic
simplification and repetition on listening comprehension.
In Experiment 1, the effects of syntactic simplification on listening
comprehension were explored. In Experiment 2, the relative effects of
syntactic simplification and repetition on listening comprehension were
investigated. In both experiments, the interaction between main effects
and proficiency levels were tested.
EXPERIMENT 1
Method
Subjects. The subjects for this experiment were 76 native Japanese-speak-
ing English majors (54 first-year students and 22 seniors) at Fukuoka
University. Results of a 20-item pretest indicated that as a group, the
seniors were significantly more proficient in listening comprehension than
the first-year students.
Treatment. Two versions of a short lecture about the African American
civil rights leader Malcolm X were prepared. One version contained a low
degree of subordination (7.46 average words per T unit and 1.20 S nodes
per T unit). The second version contained a higher degree of subordina-
tion (14.19 words per T unit and 2.49 S nodes per T unit).
Procedures. Two 44-item cloze tests were prepared, each based on one
version of the short lecture. Content words which were difficult to guess
from the context were chosen for deletion. The subjects at both proficiency
levels were randomly assigned to one of two experimental conditions. One
group heard the syntactically simplified version of the short lecture, and
the other group heard the more complex version. After listening to each
section the subjects were given 1 min to fill in the missing cloze items.
Results
Results of a 2 x 2 ANOVA (SAS GLM Type III) indicate that groups
hearing the syntactically simplified version scored significantly higher on
the recall cloze test than the groups hearing the more complex version:
F (1,72) = 21.28, p < .0001; K-R 21 reliability = .84. No interaction effect
was found between lecture version and proficiency level.
EXPERIMENT 2
In order to address the question of the relative effectiveness of syntactic
simplification in comparison with repetition and to reexamine the findings
of Experiment 1, Experiment 2 was conducted.
768 TESOL QUARTERLY

Method
Subjects. The subjects for this experiment were 82 English majors (54 first-
year students and 28 seniors) at Fukuoka University, none of whom had
participated in Experiment 1. The same 20-item pretest used in Experi-
ment 1 was administered and results indicated that the seniors were sig-
nificantly more proficient in listening than the first-year students.
Treatment. Three versions of a short lecture about a tidal wave were
recorded on tape. Version 1 contained a low degree of subordination (7.33
average words per T unit and 1.33 average S nodes per T unit). Version 2
contained a higher degree of subordination than Version 1 (18.33 average
words per T unit and 3 average S nodes per T unit). Version 3 was identical
to Version 2, however, each dictation segment was repeated once. At both
proficiency levels, subjects were randomly assigned to one of the three
experimental conditions.
Procedures. For this experiment, a partial dictation test was administered.
Three to seven words were deleted from 13 separate dictation segments.
The subjects listened to the entire segment and filled in the deleted por-
tions in the blanks on their test papers.
Results
The results of 2 x 2 x 2 ANOVA indicate that comprehension scores
for the groups hearing the syntactically simplified version were higher than
the scores for the groups hearing the complex version with no repetition,
F (1,46) = 8.63, p < .01. Also, results indicate that scores for the groups
hearing the complex version with repetition were higher than the scores
for the groups hearing the complex version without repetition: F (1,52) =
27.60, p < .0001; K-R 21 reliability = .73. No significant difference was
found between scores for groups hearing the syntactically simplified ver-
sion and the complex version with repetition.
Our attempt to find interaction effects for the syntactically simplified
version versus the complex version with repetition was invalidated by a
sampling error. An analysis of individual group cells revealed no signifi-
cant difference between proficiency levels for first-year and senior groups
hearing the complex version with repetition.
DISCUSSION
The results of both experiments indicate that syntactic simplification is
an aid to comprehension. It is noteworthy that significance was obtained
in two different types of tests which measure different levels of listening
comprehension (Chaudron, 1985). The recall cloze test gives the listener
more time to process the input and more opportunity to make use of
contextual clues, whereas the partial dictation test measures comprehen-
sion at the first level of intake. No interaction effect was found between the
effect of syntactic simplification and proficiency level in either experiment.
BRIEF REPORTS AND SUMMARIES 769

The results of Experiment 2 indicate that repetition facilitates compre-
hension. This adds more evidence to previous studies addressing this
question. It is also important to notice that there was no significant differ-
ence between the groups hearing the syntactically simplified version and
the groups hearing the complex version with repetition. This finding has
important implications for teaching methodology and materials design.
Listening texts are often syntactically simplified to aid comprehension.
Although this may aid comprehension, this modification may not be neces-
sary if other modifications, such as repetition, are employed. This is espe-
cially interesting in light of the fact that repetition is not frequently used
in texts to facilitate comprehension; in fact, it seems to be intentionally
avoided in order to give materials an “authentic” quality. It is still too early
to make definitive statements, and further research is needed; however,
it is clear that designers of listening materials and instructors of listening
comprehension need to reevaluate their assumptions in light of the find-
ings of this study and other empirical evidence.
REFERENCES
Cervantes, R. (1983). Say it again Sam: The effect of exact repetition on listening compre-
hension. Unpublished manuscript. University of Hawaii at Manoa, Honolulu.
Chaudron, C. (1983). Simplification of input: Topic reinstatements and their
effects on L2 learners’ recognition and recall. TESOL Quarterly, 17 (3), 437–458.
Chaudron, C. (1985). Intake: On models and methods for discovering learners’
processing of input. Studies in Second Language Acquisition, 7, 1–14.
Fujimoto, D., Lubin, J., Sasaki, Y., & Long, M. ( 1987). The effect of linguistic and
conversational adjustments on the comprehensibility of spoken second language discourse.
Unpublished manuscript. University of Hawaii at Manoa, Honolulu.
Krashen, S. (1982). Principles and practice in second language acquisition. New York:
Pergamon Press.
Long, M. H. (1985). A role for instruction in second language acquisition: Task-
based language training. In K. Hyltenstam & M. Pienemann (Eds.), Modelling
and assessing second language acquisition. Clevedon, England: Multilingual Matters.
Pica, T., Young, R., & Doughty, D. ( 1987). The impact of interaction on compre-
hension. TESOL Quarterly, 21 (4), 737–758.
Authors’ Address: C/O Raoul Cervantes, Department of Educational Psychology,

Education Building, University of Illinois at Urbana-
Champaign 61801
The Discourse of Pop Songs

TIM MURPHEY
Nan.zan University
Murphey and Alber (1985) postulated a pop song (PS) register and
described it as the “motherese of adolescents” and as “affective foreigner
770 TESOL QUARTERLY

talk” because of the simple and affective language. The PS register was
further characterized as a “teddy-bear-in-the-ear” to capture its riskless
communicative qualities. More detailed analyses of a larger corpus (Mur-
phey, 1989, 1990a) have now been done which support the earlier descrip-
tion and further show PSs to be repetitive, conversationlike and about half
the speed of spoken discourse. This simplicity, their highly affective and
dialogic features, and their vague references (ghost discourse), allow listen-
ers to use them in personally associative ways. These discourse features
and the song-stuck-in-my-head phenomenon (discussed below) make them
potentially rich learning materials in and out of the classroom.
THE STUDY
The Corpus
The top 50 songs in English were taken from the September 12, 1987
edition of Music & Media’s Hot 100 Chart. This date had been designated
4 months in advance in order to be nonbiased in the selection, following
Gerbner’s (1985) model of message systems analysis and Brooks’s (1982)
plea that we be “tasteless” in our research.
Word Count
A word-frequency count revealed a type-token ratio (TTR) of .09 with
a total of 13,161 words. The averdge TTR per song is .29, which implies
that each word is repeated about three times in an average song of 263
tokens. Actually 25% of the corpus is composed of just 10 different words:
4 pronouns (you, I, me, my), 4 function words (the, to, a, and), the future
auxiliary gonna, and the noun/verb love.
Content Analysis: You and Me

In the corpus, 86% of the songs contain unspecified you-referents. Al-
though our logic tells us that it is not possible that we are being addressed
directly, subconsciously (and perhaps illogically) we may receive the mes-
sages as directed toward us. This type of unspecified addressing may be
compared to the phenomenon of someone shouting hey you on the street
and everybody turning to look, thinking perhaps they are being addressed.
Whatever the ways we might choose to use songs, they are never chal-
lenged. Advertisers, of course, know this all too well (see Rotzoll, 1985).
As for I, 94% of the songs had unspecified first person referents. Songs
apparently say what some listeners want to say anyway, literally putting
the words into their mouths as they sing along. The fact that the I in the
song has no name makes it easier for the listener to appropriate the words.
The word count revealed that the total referents in first person (my, mine,
etc.) amounted to 10% of the total words, whereas second person referents
contributed 5%. Thus, 15% of the words in these songs referred to me and

you. Additionally, imperatives and questions totaled 25% of the sentences
in the corpus. These features make the discourse highly conversationlike.
Time, Place, and Gender

Ninety-four percent of the songs have no time of enunciation whatso-
ever, and 80% have no place mentioned. In no song are precise dates or
hours given, and in only one is there a named place. It seems songs happen
when and where they are heard. A further indication of the vagueness of
PSs is the lack of gender referents in the lyrics. Lyrically, only 4 songs
designate both the sex of the enunciator and addressee. No gender refer-
ence is given in 62% of the songs and thus could be sung by either sex
without changing the words, Only 12% are definitely written to be sung
by one sex to another. Of course, the androgynous characteristics of many
voices and the “image” of many singers plays upon this ambiguous possi-
bility.
Words per Minute

The words-per-minute mean speed, 75.49, is about half that of normal
speech. It is not so much that songs are slow, although some are, but rather
that they have frequent pauses. The pause structure would seem to invite
listeners to respond, if not with their own words, then at least with an
echo of what they just heard. The frequent calling to you also encourages
audience participation in the enunciation, contexualization, and meaning
making of the song. The pauses and a slow rate may allow listeners to
search for referents in their own contexts, internally or externally, an
activity that deepens appropriation.
Readability and Human Interest

Using Flesch’s (1974) readability formula and a similar one by Fry
(1977), the PS register finds itself at the level of the simplest graded
EFL readers (e.g., those published by Collins, Heinemann, Longman, and
Macmillan), having only 300 to 500 words, or at the reading level of a
native-speaker child after 5 years of schooling. Using Flesch’s human
interest formula, PS could be described as highly dramatic and of high
human interest.
To summarize: (a) The words of PSs are short, repetitive, and have a
low TTR. (b) The sentences are short. (c) Both the sentences and the
words contain many personal references. (d) These personal references
have practically no precise referents. (e) Gender, time, and place referents
are absent or, at most, vague. (f) The rate of speech of PSs is half that of
normal speech.
Typifying Pop Song Discourse Further

I applied an interactive typology approach to classifying texts, devel-
oped by Bronckart (1985), to the PS corpus. According to their extralin-
772 TESOL QUARTERLY

guistic parameters, PSs belong to his narration category, but when the
language of the PS corpus was computer analyzed using his 27-item grid,
it fell within the situational discourse (SD) category, that is, conversation.
Looking more closely at the definition of SD, we find that several psycho-
logically salient features of song are revealed which are not superficially
accessible. First, SD is “text produced in direct relation with the context
. . . with a precise moment and place of production, and which is organized
by constant reference to this context” (p. 63). As noted, any traces of
precise moments and places, and references to them, are remarkably
absent from the texts of PSs. It is precisely this lack of referents that allows
songs to happen whenever and wherever they are heard. For this listener,
the song text, if received as relevant, takes on meaning in and for that
context.
The definition of SD also stipulates that there are “identifiable interlocu-
tors.” As was noted, the identification of participants was not textually
traceable in 90% of the songs. However, one of the salient characteristics
of the songs is the large number of first and second person pronouns,
albeit with no precise referents. Again, to understand this phenomenon I
think one needs to look at the listener’s world. This hypothesized psycho-
logical processing of PS content and other characteristics suggests a certain
isomorphism with Vygotsky’s (1962) inner speech which may help to ex-
plain song’s attraction (Murphey 1990a, 1990b).
The Din of Song and the LAD

Research on the din, the involuntary rehearsal of language in one’s
mind after a period of contact with a foreign language, has shown it to be
a phenomenon worthy of consideration, as it may be a manifestation of
Chomsky’s hypothesized language acquisition device (LAD; see de Guer-
rero, 1987; Krashen, 1983; Parr & Krashen, 1986). A very similar phenom-
enon is what I call the song-stuck-in-my-head phenomenon (SSIMHP); the
repeating of a song in one’s head, also something commonly experienced,
usually occurring when audition is followed by relative quiet, as with the
last song you hear before leaving your home or car (Murphey, 1990a,
1990b). The SSIMHP might even be capable of tricking, or activating,
the LAD into involuntary rehearsal. Oliver Sachs writes “[concerning]
‘tricking’ the LAD into operation via music and song . . . one sees again
and again how Parkinsonians tho unable to walk, may be able to dance;
and though unable to talk, may be able to sing” (personal communication,
March 30, 1988). The L2 research question is whether music and song can
trick the LAD into a din mode that would process more communicative
speech.
CONCLUSION, DISCUSSION, AND IMPLICATIONS

For TESOL, PSs offer short, affective, simple, native texts with a lot of
familiar vocabulary recycled, yet vague. They are dialogic and engaging
auditorily but, because of our narrative expectations, they are probably

not very interesting as reading material. Nevertheless, their written forms
can be used to reinforce what is heard auditorily and promote a deeper
activation of the SSIMHP. Their vague references allow learners to fill
them with their own content. They also allow teachers to use them in very
different methodologies for very different reasons (Murphey, 1990a).
If involuntary rehearsal is the humming of the efficient LAD, music
and song may initially play an associative facilitating role in engaging and
stimulating it. Studying the SSIMH phenomenon may allow us to use it
more advantageously for things we want to stick in students’ minds.
ACKNOWLEDGMENT
This article reports the results of the first half of my doctoral dissertation (Uni-
versity de Neuchâtel, Switzerland; Murphey, 1990a). The second half surveyed
the literature to discover the uses that teachers made of music and song in their
classes.
REFERENCES
Bronckart, J. P. ( 1985). La fonction du discours. Paris: Delachaux & Niestle.
Brooks, W. (1982). On being tasteless. Popular Music, 2, 9–18.
de Guerrero, M. C. M. ( 1987). The din phenomenon: Mental rehearsal in the
second language. Foreign Language Annals, 20, 537–548.
Flesch, R. ( 1974). The art of readable writing. New York: Harper & Row.
Fry, E. (1977). Fry’s readability graph. Journal of Reading, 20, 242–252.
Gerbner, G. (1985). Mass media discourse: Message system analysis as a component
of cultural indicators. In T. A. van Dijk (Ed.), Discourse and communication: New
approaches to the analyses of mass media discourse and communication (pp. 13–25).
Berlin: de Gruyter.
Krashen, S. D. (1983). The din in the head, input, and the language acquisition
device. Foreign Language Annals, 16, 41–44.
Murphey, T. (1989). The where, when and who of pop song lyrics: The listener’s
prerogative. Popular Music, 8, 58–70.
Murphey, T. (1990a). Music and song in language learning: An analysis of pop song
lyrics and the use of music and song in teaching English to speakers of other languages.
Bern, Switzerland: Peter Lang Verlag.
Murphey, T. ( 1990b). The song stuck in my head phenomenon: A melodic din in
the LAD? System, 18, 53–64.
Murphey, T., & Alber, J. L. (1985). A pop song register: The motherese of
adolescents as affective foreigner talk. TES0L Quarterly, 19 (4), 793–795.
Parr, P. C., & Krashen, S. D. ( 1986). Involuntary rehearsal of second language in
beginning and advanced performers. System, 14, 275–278.
Rotzoll, K. B. (1985). Advertisements. In T. A. van Dijk (Ed.), Discourse and commu-
nication: New approaches to the analyses of mass media discourse and communication
(pp. 94–105). Berlin: de Gruyter.
Vygotsky, L. S. ( 1962). Thought and language. Cambridge, MA: MIT Press. (Original
work published 1934)
Author’s Address: Nanzan University, Faculty of Foreign Languages, 18

Yamazato-cho, Nagoya 466, Japan
774 TESOL QUARTERLY

THE FORUM
The TESOL Quarterly invites commentary on current trends or practices in the
TESOL profession. It also welcomes responses to rebuttaIs to any articles or re-
marks published here in The Forum or elsewhere in the Quarterly.
Comments on Yoshinori Sasaki’s “A Logical

Difficulty of the Parameter Setting Model”
Objection to a Logical Difficulty
GUY MODICA
In his TESOL Quarterly article (Vol. 24, No. 4), Yoshinori Sasaki is
premature in drawing the conclusion that “[the Universal Grammar]
model is untenable as a scientific theory of language acquisition”
(p. 769). In constructing his logical argument that no marked syntactic
parameter could ever originate, Sasaki misconstrues the way in which
the notion of an innate Universal Grammar (UG) must be integrated
into a theory of language acquisition and change. He argues that
because no phonological, morphological, semantic, or pragmatic infor-
mation enters into syntactic rules, no logically possible way exists to
establish a marked parameter, and therefore no speaker’s parameters
could ever be “set” in a marked “position” by the experience of trig-
gering utterances. His argument primarily addresses the compatibility
of UG with an account of historical change; I will discuss this issue first
and then the “logical difficulty” of the UG model.
Much of David Lightfoot’s career (e.g., 1974, 1979, 1980, 198la,
1981b, 1982, & 1991) has been spent examining evidence of language
shift from a generative perspective. The evolution of language reaches
an apogee in what he terms reanalysis. Beginning at an arbitrary initial
state of the language, and with the assumption of an innate language
faculty consisting of core principles and parameters and a marked
periphery, he enumerates some factors that stimulate reanalysis: pro-
cessing problems, stylistic expressiveness, contact with foreign lan-
guages/pidgins/creoles, novelty, chance, and other nongrammatical
factors. These factors introduce irregularity into a language, and which
of these irregularities is perpetuated by grammaticization is unpredict-
able. These reanalyses are forced by the attempts of succeeding genera-
tions to acquire a language based on evidence that includes novel
constructions or usage. Using principles of UG and the linguistic evi-
775
dence encountered, the child sets parameters that provide a maximally
economical grammar which generates this “evidence.” According to
Lightfoot (1982),
grammatical reanalyses meet strict conditions: they must lead to a grammar

fulfilling the restrictive requirements imposed by the theory of grammar;
they must constitute the simplest attainable grammar for a child exposed
to the new linguistic environment; they must yield an output close to that
of the earlier grammars. For any reanalysis, these requirements impose
narrow restrictions on the available options. (p. 163)
Changes may be necessitated by the theory of grammar; such changes are
therefore explained by the theory of grammar. Conversely, noting the point
at which abstract reanalyses take place teaches something about the limits
to attainable grammars. (p. 159)
A UG explanation has been suggested by Lightfoot (1982) for the

move away from the subject-object-verb (SOV) order of Old English.
Remarks such as I four trees felled or You the next round buy were common-
place before the year 1000. This grammar also generated The bartender
Little John a halfpenny owes believes. Sentences with this structure, NP-
[NP-NP-V]-V, form nested constructions, which become progressively
more difficult to process as the nesting deepens, due to limitations of
short-term memory. A most well-known example by Kuno (1973) is
The cheese the rat the cat John keeps killed ate was rotten, a perfectly grammat-
ical set of reduced relative clauses made impossibly confusing without
careful dissection of the structure.
Following Lightfoot, a possible response by speakers of Old English
to the need for embedded objects and the incomprehensibility of
nested constructions was a clever (at that time) turn of phrase The
bartender believes Little John a half penny owes. Stimulated by a processing
problem engendered by sentences containing embedded sentences as
objects, speakers found a novel linguistic solution. An NP-V-[NP-NP-
V] construction was offered, thereby introducing an alternate SVO
structure into an otherwise SOV language. A growing corpus of SVO
sentences was eventually taken by some infants acquiring the language
as evidence of an underlying SVO order, and that parameter was set
accordingly, thus beginning an important evolutionary progression
towards modern English.
Above is a (necessarily) hypothetical example of language change:
The change occurs first in the linguistic environment as a result of a
processing problem, later in the parameter that determines the basic
hierarchical pattern of the language. All this is consistent with a UG
framework. This framework is a component of the approach employed
by many researchers investigating language change. Conversely, an
776 TESOL QUARTERLY

explication of diachronic change can be of crucial importance for a
synchronic UG theory by helping to clarify the gradation from irregu-
larity to peripheral rules to marked parameter settings in the core
grammar and how that gradation is modified from period to period
through its acquisition by successive generations.
With this discussion now in place, I specifically examine Sasaki’s
argument. Corollary l—some of the switches within the speakers of
[some] languages are indeed in marked positions—is most certainly
uncontroversial. It is exactly this central claim of markedness theory
that accounts for language-particular differences such as the need for
overt subjects. While English requires a subject for every sentence,
other languages such as Spanish, Italian, Korean, and Japanese permit
subjectless sentences. Nongenerative linguistic theories, for example,
functionalism, account for such differences with difficulty, whereas in
a generative theory the overt subject requirement for English is han-
dled as a rule at the periphery of the grammar.
Corollary 2—it is virtually impossible to replace an unmarked value
(i.e., a preferred value1) or parameter with a marked one without
supporting “evidence” or “input’’—is also consistent with the theory.2
With regard to setting a parameter at other than a default value, input
from the linguistic environment is required.
When Sasaki states “Nor can effects on syntax of such factors as
phonology, morphology, semantics, and pragmatics provide a coherent
solution” (p. 769), he is reiterating a synchronic constraint—one which
pertains to the issue of the autonomy of syntax within the grammar
contained in the mind/brain of the hearer/speaker. The principle of
autonomy of the syntax is stated, for example, in Radford (1988):
No syntactic rule can make reference to pragmatic, phonological, or
semantic information. This is a constraint on the system of grammar,
not a constraint on the mechanisms of language change. Grammars do
change over time. Any theory which could not incorporate phonologi-
cal/morphological/ syntactic/semantic change would be a poor theory
THE FORUM 777

indeed. It would fail to meet even descriptive adequacy. Generative
grammar is not impoverished by this restriction on its ability to proffer
mechanisms for language change. Sasaki looks to a synchronic theory
for a diachronic explanation; it is this error rather than a flaw in
his logic that diminishes the impact of his argument. The autonomy
hypothesis that he crucially uses to draw his conclusion is not meant
to hold for language change; his argument collapses through this
weakness.
Generative linguistics has been somewhat of a gadfly to a number of
disciplines. It is from the consternation of researchers outside linguis-
tics who must employ some grammar (in the sense of a theory of
knowledge of language) in their own work that antipathy manifested
toward generative theory often originates. Generative linguists are well
aware that a theory of language must be fully integrated with theories
of historical change (e. g., Lightfoot, 1991) and language acquisition
(e.g., Hyams, 1986). A review of the literature confirms that this type of
research program is well underway. ESOL theorists and practitioners
should be encouraged to become familiar with generative linguistic
theory and not be dissuaded by this argument, which misses a funda-
mental distinction between synchrony and diachrony.
ACKNOWLEDGMENTS
The contributions of Fritz Newmeyer and Sandra Silberstein to earlier drafts has
improved my thinking, which however remains my own.
REFERENCES
Chomsky, N. (1981). Lectures on government and binding. Dordrecht, Netherlands:
Foris.
Hyams, N. (1986). Language acquisition and the theory of parameters. Dordrecht,
Netherlands: ReideI.
Kuno, S. (1973). The structure of the Japanese language. Cambridge, MA: MIT Press.
Lightfoot, D. (1974). The diachronic analysis of English modals. In J. Anderson &
C. Jones (Eds.), Proceedings of the first international conference on historical linguistics.
Dordrecht, Netherlands: Reidel.
Lightfoot, D. (1979). Principles of diachronic syntax. Cambridge: Cambridge
University Press.
Lightfoot, D. (1980). Trace theory and explanation. In E. Moravesik & J. Wirth
(Eds.), Current approaches to syntax (Syntax and Semantics Vol. 13). New York:
Academic Press.
Llghtfoot, D. (198 la). The history of NP movement. In C. Baker &J. McCarthy
(Eds.), The logical problem of language acquisition. Cambridge, MA: MIT Press.
Llghtfoot, D. (198 lb). Explaining syntactic change. In N. Hornstein and
D. Lightfoot (Eds.) Explanation in linguistics: The logical problem of language
acquisition. Cambridge, MA: MIT Press.
778 TESOL QUARTERLY

Lightfoot, D. (1982). The language lottery: Toward a biology of grammars. Cambridge,
MA: MIT Press.
Light foot, D. (1991). How to set parameters: Arguments from language change.
Cambridge, MA: MIT Press.
Radford, A. (1988). Transformational grammar. Cambridge: Cambridge University
Press.
Dual Storage Formats of Linguistic Knowledge?

An Untestable “Explanation”
YOSHINORI SASAKI
Rhodes College
First, I would like to thank Guy Modica for his thoughtful commen-
tary. In this response, I examine whether his proposal truly provides
a scientific solution to the problem. Since some of the crucial underlying
assumptions of his argument are not explicitly stated, I will clarify them
first. In particular, Modica’s argument is entirely based on a dichotomy
of dual storage formats of linguistic knowledge: One is encoded in
terms of parameters and the other is not. The following discussion will
center around the scientific validity of this assumption.
DUAL STORAGE FORMATS OF

LINGUISTIC KNOWLEDGE?
In my article, I pointed out that the initiator of a marked form, by
definition, had not had a chance to be exposed to the form before
inventing it. I consider this as contradictory to the parameter setting
model (PSM) because the switch in the initiator’s brain must have been
set to “marked” to enable her/him to use the marked form, but the
PSM claims that the initial setting of parameters must be unmarked.
In response, Modica claims that the switches of the initiator of the
marked form (and speakers of that generation) were not set to marked.
The initiator’s parameters remained unmarked, but nevertheless s/he
invented and used the marked form to lessen the processing load. The
marked form which thus originated came to be used productively and
extensively in the speech community. (If its use were only formulaic
and occasional, it would be unable to alter the initial default state of
the next generation’s parameters. ) Eventually the parameters of the
next generation, who were exposed to the marked form in their child-
hood, were set marked.
Obviously, Modica presupposes that knowledge about the marked
form stored in the first generation’s neural system was encoded in a
format which did not involve the setting of parameters. For brevity, I
THE FORUM 779

term thus encoded linguistic knowledge nonparameterized. Speakers
who use a form contrary to their parameter setting will be termed
nonparameter users of that particular form. The rest (those whose linguis-
tic performance correctly reflects their parameter settings) are parame-
ter users of the form, and their linguistic knowledge is parameterized.
In other words, there are two possible formats for storing knowledge
about language in the brain, and both of these formats allow for a
productive use of target forms. The parameterized linguistic knowl-
edge within an individual is modular, but it is nevertheless subject
to the effects of cognitive-functional factors diachronically. Modica
explains this apparent paradox by assuming that the nonparameterized
linguistic knowledge of one generation conveys these cognitive-func-
tional effects to the settings of parameters of the next generation.
It is striking that Modica’s argumentation in terms of theory con-
struction is closely similar to Krashen’s (1982) monitor model.
(McLaughlin, 1987, points out some differences between Krashen’s
proposal and Chomsky’s in linguistic details, but those distinctions do
not affect the following argument. ) Both essentially embrace a dual
linguistic knowledge storage format: One possible form of knowledge
corresponds to a biologically endowed language-specific acquisition
mechanism (Krashen: acquired knowledge; Modica: parameterized
knowledge), whereas the other does not (Krashen: learned knowledge;
Modica: nonparameterized knowledge).
The influence of the monitor model has steadily declined since the
1980s because of its lack of scientific testability. When it has been
pointed out that an observed performance of an L2 speaker did not
follow what Krashen’s characterization of an “acquirer” would predict,
Krashen’s position has remained that the particular performance came
from “learned,” not “acquired,” knowledge (see Gregg, 1984, for an
example of a critique and Krashen, 1981, for a statement of his po-
sition).
As McLaughlin (1978, 1987) correctly points out, such an ad hoc
explanation is entirely untestable in the absence of an objective measure
to decide whether a certain linguistic behavior comes from learned or
acquired knowledge. Krashen fails to supply such operational defini-
tions.
1 hope it is already evident that Modica’s argument is in this respect
crucially parallel to Krashen’s. In response to my critique that some
people (i.e., initiators of marked forms) behave contrary to the predic-
tion of the PSM, Modica argues that such behavior does not come
from parameterized knowledge but from somewhere else. The crucial
weakness of his argumentation is its lack of an objective way to deter-
mine who is using a certain linguistic form according to their internal
parameter settings and who is not.
780 TESOL QUARTERLY

Modica’s proposal is untestable not simply because he is talking about
a historical event of centuries ago. Lack of operational definitions
makes his proposal conceptually untestable even if the process rakes
place before our eyes.
For example, how is it possible to tell whether the pro-drop parame-
ter for my English use is set plus or minus? All my sentences in this
article have a subject (i.e., a pro-drop minus feature) but Modica’s
dualism makes it possible that the parameter in my brain is still set
plus. I cannot tell its setting by introspection. Indeed, it is impossible
to test the setting of a speaker’s internal parameter for L1 or L2 if such
an untestable dualism is accepted as an explanation.
It is important to note that the necessary operational definitions of
parameter users versus nonparameter users should not invoke notions
of the UG, in order to avoid a circular argument. The definitions must
be made in terms of some observable features. If, for example, the
distinction is made in terms of response time latency, this requirement
would be satisfied.
In the absence of such objective measures, any claim about language
acquisition can be defended by shuffling these two types of formats
arbitrarily: Whenever UG proponents confront data which contradict
the prediction of their theory, they can simply ascribe them to the
nonparameterized knowledge of learners and then ignore them. Only
when the data happen to fit UG theory, will they cite them as supportive
evidence. The explanation can never be falsified because there is no
empirical way to determine the source of a linguistic performance,
that is, whether it stems from parameterized or nonparameterized
knowledge. The explanation is indeed invincible, but such invincibility
is ultimately useless.
Let me add that this dualism in linguistic knowledge storage format
is not given serious consideration as a viable hypothesis among experts
of human memory (e.g., Zechmeister & Nyberg, 1982). This distinction
was invoked to patch a hole in the PSM, and there is no independent
evidence to support such a stipulation. In short, it is not a psychologi-
cally plausible assumption. The burden of proof of such a foreign idea
is clearly on the advocate’s shoulder.
FUNCTIONAL EFFECTS ON LANGUAGE ACQUISITION

Once we turn our focus away from his assumption of the dual
formats of linguistic knowledge, Modica’s example of English word
order shift simply reiterates the functionalists’ contention that forms of
human language are crucially constrained by communicative functions
(e.g., Slobin, 1977, 1979).
THE FORUM 781

In his discussion, Modica conveniently isolates the process of lan-
guage acquisition from the dynamism of language change. His writing
gives the impression that some ingenious adult solved a communicative
difficulty by inventing a new word order, and then the form was
presented to the next generation who were acquiring language. In
reality, if a new form emerges in a speech community, its initiator is
more likely to be a child rather than an adult since older generations
are known to be more conservative in their language use. Moreover,
children are more susceptible to processing constraints because of
their smaller processing capacity. Indeed, Slobin (1977) convincingly
demonstrates that the processes of language acquisition and dia-
chronic language change are constrained by the same functional
dynamism.
It is puzzling why Modica emphasizes the functional influences on
diachronic changes but ignores functional effects on language acquisi-
tion. If the functional charges (processing constraints in Modica’s ex-
ample) are indeed influential enough to motivate a speaker to invent
a marked form in the absence of evidence, it is likely that those changes
can guide an L1 learner, who has access to abundant evidence, to
acquire the same form. This provides additional support for Givón’s
(1979) claim that “children do not first acquire ‘syntax’ in Chomsky’s
sense, but rather a communicative system of a much more rudimentary
sort; and only later they modify it, gradually, into ‘syntax’” (p. 22).
It is premature to conclude that parameters are the only possible
explanation of language acquisition. Without parameters, the human
brain still has the capacity to store and productively use linguistic forms,
as Modica implicitly admits.
For example, Slobin (1979) successfully explains in functional terms
why head directionalities tend to be uniform (either right or left, but
seldom both) within a language: Diversity of directionality makes a
language difficult to process and thus hard to acquire. Functional
constraints like this drastically reduce the number of humanly pro-
cessible patterns of language, without resorting to such an untestable
notion like a hardwired universal grammar.
Chomsky (1965), without proof, alleges that there is no possibility of
explaining language acquisition without accepting very strict innate
restrictions on possible grammars. The number of possible grammars
is too great, unless a universal grammar is genetically hardwired. In
reality, we have not yet fully explored the possibility of explaining
the acquisition process without recourse to such untestable notions.
Modica’s functional explanation of the origin of marked forms essen-
tially hints that functional forces, interacting with some constraints on
cognitive processes, may be influential enough to guide children to
develop internal linguistic schemata.
782 TESOL QUARTERLY

CONCLUDING REMARKS
Scientific propositions crucially distinguish themselves from meta-
physical statements by virtue of their testability: Scientists assume re-
sponsibility to make it clear how their claims can be tested, that is,
what empirical procedure can potentially falsify them. Since Modica’s
proposal does not satisfy this requirement, it cannot provide a viable
scientific solution to the problem.
I do not deny the existence of all innate language-specific endow-
ments. No responsible scientist would do so because their existence is
untestable (i. e., unfalsifiable) within the current limitations of neuro-
psychology. Precisely because of this untestability, however, many re-
searchers in the field consider it methodologically prudent to pursue
the contribution of identifiable variables to their limits (Bohannon &
Warren-Leubecker, 1985). Innativist terms are kept only as a last re-
sort. This is the approach which I believe is scientifically sound.
ACKNOWLEDGMENTS
I appreciate encouraging remarks on this project from Susan Flynn, Talmy Givön,
Katsutoshi Ito, Lester Loschky, and Dan Slobin. I also acknowledge comments on
my previous article from Noam Chomsky and Michael Harrington. All remaining
errors are mine.
REFERENCES
Bohannon, J., & Warren-Leubecker, A. (1985). Theoretical approaches to
language acquisition. In J. B. Gleason (Ed.), The development of language.
Columbus, OH: Charles E. Merrill.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Givón, T. (1979). On understanding grammar. New York: Academic Press.
Gregg, K. R. (1984). Krashen’s monitor and Occam’s razor. Applied Linguistics, 5,
79–100.
Krashen, S. D. (1981). Second language acquisition and second language learning.
Oxford: Pergamon Press.
Krashen, S. (1982). Principles and practice in second language acquisition. Oxford:
Pergamon Press.
McLaughlin, B. (1978). The monitor model: Some methodological considerations,
Language Learning, 28, 309–332.
McLaughlin, B. (1987). Theories of second-language learning. London: Edward
Arnold.
Slobin, D. (1977). Language change in childhood and in history. In J. MacNamara
(Ed.), Language learning and thought. New York: Academic Press.
Slobin, D. (1979). Psycholinguistics (2nd ed.). London: Scott, Foresman.
Zechmeister, E., & Nyberg, S. (1982). Human memory. Monterey, CA: Brookes.
THE FORUM 783

Teaching Issues
The TESOL Quarterly publishes brief commentaries on aspects of English
language teaching. For this issue, we asked two educators to address the
following question: Is there any value in students reading aloud in the second
language classroom?
Edited by SANDRA MCKAY

San Francisco State University
Reading Aloud
An Educator Comments . . .
SUZANNE M. GRIFFIN
Washington State Board for Community and Technical Colleges
Until the early 20th century, reading aloud was the prevalent method
of teaching reading in Western cultures. Subsequent to the rise of
silent reading instruction in the early 1900s, three schools of thought
developed about oral reading instruction. The first group saw oral
reading as a means to more proficient silent reading; the second viewed
oral reading as a detriment to the development of proficient silent
reading; and the third approached oral reading as an art form whose
techniques were worth mastering. The debate about the relative value
of silent and oral reading diminished in the 1950s with the acknowledg-
ment by some that reading aloud and reading silently involve different
skills (Allington, 1984).
One important benefit of reading aloud as opposed to silent reading
is that reading aloud develops an awareness of sound-symbol relation-
ships. Because of this benefit, some applied linguists recommend read-
ing aloud for beginning-level ESL classes (Bright, Koch, Ruttenberg,
& Terdy, 1982; Rivers, 1968). Learners at the beginning levels have
limited experiences with the spoken language. As a result, they are not
successful in predicting the pronunciation of words they encounter in
printed texts. Reading aloud expands learners’ auditory experiences
with the target language by exposing them to words that they would
not ordinarily hear in spoken form.
Whereas there is evidence that oral reading may slow the reading
speed of advanced ESL learners and inhibit their comprehension
(Smith, 1971), the awareness that oral reading helps develop decoding
strategies has led some to argue for the inclusion of some oral reading
784 TESOL QUARTERLY

even in advanced-level classes. Carrell (1987) notes that learners must
have a good command of the so-called bottom-up (text-based) strate-
gies as well as top-down (knowledge-based) strategies in order to inter-
act effectively with texts.
Some of the strongest support for reading aloud comes from in-
service teachers themselves. Many ESL teachers regularly have stu-
dents in their classes read aloud and point to the benefits of this
practice. For example, in a Washington state survey of ESL teachers
of adults’ reading instruction practices (Griffin, 1992), more than 80%
(72 out of 90) of the teachers who responded indicated that they have
their students read aloud in class on a regular basis. Most feel that
ESL students at all levels benefit from reading aloud although they
emphasize that reading aloud is always a voluntary activity for individu-
als in their classes. Instructors in the survey gave high ratings to the
following benefits to students for reading aloud: expansion of oral
vocabulary, developing awareness of the sounds of the language, facili-
tation of chunking of words in meaningful groups, and development
of self-confidence.1
Reading aloud benefits teachers by providing them with the opportu-
nity to evaluate learners’ reading skills. Goodman (1970, 1972) and
others have encouraged practitioners to employ qualitative rather than
quantitative criteria in analyzing reading errors. The ESL teachers in
the Washington survey found value in the following kinds of informa-
tion gained from having learners read aloud: diagnosing knowledge
of sound-symbol relationships and knowledge of syntactic structure,
determining learners’ overall comprehension, and understanding stu-
dents’ cognitive processing of written information.
There is considerable controversy among practitioners over appro-
priate techniques for correcting learners’ errors when they read aloud.
Nearly 60% of the adult ESL teachers responding to the Washington
survey indicated that they allow students to complete a passage uninter-
rupted before noting errors. Another 36% are willing to interrupt
learners to correct their errors. More than half (52%) of the teachers
encourage students to self-correct when they are reading aloud. A
quarter of the respondents allow other students to correct reading
errors.
In addition to providing insight into learners’ reading skills, some
1 The mail survey sent to all adult ESL programs in June 1992 consisted of 12 questions, 10
of which required selection of multiple responses, and 2 of which were open-ended. Teach-
ers who answered yes to the question, Do you regularly have your students read aloud in
ESL classes?, were asked to check items they agreed with from lists of possible benefits and
procedures for reading aloud. Where appropriate, they ranked responses in priority order.
Teachers who answered no to the reading aloud question were invited to list reasons why.
The last survey question invited general comments from respondents on the topic of’reading
aloud.
THE FORUM 785

instructors in the survey felt that reading aloud helps keep all students
involved in the class. Some teachers also felt that reading aloud helps
to build students’ self-confidence. One teacher, for example, pointed
out that students who started as nonliterate learners 2 years earlier
expressed feelings of satisfaction and accomplishment in being able to
read aloud.
The introduction of Read Right! (Tadlock, 1986, 1991), a training
method of ESL and literacy tutors in Washington, has caused some
debate about appropriate materials for reading aloud in ESL classes.
The Read Right! method initially promoted children’s books as appro-
priate material for reading aloud because they are usually written in
simple, repetitive language. Some ESL teachers of adults have pointed
out that the content of children’s books often assumes a knowledge of
the culture in which the texts originate. Moreover, the subject matter
in children’s texts is not relevant to many adult learners. Most teachers
agree that beginning-level ESL learners, in particular, need to read
materials which are immediately relevant to their lives and which rein-
force topics introduced in listening and speaking practice.
Materials most frequently chosen for reading aloud practice by ESL
teachers who responded to the survey are published ESL texts. Nearly
65% of the respondents also use teacher-generated materials for read-
ing aloud practice, whereas student-generated texts and newspaper
articles are used by nearly half of those who completed the survey.
Some teachers in the survey who did not have learners read aloud
in their classes objected to this activity because of student discomfort
and nervousness. This objection can be alleviated by using reading
aloud only on a voluntary basis. Several reported that their classes
focus on conversational skills rather than literacy. Others cited large
classes and time constraints as reasons for not including reading aloud
in their ESL classes. Most of the teachers who cited theoretical reasons
for not including reading aloud in their instructional repertoire teach
advanced-level reading classes. They felt that reading aloud interferes
with students’ reading speed and comprehension.
The Washington survey results indicate that reading aloud is widely
used in TESOL classes. The benefits cited by teachers responding to
the survey suggest that this practice deserves serious consideration,
particularly by teachers of ESL students at beginning reading levels.
THE AUTHOR
Suzanne M. Griffin, State Director of Adult Education in Washington, chaired the
1991 TESOL Convention. She has been involved in the TESOL profession since
1968 as a teacher, textbook author, video producer, researcher, program adminis-
trator, and policy maker.
786 TESOL QUARTERLY

REFERENCES
Allington, R. (1984). Oral reading. In P. D. Pearson (Ed.), Handbook of reading
research. New York: Longman.
Bright, J. P., Koch, K., Ruttenberg, A., & Terdy, D. (1982). An ESL literacy resource
guide. Arlington Heights, IL: Northwest Educational Cooperative.
Carrell, P. L. (1987). A view of written text as communicative interaction:
Implications for reading in a second language. In J. Devine, P. L. Carrell, &
D. E. Eskey (Eds.), Research in reading in English as a second language. Washington,
DC: TESOL.
Goodman, Y. M. (1970). Using children’s reading miscues for new teaching
strategies. Reading Teacher, 23, 455–459.
Goodman, Y. M. (1972). Reading diagnosis—Qualitative or quantitative? Reading
Teacher, 26, 27-32.
Griffin, S. M. ( 1992). [Survey of ESL reading instruction practices in Washington
State]. Unpublished raw data.
Rivers, W. M. (1968). Teaching foreign language skills. Chicago: University of Chicago
Press.
Smith, F. (1971). Understanding reading: A psycholinguistic analysis of reading and
learning to read. New York: Holt, Rinehart and Winston.
Tadlock, D. (1986). A reading program based on psycholinguistics and Piaget: An
implementation manual. Seattle, WA: Adult Basic Skills and Literacy Educators
Network.
Tadlock, D. (1991). Read right! A videotape and trainer’s guide. Seattle, WA: Adult
Basic Skills and Literacy Educators Network.
Another Educator Comments. . .
PATRICIA L. ROUNDS
University of Oregon
Not long ago, as I traveled around Sri Lanka as an English for special
purposes (ESP) consultant, again and again the teachers asked me what
I thought about having students read aloud. Apparently a previous
teacher trainer had disparaged it, and they wanted to know my opinion.
I also condemned it. I gave them my reasons. But they were not all
satisfied. How could teachers help students improve their reading skills
if they didn’t have them read aloud? Wasn’t it important that students
know the correct pronunciation of the words they were reading? How
could the students read words they couldn’t pronounce? I countered
that this was certainly not true for proper names. I gave them an
example: I knew from my written schedule that I would visit Sri Jaya-
wardenapura University. I knew what this combination of symbols
meant, but I didn’t know how to pronounce it correctly. The teachers
countered: Didn’t they need to monitor their students’ pronunciation?
I replied that the taxi driver who took me to the opposite side of town
THE FORUM 787

because he misunderstood my poor Sinha pronunciation became an
excellent teacher.
But what remains in my mind from that experience is the depth of
feeling these teachers expressed in countering my arguments. For
them, reading aloud was a technique held in high esteem. I had an
emotional response to it as well. I recalled reading aloud from my
Catholic school days in New York City. We had large classes—never
fewer than 50 children in a classroom—and I remember how horribly
boring it was to listen to each of my classmates reading about Dick,
Jane, and Spot. I could and would have finished the whole book in one
of those sessions if I hadn’t feared being called on and not knowing
the place.
I do not mean to imply that reading aloud is never a useful activity.
Frodesen (1991) suggests having students read their papers aloud and
listen for errors since some students can hear their errors more easily
than they can see them. Reading scripts aloud can be useful in the
teaching of pronunciation because students are freed from the immedi-
ate pressure of simultaneously making meaning and expressing them-
selves. In no way do I discount the importance of reading stories,
poetry, or other literature aloud to children and adults. These activities
are beneficial in developing the child’s interest in and knowledge of
text and are plainly enjoyable for people of all ages.
What I am concerned about is reading aloud as a way of teaching
students to read. Reading aloud is certainly the most public of the
activities we associate with reading. But what role does it really have in
the reading process? In L1 reading research there are two research
groups, each emphasizing different aspects of the reading process
and each having different implications for teaching. One group, the
“bottom-up” group, supports instructional programs that emphasize
the role of lower level linguistic skills such as phonemic awareness in
the reading process. First language teachers in this camp teach learners
how to convert letter-to-sound correspondences into words and then
convert the meaning of strings of words into text. Students are never
asked to read words for which they have not learned the sounds. In
this approach, reading aloud would be appropriate classroom practice
because teachers need to closely monitor student progress in learning
the phoneme-grapheme correspondences.
The “top-down” group focuses on the extraction of meaning and the
role of context, inferential skills, and higher level linguistic processes in
learning to read. They emphasize activities that ask students to use the
information they read; they propose personalizing what the readers
are reading by relying on what they already know to help understand
new text. They believe that most children learn phonic skills without
direct instruction. Reading aloud has no role in this approach.
788 TESOL QUARTERLY

Armed with this understanding, modern second language teaching
literature recommends an interactive approach that recognizes the
importance of word recognition as well as higher level comprehension
skills. However, providing for the ability to control and monitor the
introduction of letter-sound correspondences is not usually part of
ESL/EFL materials design. Furthermore, as Hawkins (1991) points out,
the phonics approach is based on a presupposition that is faulty for
nonnative speakers, namely, that the learner already controls the
sounds of the language. But, if students’ pronunciation is incorrect, is
their understanding of the words also incorrect? Serpell (1968, as cited
in Hawkins, 1991) found that such misinterpretations slow reading
rate temporarily, but the context of the word is generally useful in
disambiguating meaning. For example, even if readers cannot hear or
pronounce the difference between the phonemes /I/ and /i/, they will
choose the correct word to complete the sentence, “The (ship/sheep)
docked at the harbor after a long and arduous cross-Atlantic journey,”
(Hawkins, 1991). Hence, nativelike phonological decoding is not neces-
sary for extracting accurate meaning of words in context.
It is also clear that by emphasizing linear decoding, reading aloud
implies that reading consists of knowing each word and adding up the
words to make meaning, a view prevalent in the grammar-translation
approach. However, research (Hart, 1983; Holdaway, 1985; Smith,
1983) indicates that reading cannot be taught as a process which pro-
ceeds word by word because the human brain is limited in the amount
of information it can absorb, process, and commit to memory at one
time. A building block approach to a text will lead to the meaning of
one word being forgotten before the next word can be sounded out;
no meaningful message will emerge, Haverson (1991) suggests that this
approach is particularly unsuited for adult language-minority readers
since it does not capitalize on their maturity and adult experience.
Implying that each word is equally important is also a problematic
approach for students who must read large amounts of information in
their second language. As teachers armed with an understanding of
top-down approaches, we go to great lengths to help students realize
that they can understand the overall message of a text without getting
slowed down by a word they don’t know. We feel justified in encourag-
ing the development of such strategies by research such as that of
Hosenfeld (1977), who found that successful readers skipped words
that weren’t important to the total meaning of a passage. Further, we
teach students to read selectively, focusing on more pertinent parts of
a text. In a reading aloud performance, it would not be acceptable to
skip a word, a sentence, or a paragraph; all are treated equally.
In the end, I firmly believe that all we learn from a reading aloud
performance is how well students can transform the printed symbol
THE FORUM 789

into sounds approaching some native-speaker standard. We cannot
assume they understand what they say because of capacity constraints
and because second language students are often in the situation of
sounding out words they don’t know that are presented in syntax they
don’t control. When making pedagogical decisions, we must always
keep our students’ needs in mind. Recently there has been a growing
need to access the international academic, business, and cultural com-
munities through reading English language textbooks, journals, and
magazines. Often this is the prime motivation for learning the lan-
guage. Rather than asking students to trod lockstep through a reading
passage, precious class time might be better spent on activities which
promote comprehension of subject matter content and using the ideas
in the text.
THE AUTHOR
Patricia L. Rounds teaches and does research and teacher training at the University
of Oregon and occasionally teaches and consults elsewhere in the United States
and in other countries.
REFERENCES
Frodesen, J. (1991). Grammar in writing. In M. Celce-Murcia (Ed.), Teaching
English as a second or foreign language (pp. 264–276). New York: Newbury House.
Hart, L. (1983). Human brain and human learning. New York: Longman.
Haverson, W. (1991). Adult literacy training. In M. Celce-Murcia (Ed.), Teaching
English as a second or foreign language (pp. 185–194). New York: Newbury House.
Hawkins, B. (1991). Teaching children to read in a second language. In M. Cele-
Murcia (Ed.), Teaching English as a second or foreign lnguage (pp. 169–184). New
York: Newbury House.
Holdaway, D. (1985), Stability and change in literacy learning. Exeter, NH:
Heinemann.
Hosenfield, C. (1977). A preliminary investigation of the reading strategies of
successful and nonsuccessful second language learners. System, 5, 110–123.
Smith, F. (1983). Essays into literacy. Exeter, NH: Heinemann.
790 TESOL QUARTERLY

Vol 26 4

Uploaded by

Copyright:

Available Formats

You might also like

Vol 26 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vol 26 4

Uploaded by

Copyright:

Available Formats

TABLE OF CONTENTS

A Journal for Teachers of English to Speakers of Other Languages

Interactions I: A Communicative Grammar (2nd ed.),

■ With this issue, we conclude our reliance on traditional typesetting and

Bonny Peirce seeks to demystify the TOEFL reading test at both

626 TESOL QUARTERLY

IN THIS ISSUE 627

Statistics as a Foreign Language—

As was Part 1 of this article, Part 2 is addressed to those practicing

I n Part 1 of this article, strategies were presented to help those EFL/

THINK ABOUT THE VARIABLES OF FOCUS IN

By definition, a variable is any attribute or set of observations that

Variables Can Take on Different Roles

Variables can perform five different functions in a study, and they

630 TESOL QUARTERLY

STATISTICS AS A FOREIGN LANGUAGE 631

Variables Can Be Measured on Different Scales

Once defined and labeled, the variables can be analyzed statistically

632 TESOL QUARTERLY

STATISTICS AS A FOREIGN LANGUAGE 633

Variables Can Have Independent or Repeated Levels

EXAMINE WHETHER THE CORRECT STATISTICAL

As discussed in Part 1 of this article, there are three families of

634 TESOL QUARTERLY

STATISTICS AS A FOREIGN LANGUAGE 635

CHECK THE ASSUMPTIONS UNDERLYING THE

Assumptions are preconditions that are necessary for accurate applica-

STATISTICS AS A FOREIGN LANGUAGE 639

640 TESOL QUARTERLY

STATISTICS AS A FOREIGN LANGUAGE 641

Which Assumptions Underlie Each Statistical Test?

Table 3 provides a list of all of the statistics covered in this paper

642 TESOL QUARTERLY

CONSIDER WHY THE STATISTICAL TESTS HAVE

There are obviously a large number of statistical tests listed in Tables

In order to understand the relationships between the various statisti-

STATISTICS AS A FOREIGN LANGUAGE 643

STATISTICS AS A FOREIGN LANGUAGE 647

In order to understand the relationships between the various statisti-

648 TESOL QUARTERLY

To better understand the relationships between the various statistical

STATISTICS AS A FOREIGN LANGUAGE 649

in their number in three basic ways, depending on whether there are

Another set of statistical analyses that was characterized earlier as

PRACTICE READING STATISTICAL TABLES AND

Tables are a common way of summarizing a great deal of statistical

STATISTICS AS A FOREIGN LANGUAGE 651

In any table, readers should begin by determining what the column

652 TESOL QUARTERLY

STATISTICS AS A FOREIGN LANGUAGE 653

Often when correlation coefficients are summarized in statistical

STATISTICS AS A FOREIGN LANGUAGE 655

Before discussing tables that present group comparisons, it is neces-

656 TESOL QUARTERLY

STATISTICS AS A FOREIGN LANGUAGE 657

658 TESOL QUARTERLY

STATISTICS AS A FOREIGN LANGUAGE 659

525.83, or SS ÷ df = 1577.49 ÷ 3 = 525.83. Note also that the F value