Professional Documents
Culture Documents
The Pygmalion Effect and Its Mediating Mechanisms - Rosenthal 2002
The Pygmalion Effect and Its Mediating Mechanisms - Rosenthal 2002
2
The Pygmalion Effect and Its
Mediating Mechanisms
ROBERT ROSENTHAL
Department of Psychology, University of California, Riverside
This Chapter is based in part on an invited address given to the Teachers of Psychology in the
Secondary Schools (TOPSS) and subsequently published in Psychology TeacherNetwork (~N), 1998,
Vol. 8, pp. 2-4, 9. It is an updated version of papers cited in the references of the P~FNpaper.
Correspondence concerning this chapter should be addressed to Robert Rosenthal, Department
of Psychology, University of California, Riverside, CA 92521~0426.
ImprovingAcademicAchievement
Copyright2002,ElsevierScience(USA).All rightsreserved. 25
26 Robert Rosenthal
Human Subjects
In the first of our studies employing human subjects, 10 students of psychology,
both undergraduate and graduate, served as the experimenters (Rosenthal &
Fode, 1963b). All were enrolled in an advanced course in experimental psych~
ology and were already involved in conducting research. Each student-
experimenter was assigned as his or her research participants about 20
students of introductory psychology. The procedure was for the experimenters
to show a series of 10 photographs of people's faces to each of their partici~
pants individually. Participants were to rate the degree of success or failure
shown in the face of each person pictured in the photos. Each face could be
rated at any value from - 10 to -4-10,with - 10 meaning extreme failure and -t- 10
meaning extreme success. The 10 photos had been selected so that, on the
average, they would be seen as neither successful nor unsuccessful, but quite
neutral, with an average numerical score of zero.
All 10 experimenters were given identical instructions on how to administer
the task to their participants and were given identical instructions to read
to them. They were cautioned not to deviate from these instructions. The
purpose of their participation, it was explained to all experimenters, was to
see how well they could duplicate experimental results that were already well~
established. Half the experimenters were told that the "well~established"
finding was such that their participants should rate the photos as of successful
people (ratings of -I-5) and half the experimenters were told that their partici-
pants should rate the photos as being of unsuccessful people (ratings of -5).
Results showed that experimenters expecting higher photo ratings obtained
higher photo ratings than did experimenters expecting lower photo ratings.
Subsequent studies tended to obtain generally similar results (Rosenthal, 1969;
Rosenthal & Rubin, 1978).
Animal Subjects
Pfungst's work with Clever Hans and Pavlov's work on the inheritance
of acquired characteristics had both suggested the possibility of experimenter
expectancy effects with animal subjects (Gruenberg, 1929; Pfungst, 1965).
In addition, Bertrand Russell (1927) had noted this possibility, adding that
animal subjects take on the national character of the experimenter. As he
put it: "Animals studied by Americans rush about frantically, with an incredible
display of hustle and pep, and at last achieve the desired result by chance.
Animals observed by Germans sit still and think, and at last evolve the solution
out of their inner consciousness" (pp. 29-30).
But it was not only the work of Pavlov, Pfungst, and Russell that made us test
the generality of experimenter expectancy effects by working with animal
2. The Pygmalion Effect and Its Mediating Mechanisms 27
subjects. It was also the reaction of my friends and colleagues who themselves
worked with animal subjects. That reaction was: "Well of course you'd find
expectancy effects and other artifacts when you work with humans; that's why
we work with rats."
A good beginning might have been to replicate with a larger sample size
Pfungst's research with Clever Hans; but with horses hard to come by, rats were
made to do (Rosenthal & Fode, 1963a).
A class in experimental psychology had been performing experiments with
human participants for most of a semester. Now they were asked to perform
one more experiment, the last in the course and the first employing animal
subjects. The experimenters were told of studies that had shown that maze~
brightness and maze~dullness could be developed in strains of rats by succes~
sive inbreeding of the well and the poorly performing maze runners. Sixty
laboratory rats were equitably divided among the 12 experimenters. Half the
experimenters were told that their rats were maze~bright and the other half
were told their rats were maze~dull. The animal's task was to learn to run to the
darker of two arms of an elevated T~maze. The two arms of the maze, one white
and one gray, were interchangeable; and the "correct" or rewarded arm was
equally often on the right as on the left. Whenever animals ran to the correct
side they obtained a food reward. Each rat was given 10 trials each day for 5
days to learn that the darker side of the maze was the one that led to the food.
Beginning with the first day and continuing on through the experiment,
animals believed to be better performers became better performers. Animals
believed to be bright showed a daily improvement in their performance, while
those believed to be dull improved only to the third day and then showed a
worsening of performance. Sometimes an animal refused to budge from the
starting position. This happened 11% of the time among the allegedly bright
rats; but among the allegedly dull rats it happened 29% of the time. When
animals did respond and correctly so, those believed to be brighter ran faster
to the rewarded side of the maze than did even the correctly responding rats
believed to be dull.
When the experiment was complete, all experimenters rated their rats and
their own attitudes and behavior vis-a-vis their animals. Those experimenters
who had been led to expect better performance viewed their animals as bri~
ghter, more pleasant, and more likable. These same experimenters felt more
relaxed in their contacts with the animals and described their behavior toward
them as more pleasant, friendly, enthusiastic, and less talkative. They also
stated that they handled their rats more and also more gently than did the
experimenters expecting poor performance.
The next experiment with animal subjects also employed rats, this time
using not mazes but Skinner boxes (Rosenthal & Lawson, 1964). Because
the experimenters (39) outnumbered the subjects (14), experimenters worked
in teams of two or three. Once again about half the experimenters were
led to believe that their subjects had been specially bred for excellence of
28 Robert Rosenthal
performance. The experimenters who had been assigned the remaining rats
were led to believe that their animals were genetically inferior.
The learning required of the animals in this experiment was more complex
than that required in the maze learning study. This time the rats had to learn in
sequence and over a period of a full academic quarter the following behaviors:
to run to the food dispenser whenever a clicking sound occurred; to press a bar
for a food reward; to learn that the feeder could be turned off and that
sometimes it did not pay to press the bar; to learn new responses with only
the clicking sound as a reinforcer (rather than the food); to bar-press only in the
presence of a light and not in the absence of the light; and, finally, to pull on a
loop that was followed by a light that informed the animal that a bar-press
would be followed by a bit of food.
At the end of the experiment the performance of the animals believed to be
superior was superior to that of the animals believed to be inferior, and the
difference in learning favored the allegedly brighter rats in all five of the labora~
tory sections in which the experiment was conducted.
If rats became brighter when expected to, then it would not be farfetched to
think that children could become brighter when expected to by their teachers.
Indeed, Kenneth Clark (1963) had for years been saying that teachers' expect~
ations could be very important determinants of intellectual performance.
Clark's ideas and our research should have sent us right into the schools to
study teacher expectations; but that is not what happened.
What did happen was that after our laboratory had completed about a dozen
studies of experimenter expectancy effects (we no longer used the term uncon.-
scious experimenter bias), I summarized our results in an article for the American
Scientist (Rosenthal, 1963). I concluded this article by wondering whether the
same interpersonal expectancy effects found in psychological experimenters
might not also be found in physicians, psychotherapists, employers, and
teachers: "When the master teacher tells his apprentice that a pupil appears
to be a slow learner, is this prophecy then self-fulfilled?" (p. 280).
Among the reprint requests for this paper there was one from Lenore
F. Jacobson, the principal of an elementary school in South San Francisco,
California. I also sent her a stack of unpublished papers and thought no more
about it. Soon after, Lenore Jacobson wrote me a letter telling of her interest in
the problem of teacher expectations. She ended her letter with the following
line: "If you ever 'graduate' to classroom children, please let me know whether I
can be of assistance." I gratefully accepted Lenore's offer of assistance and
asked whether she would consider collaborating on a project to investigate
teacher expectancy effects. A tentative experimental design was suggested in
this letter as well.
2. The Pygmalion Effect and Its Mediating Mechanisms 29
Lenore replied, mainly to discuss concerns over the ethical and organiza-
tional implications of creating false expectations for superior performance in
teachers. If this problem could be solved, her school would be ideal, she felt,
with children from primarily lower~class backgrounds. Lenore also suggested
gently that I was "a bit naive" to think one could just tell teachers to expect
some of their pupils to be "diamonds in the rough." We would have to
administer some new test to the children, a test the teachers would not know.
Phone calls and letters followed, and in January of 1964, a trip to South San
Francisco to settle on a final design and to meet with the school district's
administrators to obtain their approval. This approval was forthcoming be~
cause of the leadership of the school superintendent, Dr. Paul Nielsen. Ap-
proval for this research had already been obtained from Robert L. Hall, Program
Director for Sociology and Social Psychology for the National Science Founda~
tion, which had been supporting much of the early work on experimenter
expectancy effects.
An Unexpected Finding
At the time the Pygmalion experiment was conducted there was already
considerable evidence that interpersonal self~fulfilling prophecies could
occur, at least in laboratory settings. It should not then have come as such a
great surprise that teachers' expectations might affect pupils' intellectual
30 Robert Rosenthal
development. For those well-acquainted with the prior research, the surprise
value was, in fact, not all so great. There was, however, a surprise in the
Pygmalion research. For this surprise there was no great prior probability, at
least not in terms of many formal research studies.
At the end of the school year of the Pygmalion study, all teachers were asked
to describe the classroom behavior of their pupils. Those children in whom
intellectual growth was expected were described as having a better chance of
becoming successful in the future, as more interesting, curious, and happy.
There was a tendency, too, for these children to be seen as more appealing,
adjusted, and affectionate, as less in need of social approval. In short, the
children in whom intellectual growth was expected became more intellectually
alive and autonomous, or at least were so perceived by their teachers.
But we already know that the children of the experimental group gained
more intellectually, so that perhaps it was the fact of such gaining that accoun~
ted for the more favorable ratings of these children's behavior and aptitude.
But a great many of the control group children also gained in IQ during the
course of the year. We might expect that those who gained more intellectually
among these undesignated children would also be rated more favorably by
their teachers. Such was not the case. The more the control group children
gained in IQ the more they were regarded as less well-adjusted, as less interest~
ing, and as less affectionate.
From these results it would seem that when children who are expected to
grow intellectually do so, they are benefited in other ways as well. When
children who are not specifically expected to develop intellectually do so,
they seem either to show accompanying undesirable behavior or at least are
perceived by their teachers as showing such undesirable behavior. If children
are to show intellectual gain, it seems to be better for their real or perceived
intellectual vitality, and for their real or perceived mental health, if their teacher
has been expecting them to grow intellectually. It appears worthwhile to
investigate further the proposition that there may be hazards to unpredicted
intellectual growth (Rosenthal, 1974).
Reactions to Pygmalion
Reactions to Pygmalion were extreme. Many were very favorable, many were
very unfavorable. Elsewhere we have noted in considerable detail the best
known of these negative criticisms and given reasons in considerable detail
why they were not compelling (Rosenthal, 1985, 1987, 1995; Rosenthal &
Rubin, 1971, 1978). For our present purposes, it will be enough to give a very
brief overview of these criticisms and why they were not compelling.
1. The analyses of the data were criticized on the grounds that the analyses
should have focused on classrooms as a whole rather than on individual
children. In fact, we had analyzed the data both ways, that is, by
2. The Pygmalion Effect and Its Mediating Mechanisms 31
classrooms and by children, a fact made clear in our report, and both
ways gave essentially the same results.
2. A second criticism claimed that because the same IQ test had been
employed for both the pretest and the posttest, the study might suffer
from practice effects. It puzzles us how practice effects could bias the
results of a randomized experiment. If practice effects were so great as to
drive everyone's performance up to the limit, or ceiling, of the test, then
practice effects could operate to diminish the effects of the experimental
manipulation but they could not operate to increase those effects.
3. A third criticism was that teachers themselves administered the group
tests of IQ. As it turned out, however, when children were retested by
testers who were blind to the experimental conditions, indeed to the
existence of experimental conditions, the effects of teacher expect-
ations actually increased rather than decreased.
4. A fourth criticism was that we had employed group tests of IQ, tests that
were less reliable than individually administered tests of IQ. This criticism
suggested that the teacher expectancy effect might be due to this
greater unreliability of the test instrument. As it turns out, however,
lower reliability of a test instrument makes it harder, not easier, to obtain
statistically significant results.
5. A fifth criticism was that the IQ of the youngest children had been
measured with low validity. Actually, the validity of the measures of even
these youngest children (r--0.65) was substantially higher than the
validity of many IQ tests, and much higher than the validity of psy~
chological tests in general (Cohen, 1988). Incidentally, even if we set
aside the results from these youngest children, ample evidence remains
for the operation of teacher expectancy effects.
6. A sixth criticism suggested that the data should have been transformed
mathematically before being analyzed. Critics then transformed the
original data of Pygmalion in eight different ways. Some of these trans~
formations were seriously biased (e.g., discarding data showing greater
teacher expectancy effects). Despite this, however, none of the trans~
formations gave results noticeably different from those reported in the
Pygmalion experiment. For total IQ, every transformation gave a signifi~
cant result when one had been reported in the Pygmalion experiment.
When verbal IQ and reasoning IQ were considered separately, the various
transformations yielded more significant teacher expectancy effects than
had been reported in Pygmalion.
A Heuristic N o t e
Before leaving the topic of negative criticisms of Pygmalion, one casual obser~
vation should be offered for any possible heuristic value it may hold. The bulk
of the criticism of Pygmalion came neither from mathematical statisticians, nor
32 Robert Rosenthal
from experimental social psychologists, nor from educators (though the presi-
dent of a large teachers' union attacked Pygmalion bitterly as an affront to the
good name of the teacher or the teachers' union). The bulk of the negative
reactions came from workers in the field of educational psychology. Perhaps it
was only they who would have been interested enough to respond. That seems
unlikely, however, as many other kinds of psychologists regarded the Pygma~
lion effect as of great interest. We leave the observation as just a curiosity, one
that might be clarified by future workers in the fields of the history, the
sociology, and the psychology of science.
Psychological researchers are, and should be, a skeptical lot. They demand that
claims to knowledge be based on credible empirical evidence. But that is not
enough. We demand also that phenomena claimed as knowledge be replicable.
A finding is not believed for long if it cannot be replicated by other workers in
the field.
For the research area of interpersonal expectations in general, there have
been nearly 500 replication studies, the vast majority of which have found that
what one person expects of another tends to elicit that behavior from that
other person. The average magnitude of the effect can be expressed as a cor~
relation of about 0.30 (a very substantial magnitude) between what has been
expected from research participants and what has been obtained from re~
search participants (Rosenthal, 1984, 1991a, 1991b, 1998).
Summaries of replications are also available for just the effects of interper~
sonal expectations on pupils' IQ performance (Raudenbush, 1984, 1994; Smith,
1980). Both Raudenbush and Smith found significant overall effects of inter~
personal expectations on students' IQ. Raudenbush's analysis (1994) was
designed to investigate the relationship between the credibility of the expect~
ancy induction and the magnitude of the teacher expectancy effect on pupil IQ.
He reasoned that inductions of expectations in teachers would be credible only
to the extent that teachers did not already know the children and, thus, had not
already established expectations on the basis of their direct experience. AI-
though the effects of teacher expectations were significant for his full set of 19
studies, he found dramatic differences in effect sizes as a function of how long
teachers had known pupils before the induction of the expectation. Of the
studies in which teachers knew pupils only 2 weeks or less, 91% showed results
in the predicted direction, compared with only 12% of the studies in which
teachers knew pupils longer than 2 weeks. This was very strong evidence that
simply telling teachers that pupils will do well is not very effective if there are
strong prior bases for teachers having formed their own expectations.
It may be of interest to note the typical magnitude of the effect on IQ
of experimentally induced favorable teacher expectations. For the studies
2. The Pygmalion Effect and Its Mediating Mechanisms 33
References
Clark, K. B. (I 963) Educational stimulation of racially disadvantaged children. In A. H. Passow (Ed.),
Education in depressed areas. New York: Bureau of Publications, Teachers College, Columbia
University.
Cohen, J. (1988) Statistical power analysis for tile behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum.
Gruenberg, B. C. (1929). Tile story of evolution. Princeton, NJ: Van Nostrand.
Harris, M. J., & Rosenthal, R. (1985). The mediation of interpersonal expectancy effects: 31 meta-
analyses. Psychological Bulletin, 97, 363-386.
Harris, M. J., & Rosenthal, R. (1986). Four factors in the mediation of teacher expectancy effects. In
R. S. Feldman (Ed.), The social psychology of education (pp. 91-114). New York: Cambridge University
Press.
Merton, R. K. (I 948). The self-fulfilling prophecy. Antioch Review, 8, 193-210.
Merton, R. K. (1987). Three fragments from a sociologist's notebooks: Establishing the phenom-
enon, specified ignorance, and strategic research materials. Annual Review of Sociology, 13, 1-28.
Pfungst, O. (1965). Clever Hans (C. L. Rahn, Trans.). New York: Holt, Rinehart & Winston. (Original
work published 1911)
Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the
credibility of expectancy induction: A synthesis of findings from 18 experiments. ]ournal of
Educational Psychology, 76, 85-97.
Raudenbush, S. W. (1994). Random effects models. In H. Cooper & L. V. Hedges (Eds.), Tile handbook
of research synthesis. New York: Russell Sage Foundation.
Rosenthal, R. (1963). On the social psychology of the psychological experiment: The experi-
menter's hypothesis as unintended determinant of experimental results. American Scientist, 51,
268-283.
Rosenthal, R. (1969). Interpersonal expectations. In R. Rosenthal & R. L. Rosnow (Eds.), Artifact in
behavioral research (pp. 181-277). New York: Academic Press.
36 Robert Rosenthal
Rosenthal, R. (1973). The mediation of Pygmalion effects: A four factor "theory." Papua New Guinea
Journal of Education, 9, 1-12.
Rosenthal, R. (1974). On the social psychology of the self.fulfilling prophecy: Further evidencefor Pygmalion
effects and their mediating mechanisms (Module 53, pp. 1-28). New York: MSS Modular Pub.
Rosenthal, R. (1984). Meta~analytic proceduresfor social research. Newbury Park, CA: Sage.
Rosenthal, R. (1985). From unconscious experimenter bias to teacher expectancy effects. In J. G.
Dusek, V. C. Hall, & W. J. Meyer (Eds.), Teacher expectancies (pp. 37-65). Hillsdale, NJ: Lawrence
Erlbaum.
Rosenthal, R. (1987). Pygmalion effects: Existence, magnitude, and social importance. Educational
Researcher, / 6, 37-41.
Rosenthal, R. (1991a). Meta..analytic proceduresfor social research (rev. ed.). Newbury Park, CA: Sage.
Rosenthal, R. (1991b). Teacher expectancy effects: A brief update 25 years after the Pygmalion
experiment. Journal of Research in Education,/, 3-12.
Rosenthal, R. (1995). Critiquing Pygmalion: A 25-year perspective. Current Directions in Psychological
Science, 4, 171-172.
Rosenthal, R. (1998). Interpersonal expectancy effects: A forty year perspective. Psychology Teacher
Network, 8, 2-4, 9.
Rosenthal, R., & Fode, K. L. (1963a). The effect of experimenter bias on the performance of the
albino rat. Behavioral Science, 8, 183-189.
Rosenthal, R., & Fode, K. L. (1963b). Three experiments in experimenter bias. Psychological Reports,
12, 491-511.
Rosenthal, R., & Jacobson, L. (1966). Teachers' expectancies: Determinants of pupils' IQ gains.
Psychological Reports, I 9, 115-118.
Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the classroom. New York: Holt, Rinehart & Winston.
Rosenthal, R., & Jacobson, L. (1992). Pygmalion in the classroom (expanded ed.). New York: Irvington.
Rosenthal, R., & Lawson, R. (1964). A longitudinal study of the effects of experimenter bias on the
operant learning of laboratory rats. Journal of Psychiatric Research, 2, 61-72.
Rosenthal, R., & Rubin, D. B. (1971). Pygmalion reaffirmed. In J. D. Elashoff & R. E. Snow (Eds.),
Pygmalion reconsidered (pp. 139-155). Worthington, OH: C. A. Jones.
Rosenthal, R., & Rubin, D. B. (1978). Interpersonal expectancy effects: The first 345 studies. The
Behavioral and Brain Sciences, 3, 377-386.
Russell, B. (1927). Philosophy. New York: Norton.
Smith, M. L. (1980). Teacher expectations. Evaluation in Education, 4, 53-55.