Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

The Flynn Effect on the RAPM for High Ability Groups

The generational increase in raw IQ test scores reported by Flynn have since elicited many studies on the
subject of which the phenomenon had been attributed to several causes such as improvements in nutrition,
greater access to education, test familiarity and exposure to modern living. Flynn (1987) reported that the
score increases over time are attributed to problems in the test measurement of intelligence. Many studies
have carried out on the Flynn Effect on scores around the mean, but there has been a dearth of studies on the
higher ranges of IQ. A few of the studies carried out on the FE at the higher end of the scale are that by
Herman H. Spitz (1989) on the Wechsler Scales and recently by Colom et al. (2005) using the data on the
Pressey’s Graphic Test. Spitz demonstrated in his paper that the score discrepancy between the WAIS and
Wais-R (which are normed 25 years apart) showed increase in scores across the IQ scale, decreasing towards
the left and right of the mean, with almost no increase at IQ 125 and above. Colom on the other hand, used the
Pressey’s Graphic test to demonstrate a difference in distribution of scores (between1979 to 1999) with the
mean shifting right although showing almost zero gain at the 99th percentile. These two studies it seems
indicates that massive gains in raw scores are conditional upon the level of IQ.

Studies Carried Out with the Ravens Advanced Progressive Matrices

There are several studies carried out on university samples with the Ravens Advanced Progressive Matrices
(RAPM) for various research purposes, of which I believe could offer some insights, and if indeed the Flynn
Effect has afflicted the RAPM, the raw score gains would be quite obvious over a span of more than 35 years.
The RAPM is typically administered to the top 20% of the general population and usually within university
settings, making the test ideal, if one wants to observe any gains in test scores for a high ability group. The
table below lists the test results using the 40 minute timed version of the RAPM (Set II-36 item) carried out
over a span of more than 35 years, on various university populations. The study carried out by Paul in 1985
however is based on the untimed version but included in table 1 for comparison.

Table 1 – RAPM Raw Score for University Samples


No. Reference Year University or Location Mean Raw Age
N Score SD
M F M F M F
1 Yates & Forbes, 1965 U. Western Australia 565 180 23.67 22.75 5.29 5.57
2 1st year
Francis Van Dam 1970 University of Louvain 288 22.25 students
3 Kanekar, 1977 USA 71 101 26.08 25.97 5.04 4.96
4 Paul Steven M. 1985 UC Berkeley 110 190 28.40 26.23 6.44 6.15
5 Pitariu, 1986 Romania 785 531 21.89 21.28 5.02 4.63
6 1st year
Jensen & Saccuzzo 1988 San Diego State U 261 21.69 5.90 students
7 1st year
Stough & Nettlebeck 1992 U. Adelaide 136 311 24.40 4.60 students
8 U. Toronto
Bors & Stokes 1998 Scarborough 180 326 23.00 21.68 5.6 17–30
9 Colom & Garcia-Lopez, 2002 Spain 303 301 23.90 22.40 4.80 5.30 18 (ave)
10 Lynn & Irwing 2004 USA 1,807 415 25.78 24.22 4.80 5.30
11 Universidad Auto
Abad, Colom, et al 2004 ´noma de Madrid 1,069 901 24.19 22.73 5.37 5.47 17–30
12 Colom, Escorial et al 2004 Spain 120 119 24.57 23.32 4.13 4.52 18–24
13 Day & Arthur 2004 Texas A&M 176 24.16 6.43 19.6
Note: Raven’s Advanced Progressive Matrices is essentially the 1962 version in terms of test items (APM; 1962 revision). Subsequently the Raven’s
APM scores were converted to percentiles using 1993 smoothed detailed U.S. norms (J. Raven, Raven, & Court, 1998), for the 1998 version. Scores up
to 2004 are still listed even though Teasdale & Owen (2005) have shown that the FE had peaked in 1998.
The age data of the samples listed in the table above is incomplete. However it is reasonable to assume that
since the samples are primarily from university populations and many of which were carried out on 1st year
students, the possibility for depression of scores due to age effects can mostly be ruled out. A cursory look at
the table reveals almost no significant increase in raw scores over a period of almost 4 decades. If we plot the
weighted mean of the raw scores for each study (of the male & female scores) and plot it against the year each
study was carried out, we get a scatter plot as shown on the graph below;

Note : S.M. Paul’s data is excluded as the study was carried out based on the untimed version of the RAPM

The raw scores over 37 years indicates that the RAPM is internally consistent and stable over time and
somewhat evident that the trend-line appears quite flat with no evidence of any significant increase in raw
scores since Yates and Forbes’ (1967) study. In fact, Stough & Nettlebeck(1993) arrived at the same
conclusion by comparing the mean raw score of 24.4 based on their study at the University of Adelaide,
compared to the mean raw score of 23.17 in Yates and Forbes’ study at the University Western Australia in
1965.
Raw Scores for A University Sample Compared to the 1992/993 Standardization Normative
Data

In 2002/03 Francis Van Dam & Raven (2008) carried out a study on a University of Louvain sample using the
RAPM. The study was carried out on the same cohort about 30 years apart, the aim being to investigate if
increases in age had the effect of lowering test scores. However in the process, some curious findings were
discovered when the test scores were compared against the 1992/93 UK/US standardized normative data.
Due to the fact that the Louvain study used the 48-item version of the RAPM, we need to convert the raw data
to make comparisons with scores on the 36-item RAPM. Graph 4 in the editions of the (British) Guide to the
Use of the Advanced Progressive Matrices (Raven, J. C., 1965) identifies the items eliminated when Set II was
reduced from 48 to 36 items in 1962. Items 1-8 and 17 were eliminated because everyone got them right. Item
11 was a bad item. And items 44 and 46 were too difficult. This means that to convert the 36-item test to the
48-item test, one can add 8 to scores on the new test that lie between 1 and 2, 9 to scores between 3 and 7,
10 to scores up to 33 and 12 above 33. Conversely if one wants to convert the 48-item scores to the present
36-item test, we could take off the scores accordingly. The normative data from the UK standardization in 1992
(Raven J., 2000) at the 90th percentile, is a raw score of 31 for 20 year-olds and 29 for 50 year-olds. The
scores in 1962 are re-tabulated in table 2 below;

Table 2 – RAPM Raw Score (untimed) 1992 Standardization (Dumfries)


Age Group 20 50
Year 1962 1970/71 est. 1992 1962 est. 1970/71 est. 1992 2002/03 est.
90th percentile 21 24 31 17 21 29 33
Notes: i) The normative data are from the UK Standardization in 1992, see tables 10 & 12 (Raven J 2000). ii) The raw score for 40 year olds is 17 in
1962, and for simple comparison, is taken as 17 for 50 year olds. iii) The raw score for 50 year olds in 2002 is extrapolated based on the score increase
between 1962 and 1992. iv)The 1970/71 score is interpolated between the 1962 and 1992 normative scores.

If we look at the scores of the Louvain sample in 1970/71 and the re-test scores 30 years later in 2002/2003,
the raw score for the same cohort of 99 respondents fell from 35.9 to 33.4. If we convert these to the 36-item
test equivalent we arrive at 25.9 (35.9-10) and 23.4 (33.4-10), that is a score of 25.9 for 20 year-olds in 1970
and a score of 23.4, 30 years later as 50 year-olds, keeping in mind that these scores are from the same
cohort, 30 years apart. Now if you look at table 2 again, the normative data from the 1992 standardization
indicates an increase (24 to 33), if you were to compare the raw score for 20-year olds in 1970/71 vs that for
the 50 year-olds in 2002/03 for the untimed version of test. However the Louvain data indicates a decline in
scores between the 2 age groups which are 30 years apart! Since the Louvain sample used the 40-minute
timed version of the test and the standardized data uses the untimed version, direct score comparisons cannot
be made. However, significantly, one sample shows a decline, whereas the standardized data shows a
significant increase for essentially the same test. Van Dam & Raven (2008) makes a similar comparison, using
the untimed normative data extrapolations with the mean Louvain score, although a direct comparison of raw
scores between timed and untimed versions of the test are not really comparable.

If we go back to Table 1 and pick the scores closest to the period between1962 and 1992 to make a close
comparison with the 1992 standardized data. The closest comparison will be the study carried out at the
University of Western Australia by Yates & Forbes and that carried out 27 years later at the University of
Adelaide. The weighted average (between male and female scores) mean raw score by Yates and Forbes
(1967) in 1965 is at 23.5 of which Stough & Nettlebeck (1993) estimates this at IQ 125 or the 95th percentile.
The raw score in 1992 for the University of Adelaide is at 24.4. Now let’s look at the data from the 1992
Standardization again (see table 3).

Table 3 – RAPM Raw Score (untimed) 1992 Standardization (Dumfries)


Age Group 20
Year 1962 1965 est. 1992
95th percentile 24 25 33
Notes: i) The normative data are from the UK Standardization in 1992 see tables 10 & 12 (Raven 2000). ii) The 1965 score is interpolated between the
1962 and 1992 normative data.
The normative data from the 1992 standardization is for the untimed version of the test, and the raw score
increase between 1965 and 1992 is quite large (8 raw points). However if we look at the score difference
between the Australian university samples for the 40-minute timed version of the test, the raw score increase is
only about 1 raw point. Since it would be fair to say that the standard deviations for the timed and untimed
version of the test at the 95th percentile cannot be that far apart (since we are basically looking at very similar
cohorts in terms of ability), the raw score difference 27 years apart (if indeed there is a Flynn Effect), for both
the time and untimed version of the test should not be to different in magnitude, but that is not what we get.
The normative data from the 1992 standardization seems to show a much higher score increase compared to
the raw score at the 95th percentile of the 2 Australian University samples 27 years apart.

In sum, it seems one can conclude there appears to be no Flynn effect on the raw IQ scores of the timed
version of the RAPM for university population samples when using the RAPM from 1965 through to 1992 or
even up to 2004. Paradoxically however, the 1992 standardized scores of the untimed version of the RAPM do
not exhibit the same temporal stability over pretty much the same period, even at the 95th percentile.

References:
1. Abad F. J., Colom R., Rebollo I., Escorial S. (2004) -Sex differential item functioning in the Raven’s Advanced Progressive
Matrices: evidence for bias- Pergamon Personality and Individual Differences 36 (2004) 1459–1470.
2. Colom et al (2005) - The generational intelligence gains are caused by decreasing variance in the lower half of the
distribution: Supporting evidence for the nutrition hypothesis - Intelligence 33 (2005) 83–91.
3. Day & Arthur (2004) – Ability based pairing strategies in the team-based training of a complex skill.
4. Douglas A. Bors and Tonya L. Stokes (1998) - Raven's Advanced Progressive Matrices: Norms for First-Year University
Students and the Development of a Short Form -Educational and Psychological Measurement 1998; 58; 382.
5. Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101, 171-191
6. Irwing, P. & Lynn, R. (2005) -Sex differences in means and variability on the progressive matrices in university students: A
meta-analysis -British Journal of Psychology (2005), 96, 505–524.
7. Jensen A.R., Dennis P. Saccuzzo and Gerald E. Larson (1988) -Equating the Standard and Advanced Forms of the Raven
Progressive Matrices -Educational and Psychological Measurement 1988; 48; 1091.
8. Raven J. (2000)– The Raven’s Progressive Matrices: Change and Stability over Culture and Time - Cognitive Psychology 41,
1–48 (2000).
9. Stough, C. & Nettlebeck, T. (1993) – Raven’s advanced Progressive Matrices and increases in intelligence - Personality and
Individual Differences Volume 15, No.1 pp 103-104 1993.
10. Spitz Herman H (1989) –Variations in Wechsler Interscale IQ Disparities at Different Levels of IQ - INTELLIGENCE 13, 157-
167 (1989).
11. Teasdale & Owen (2005) A long-term rise and recent decline in intelligence test performance: The Flynn Effect in reverse.
12. Van Dam, F. & Raven J. (2008) - Uses and Abuses of Intelligence Studies Advancing Spearman and Raven’s Quest for Non-
Arbitrary Metrics – Chapter 9 Does the “Flynn Effect” Invalidate the Interpretation Placed on Most of the Data Previously
Believed to Show a Decline in Intellectual Abilities with Age?
13. Yates, A. J. & Forbes, A. R. (1967). Raven’s Advanced Progressive Matrices (1962): Provisional Manual for Australia and New
Zealand. Hawthorn, Victoria: Australian Council for Educational Research.
Note: The various papers cited indicates that the RAPM items have not changed since 1962 of which Bors & Stokes compares directly the
normative data of their study with Ravens 1962 data and S.M. Paul’s 1985 data. Stough & Nettlebeck (1992) uses the 1962 version in studying
the Flynn effect over a period of 25 years. Colom & Abad (2004) makes direct comparisons of their scores to that of Arthur & Woehr (1993)
and Bors & Stokes (1998). The data quoted for Van Dam’s Study takes into account the 48-item result minus 12 of set I.

The abstract from the Teasdale and Owen paper;

http://www.iapsych.com/iqmr/fe/LinkedDocuments/teasdale2008.pdf

"Scores on cognitive tests have been very widely reported to have increased through the decades of the last
century, a generational phenomenon termed the ‘Flynn Effect’ since it was most comprehensively documented
by James Flynn in the 1980's.
There has, however, been very little evidence concerning any continuity of the effect specifically into the
present century. We here report data from a population, namely young adult males in Denmark, showing that
whereas there were modest increases between 1. and 1998 in scores on a battery of four cognitive tests–these
constituting a diminishing continuation of a trend documented back to the late 1950's–scores on all four tests
declined between 1998 and 2003/2004. For two of the tests, levels fell to below those of 1988. Across all tests,
the decrease in the 5/6 year period corresponds to approximately 1.5 IQ points, very close to the net gain
between 1988 and 1998. The declines between 1998 and 2003/4 appeared amongst both men pursuing higher
academic education and those not doing so."

this one by Michael Shayer1* and Denise Ginsburg2


* College, University of London, London, UK * Consultant, Cambridge, UK

http://www.cognitiveacceleration.co.uk/documents/ca_approach/30yearson_II.pdf

"Conclusion. The negative Flynn-effect found on Volume & Heaviness for Y7 pupils
is paralleled by a similar negative effect on attainment of formal operations by Y8
and Y9, compared with 1976. Yet at the same time the proportion of pupils using
the top level of concrete operational thinking has increased on both tests. It seems
that there has been a change either in general societal pressures on the individual or
in the style of teaching in schools – or both – favouring a lower level of processing
of reality."

this one by James Flynn himself!!!

http://www.telegraph.co.uk/education/educationnews/4548943/British-teenagers-have-lower-IQs-than-their-
counterparts-did-30-years-ago.html

excerpt : "Professor James Flynn, of the University of Otago in New Zealand, the discoverer of the Flynn effect
and the author of the latest study, believes the abnormal drop in British teenage IQ could be due to youth culture
having "stagnated" or even dumbed down."

This study on the Ravens Matrices; http://www.iapsych.com/iqmr/fe/LinkedDocuments/brouwers2009.pdf


Variation in Raven's Progressive Matrices scores across time and place
Symen A. Brouwers a,⁎, Fons J.R. Van de Vijver a,b, Dianne A. Van Hemert c

Abstract: The paper describes a cross-cultural and historical meta-analysis of Raven's Progressive Matrices.
Data were analyzed of 798 samples from 45 countries (N=244,316), which were published between 1944 and
2003. Country-level indicators of educational permeation (which involves a broad set of interrelated educational
input and output factors that are strongly related to economic development), the samples' educational age, and
publication year were all independently related to performance on Raven's matrices. Our data suggest that the
Flynn effect can be found in high as well as low GNP countries, although its size is moderated by education-
related sample and country characteristics and seems to be smaller in developed than in emerging
countries.

Table 5
Size of the Flynn Effect by country (β =standardized regression coefficients).
Order of data; Country, Number of years ,Number of samples, β, R2
Australia 7, 35, −.26, .33
Canada 8, 20, −.52, .68
Germany (West) 8, 25, −.05 .40
India 8, 41 ,.62, .44
Iran 2, 22, .64,.95
Poland 5, 72, .55 ,.60
United Kingdom 14, 129, .53, .52
United States 17, 99, −.01,0.20

Whats interesting about table 5 is that all the developed countries except for the UK shows a negative regression
coefficient (β) i.e a negative flynn effect.

Also the paper found that the size of the Flynn effect showed a significantly negative
correlation with the Gross National Product of the country, r(8)=−.74,p (rho) < .05.
I dug up this paper from my desk-top on the Flynn effect. This paper was to find out how the Flynn Effect
affects the entire distribution. The study was carried out in Barcelona using a culture free test (Pressey’s
Graphic Test) 29 years apart.

Please refer to table 2; the largest gains on raw scores, were on the lower half of the distribution with almost no
gain at the 99th percentile. This has the same effect I described in my point i. above where the mean has shifted
right (see fig 1) but the standard deviation has become smaller resulting in no gains at the higher end on the
distribution.

The url link for anyone interested; http://www.iapsych.com/iqmr/fe/LinkedDocuments/colom2005.pdf

This paper also touches on brain morphology, in that the reported average brain size increase from 1950-2000 is
about 1 SD. Although controversial, there are several studies showing a significant correlation between brain
size and IQ, hence partly supporting the nutritional hypothesis of the gains in raw scores.

You might also like