Professional Documents
Culture Documents
Correcting For Sample Problems in PISA and The Improvement in Portuguese Students' Performance
Correcting For Sample Problems in PISA and The Improvement in Portuguese Students' Performance
Correcting For Sample Problems in PISA and The Improvement in Portuguese Students' Performance
Pedro Freitas, Luís Catela Nunes, Ana Balcão Reis, Carmo Seabra & Adriana
Ferro
To cite this article: Pedro Freitas, Luís Catela Nunes, Ana Balcão Reis, Carmo Seabra &
Adriana Ferro (2015): Correcting for sample problems in PISA and the improvement in
Portuguese students’ performance, Assessment in Education: Principles, Policy & Practice, DOI:
10.1080/0969594X.2015.1105784
Article views: 57
Nova School of Business and Economics, Universidade Nova de Lisboa, Lisbon, Portugal
(Received 17 March 2015; accepted 5 October 2015)
Downloaded by [McMaster University] at 22:24 09 April 2016
1. Introduction
The Programme for International Student Assessment (PISA) was launched by the
OECD in 2000 to test the skills and knowledge of 15-year-old students in the fields
of Reading, Mathematics and Sciences. Together with the results on the level of pro-
ficiency in the above-mentioned fields, it includes surveys to the students, their fami-
lies and their schools about a considerable number of characteristics that are
expected to be related with the educational achievement of students.
The results from the PISA tests together with the microdata from the PISA
datasets have been extensively used in empirical research: Rangvid (2007), and
Schneeweis and Winter-Ebnner (2007), analysed the role of peer effects in
achievement in Denmark and Austria, respectively, using data from PISA 2000;
Vandenberghe and Robin (2004), also using data from PISA 2000, studied the
effects of attending private vs. public schools; Corak and Lauzon (2009), analysed
the impact of class size for Canada; Fertig and Wright (2005), looked at school
quality effects for the set of countries that participated in PISA 2000; Hanushek,
Link, and Woessmann (2013), using data from the four PISA data-sets 2000 to
*Corresponding author. Email: abr@novasbe.pt
Our study reveals that in the three PISA waves examined, there were consider-
able deviations between the population represented by these PISA samples and the
effective Portuguese population. We recalculate PISA scores with a revised set of
weights for each group of students, considering their grade and track of studies, and
type of school, and show that there is a sizable impact on average scores. Instead of
the stagnation between 2009 and 2012 reported by PISA, we find an increase in
both Reading and Mathematics during this period. In Sciences, instead of the fall
reported in PISA, we observe stability in the results. We decompose the evolution of
the scores into two effects: (i) change in the student population distribution by grade
and track of studies, and type of school; and (ii) evolution in the performance of
each type of student. We apply this decomposition to the whole system and also
separately to public and private schools. Our conclusion is that the improvement in
students’ scores was more important than the change in the population structure for
explaining the overall positive evolution of the Portuguese PISA results from 2006
to 2012.
The organisation of the paper is as follows. In Section 2, we describe the data.
In Section 3, we discuss the representativeness of the PISA samples and in Section 4,
the PISA scores are recalculated using a new set of weights adjusted to reflect the
actual Portuguese student population. In Section 5, we provide a decomposition of
the evolution of the recalculated PISA scores. Section 6 concludes.
2. Data
Our analysis of the representativeness of the PISA sample used for Portugal became
possible as a result of the disclosure by the Portuguese Government, in 2014, of a
rich administrative data-set with students’ population data since 2007.3 For 2006,
although the official statistical data are less detailed, the level of disaggregation was
still sufficient to assess the representativeness of the PISA sample. For 2000 and
2003, the available data are insufficient to carry out the exercise. Our analysis below
is therefore focused on the period 2006 on.
We also use the PISA data-sets available online for the 2006, 2009 and 2012
cycles.4 The target population of PISA studies is 15-year-old students enrolled in or
above the 7th grade.5 The sampling process is summarised in two steps. In the first
PISA samples, the schools according to previously defined stratification criteria. In
the second step, students are randomly selected from each of these schools.
4 P. Freitas et al.
detailed data-set. We obtained from these population data-sets information about 15-
year-old students’ grades, tracks of studies and types of school (public vs. private).10
Table 2 summarises the comparison between the populations represented by the
PISA samples, taking into account the weight of each observation and the actual
population. The weighted sample represents the student population enrolled in both
the private and public system in the year of the test.
It is clear that there are considerable differences between the population repre-
sented in PISA (weighted sample) and the 15-year-old Portuguese student popula-
tion in the years under analysis. There are also some differences in terms of the
distribution of students per private and public schools. Table 2 shows the number of
observations in each PISA sample and the represented population per type of
school.
Table 3 shows the scores attained in PISA, disaggregated by the grade and the
track of studies the student is enrolled in for students in public schools.11 As can be
seen, the lowest scores in the PISA tests are achieved by students in the lowest
grades of the academic track or in Lower Secondary Vocational courses and the dif-
ferences are large.12 The strong dependence of the Portuguese students’ results on
PISA on the grade students are enrolled in has already been shown by Pereira
(2011), and Pereira and Reis (2012). Their findings corroborate O’Leary’s (2001),
remarks. The table also shows a strong dependence on the track of studies, a fact
that was found to have important implications in the case of Austria (Neuwirth,
2006). In that country, the PISA 2000 assessment did not adequately cover students
enrolled in combined school and work-based vocational programmes. As a conse-
quence, the Austrian PISA 2003 national report erroneously reported a fall in perfor-
mance in all three PISA domains. This justifies our focus on the variable ‘grade and
track of studies’ regarding the population representativeness of the sample.
6
P. Freitas et al.
7th grade 324.15 336.49 348.37 372.09 369.16 387.58 365.58 358.12 371.97
(72.92) (61.26) (55.52) (53.13) (56.24) (45.59) (59.74) (48.64) (53.25)
8th grade 389.45 386.52 396.84 407.14 397.03 418.47 406.57 395.37 407.63
(60.82) (52.13) (51.62) (56.98) (55.73) (52.61) (59.58) (52.02) (55.16)
9th grade 445.87 439.97 448.59 460.30 458.48 468.41 464.41 459.22 465.83
(68.56) (59.21) (61.91) (63.39) (65.21) (58.86) (66.43) (69.88) (65.76)
Lower Secondary Vocational 336.94 350.46 354.68 368.30 367.06 381.26 357.01 370.50 376.30
(76.21) (64.18) (54.08) (63.02) (58.71) (58.11) (70.16) (59.24) (61.96)
Upper Secondary Academic 539.58 524.49 533.75 539.32 536.05 538.39 541.05 540.48 538.48
(61.48) (62.37) (61.49) (57.81) (64.81) (59.78) (61.51) (69.18) (61.99)
Upper-Secondary Technological 501.38 501.41 503.08 490.07 514.55 488.50 521.02 529.34 521.76
(61.27) (61.81) (61.65) (49.60) (46.94) (35.85) (69.62) (85.44) (78.49)
Upper-Secondary Professional 410.27 423.32 487.48 466.93 479.61 469.21 479.37 483.28 479.98
(58.21) (27.79) (44.54) (58.75) (55.66) (56.71) (57.05) (56.00) (52.06)
Table 4. Distribution of students according to the grade and track of studies – public schools.
2006 2009 2012
PISA Actual Population difference PISA Actual population difference PISA Actual population difference
Grade and track of studies (1) (2) (3) (4) (5) (6) (7) (8) (9)
7th grade 6.7% 8.3% −1.6*** 2.2% 5% −2.8*** 2.3% 3.8% −1.5***
8th grade 13.2% 14.1% −.9** 9% 10.3% −1.3*** 8% 8.6% −.6
9th grade 29.0% 28.9% .1 27.5% 19.4% 8.1*** 26.8% 19.5% 7.3***
Lower Secondary Vocational 2.3% 4.6% −2.3*** 7% 10.4% −3.4*** 10.4% 10.1% .3
Upper secondary Academic 37.8% 37.3% .5 48.6% 47.1% 1.5*** 45.2% 49.8% −4.6***
Upper-Secondary Technological 10.9% 6.4% 4.5*** .6% 0.8% −.2 .2% .1% .1
Upper-Secondary Professional .1% .4% −.3 5.3% 6.6% −1.3*** 7.2% 8% −.8**
Total 100% 100% 100% 100% 100% 100%
Note: z-Test for difference in proportions (PISA vs. actual population).
*Statistically significant at the .10 level.
**Statistically significant at the .05 level.
***Statistically significant at the .01 level.
Assessment in Education: Principles, Policy & Practice
7
8 P. Freitas et al.
under analysis. We can also see the near disappearance of the Upper Secondary
Technological Courses and the increase of the Upper Secondary Professional
Courses and Lower-Secondary Vocational Courses.
Table 4 also compares the distribution of grade and track of studies according to
the PISA weights (columns 1, 4 and 7) and the weights in the actual population
(columns 2, 5 and 8). For 2006 we see some large deviations in the Upper Second-
ary Technological courses, which are strongly overrepresented in the PISA data-set
and in the Lower Secondary Vocational courses, which are underrepresented.
Regarding 2009, we find overrepresentation of the 9th grade and underrepresentation
of the 7th grade and of the Lower Secondary Vocational courses. In 2012, we again
find overrepresentation of the 9th grade but now there is a strong underrepresenta-
tion of Upper Secondary Academic courses. These differences are important, as we
have seen that scores vary considerably according to the grade and track of studies
the student is enrolled in (Table 3).
Downloaded by [McMaster University] at 22:24 09 April 2016
As mentioned above, one possible source for these discrepancies is the occur-
rence of different rates of participation across students in different grades and tracks
of studies. Although this disaggregated information is not available, the participation
rates of Portuguese students are always below the OECD average, as can be
observed in Table 6.13
Table 5. Distribution of Students according to the grade and track of studies – private schools.
2006 2009 2012
PISA Actual population difference PISA Actual Population difference PISA Actual population difference
Grade and track of studies (1) (2) (3) (4) (5) (6) (7) (8) (9)
7th grade 3.9% 3.6% .3 .7% 1.8% −1.1 .4% 1.1% −.7
8th grade 10% 8.4% 1.6 3.4% 5.7% −2.3* 2.1% 3.6% −1.5
9th grade 27.6% 25.8% 1.8 17.4% 10.5% 6.9*** 19.2% 12.1% 7.1***
Lower Secondary Vocational 0% 7% −7*** 8.5% 10% −1.5 2.6% 12% −9.4***
Upper Secondary Academic 38.5% + 20.1% 33% + 8% 17.6*** 43.9% + 13.4% 52.2% 5.1*** 66%+0% 50.2% 15.8***
+ Technological
Upper-Secondary 0% 14.4% −14.4*** 12.7% 19.9% −7.2*** 9.8% 21.1% −11.3***
Professional
Total 100% 100% 100% 100% 100%
100%
Note: Null hypothesis of no difference in proportions (PISA vs. actual population) is checked with a z-test.
*Statistically significant at the .10 level.
**Statistically significant at the .05 level.
***Statistically significant at the .01 level.
Assessment in Education: Principles, Policy & Practice
9
10 P. Freitas et al.
points. This evolution is fairly close to the one that was reported by PISA. However,
between 2009 and 2012 the evolution is quite different from that reported: there is
an increase in both Reading and Mathematics, instead of a stagnation, and a
stabilisation of the results in Sciences, instead of the decrease reported (Figure 1).
to PISA results. For instance, Barrera-Osorio et al. (2011) analysed the evolution of
Indonesian results and Ramos, Duque, and Nieto (2012) studied the rural–urban dif-
ferential in student achievement in Colombia. We use a similar although simpler
approach to decompose the evolution of results. The graphic representation presented
in Figure 2 illustrates our procedure.
In step 1, the PISA score for year x is reported considering the student distribu-
tion observed in the population for that same year. The same happens in step 3, for
year y. Between these two years, in step 2, an intermediate computation is per-
formed, recalculating the PISA scores at year y, but assuming the student distribu-
tion in terms of grade and track of studies observed in the previous year (x). Thus,
from step 1 to step 2 we account for the effect of the improvement in the scores of
each type of student, while from step 2 to step 3 the effect of the changes in the
structure of the population in terms of grade and track of studies is isolated. The
recalculated scores can be seen in the Tables 10–12.
This decomposition presented in Table 10 for public schools shows that in
Reading, from 2006 to 2009, around six of the almost 18 points of the progres-
sion are due to a population effect. From 2009 to 2012, the majority of the evolu-
tion, 4.4 points, is due to the population effect. In Mathematics, nearly six of the
21 points can be imputed to the evolution of the population structure in terms of
grade and track of studies from 2006 to 2009. From 2009 to 2012, the population
effect has an impact of around 5 points. A similar pattern is visible in Sciences:
from 2006 to 2009, only five of the total 19 points of progression are due to a
population effect, while from 2009 to 2012 the evolution is due to the population
effect. The results of the same decomposition applied to private schools are shown
in Table 11.
In the Reading test, around eight out of the 26 points overall increase regis-
tered between 2006 and 2009 are due to the change in the population distribution,
while in Mathematics this same factor explains nine out of the 24 points and in
Sciences seven of the 16 points improvement. From 2009 to 2012, all of the evo-
lution is explained by the improvement in students’ scores. The population effect
even has a negative effect on the scores between the two most recent PISA tests.
This result is due to the increase in the percentage of students enrolled in private
schools in the lower secondary vocational track, the worst performing group, as
shown in Table 5.
Assessment in Education: Principles, Policy & Practice 13
6. Conclusion
In this paper, we illustrate how problems of representativeness of the PISA samples
Downloaded by [McMaster University] at 22:24 09 April 2016
Acknowledgements
Support from Fundação para a Ciência e Tecnologia is gratefully acknowledged. We thank
DGEEC – Portuguese Ministry of Education for the data provided. Suggestions from two
anonymous referees are gratefully acknowledged.
Assessment in Education: Principles, Policy & Practice 15
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
Fundação para a Ciência e Tecnologia_ Ministry of Education and Science Portugal [grant
number PTDC/EGE-ECO/122754/2010].
Notes
1. Some policy-makers responses to the PISA results can be found at the OECD website:
http://www.oecd.org/education/focus-world-reaction-to-pisa.htm.
2. http://www.educare.pt/noticias/noticia/ver/?id=22522
3. Data already existed, but they were made available to the scientific community at this
date.
4. See OECD, the PISA International Database at http://www.oecd.org/pisa/.
Downloaded by [McMaster University] at 22:24 09 April 2016
5. Actually, students’ ages fall between 15 years and 3 months old and 16 years and
3 months old.
6. If the rate of student participation falls short of 50% within a school, the whole school
is withdrawn from the sample. For the validity of the whole survey, PISA sets a thresh-
old of 85% for the school response and 80% for student response.
7. For further details on the design of PISA, see the PISA Technical Reports (OECD,
2012). A good description is provided in McGaw (2008).
8. MISI is the acronym for ‘Information System of the Ministry of Education’.
9. Azores and Madeira account for 5.8% of the total 15-year-old Portuguese population in
2012. For 2009 and 2012 students from Azores and Madeira were excluded from PISA,
too. For 2006 it was not possible to isolate this subsample. Excluding these students
from the PISA sample never affects the final PISA scores by more than one point.
10. For 2000 and 2003, there are no statistical data with this level of disaggregation for
15-year olds.
11. The mean scores reported are always the means of the five plausible values calculated
by PISA.
12. The same pattern is seen for students in the private system.
13. The effect of the response rates has been highlighted in the case of England by
Micklewright, Schnepf, and Skinner (2012).
Notes on contributors
Pedro Freitas has a masters in Economics and is a PhD candidate in Economics at Nova
School of Business and Economics. His research interests are Economics of Education,
Human Capital Theory and Public Economics.
Luis Catela Nunes is a full professor at Universidade Nova de Lisboa and the director of
the Nova School of Business and Economics Research Centre. His research interests cover
several areas of applied econometrics.
Ana Balcão Reis is Associate Professor of Economics at Nova School of Business and
Economics, Universidade Nova de Lisboa. Her research interests are Economic Growth and
Human Capital and more recently Economics of Education.
Carmo Seabra is Associate Professor of Economics at Nova School of Business and
Economics, Universidade Nova de Lisboa. Her research is focused on Microeconomic
Policy Analysis issues and over the last years specifically on the area of Economics of
Education.
Adriana Ferro has a Masters in Economics from Nova School of Business and
Economics, she is a consultant at the World Bank. Her research areas are education, HIV and
development economics.
16 P. Freitas et al.
ORCID
Pedro Freitas http://orcid.org/0000-0002-0629-6901
Ana Balcão Reis http://orcid.org/0000-0003-0962-4605
References
Baird, J. A., Isaacs, T., Johnson, S., Stobart, G., Yu, G., Sprague, T., & Daugherty, R. (2011).
Policy effects of pisa. Oxford: Oxford University Centre for Educational Assessment.
Barrera-Osorio, F., Garcia-Moreno, V., Patrinos, A. H., & Porta, E. E. (2011). Using the
oaxaca-blinder decomposition technique to analyze learning outcomes changes over time
(Policy Research Working Paper No.5584). Washington, DC: World Bank. Retrieved
from http://dx.doi.org/10.1596/1813-9450-5584
Corak, M., & Lauzon, D. (2009). Differences in the distribution of high school achievement:
The role of class-size and time-in-term. Economics of Education Review, 28, 189–198.
Cosgrove, J., & Cartwright, F. (2014). Changes in achievement on PISA: The case of Ireland
and implications for international assessment practice. Large-scale Assessments in Educa-
Downloaded by [McMaster University] at 22:24 09 April 2016
Appendix
Since for PISA 2006, there are no observations relative to Lower Secondary Vocational
courses and Upper Secondary Professional courses, it was necessary to define a method to
assign a score to these students.
It was assumed that the proportional relationships between the scores of Upper
Secondary Professional courses and the 9th grade and between the scores of Lower
Downloaded by [McMaster University] at 22:24 09 April 2016
Secondary Vocational courses and 7th grade observed in public schools in 2006 were also
verified in 2006.
ðUpper econdary Professional courses jpublic 2006Þ
ð9th grade jpublic 2006Þ
ðUpper Secondary Professional courses jprivate 2006Þ
¼
ð9th grade jprivate 2006Þ