Correcting For Sample Problems in PISA and The Improvement in Portuguese Students' Performance

Assessment in Education: Principles, Policy & Practice
ISSN: 0969-594X (Print) 1465-329X (Online) Journal homepage: http://www.tandfonline.com/loi/caie20
Correcting for sample problems in PISA and

the improvement in Portuguese students’
performance
Pedro Freitas, Luís Catela Nunes, Ana Balcão Reis, Carmo Seabra & Adriana
Ferro
To cite this article: Pedro Freitas, Luís Catela Nunes, Ana Balcão Reis, Carmo Seabra &
Adriana Ferro (2015): Correcting for sample problems in PISA and the improvement in
Portuguese students’ performance, Assessment in Education: Principles, Policy & Practice, DOI:
10.1080/0969594X.2015.1105784
To link to this article: http://dx.doi.org/10.1080/0969594X.2015.1105784
Published online: 10 Dec 2015.
Submit your article to this journal
Article views: 57
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=caie20
Download by: [McMaster University] Date: 09 April 2016, At: 22:24

Assessment in Education: Principles, Policy & Practice, 2015
http://dx.doi.org/10.1080/0969594X.2015.1105784
Correcting for sample problems in PISA and the improvement in

Portuguese students’ performance
Pedro Freitas , Luís Catela Nunes, Ana Balcão Reis* , Carmo Seabra and
Adriana Ferro
Nova School of Business and Economics, Universidade Nova de Lisboa, Lisbon, Portugal
(Received 17 March 2015; accepted 5 October 2015)
Downloaded by [McMaster University] at 22:24 09 April 2016
The results of large-scale international assessments such as Programme for Inter-

national Student Assessment (PISA) have attracted a considerable attention
worldwide and are often used by policy-makers to support educational policies.
To ensure that the published results represent the actual population, these surveys
go through a thorough scrutiny to ensure their validity. One important issue that
may affect the results is the occurrence of different participation rates across
groups with heterogeneous average student scores. In this study, we illustrate
how problems of representativeness of the samples may be corrected by the use
of post-stratified weights. We use the case of Portugal, a country where we
uncover a meaningful difference between the actual population and the PISA
samples in terms of the distribution of students per grade and track of studies.
We show that after post-stratification, the temporal evolution of the PISA scores
is quite different from the one reported.
Keywords: PISA; samples; educational achievement; age-based assessment;
participation rates
JEL: I20, 125
1. Introduction
The Programme for International Student Assessment (PISA) was launched by the
OECD in 2000 to test the skills and knowledge of 15-year-old students in the fields
of Reading, Mathematics and Sciences. Together with the results on the level of pro-
ficiency in the above-mentioned fields, it includes surveys to the students, their fami-
lies and their schools about a considerable number of characteristics that are
expected to be related with the educational achievement of students.
The results from the PISA tests together with the microdata from the PISA
datasets have been extensively used in empirical research: Rangvid (2007), and
Schneeweis and Winter-Ebnner (2007), analysed the role of peer effects in
achievement in Denmark and Austria, respectively, using data from PISA 2000;
Vandenberghe and Robin (2004), also using data from PISA 2000, studied the
effects of attending private vs. public schools; Corak and Lauzon (2009), analysed
the impact of class size for Canada; Fertig and Wright (2005), looked at school
quality effects for the set of countries that participated in PISA 2000; Hanushek,
Link, and Woessmann (2013), using data from the four PISA data-sets 2000 to
*Corresponding author. Email: abr@novasbe.pt
© 2015 Taylor & Francis

2 P. Freitas et al.
2009, analysed whether school autonomy is beneficial. Another line of research

focused on the temporal evolution of the efficiency of individual educational sys-
tems; for instance, Pereira (2011), analysed the evolution of the scores of Portuguese
students; Cosgrove and Cartwright (2014), looked at the evolution of the results of
PISA in Ireland; and Ryan (2013), focused on the evolution of the PISA results in
Australia. There is also a broad literature that looks at international differences in
educational achievement using the results from international tests, and in particular
from PISA: Hanushek and Woessman (2011), present a survey of the results from
this research.
The attention raised by PISA results expanded well beyond the academic
community, with several governments looking attentively at the results and even
basing their educational policies on directions suggested by the comparative analysis
of the results of participating countries.1 Martens and Niemann (2013), carried out
an analysis of the public impact of the disclosure of PISA results measured by the
number of articles published in the national newspapers of the participating coun-

tries; Baird et al. (2011), explored the policy implications of the disclosure of PISA
results in Canada, England, France, Norway, Switzerland and Shanghai-China; Grek
(2009), analysed the political impact of the disclosure of PISA results in several
European countries; Egelund (2008), stresses the great impact of PISA on the debate
on education in Denmark, Dobbins and Martens (2012), studied the impact of PISA
on the formulation of educational policy in France.
Given the political relevance of PISA, an evaluation of the credibility or general-
ity of the results seems a very worthwhile exercise. One example of how problems
with the way samples are constructed can lead to biased results is reported in the
case of the PISA 2000 for Austria by Neuwirth (2006). In turn, Jerrim (2013), con-
sidered both PISA and TIMMS for the case of England and concluded that there
were specific problems with the treatment of missing data, survey procedures and
target population that limit the robustness of the results. Another possible source of
bias on the representativeness of the sample may be the occurrence of differential
participation rates across groups with different average scores. Our paper contributes
to this literature by analysing the representativeness of the samples used by PISA in
terms of the distribution of students per grade and track of studies for Portugal.
Portugal has been a participant country in the PISA study from the beginning.
Table 1 shows the evolution of the results of Portuguese students in the PISA tests
between 2000 and 2012. Three main facts are apparent: first, an increase throughout
the whole period in all areas; second, the remarkable improvement in all subjects
observed in 2009; third, the stagnation or even regression shown in the final year.
The disclosure of the results in 2012 raised concerns in the Portuguese society.
The preceding Ministry of Education criticised the current Ministry’s policies, stat-
ing that these results were mostly due to the decrease in the amount of resources
spent on education, and the largest opposition party even called for the resignation
Table 1. PISA 2000–2012 – mean scores for Portugal.

Test PISA 2000 PISA 2003 PISA 2006 PISA 2009 PISA 2012
Reading 470 478 472 489 488
Mathematics 454 466 466 487 487
Sciences 459 468 474 494 489
Assessment in Education: Principles, Policy & Practice 3
of the Ministry of Education, due to what they considered to be the overly

disappointing results for Portuguese students.2
The peculiar path exhibited by the Portuguese scores raises the question about
whether it reflects the actual behaviour of the Portuguese educational system taking
place in this period or whether it is due to specificities of the sample.
PISA results are based on a complex sampling procedure based on stratification
variables defined by each participating country. The purpose of that process is to
obtain a representative sample of the whole population of 15-year-old students, per
country involved. However, the fact that an age-based population definition is used
in PISA has important implications, especially when the results are contrasted with
those coming from grade-based tests, as remarked by O’Leary (2001). In order to
ensure the validity of the test results, it is important that the different grades be
correctly represented and weighted in the sample. This is especially important in
countries with a high percentage of repeaters, as is the case of Portugal.
Our study reveals that in the three PISA waves examined, there were consider-
able deviations between the population represented by these PISA samples and the
effective Portuguese population. We recalculate PISA scores with a revised set of
weights for each group of students, considering their grade and track of studies, and
type of school, and show that there is a sizable impact on average scores. Instead of
the stagnation between 2009 and 2012 reported by PISA, we find an increase in
both Reading and Mathematics during this period. In Sciences, instead of the fall
reported in PISA, we observe stability in the results. We decompose the evolution of
the scores into two effects: (i) change in the student population distribution by grade
and track of studies, and type of school; and (ii) evolution in the performance of
each type of student. We apply this decomposition to the whole system and also
separately to public and private schools. Our conclusion is that the improvement in
students’ scores was more important than the change in the population structure for
explaining the overall positive evolution of the Portuguese PISA results from 2006
to 2012.
The organisation of the paper is as follows. In Section 2, we describe the data.
In Section 3, we discuss the representativeness of the PISA samples and in Section 4,
the PISA scores are recalculated using a new set of weights adjusted to reflect the
actual Portuguese student population. In Section 5, we provide a decomposition of
the evolution of the recalculated PISA scores. Section 6 concludes.
2. Data
Our analysis of the representativeness of the PISA sample used for Portugal became
possible as a result of the disclosure by the Portuguese Government, in 2014, of a
rich administrative data-set with students’ population data since 2007.3 For 2006,
although the official statistical data are less detailed, the level of disaggregation was
still sufficient to assess the representativeness of the PISA sample. For 2000 and
2003, the available data are insufficient to carry out the exercise. Our analysis below
is therefore focused on the period 2006 on.
We also use the PISA data-sets available online for the 2006, 2009 and 2012
cycles.4 The target population of PISA studies is 15-year-old students enrolled in or
above the 7th grade.5 The sampling process is summarised in two steps. In the first
PISA samples, the schools according to previously defined stratification criteria. In
the second step, students are randomly selected from each of these schools.
4 P. Freitas et al.
To account for the possibility of non-response, each sampled school is pre-assigned

two replacement schools. Non-participating students are not replaced.6
The explicit stratification variables used in the Portuguese case were region, type
of school and school dimension in 2006. From 2009 on, the region was the only cri-
terion taken into account to perform the stratification. For each student, PISA reports
weights aimed at scaling up the sample to the population.7 All our analyses per-
formed with the PISA data-sets use those weights.
Population data are available from two different databases: MISI,8 a very compre-
hensive administrative data-set comprising microdata since 2007, and Estatísticas da
Educação, a less detailed data-set. MISI concerns only the public system on the
Portuguese mainland, thus excluding students in the islands of the Azores and
Madeira,9 and information about private schools. Therefore, for public schools PISA
2009 and 2012 were compared with MISI 2009 and 2012, while for 2006 the less
detailed data-set was used. For private schools, the entire analysis uses the less
detailed data-set. We obtained from these population data-sets information about 15-
year-old students’ grades, tracks of studies and types of school (public vs. private).10
Table 2 summarises the comparison between the populations represented by the
PISA samples, taking into account the weight of each observation and the actual
population. The weighted sample represents the student population enrolled in both
the private and public system in the year of the test.
It is clear that there are considerable differences between the population repre-
sented in PISA (weighted sample) and the 15-year-old Portuguese student popula-
tion in the years under analysis. There are also some differences in terms of the
distribution of students per private and public schools. Table 2 shows the number of
observations in each PISA sample and the represented population per type of
school.
Table 3 shows the scores attained in PISA, disaggregated by the grade and the
track of studies the student is enrolled in for students in public schools.11 As can be
seen, the lowest scores in the PISA tests are achieved by students in the lowest
grades of the academic track or in Lower Secondary Vocational courses and the dif-
ferences are large.12 The strong dependence of the Portuguese students’ results on
PISA on the grade students are enrolled in has already been shown by Pereira
(2011), and Pereira and Reis (2012). Their findings corroborate O’Leary’s (2001),
remarks. The table also shows a strong dependence on the track of studies, a fact
that was found to have important implications in the case of Austria (Neuwirth,
2006). In that country, the PISA 2000 assessment did not adequately cover students
enrolled in combined school and work-based vocational programmes. As a conse-
quence, the Austrian PISA 2003 national report erroneously reported a fall in perfor-
mance in all three PISA domains. This justifies our focus on the variable ‘grade and
track of studies’ regarding the population representativeness of the sample.
3. Representativeness of the PISA sample

Focusing first on public school students in the Portuguese mainland, and following a
student division according to the grade and track of studies in which they are
enrolled, we see in columns 2, 5, and 8 in Table 4 important changes in the student
population, especially between 2006 and 2009. The percentage of students in the
lower levels of education (7th, 8th and 9th grades) decreased substantially. This
may indicate that the number of repeaters in the system decreased in the years
Table 2. PISA samples, represented population and actual population.

2006 2009 2012
Population Population Population
# of observ. in Represented in # of observ. in Represented in # of observ. in Represented in
PISA PISA Actual PISA PISA Actual PISA PISA Actual
Total # students 5109 90,079 94,964 6298 93,635 105,226 5722 92,189 105,615
In private 593 9143 11,187 682 13,879 13,237 583 10,129 14,437
schools
% in private 10.1% 11.8% 14.8% 12.6% 11.0% 13.7%
schools
5
6
P. Freitas et al.
Table 3. PISA scores – public schools.

Average score PISA 2006 Average score PISA 2009 Average score PISA 2012
Grade and track Reading Mathematics Sciences Reading Mathematics Sciences Reading Mathematics Sciences
7th grade 324.15 336.49 348.37 372.09 369.16 387.58 365.58 358.12 371.97
(72.92) (61.26) (55.52) (53.13) (56.24) (45.59) (59.74) (48.64) (53.25)
8th grade 389.45 386.52 396.84 407.14 397.03 418.47 406.57 395.37 407.63
(60.82) (52.13) (51.62) (56.98) (55.73) (52.61) (59.58) (52.02) (55.16)
9th grade 445.87 439.97 448.59 460.30 458.48 468.41 464.41 459.22 465.83
(68.56) (59.21) (61.91) (63.39) (65.21) (58.86) (66.43) (69.88) (65.76)
Lower Secondary Vocational 336.94 350.46 354.68 368.30 367.06 381.26 357.01 370.50 376.30
(76.21) (64.18) (54.08) (63.02) (58.71) (58.11) (70.16) (59.24) (61.96)
Upper Secondary Academic 539.58 524.49 533.75 539.32 536.05 538.39 541.05 540.48 538.48
(61.48) (62.37) (61.49) (57.81) (64.81) (59.78) (61.51) (69.18) (61.99)
Upper-Secondary Technological 501.38 501.41 503.08 490.07 514.55 488.50 521.02 529.34 521.76
(61.27) (61.81) (61.65) (49.60) (46.94) (35.85) (69.62) (85.44) (78.49)
Upper-Secondary Professional 410.27 423.32 487.48 466.93 479.61 469.21 479.37 483.28 479.98
(58.21) (27.79) (44.54) (58.75) (55.66) (56.71) (57.05) (56.00) (52.06)
Note: Standard errors appear in parenthesis.

Table 4. Distribution of students according to the grade and track of studies – public schools.
2006 2009 2012
PISA Actual Population difference PISA Actual population difference PISA Actual population difference
Grade and track of studies (1) (2) (3) (4) (5) (6) (7) (8) (9)
7th grade 6.7% 8.3% −1.6*** 2.2% 5% −2.8*** 2.3% 3.8% −1.5***
8th grade 13.2% 14.1% −.9** 9% 10.3% −1.3*** 8% 8.6% −.6
9th grade 29.0% 28.9% .1 27.5% 19.4% 8.1*** 26.8% 19.5% 7.3***
Lower Secondary Vocational 2.3% 4.6% −2.3*** 7% 10.4% −3.4*** 10.4% 10.1% .3
Upper secondary Academic 37.8% 37.3% .5 48.6% 47.1% 1.5*** 45.2% 49.8% −4.6***
Upper-Secondary Technological 10.9% 6.4% 4.5*** .6% 0.8% −.2 .2% .1% .1
Upper-Secondary Professional .1% .4% −.3 5.3% 6.6% −1.3*** 7.2% 8% −.8**
Total 100% 100% 100% 100% 100% 100%
Note: z-Test for difference in proportions (PISA vs. actual population).
*Statistically significant at the .10 level.
**Statistically significant at the .05 level.
***Statistically significant at the .01 level.
7
8 P. Freitas et al.
under analysis. We can also see the near disappearance of the Upper Secondary
Technological Courses and the increase of the Upper Secondary Professional
Courses and Lower-Secondary Vocational Courses.
Table 4 also compares the distribution of grade and track of studies according to
the PISA weights (columns 1, 4 and 7) and the weights in the actual population
(columns 2, 5 and 8). For 2006 we see some large deviations in the Upper Second-
ary Technological courses, which are strongly overrepresented in the PISA data-set
and in the Lower Secondary Vocational courses, which are underrepresented.
Regarding 2009, we find overrepresentation of the 9th grade and underrepresentation
of the 7th grade and of the Lower Secondary Vocational courses. In 2012, we again
find overrepresentation of the 9th grade but now there is a strong underrepresenta-
tion of Upper Secondary Academic courses. These differences are important, as we
have seen that scores vary considerably according to the grade and track of studies
the student is enrolled in (Table 3).
As mentioned above, one possible source for these discrepancies is the occur-
rence of different rates of participation across students in different grades and tracks
of studies. Although this disaggregated information is not available, the participation
rates of Portuguese students are always below the OECD average, as can be
observed in Table 6.13
4. Recalculating PISA scores using post-stratified weights

Given the distortions detected, we next analyse the changes in average PISA scores
that would result from using post-stratified weights. In particular, we recalculate
PISA scores considering the average scores for each group of students shown
in Table 3 and the actual distribution of students per grade and track of studies
in the population shown in Tables 4 and 5. Adjusted scores are calculated as:
P
J
Adjusted Score ¼ wj ScoreJ , where each group j = 1, …, J represent a ‘grade
j¼1
and track of studies’, wj is the actual weight of the group in the 15-year-old popu-
lation, and ScoreJ is the average PISA score for that group using the original PISA
weights. We obtain lower scores for 2006 and 2009, while for 2012 the recalcu-
lated scores are not statistically different from those reported by PISA. The results
from these recalculations are shown in Table 7 for public schools.
Next we recalculate PISA scores considering the 15-year-old student population
distribution in private schools observed in population statistics. As mentioned above,
for PISA 2006 there are no observations for Lower Secondary Vocational courses
and Upper Secondary Professional courses. The methodology used to overcome this
problem is presented in the appendix. The results are shown in Table 8 and suggest
that there is an upper bias in the published scores for private schools in all three
PISA tests since 2006.
Finally, in Table 9 we show the recalculated average PISA scores for all schools
considering the distribution of students in terms of grade and track of studies, and
type of school.
The recalculated PISA scores in 2006 and 2009 are significantly lower than the
ones published by PISA, whereas in 2012 the figures are very similar. Regarding the
evolution from 2006 to 2009, in Reading the score increase amounts to almost 19
points, in Mathematics the jump is around 24 points and in Sciences it is around 21
Table 5. Distribution of Students according to the grade and track of studies – private schools.
2006 2009 2012
PISA Actual population difference PISA Actual Population difference PISA Actual population difference
Grade and track of studies (1) (2) (3) (4) (5) (6) (7) (8) (9)
7th grade 3.9% 3.6% .3 .7% 1.8% −1.1 .4% 1.1% −.7
8th grade 10% 8.4% 1.6 3.4% 5.7% −2.3* 2.1% 3.6% −1.5
9th grade 27.6% 25.8% 1.8 17.4% 10.5% 6.9*** 19.2% 12.1% 7.1***
Lower Secondary Vocational 0% 7% −7*** 8.5% 10% −1.5 2.6% 12% −9.4***
Upper Secondary Academic 38.5% + 20.1% 33% + 8% 17.6*** 43.9% + 13.4% 52.2% 5.1*** 66%+0% 50.2% 15.8***
+ Technological
Upper-Secondary 0% 14.4% −14.4*** 12.7% 19.9% −7.2*** 9.8% 21.1% −11.3***
Professional
Total 100% 100% 100% 100% 100%
100%
Note: Null hypothesis of no difference in proportions (PISA vs. actual population) is checked with a z-test.
*Statistically significant at the .10 level.
**Statistically significant at the .05 level.
***Statistically significant at the .01 level.
9
10 P. Freitas et al.
Table 6. PISA participation rates.

Weighted student partici-
Weighted school participation rate pation rate
Before replacements After replacements After replacements
Portugal OECD average Portugal OECD average Portugal OECD average
2006 94.9 91.7 98.7 96.6 86.7 91.1
2009 93.6 91.9 98.4 95.9 87.1 90.9
2012 95.4 91.8 95.8 96.5 87.4 90.7
Table 7. Adjusted and official PISA scores – public schools.

2006 2009 2012

Reading
Adjusted 461.1 478.8 484.3
(459.3.7, 463.1) (477.1, 480.4) (482.5, 486.1)
Official 469.1 485.9 482.2
(466.4, 471.9) (483.7, 488.14) (481.7, 484.8)
Mathematics
Adjusted 455.5 476.6 483.4
(453.8, 457.2) (474.9, 478.3) (481.6, 485.3)
Official 462.7 483.6 481.2
(460.1, 465.1) (481.3, 485.9) (478.7, 483.7)
Sciences
Adjusted 464.1 483.4 485.6
(462.4, 465.7) (481.8, 484.9) (483.9, 487.3)
Official 471.1 490.1 483.7
(468.5, 473.6) (487.9, 492.2) (481.3, 486.0)
Note: 95% confidence intervals are presented in parenthesis.
Table 8. Adjusted and official PISA scores – private schools.

2006 2009 2012
Reading
Adjusted 481.7 508.2 512.3
(476.6, 486.9) (503.8, 512.7) (506.2, 518.4)
Official 500.2 516 537.1
(492.9, 507.3) (510.5, 522.7) (531.2, 542.9)
Mathematics
Adjusted 484.7 507.1 516.5
(480.0, 489.4) (500.4, 513.9) (509.9, 522.9)
Official 497.2 515.1 539.3
(490.59, 503.74) (508.5, 521.7) (532.8, 545.7)
Sciences
Adjusted 492.59 508.5 512.6
(488.1, 497.1) (502.2, 514.8) (506.2, 519.1)
Official 503.09 516.5 535.2
(496.83, 509.36) (510.6, 522.3) (529.2, 541.2)
points. This evolution is fairly close to the one that was reported by PISA. However,
between 2009 and 2012 the evolution is quite different from that reported: there is
an increase in both Reading and Mathematics, instead of a stagnation, and a
stabilisation of the results in Sciences, instead of the decrease reported (Figure 1).
5. Decomposing the evolution

Next we use the recalculated scores to perform a decomposition of the evolution of
the PISA scores from 2006 to 2009, and from 2009 to 2012. We assume that the evo-
lution between any two years can be decomposed into two different components: (a)
improvement in the scores of each type of student, where each type of student means
students attending the same grade and track of studies and type of school; (b) change
in the student population distribution according to these variables. Some studies have
Figure 1. Evolution of PISA scores: Adjusted vs. official.

Figure 2. Graphic representation of the decomposition of the change in the scores.
presented this type of exercise applying the Oaxaca–Blinder decomposition technique

to PISA results. For instance, Barrera-Osorio et al. (2011) analysed the evolution of
Indonesian results and Ramos, Duque, and Nieto (2012) studied the rural–urban dif-
ferential in student achievement in Colombia. We use a similar although simpler
approach to decompose the evolution of results. The graphic representation presented
in Figure 2 illustrates our procedure.
In step 1, the PISA score for year x is reported considering the student distribu-
tion observed in the population for that same year. The same happens in step 3, for
year y. Between these two years, in step 2, an intermediate computation is per-
formed, recalculating the PISA scores at year y, but assuming the student distribu-
tion in terms of grade and track of studies observed in the previous year (x). Thus,
from step 1 to step 2 we account for the effect of the improvement in the scores of
each type of student, while from step 2 to step 3 the effect of the changes in the
structure of the population in terms of grade and track of studies is isolated. The
recalculated scores can be seen in the Tables 10–12.
This decomposition presented in Table 10 for public schools shows that in
Reading, from 2006 to 2009, around six of the almost 18 points of the progres-
sion are due to a population effect. From 2009 to 2012, the majority of the evolu-
tion, 4.4 points, is due to the population effect. In Mathematics, nearly six of the
21 points can be imputed to the evolution of the population structure in terms of
grade and track of studies from 2006 to 2009. From 2009 to 2012, the population
effect has an impact of around 5 points. A similar pattern is visible in Sciences:
from 2006 to 2009, only five of the total 19 points of progression are due to a
population effect, while from 2009 to 2012 the evolution is due to the population
effect. The results of the same decomposition applied to private schools are shown
in Table 11.
In the Reading test, around eight out of the 26 points overall increase regis-
tered between 2006 and 2009 are due to the change in the population distribution,
while in Mathematics this same factor explains nine out of the 24 points and in
Sciences seven of the 16 points improvement. From 2009 to 2012, all of the evo-
lution is explained by the improvement in students’ scores. The population effect
even has a negative effect on the scores between the two most recent PISA tests.
This result is due to the increase in the percentage of students enrolled in private
schools in the lower secondary vocational track, the worst performing group, as
shown in Table 5.
Table 9. Adjusted PISA scores – all schools.

2006 2009 2012
Reading
Adjusted 463.6 482.5 488.1
(461.8, 465.4) (480.9, 483.8) (486.4, 489.9)
Official 472.3 490.5 488.2
(469.71, 474.90) (488.4, 492.6) (485.9, 490.6)
Mathematics
Adjusted 458.9 482.7 487.9
(457.2, 460.5) (481.1, 483.9) (486.1, 489.7)
Official 466.16 488.3 487.6
(463.78, 468.54) (486.09, 490.4) (485.2, 489.9)
Sciences
Adjusted 467.4 488.8 489.3
(465.9, 468.9) (487.3, 489.9) (487.6, 491.0)
Official 474.3 493.9 489.4

(471.96, 476.65) (491.99, 495.96) (487.1, 491.6)
Table 10. Decomposition of the evolution of PISA scores – public schools.
Table 11. Decomposition of the evolution of PISA scores – private schools.
Finally, we present in Table 12 the decomposition of the evolution of scores

applied to all schools, taking into account the grade and track of studies, and type of
school; the results show that for the whole period the evolution of the student’s
scores is the main driver of the increase in PISA results, with the change in the
population structure playing a weaker role.
Table 12. Decomposition of the evolution of PISA scores – all schools.
6. Conclusion
In this paper, we illustrate how problems of representativeness of the PISA samples
may be corrected by use of post-stratified weights using the case of Portugal, a

country where a meaningful bias between the effective student distribution and the
PISA samples was found. We find that the evolution of the scores is quite different
from the one reported. We also discuss the impact on the scores and performance
evolution of the use of weights that more accurately reflect the structure of the popu-
lation in terms of grades and track of studies. We introduce this segmentation since
this criterion has been shown to capture both the variability in PISA scores and the
population representativeness problems of the PISA samples. For the three exam
years under analysis, a sizable bias between the effective student distribution and
PISA samples was found.
We provide recalculated PISA scores that use post-stratified weights. In 2006
and 2009, the recalculated score is lower than the one officially reported by PISA,
while for 2012 the recalculated score is in line with the one that was published.
These results are driven by the results of students of public schools. The evolution
between 2006 and 2009 is similar to the one officially observed. However, contrary
to the stagnation indicated in the PISA results, from 2009 to 2012, the recalculated
scores show an improvement in the Portuguese student performance. We also per-
formed a score decomposition to determine which part of the evolution is due to the
change in the student population structure and which part is related to better student
performance. The results show that for the whole period, the evolution of the stu-
dent’s scores is the main driver of the increase in PISA results, with the change in
the population structure playing a weaker role.
The benefits of using data about the distribution of students per grade and track
of studies on the construction of post-stratified weights can be relevant in countries
with high retention rates affecting the grade of study of 15-year-olds, particularly if
there are differential participation rates across these strata. This strategy could
increase the representativeness of the PISA sample and thereby the political value of
its conclusions. Although we have focused on the Portuguese case, similar problems
of representativeness might arise in other countries.
Acknowledgements
Support from Fundação para a Ciência e Tecnologia is gratefully acknowledged. We thank
DGEEC – Portuguese Ministry of Education for the data provided. Suggestions from two
anonymous referees are gratefully acknowledged.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
Fundação para a Ciência e Tecnologia_ Ministry of Education and Science Portugal [grant
number PTDC/EGE-ECO/122754/2010].
Notes
1. Some policy-makers responses to the PISA results can be found at the OECD website:
http://www.oecd.org/education/focus-world-reaction-to-pisa.htm.
2. http://www.educare.pt/noticias/noticia/ver/?id=22522
3. Data already existed, but they were made available to the scientific community at this
date.
4. See OECD, the PISA International Database at http://www.oecd.org/pisa/.
5. Actually, students’ ages fall between 15 years and 3 months old and 16 years and
3 months old.
6. If the rate of student participation falls short of 50% within a school, the whole school
is withdrawn from the sample. For the validity of the whole survey, PISA sets a thresh-
old of 85% for the school response and 80% for student response.
7. For further details on the design of PISA, see the PISA Technical Reports (OECD,
2012). A good description is provided in McGaw (2008).
8. MISI is the acronym for ‘Information System of the Ministry of Education’.
9. Azores and Madeira account for 5.8% of the total 15-year-old Portuguese population in
2012. For 2009 and 2012 students from Azores and Madeira were excluded from PISA,
too. For 2006 it was not possible to isolate this subsample. Excluding these students
from the PISA sample never affects the final PISA scores by more than one point.
10. For 2000 and 2003, there are no statistical data with this level of disaggregation for
15-year olds.
11. The mean scores reported are always the means of the five plausible values calculated
by PISA.
12. The same pattern is seen for students in the private system.
13. The effect of the response rates has been highlighted in the case of England by
Micklewright, Schnepf, and Skinner (2012).
Notes on contributors
Pedro Freitas has a masters in Economics and is a PhD candidate in Economics at Nova
School of Business and Economics. His research interests are Economics of Education,
Human Capital Theory and Public Economics.
Luis Catela Nunes is a full professor at Universidade Nova de Lisboa and the director of
the Nova School of Business and Economics Research Centre. His research interests cover
several areas of applied econometrics.
Ana Balcão Reis is Associate Professor of Economics at Nova School of Business and
Economics, Universidade Nova de Lisboa. Her research interests are Economic Growth and
Human Capital and more recently Economics of Education.
Carmo Seabra is Associate Professor of Economics at Nova School of Business and
Economics, Universidade Nova de Lisboa. Her research is focused on Microeconomic
Policy Analysis issues and over the last years specifically on the area of Economics of
Education.
Adriana Ferro has a Masters in Economics from Nova School of Business and
Economics, she is a consultant at the World Bank. Her research areas are education, HIV and
development economics.
ORCID
Pedro Freitas http://orcid.org/0000-0002-0629-6901
Ana Balcão Reis http://orcid.org/0000-0003-0962-4605
References
Baird, J. A., Isaacs, T., Johnson, S., Stobart, G., Yu, G., Sprague, T., & Daugherty, R. (2011).
Policy effects of pisa. Oxford: Oxford University Centre for Educational Assessment.
Barrera-Osorio, F., Garcia-Moreno, V., Patrinos, A. H., & Porta, E. E. (2011). Using the
oaxaca-blinder decomposition technique to analyze learning outcomes changes over time
(Policy Research Working Paper No.5584). Washington, DC: World Bank. Retrieved
from http://dx.doi.org/10.1596/1813-9450-5584
Corak, M., & Lauzon, D. (2009). Differences in the distribution of high school achievement:
The role of class-size and time-in-term. Economics of Education Review, 28, 189–198.
Cosgrove, J., & Cartwright, F. (2014). Changes in achievement on PISA: The case of Ireland
and implications for international assessment practice. Large-scale Assessments in Educa-
tion, 2(1), 1–17.

Dobbins, M., & Martens, K. (2012). Towards an education approach à la finlandaise? French
education policy after PISA. Journal of Education Policy, 27, 23–43.
Egelund, N. (2008). The value of international comparative studies of achievement – A
Danish perspective. Assessment in Education: Principles, Policy & Practice, 15, 245–251.
Fertig, M., & Wright, R. E. (2005). School quality, educational attainment and aggregation
bias. Economics Letters, 88, 109–114.
Grek, S. (2009). Governing by numbers: The PISA ‘effect’ in Europe. Journal of Education
Policy, 24, 23–37. doi:10.1080/0268093080241266
Hanushek, E. A., Link, S., & Woessmann, L. (2013). Does school autonomy make sense every-
where? Panel estimates from PISA. Journal of Development Economics, 104, 212–232.
Hanushek, E., & Woessman, L. (2011). The economics of international differences in educa-
tional achievement. In E. A. Hanushek, S. J. Machin, & L. Woessman (Eds.), Handbook
of the economics of education (Vol. 3, pp. 91–200). Amsterdam, North Holland: Elsevier.
Jerrim, J. (2013). The reliability of trends over time in international education test scores: Is
the performance of England’s secondary school pupils really in relative decline? Journal
of Social Policy, 42, 259–279.
Martens, K., & Niemann, D. (2013). When do numbers count? The differential impact of the
PISA rating and ranking on education policy in Germany and the US. German Politics,
22, 314–332.
McGaw, B. (2008). The role of the OECD in international comparative studies of achieve-
ment. Assessment in Education: Principles, Policy & Practice, 15, 223–243.
Micklewright, J., Schnepf, S. V., & Skinner, C. (2012). Non-response biases in surveys of
schoolchildren: the case of the english programme for international student assessment
(PISA) samples. Journal of the Royal Statistical Society: Series A (Statistics in Society),
175, 915–938.
Neuwirth, E. 2006. PISA 2000: Sample weight problems in Austria (OECD Education
Working Papers, No. 5). Paris: OECD Publishing NJ1.
OECD. 2012. PISA technical reports. Paris: OECD Publishing.
O’Leary, M. (2001). The effects of age-based and grade-based sampling on the relative stand-
ing of countries in international comparative studies of student achievement. British Edu-
cational Research Journal, 27, 187–200.
Pereira, M. C. (2011). An analysis of the Portuguese students’ performance in the OECD
Programme for International Student Assessment (PISA). Lisboa: Economic Bulletin,
Banco de Portugal.
Pereira, M. C., & Reis, H. (2012). What accounts for Portuguese regional differences in stu-
dents’ performance? Evidence from OECD PISA. Lisboa: Economic Bulletin, Banco de
Portugal.
Ramos, R., Duque, J. C., & Nieto, S. (2012). Decomposing the rural-urban differential in
student achievement in Colombia using PISA microdata. (IZA Discussion paper series).
Bonn: Institute for the Study of Labor (IZA).
Rangvid, B. S. (2007). School composition effects in Denmark: Quantile regression evidence

from PISA 2000. Empirical Economics, 33, 359–388.
Ryan, C. (2013). What is behind the decline in student achievement in Australia? Economics
of Education Review, 37, 226–239.
Schneeweis, N., & Winter-Ebnner, R. (2007). Peer effects in Austrian schools. Empirical
Economics, 32, 387–409.
Vandenberghe, V., & Robin, S. (2004). Evaluating the effectiveness of private education
across countries: A comparison of methods. Labour Economics, 11, 487–506.
Appendix
Since for PISA 2006, there are no observations relative to Lower Secondary Vocational
courses and Upper Secondary Professional courses, it was necessary to define a method to
assign a score to these students.
It was assumed that the proportional relationships between the scores of Upper
Secondary Professional courses and the 9th grade and between the scores of Lower
Secondary Vocational courses and 7th grade observed in public schools in 2006 were also
verified in 2006.
ðUpper econdary Professional courses jpublic 2006Þ
ð9th grade jpublic 2006Þ
ðUpper Secondary Professional courses jprivate 2006Þ
¼
ð9th grade jprivate 2006Þ
ðLower Secondary Vocational jpublic 2006Þ

ð7th grade jpublic 2006Þ
ðLower SecondaryVocational jprivate 2006Þ
¼
ð7th grade jprivate 2006Þ

Correcting For Sample Problems in PISA and The Improvement in Portuguese Students' Performance

Uploaded by

Copyright:

Available Formats

You might also like

Correcting For Sample Problems in PISA and The Improvement in Portuguese Students' Performance

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correcting For Sample Problems in PISA and The Improvement in Portuguese Students' Performance

Uploaded by

Copyright:

Available Formats

Assessment in Education: Principles, Policy & Practice

ISSN: 0969-594X (Print) 1465-329X (Online) Journal homepage: http://www.tandfonline.com/loi/caie20

Correcting for sample problems in PISA and

To link to this article: http://dx.doi.org/10.1080/0969594X.2015.1105784

Published online: 10 Dec 2015.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Download by: [McMaster University] Date: 09 April 2016, At: 22:24

Correcting for sample problems in PISA and the improvement in

The results of large-scale international assessments such as Programme for Inter-

© 2015 Taylor & Francis

2009, analysed whether school autonomy is beneﬁcial. Another line of research

number of articles published in the national newspapers of the participating coun-

Table 1. PISA 2000–2012 – mean scores for Portugal.

of the Ministry of Education, due to what they considered to be the overly

To account for the possibility of non-response, each sampled school is pre-assigned

3. Representativeness of the PISA sample

Table 2. PISA samples, represented population and actual population.

Table 3. PISA scores – public schools.

Note: Standard errors appear in parenthesis.

4. Recalculating PISA scores using post-stratiﬁed weights

Table 6. PISA participation rates.

Table 7. Adjusted and ofﬁcial PISA scores – public schools.

2006 2009 2012

Table 8. Adjusted and ofﬁcial PISA scores – private schools.

5. Decomposing the evolution

Figure 1. Evolution of PISA scores: Adjusted vs. ofﬁcial.

Figure 2. Graphic representation of the decomposition of the change in the scores.

presented this type of exercise applying the Oaxaca–Blinder decomposition technique

Table 9. Adjusted PISA scores – all schools.

Ofﬁcial 474.3 493.9 489.4

Table 10. Decomposition of the evolution of PISA scores – public schools.

Table 11. Decomposition of the evolution of PISA scores – private schools.

Finally, we present in Table 12 the decomposition of the evolution of scores

Table 12. Decomposition of the evolution of PISA scores – all schools.

may be corrected by use of post-stratiﬁed weights using the case of Portugal, a

tion, 2(1), 1–17.

Rangvid, B. S. (2007). School composition effects in Denmark: Quantile regression evidence

ðLower Secondary Vocational jpublic 2006Þ

You might also like