Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

ANDREA DÍAZ JORGE GRAU

MADRID ELECTIONS 2021


1. INTRODUCTION
The aim of this work is to link concepts and practical exercises we have done in class with a real
application of confidence intervals in which these concepts are extracted from a poll of the newspaper
about the Madrid elections of 2021. Also, we are going to make hypothesis tests about the predictions of
the political parties to conclude in one question: were the polls correct with their estimates? Let’s see the
news:

1.1. BEFORE THE ELECTIONS

Isabel Díaz Ayuso obtendría una victoria clara el 4-M aunque necesitará a Vox para gobernar, según la
encuesta de Metroscopia para EL PAÍS, con 3.000 entrevistas, 700 más que el CIS, realizadas entre el
20 de abril — un día antes del debate de Telemadrid — y el 26 —después del debate fallido de la SER.
La candidata del PP se llevaría más del 50% de los votos que en 2019 fueron a Ciudadanos, que
desaparecería de la Asamblea tras haber cogobernado la comunidad dos años. El PSOE, que ganó las
últimas elecciones, cae por debajo del 20% de votos, su peor resultado en la región, y Mónica García
le pisa los talones. Pablo Iglesias evita que su partido quede fuera del Parlamento regional, como
pronosticaban los sondeos previos a su candidatura, aunque obtendría 14 escaños menos que Más
Madrid.
El sondeo permite también analizar la situación de cada partido, el perfil de sus votantes, fidelidad y
fugas, la influencia de la pandemia y cómo afecta a los comicios el hecho de que se celebren en un
martes laborable. Esta es la radiografía del estado de ánimo del eletorado madrileño a ocho días de las
elecciones:
 Suma de bloques y probabilidades. Isabel Díaz Ayuso se queda a 10 escaños de la mayoría
absoluta, que se sitúa en 69. La diferencia entre bloques es de ocho escaños a favor de la derecha. La
candidata del PP acumularía por sí sola una cantidad de votos (el 41,3%) cercana a la suma del
PSOE, Más Madrid y Unidas Podemos (45,1%).
Ciudadanos se quedaría fuera de la Asamblea con un 3% de los votos —es necesario alcanzar el 5%
para entrar— y Vox aguanta el tirón de Ayuso. El escenario es radicalmente distinto al de las últimas
elecciones, cuando el PSOE fue la primera fuerza. La probabilidad de que PP y Vox sumen mayoría
ANDREA DÍAZ JORGE GRAU
absoluta es, según el cálculo de Metroscopia a través de 50.000 simulaciones, del 87%. La de que la
obtengan los partidos de izquierda es del 9% y la del empate, del 4%.
1.2. AFTER THE ELECTIONS

2. PROCESS OF CONFIDENCE INTERVALS


Firstly, we have searched for polls and we took as a reference of the work the Metroscopia one for ‘El
País’. Then, we have selected the 4 political parties (PP, PSOE, Vox and Más Madrid) with the highest
percentage of voting intention to estimate the population parameter, in this case, the percentage of
voters to that parties (population proportion).
 To estimate that parameter, we have made a confidence interval with the information provided on the poll:
MÁS MADRID
Sample size (n) 3000
Level of confidence 95,5%
P (prob. of success) 0,176
Q (prob. of failure) 0,824
Z-score 2,005
Alpha (α) 0,045

CI= (0,1621;0,1899)
Upper bound 0,1899
Lower bound 0,1621
Margin of Error (MOE) 1,39%
Length of the CI (LCI) 0,028

The proportion of people intending to vote for


Más Madrid on May 4th would be between
1-α=95,5% 16,21% and 18,99% with a level of
confidence of 95,5%. With this results, we
can conclude that if we repeat this sample,
α α 95.5 out of 100 times, the real value of the
=2,25 % =2,25 % proportion of would be within that interval.
2 2

16,21 17,6 18,99


-1,39% 1,39%
LCI=0,028
ANDREA DÍAZ JORGE GRAU

PSOE
Sample size (n) 3000
Level of confidence 95,5% The proportion of people intending to vote for
P (prob. of success) 0,197 PSOE would be between 18,24% and
Q (prob. of failure) 0,803 21,16% with a level of confidence of 95,5%.
This means that, if we repeat the sample
Z-score 2,005
with the same information, 95.5 out of 100
Alpha (α) 0,045
times, the real value of the proportion of
would be within that interval.
CI= (0,1824;0,2116)
Upper bound 0,2116
Lower bound 0,1824
Margin of Error (MOE) 1,46%
Length of the CI (LCI) 0,029
PP
Sample size (n) 3000
Level of confidence 95,5%
P (prob. of success) 0,413
Q (prob. of failure) 0,587
Z-score 2,005
Alpha (α) 0,045

CI= (0,395;0,431)
Upper bound 0,431
Lower bound 0,395
Margin of Error (MOE) 1,80%
Length of the CI (LCI) 0,036

The proportion of people intending to vote


for PP would be between 39,5% and
43,1% with a level of confidence of
95,5%. This means that, if we repeat the
sample with the same information, 95.5
out of 100 times, the real value of the
proportion of would be within that interval.

VOX
Sample size (n) 3000
Level of confidence 95,5%
P (prob. of success) 0,094
Q (prob. of failure) 0,906
Z-score 2,005
Alpha (α) 0,045

CI= (0,0833;0,1047)
Upper bound 0,1047
Lower bound 0,0833
Margin of Error (MOE) 1,07%
Length of the CI (LCI) 0,0214

The proportion of people intending to vote


for VOX on the elections would be between
8,33% and 10,47% with a level of
ANDREA DÍAZ JORGE GRAU

 What would be the sample size if we want to get a MOE of 1% in each confidence interval?

6359 people need to be interviewed in


order to have the same level of confidence
and a MOE of 1% in PSOE’s confidence
interval.

In the case of Más Madrid, 5829 people need


to be interviewed in order to have the same
level people
9746 of confidence
need toand
beainterviewed
MOE of 1%inin order
it CI.
to have the same level of confidence and a
MOE of 1% in PP’s confidence interval.

In the case of VOX, 3438 people need to be


interviewed in order to have the same level of
confidence and a MOE of 1% in it CI.

3. PROCESS OF THE HYPOTHESIS TESTS


ANDREA DÍAZ JORGE GRAU
 For Más Madrid, we have set a null hypothesis of the population proportion being equal to 15%, while the
alternative hypothesis is the population proportion being a value different from 15%.

Firstly, we have compared the experimental value to the


critical value, in this case the Z-score of α/2 because it
is a two-sided test. After comparing them, we can see
that the experimental value is within the critical region
and under the alternative hypothesis, so we reject H0
and we accept H1. This means that the proportion
would be different from 15% (in this case higher) and
the empirical evidence of the sample supports the
alternative hypothesis at a 5% significance level.

In this case, the p-value is the probability on


the right because the experimental value is on
the right of H0 and it's multiplied by 2 because
it’s a two-sided test. The p-value is 0.000%,
which is way smaller than the alpha 5%, which
means not only that the experimental value is
in the critical region and therefore we reject H0
but also that with the same sample we could
reject the null hypothesis at an even lower
significance level (committing a smaller type I
error).
ANDREA DÍAZ JORGE GRAU
 In the case of VOX, we have set the null hypothesis of the population proportion being equal to 11%, while

Firstly, we have compared the experimental value to the critical value, in this case the -Z-score of
α/2 because it is a two-sided test. This value is already standardized so we just go to the tables of
the N(0,1) and find the value which is ±1.96.
After comparing them, the experimental value is in the critical region under the alternative
hypothesis, so we reject H0 and we accept H1. This means that the proportion would be different
from 11% (in this case smaller) and the empirical evidence of the sample supports the alternative
hypothesis at a 5% significance level.

the

alternative hypothesis is the population proportion being a value different from 11%.

In this case, the p-value is the


probability on the left because the
experimental value is on the left of
H0 and it's multiplied by 2 because
it’s a two-sided test. The p-value is
0.52%, which is way smaller than the
alpha 5%. This means not only that
the experimental value is in the
critical region and therefore we reject
H0 but also that with the same
sample we could reject the null
hypothesis at an even lower
ANDREA DÍAZ JORGE GRAU

Firstly, we have compared the experimental value to


the critical value, in this case the Z-score of α/2
because it is a two-sided test. After comparing
them, we can see that the experimental value is
within the acceptance region and under the null
hypothesis, so we cannot reject H0. This means that
the proportion would be different from 15% (in this
case higher) and the empirical evidence of the
sample supports the null hypothesis at a 5%
significance level.

 I
n the case of PSOE, we have set the null hypothesis of the population proportion being equal to 21%,
while the alternative hypothesis is the population proportion being a value different from 21%
ANDREA DÍAZ JORGE GRAU

In this case, the p-value is the probability


on the left because the experimental value
is on the left of H0 and it's multiplied by 2
because it’s a two-sided test. The p-value
is 8,02%, which is higher than the alpha
5%. This means not only that the
experimental value is in the acceptance
region and therefore there is not enough
evidence to reject the null hypothesis.

 For PP, we have set a null hypothesis of the population proportion being equal to 39%, while the

Firstly, we have compared the experimental value to the critical value, in this case the -Z-score of α/2
because it is a two-sided test.
After comparing them, the experimental value is in the critical region under the alternative hypothesis, so
we reject H0 and we accept H1 which means that the proportion would be different from 39% (in this case
higher) and the empirical evidence of the sample supports the alternative hypothesis at a 5% significance
level.

alternative hypothesis is the population proportion being a value different from 39%
ANDREA DÍAZ JORGE GRAU

4. FINAL QUESTION: WERE THE POLLS RIGHT?

 In the case of PP, the polls were slightly wrong


because the result of the proportion of people who
voted for PP was 44.73% which is not within the
confidence interval created that was between
39.5% and 43.1% with a level of confidence of
95.5%. Also, the hypothesis test we did before
was absolutely right because it rejected the null
hypothesis (p=0,39) and accepted that the
proportion would be different from that value
44,73
(higher).
ANDREA DÍAZ JORGE GRAU

 In the case of Más Madrid, the polls were absolutely


right because the result for this political party was
16,97% of the votes, which is within the confidence
interval created for the population proportion that
was between 16,21% and 18,99%. Also, we can
conclude that the poll made was very representative
of the population proportion and that the hypothesis
test made was right (accepting the alternative
hypothesis of the population proportion being
different from 0,15).

 In the case of PSOE, the polls were wrong


because the result of the proportion of people who
voted for PSOE was 16,85% which is less
percentage than the predicted one and is not within
the confidence interval created for the proportion of
people intending to vote for PSOE that was
between 18,24% and 21,16%. On the other hand,
the hypothesis test we did before was not accurate
because it rejected the alternative hypothesis
(proportion different from 0,21) and accepted that
the proportion would be equal to 0,21.

 In the case of VOX, the polls were right because the


result on the elections was 9,13% which is within
the confidence interval created for the proportion of
people intending to vote for VOX that was between
8,33% and 10,47% with a level of confidence of
95.5%. We can conclude that the poll made was
very representative of the population proportion and
that the hypothesis test was right (accepting the
alternative hypothesis of the population proportion
being different from 0,11).
ANDREA DÍAZ JORGE GRAU

After compering the predictions of the poll to the real results of the elections can see that the
predictions were very close so yes, we can say that they were right. The main goal of elections polls is
not to predict the exact real value we will obtain the day of the election although that is the ideal
situation but to be close to it.

As we can observe the only vote estimations that weren’t extremely close to the real value were PP and
PSOE, in the case of PP the prediction underestimated the percentage of votes, it predicted 41,3%
while the real result was 44,73% and in the case of PSOE it underestimated it predicted 19,7% while
the real result was 16,85%. In the case of the other four parties that appear in the poll the predictions
were extremely close to the real value with the difference between the real value and the prediction
being always smaller than 1%.

WERE THE POLLS RIGHT? FUTHER RESEARCH AND CONCLUSIONS FOR THE E-PORFOLIO

We reached the conclusion that the poll for El País was right what about the other polls? Here is a
comparison between the average result of all the election polls and the election result:

Source: Francisco Llaneras Estrada


ANDREA DÍAZ JORGE GRAU

As we can see the vote estimation was very close in most cases, the only error was in the vote
estimation for PP and PSOE. This indicates that the error that appear in the poll for El País in which the
vote estimation for PP was underestimated and the vote estimation for PSOE was overestimated was a
common tendency among most polls. Maybe there was a change of vote intention that the polls could
not detect.

Here is a comparison of all the polls' errors and successes and the real result:

Source: Electomania

We see that most polls have a low margin of error and in general terms the average error has been
below 2.5 percentage points. Which suggests that on average the polls were close to the real result.

In this table where the accuracy of the polls is judged, a distinction is made between those carried out
in the five-day period prior to the elections and those that are not, because in Spain it is forbidden to
publish and disseminate polls during the five days prior to an election. This prohibition only prevents
polls from being published, not from being conducted.

The most accurate polls were the ones conducted close to the day of the elections or the same day as
the elections: GAD3 FORTA 4M, Demoscopia y Servicios Esdiario 4M and ChulaPanel Final 2M with
an error of 0.76 , 0.82 and 1.17 respectively. This doesn’t mean that the polls conducted at the end of
April weren’t accurate, they actually were as we can see looking at Sociometria for El Español,
Metroscopia for El País (the poll we analyzed before) and Demoscopia y Servicios for Esdiario.

As a matter of fact, this next graph shows that the vote estimations of the polls didn’t change much
during the month of April:
ANDREA DÍAZ JORGE GRAU

Source: El País

If the elections occuered a moth ago the end result would have been very similar what shocases that
the final moth of the campaign didn’t convince many people of changing their vote. The biggest
differences since the start of the campaign were a decrease of voter of PSOE and VOX and the
increase of voters of Más Madrid, Podemos and PP. The vote estimations were almost exactly the
same for Ciudadanos since the start of the campaign period until the end of it.

WHY IS IT ILLEGAL TO PUBLISH POLLS 5 DAYS BEFORE THE ELECTIONS? CAN POLLS
INFLUENCE THE POLITICAL VOTE? CIUDADANOS CASE

As we mention before in Spain the law establishes that “During the five days prior to the voting day, the
publication and dissemination or reproduction of electoral polls by any communication media is
forbidden.” (source: Ley Orgánica 5/1985, de 19 de Junio, del régimen electoral general) . This
measure was created with the intention of preventing polls from influencing voting intentions. This
prohibition is intended to prevent polls from influencing the vote of less popular parties that may not
ANDREA DÍAZ JORGE GRAU
pass the 5% vote threshold (in these elections this affects the case of Ciudadanos for example). This
prohibition only prevents polls from being published, not from being conducted. Although the
effectiveness of this law is debated, it is true that poll results can influence the vote intention of the
population.

Since there is an electoral barrier of 5%: those parties that fall below this percentage of the vote in the
elections will not obtain representation in the Assembly of Madrid. The least popular parties are
sometimes negatively affected by polls, the potential voters might read the polls and if the vote intention
is bellow or close to 5% the voter might choose to vote for a similar party with a higher vote intention to
avoid “wasting their vote” in a party with no representation. In Spain this phenomenon is called “voto
util”. This results in a self-fulfilling prophecy: since the surveys say the party won’t pass the 5% barrier
then party ends up not passing the 5% barrier, although the surveys are not the main reason for that
they can influence the vote intention.

The orange line is Ciudadanos vote intention according to the polls despite having very diferent
estimations around the 14 of march it remains constant for the most part and slightly decreases over
time from a average of 4.8% to an average of 3.8%

Source: El País
CAN THE POLLS HAVE A POLITICAL BIAS? CIS CASE

Source: Electomania

There is an alarming piece of data presented in the table we reference before which is the CIS survey,
CIS is the “Centro de Investigaciones Sociológicas” (Centre for Sociological Research") and is a
Spanish public research institute. It is concerning that a public institution financed with taxpayer’s
ANDREA DÍAZ JORGE GRAU
money gave such an inaccurate prediction with an error of 3.27 which is one of the worst in the table
specially since the CIS has access to more resources and large samples than other pollsters. Is this
just the result of bad luck while conducting the survey or can the CIS poll have a political bias?

Source: El País

As we can see by comparing the CIS poll with other polls it overestimated Más Madrid and
underestimated VOX, giving more votes to the left overall than most polls, this unfortunately doesn’t
seem to be a one time thing but a systematic issue ever since 2018 when José Félix Tezanos became
president of the CIS. The bias could be motivated by the fact that Tezanos has very tight bonds to the
PSOE party.

An analysis of the work of the CIS reveals that their forecasts have been wrong more than usual, they
have always done so with a bias to the left and this bias did not exist until the arrival of Tezanos. The
data showcases that the CIS overestimates the vote intention towards the left. The graphs showcase
17 election polls conducted by the CIS between 2018 and 2021 in all of them the total number of leftist
votes was overestimated.
ANDREA DÍAZ JORGE GRAU

Source: El País

The data clearly displays that polls can have a political bias, but how can a poll have a political bias if
the sample is meant to be random? the source of that political bias is the methodology that the CIS
follows, by methodology we are referring to what is colloquially known among pollsters in Spanish as
"cocinar" (cooking).

What exactly does “cocinar” mean?

The term refers to the techniques a pollster employs to produce more accurate vote estimates from the
raw data. This may seem manipulative, but it is actually a valid and common practice, most of the
leading polls process the data to produce more their estimates. They do so in order to anticipate the
voting intentions of the undecided, predict who will actually vote and correct for biases in the sample.

These are some examples difficulties that can arise when conducting an election poll that
“cocinar” helps us solve:

First example of a difficulty is the people who do not declare their vote. There are people who
choose not to declare who they are going to vote for because they are aware that their choice is socially
frowned upon. One of the effects this is the underestimation of the results. But the “voto oculto” may
also be simply due to the undecided population that, as election day approaches, is deciding its vote.

Can you predict the behavior of the undecided, those people who, if asked, will honestly say they do not
know which party they are going to vote for? In reality, yes. It is common to assume, for example, that
many undecideds will vote for whoever they voted for last time. But more sophisticated models also
determine their vote by taking into account age, ideology or where the voter lives.

Another example of a difficulty is to find out who will vote. Polls face a double problem: to know if
people prefer to vote for one candidate or another, and to know if that person is going to vote or not.
Most people say that they will, because it seen as a good thing and because people are optimistic, but
on the day in question, unexpected events can happen and sometimes we do not vote. To anticipate
ANDREA DÍAZ JORGE GRAU
this, pollsters assign each person a probability as to whether they will actually vote. Some polling firms
ask respondents how confident they are that they will go to the polls and then they reduce that
probability by half if they didn't go to vote last time. Other polls take into account gender, age or the
party each person supports to decide how likely they are to vote.

A third example of problems with polls is that biases in a sample can arise in subtle ways. It may be
that certain people are more accessible to pollsters (such as the unemployed, who spend more time at
home) or that some people are more reluctant to respond. For whatever reason, if the sample lacks
people of one type and there are too many of another, the result may deviate from reality.

To avoid this, it is recommended to do more "cocina" meaning processing of the data: to use
weightings, the most common technique to strengthen the representativeness of a survey. The idea is
to detect which people are in short supply in the sample and give them more weight. For example, if
university women make up 20% of the census but only represent 10% of the respondents, one solution
is to double the value of their responses.

This practice in itself is legitimate and usual, but it can be negative if the methodology is not clear, it is
not explained or if it is deficient.

Now that we know what we mean by methodology, what is the methodology that the CIS
follows?

The methodology used by the CIS to create vote estimates has changed multiple times since Tezanos
arrival but it has always favor the left, two simple examples of how it does this are:

1) It gives too much importance to the 'sympathy' variable

The first formula the CIS used for creating the vote estimate was limited to a combination of direct vote
questions (what people say they would vote for if the elections were held today) and sympathy (the
respondent is asked which party he or she feels most sympathy for).

The 'sympathy' variable is traditionally good for the socialists and therefore will improve PSOE’s vote
estimate. In the methodology of the CIS it said that its estimation model takes the variable “party
sympathy” and views it as a possible “voting option" for undecided and abstentionists. This will of
course make predictions that mistake undecided and abstentionists as leftwing voters.

Tezanos himself mentions how this favors PSOE in an interview with El País: “And we start from the
hypothesis that the greatest probability of voting is for the party with which citizens sympathize the
most. And in this case the PSOE has many more sympathizers, and of course, using the formula vote
intention plus sympathy this creates distances between PSOE and other parties.”

2) Misuse of the “vote recall” variable

The variable vote recall or “recuerdo de voto” refers to what the person voted the last time they voted.
This variable is part of the methodology of the CIS but the CIS doesn’t seem to process this variable
data properly.

In the elections for Madrid in 2021 among the CIS respondents there are more responses from left-wing
voters. In this survey, only 40% of respondents who "remember" their vote from the 2019 Madrid
autonomic elections claim to have voted then for PP, Vox or Ciudadanos, although those parties
accounted for more than 50% of votes. This means that meaning that there is a bias towards the left in
the sample and that bias does not seem to be sufficiently corrected when producing their estimates.

So this is the answer to if the polls can have a political biased and how it is created: yes, the
polls can have a political bias and pollsters can manage to make the results favor the left by changing
the way they process the raw data.
ANDREA DÍAZ JORGE GRAU

You might also like