Trabajo Definitivo Econometria

Daniel Segura
Pérez
Universidad
Autónoma de
Madrid
May 2020
AN ECONOMETRIC ANLYSIS
FOR THE COVID-19
Daniel Segura Pérez
INDEX
1. INTRODUCTION....................................................................................................................2
2. DATA ANALYSIS....................................................................................................................3
·China·......................................................................................................................................3
·Italy·........................................................................................................................................5
·Spain·.......................................................................................................................................6
·Similarities and differences between countries·.....................................................................8
·Types of functions·..................................................................................................................9
·Volatility·...............................................................................................................................10
·Weekly analysis·....................................................................................................................11
·Chinese experience·..............................................................................................................13
3. ALTERNATIVES MODELS.....................................................................................................13
4. PREDICTION........................................................................................................................14
·Identification·........................................................................................................................14
·Estimation·............................................................................................................................17
·Diagnosis·..............................................................................................................................18
5. ·CONTAGIONS·....................................................................................................................19
6. RECOVERED........................................................................................................................37
7. DEATHS...............................................................................................................................57
8. CONCLUSION......................................................................................................................74
9. BIBLIOGRAFY......................................................................................................................75
1
1. INTRODUCTION
¿What is COVID-19?
The coronavirus COVID-19 pandemic is the defining global health crisis of our time and
the greatest challenge we have faced since World War Two.
Since its emergence in Asia, the virus has spread to every continent except Antarctica.
But the COVID-19 is much more than a health crisis, it has the potential to create
devastating social, economic, and political crisis that will leave deep scars.
As work in the Department of Health Economics, have to prepare a set of consecutive
reports concerning the consequences of the international pandemic in our three
countries studied.
So, I am going to prepare a set of reports analyzing three variables in each country. The
variables that I am going to study are:
-The total confirmed cases
-The total confirmed deaths owing to COVID-19
-The total recovered people that test positive on COVID-19.
At last, I am going to compare econometric models to choose the best for each sample
of the variables in Spain and I am going to find the optimal forecast for the samples of
the three variables. I assume that a optimal forecast it´s a prediction obtained by
minimizing the expected loss function.
With the predictions of the variables for each different sample, I am going to analyze
the quality and precision of the prediction depending on the sample quantity and the
model used.
2. DATA ANALYSIS
·China·
Analyzing the data of the three countries, I am going to start with the country where
the pandemic started, China. I have data about the city of Hubei, and the data fill from
22/1/2020 to 3/25/2020. The country of China is the one from I have more information
about, as I have a sample of 64 observations.
2
The first important feature of the data in China is that, in the initial observations the
number of contagions increase in a slow and lineal tendency, the first observation I
have it´s 444 contagions, which is a high quantity for the first observation.
In the first two weeks, the number of contagions increased rapidly from 3554 to 33366
to 2/11/2020, with an increase of 838%.
The number of contagions continue increasing during the first 4 weeks, with an
exponential increment trough the weeks.
The hardest week if we talk in contagions terms, it´s the fourth week, when the
number of contagions increase up to new 28665 contagions, leading to an average of
new contagions per day of 4095.
Along the next weeks the daily increase it´s featured by a smoother positive tendency,
finishing with a constant daily increase and a more lineal growth in contagions.
If I talk about the recovered cases that tested positive on COVID-19, the evolution of
the recovered cases it´s very similar with the evolution of the contagions. This has logic
because the virus has a small mortality to case ratio, so, in general, the many
confirmed cases in Hubei were correctly treated and the patients recovered from the
virus. I can see the similar evolution if I analyze the recovered cases graph.
In the graph 1.2 I can observe a very similar positive tendency of the recovered cases
with relation to the contagions in the city of Hubei. In the graph 1.2, which concern the
contagions, it has a more outstanding growth, but in the case of recovered, we can see
two different phases in the graph.
3
The first one, we observe an exponential increase in the number of recovered cases,
then the observations smooth their tendency and lead to a more constant and lower
increase in the recovered cases, due basically by the fact that there are not new
confirmed cases.
The best day if we talk in terms of recovered cases it´s the 2/21/2020 with an
incredible peak of 9181 recovered cases.
The city of Hubei may be an idyllical reference for other countries with an average
recovered case of 986 per day.
To end with Hubei, I am going to analyze the deaths during the sample period. China is
characterized for being one the greatest countries in terms of deaths in the
international pandemic.
The behavior of the variable deaths in China it´s completely different form the other 2
countries. The deaths starts with a slowly and constant increase in deaths from
1/22/2020 to 2/19/2020, when an exponential increase characterized the next few
weeks up to the last week of observations, when the daily deaths start to decrease
until the last day of observations when the daily deaths were only 3.
·Italy·
First, it´s important to say that Italy was the spotlight of the virus in Europe, so the fast
expansion in Europe is very similar to the expansion in the EU country.
4
The first observation of Italy starts with only 3 confirmed contagions, during the next 8
days the virus was spreading through all the country, mostly in the region of
Lombardia, one of the most affected regions of Italy.
After that initial week of no increases in contagions, the number of confirmed cases
start to increase very quickly. From the week starting at 3/4/2020 the contagions start
to increase in an exponential behavior. In Italy it´s very important, also in the case of
Spain, to analyze the data in week terms, because the data change too much
depending on which week we analyze. The worst week of Italy of the variable of
contagions it´s the last week of observations, and that tell me that the exponential
increase in the number of contagions in Italy is going to continue in the next weeks of
March and April.
Secondly, analyzing the variable of recovered people that test positive on COVID-19, in
the graph I can see the variable starts equal to 0, because the first weeks the sample
does not have information about Italy, which starts on 2/7/2020.
As I commented with China, the variable recovered it´s very similar in graph terms with
the variable of confirmed contagions. This is because the virus has a low death rate in
general cases, so the recovered graph it´s very similar to the confirmed cases, it´s even
more exponential because some people have only a few symptoms and recovered
easily.
The week with the most recovered cases it´s also the last week of observations, with
an average of recovered people per day of 589 and the highest number of recovered
people in a day was at 3/22/2020 with 1300 recovered cases.
5
To sum up with Italy, I am going to analyze the variable of deaths, Italy is one of the
most kicked countries with a total number of deaths in my observation of 10950 and if
I look to the total number of death people today, at 5/26/2020, the total confirmed
deaths in Italy are more than 30000, being the European country with the highest
number of deaths.
Obviously the first weeks of the sample the deaths are equal to 0 but, later on the
variable start to increase also in an exponential form, but I sincerely think that both
recovered and contagions variable have a sharper function than the variable deaths.
Italy does not reach the highest peak, that would be in a date out of the sample.
·Spain·
Now I am going to analyze Spain, our country and the country which we are going to
predict and do the forecast about the three coronavirus variables.
So, first of all, I am going to highlight that Spain is the second most affected country of
Europe, behind Italy, so the behavior of the data in Italy and Spain it´s very similar and
have so many things in common.
If I observe the evolution of the confirmed cases in Spain, the curve fits an exponential
curve, and even more pronounced than in Italy, when it was so steep.
But as in the case of Italy, the curves are very similar because the effect of the
coronavirus in both countries are very similar too. Also, both countries share culture
and traditions and geographical zone. This fact may also affect and contribute to this
similarity of the graph.
Analyzing the data, the first confirmed case arrived in Spain at February, day 27, when
the government register 2 confirmed cases of coronavirus in Valencia. The evolution in
Spain it´s very lineal the first 4-5 weeks, because of the few confirmed cases, but the
contagions will increase after this first week. As in the case of Italy, it´s better to
analyze the data weekly.
For example, it makes no sense to analyze all together because the function is an
exponential one, so the first data it´s going to be around 0 or small values and in the
time or only 1 or 2 weeks the contagions will increase massively.
6
The week with the higher number of confirmed cases is the week that starts at
3/18/2020, the last week of the sample, as in Italy, the new cases confirmed in Spain
will be higher the following weeks.
In the variable recovered cases, as I said that in Spain it´s better to analyze the data
weekly, I am going to focus on the 2 last weeks of observations. However, this
observations of the last weeks are not going to be as good as possible and
representative.
The number of recovered people augmented significantly from the week 3/11/2020
and continues with an exponential function, with some positive and negative peaks,
and an average recovered people per day in the last 2 weeks of 888 recovered cases
per day.
The recovered cases and the graph from Spain and Italy are very similar, analyzing
deeply the graph, I see that the graph of Spain is a little bit more exponential than the
Italian one.
The variable deaths in Spain follows a very similar tendency as recovered and
confirmed cases, but the curve is smoother than the other 2 variables.
As I see in the graph, again it´s better to analyze the variable deaths weekly, and in the
first 2 weeks of observation the deaths in Spain were not to high, but when the virus
expanded the deaths increase too much. For example, since the deaths were
increasing, in week 3 at most, the deaths increase with a average of 75% more from
one day to the next, with a highest daily change of 168% from 3/7/2020 to 3/8/2020.
I can say for sure that the number of deaths will continues to increase the next weeks.
7
·Similarities and differences between countries·

If I analyze the 3 variables together of the 3 countries, it´s very clear that Spain and
Italy have so much similarities and not only in the graphs, also in the data.
China stay out of comparison between it has a completely different behavior against
Spain and Italy.
But instead of the different behavior of China, it has a similarity with Spain and Italy,
but only in the first 20 observations of confirmed cases and contagions.
China starts both variables with a lineal function which develop into an exponential
function, but, then the curve in China stabilizes and become more constant, while in
Spain and Italy, the exponential function continues to rise and rise.
If I compared for example the recovered graph of Spain and Italy, it has almost the
same tendency, it´s very similar. I can clearly identify 2 different phases within the
graphs. The first phase is where the virus is settling in the population so the confirmed
cases and recovered are very low, and the deaths are equal to almost 0. And then I cn
observe the second phase, the virus it´s everywhere and the confirmed cases starts to
increase too much, and 2 or 3 days after that the recovered cases follow the same
behavior so, both countries have too much common thing in both variables
commented. However, in China I can identify 3 phases in the variable’s graphs.
The first one, it´s the same as the first phase in Spain and Italy, I can see low confirmed
cases and low recovered ones. But, the second phase started early in China and it´s
also equal to the second phase in Spain and Italy, the confirmed cases and recovered
starts to increase in an exponential form so the total number increased too much from
one day to the next one. Finally, I can see a third phase which it´s not present until
now in the Spain and Italy graphs. This phase concerns the total recuperation of China,
because the confirmed contagions remain small and constant, very controlled by the
medical authorities and the variable deaths start to decrease to levels close to 0.
So, this third phase it´s not possible to see until now in Spain and Italy because this
countries are in different stages of the pandemic, so China is almost to finalize the
pandemic and Spain and Italy are only starting the pandemic.
If I look to the empirical results of the main statistics, I can see that the average
contagions, average recovered cases, and deaths are very similar in Spain and Italy,
while in China are very different results.
SPAIN ITALY CHINA
AVERAGE CONTAGIONS 1767.9286 1678.8958 1059.3753
AVERAGE DEATHS 100.2857 126.6041 49.4218
AVERAGE RECOVERED 250.4642 228.1253 986.7656
So, both Spain and Italy have similar averages, but China does not, and this also
demonstrate that Spain and Italy managed worse the situation of the pandemic than
China do up to this moment.
8
·Types of functions·
EXPONENTIAL FUNCTION
The exponential growth function is applied to a variable such that its variation grows
faster and faster over time. The variable grows at a growth rate, this growth rate is
raised to a constant that must be greater than 1.
This exponential function is very well adopted in our data series because this virus has
had a very rapid growth trend, based on a very low group of people, it has grown very
fast over time exponentially.
LINEAL FUNCTION
If I said that some type of lineal function represents our data, I would be lying because
it´s not true.
A linear function should maintain constant growth throughout the days, and as we can
see in the graphs for each of the countries, the data series have spikes and falls with
irregular peaks that make it clear that the linear function is not the most suitable for
this series of data.
LOGARITHMIC FUNCTION
The logarithmic function could be a perfect applicant for our data.
The logarithmic function is the opposite of the exponential function, it can be
explained as a function in which the data quickly grows at the beginning but little by
little as the weeks go by, the data stabilizes until they become constant.
The problem with logarithmic function is that in this case, the virus has not grown as
fast and upward as logarithmic function does.
QUADRATIC FUNCTION
The quadratic function is a function whose graph is a parabola, that is, from an analysis
point of view, we could represent our variables as a parabola. Analyzing the 3
variables, both contagions and deaths variables would be the ones that most resemble
a quadratic function only in China and in the set of the second and third phases.
Therefore, we could say that the functions that best represent this crisis in general
would be the exponential function for the recovered variable, since it will grow steadily
to that of those infected, but it will continue to grow when contagions are 0.
For the contagion and death variables I think that the most successful functions will be
both the exponential and the quadratic, the exponential because it represents very
clearly the very marked increase in cases and the quadratic function helps us
understand how cases have become much more constant and they have been
decreasing in a controlled way until they have 0 infections and 0 deaths.
9
With the sample we have, the exponential function would represented all the variables
and countries, but as I know what will happen the following weeks and that the
pandemic would stabilize by the government, and the number of contagions and
deaths would become more constant so that its more likely a quadratic function.
·Volatility·
I want to study and analyze now the volatility and dispersion of the variables of our
data. So, first I compute the main statistics, which are, mean median and variance of
the 3 countries.
Also, if I want to check for volatility, I could also look in standard deviation, which is the
square root of the variance and in the range.
Therefore, now I am going to analyze the data on a weekly basis, in this way the data
will be much more representative and reliable because our sample varies a lot from
the beginning to the end of the period and it is much better to analyze it weekly.
If I look for the first variance, a variance calculated with all the data together, in a
simple group, I can see that is too high and that it´s because, as we know, the variance
is a measure of dispersion defined as the expectation of the square of the deviation of
said variable from its mean. Basically represents the dispersion of the values of the
variable during the sample period, so if we look at the variance takin into account all
the period, it would be so much higher because the dispersion between the fist value
and the last one is going to be enormous.
So, it´s better to look weekly. If we compare both variances, the difference is very big,
but instead of that, the weekly variance it´s actually big, that means that our data has
many dispersions between its values and such that, the volatility of the series is higher.
The volatility of the sample it´s going to continue increasing because the pandemic it´s
just arriving in Spain and Italy, so as the values of the following weeks continue
increasing, the variance of the total period it´s going to be higher and higher.
If I compare the variance of the different countries I can see that, for example, in the
variable contagions, Spain has the highest variance taking into account all the sample
period, and this is because Spain suffered a massive increased in the confirmed cases
in a shorter period than Italy and China. The variances of the other 2 variables are
similar, only that variance in China is a little bit lower tan the other 2 countries, this is
because the data it´s more constant at the last weeks.
Now I can analyze the case of Spain, as it´s going to country that I am going to predict,
and I am going to analyze the variable contagions.
Looking at the variances of the weeks, the highest value is from 3/11/2020 to
3/18/2020, in this week I see a variance equal to 830000, that is so high, but it´s
because the differences of the data with the mean of the week are high.
10
The variance follows a evolution trough the weeks, the first week it was not so
elevated, with a value of 222, but as I comment before, it´s because the values in the
first weeks are lower because the virus was settling in the country.
So, when the virus settled, the variance starts to be greater than the last week, but the
last week I observe a variance slightly lower than the previous week.
In the case of Italy, it was very similar to the case of Spain, and China have lower
variance values.
In conclusion, analyzing the variance within the whole sample period could confuse us,
because the dispersion is very high in variables like the ones I have, so it´s much better
to analyze the variance in week periods.
The variance means volatility, so the country with the highest volatility is
Spain in the three variables, mainly because the higher difference because the first and
the last value and the shorter sample period.
So, of course, the results are going to be more reliable if we analyze weekly, but not
only in the case of variance, also the mean.
The mean it´s going to be more representative of all the sample if it is calculated for
each week, otherwise the mean maybe is not going to represent well the first and the
last values of the sample.
·Weekly analysis·
China
Analyzing weekly the data of China, I can divide the variable contagions into 2 stages,
the first stage covers the first 4 weeks, when the number of confirmed cases increased
too much but, in the second stage, the last 4 weeks, the confirmed cases decrease day
by day an remain constant near to 0.
In the variable of recovered cases, now I can not divide it between 2 stages, because
when the number of contagions starts to decrease, the number of recovered people
continuous increasing because the people contagious from the past week they are
recovering one or two week after they got the virus. But in general, the 4-5 weeks the
recovered number of people increasing with the same rhythm as the confirmed ones,
but if I jump into the last 2-3 weeks the cases are going to decrease significantly
because there are not new confirmed cases.
To sum up with China, the variable deaths reach the highest average deaths per day at
week 3, and then starts to decrease because of the good management of the
pandemic by the Chinese government.
11
Italy
The variable contagions in Italy suffered a continuous increase in the value of its
variance during the weeks up to the last week, when the variance was high but it
decreases a little bit.
Looking at the recovered cases, here I can see the relation of the variances between
the contagions and the recovered variables. Once the contagions start to remain
constant and around 0, the recovered cases continue increasing and the dispersion is
higher and higher. So, as the contagions variance increase continuously up to the last
week, the variable recovered increase constantly in all the weeks.
The variable deaths follow the same tendency as the recovered cases, increasing to
much the first weeks, then the last 2 weeks the deaths continuous increasing but not
at the growth rate than the previous weeks.
Spain
First, the variable contagions in Spain show me the highest increase within the 3
countries, I only have 4 weeks so it´s difficult to analyze, but, the contagions increased
less in the first week than in the other ones, and then the massive increased in the last
2 weeks produced that the variance is such higher.
Secondly, the recovered cases in Spain, fits the same tendency and evolution that the
variable contagions, only with a short delay of values because of the minimum time
that people need to recover from the virus. But the recovered cases increase week by
week and I sincerely say that the variable will continue to increase the following
weeks.
To end with Spain, the variable deaths it´s maybe the variable with the highest
dispersion in Spain, and in the three countries. The weekly data show me that the
average deaths per week increase over the weeks and will increase during the next
weeks if the confirmed cases increase also.
·Chinese experience·
I think that the Chine experience it´s not valuable por Spain and Italy, basically because
the sample period is different, also because China managed the virus much better than
the Mediterranean countries.
I would change my opinion if the question were if Italy could be a good model for
Spain, my answer would be yes. Mainly because Spain and Italy are two very similar
countries, and that Spain saw what was happening in Italy before the virus arrived in
Spain. But Spain and Italy could not valuate at all the situation in China, because of the
absence of information and reliable data.
12
Regarding your question about any weird observation in the data of China, the only
eventually and for me unusual observation is one value in the variable of recovered
cases, the day 2/21/2020 9181 people recovered in Hubei from the coronavirus, the
odd feature about this value is that if I do not take into account that value to calculate
the average recovered cases the same week, the average will be equal to 2204
recovered cases per day, but if I take into account that extremely value, the new mean
of the week is 3201.
In conclusion, this reason is why economists have to be care with random variables
and with peculiar values in the series, the average recovered cases per day increase in
1000 units from one week to the another one, so it´s difficult to believe that this value,
which is only for 1 day, would not undermine the whole average of the week.
3. ALTERNATIVES MODELS
First, before starting the prediction of the variables analyzed in point 2, I have to
introduce for the simple and more used models in time series predictions and
econometrics.
Econometrics models are constructed from economic data with the aid of the of the
techniques of statistical inference, econometrics models are usually based on
economic theories that assume optimizing behavior on the part of economic models.
-MA model: The (MA) representation is known as moving average because it is a sum
of weighted shocks and its “moving” because the shock are different in each period.
So, by theory, a MA(q) is a moving average model of order q so that the dynamics of
the process are a linear unction of the last q innovations.
The parameters of the model MA are called theta, and I am going to obtain as many
parameters as the q is.
-AR model: The autoregressive model predicts future behavior based on past behavior.
It´s used for forecasting when there is some correlation between values in a time
series and the values that precede and succeed them.
I am going to use only past data to model the behavior, hence the name
autoregressive, the process is basically a linear regression of the data in the current
series against one or more past values in the same series.
So, by theory, an AR(p) it´s an autoregressive model of order p so that the dynamics of
the process are a linear function of the last p observations.
The parameters of the model AR are called phi, so I am going to have as parameters as
q.
-ARMA model: The autoregressive-moving-average ARMA(p,q) models provide a
parsimonious description of a stationary stochastic process in terms of two
13
polynomials, one for the autoregression AR and the second for the moving average
MA.
-White noise: Stochastic process characterized by lack of autocorrelation at any
displacement.
So, basically it´s a random signal with equal intensities at every frequency and is often
defined in statistics as a signal whose samples are a sequence of unrelated, random
variables with no mean and limited variance. In some cases, it may be required that
the samples are independent and have identical probabilities.
4. PREDICTION
To get started in the prediction of the variables I have the obligation to introduce the
process of a prediction theoretically and explain a little bit how I choose the best
model for each variable, and how I calculated and interpret all the results in EViews.
In the whole process of prediction, I split the process in 4 substages that I believe it´s
better to do and more practical.
·Identification·
First, I must identify the better model which fits the series, so I firs part it´s to see if the
series has only regular part or also has a stationary part.
As I have learned in my process to became an econometric specialist, if I have non-
stationary data, it would not be possible to work with the models MA and AR, but I
have to think about it and know that the consequences of using non-stationary models
to forecast prediction in the short run has no sense in econometrics terms. So, the
consequences of non-stationary models are only important when the number of
observations tend to infinite (∞).
So, in this situation, with the COVID-19 I have a relatively small sample to work in, so
the non-stationary models are not going to have relevant consequences for me in the
prediction process.
So, within the stage of identify the correct model, I have also 4 steps.
-First step: I have to study if the variable (series) is stationary in terms of variance, this
step is very simple, so I just have to check the series graph, and see if the variable
follows a constant tendency and if it´s homogeneous.
If the graph is homogeneous and fits a constant tendency (upward or downward) the
variable it´s going to be stationary in terms of variance.
Otherwise, if the series graph has a positive notorious tendency, and also that
tendency it´s not homogeneous, for example, it´s constant at first but then starts to
increase in the last part of the graph, then it´s heterogenous not homogenous.
So, if the series graph it´s not homogeneous, I am going to applied logarithms to the
variable.
14
Why do I apply logarithms?

In econometrics, apply logarithms in the equation or in the series in my case, has the
objective of stabilize the regressors and parameters, to reduce the atypical
observations and to establish different points of view for the estimation.
So, imagine my series it´s not stationary in terms of variance, so I am going to apply
logarithms to my series. I am going to show you the effects of the application of
logarithms in the practical predictions. The result it´s that now our series graph it´s
going to be more homogeneous and the tendency it´s going to be more constant over
the period sample.
-Second step: Secondly, I am going to study if the series it´s stationary in terms of the
mean. The mean it´s the average of values of the variable over the sample period.
As I applied logarithms, I must look at the logarithm series graph to check if the
variable is stationary in terms of mean. With a naked eye, I can know if the variable is
stationary or not. The graph must be close to a representative mean, it´s irrelevant if I
have positive or negative values above or below the mean but the idea that a variable
is stationary in terms of mean it´s that the values be around the mean.
Generally, a series like the ones I have are not going to be stationaries in terms of
mean. But I must look for more empirical results to answer the question.
I am going to ask EViews for the correlograms of the Autocorrelation Function (ACF)
and the Partial Autocorrelation Function (PACF) and I am going to analyze especially
the first values of the correlograms. A random variable would not be stationary in
terms of mean if it has infinite and decreasing ACF and only 1 but elevated value of
PACF. This last condition is a sufficient but not obligatory condition.
·Autocorrelation function (ACF): Collection of correlation coefficients between any two
random variables in the stochastic process that are k periods apart for k=1,2,…
·Partial Autocorrelation Function (PACF): The partial autocorrelation function gives the
partial correlation of a time series with its own lagged values, regressed the values of
the times series at all shorter lags.
I have another alternative to know if the variable it´s stationary in terms of mean. I am
going to ask EViews for the Augmented Dickey-Fuller statistic.
·Augmented Dickey-Fuller statistic: Test statistic to assess the presence of unit roots by
also considering the autocorrelation of the data.
So, when EViews give me the value of the Augmented Dickey-Fuller test, I am going to
realize a Hypothesis testing where:
·Null Hypothesis: The variable has a unit root

·Alternative hypothesis: The variable has not a unit root
15
With a confidence level of 95% and hence a significance level of 5%, I am going to
reject or not reject the Null Hypothesis.
If first I reject the Null Hypothesis because the variable has a unit root, then it means
that the variable is not stationary in terms of mean.
In the case the variable has a unit root, and as I want the variable to be stationary with
respect to the mean, I need to do something to convert the series into stationary with
respect to the mean.
To convert my series into stationary with respect to the mean I need to take regular
differences. Why regular differences?
As I just have taken logarithms to the series, if a take a one regular difference this may
help to stabilize the variance and to transform the series into a series with a more
representative a mean which it´s around 0.
1st difference: Difference between the value of a random variable and it´s 1-period
lagged value.
I am going to take that first difference and then check again if the series has now a unit
root or not. So, when I take logarithms and one regular difference to the variable, I ask
EViews for the Augmented Dickey-Fuller test and I realize a Hypothesis regarding:
·Null Hypothesis: The variable has a unit root
.Alternative hypothesis: The variable has not a unit root
If the Augmented Dickey-Fuller test show me that the variable has another unit root I
must take a second regular difference and repeat the Hypothesis. Usually it´s not
necessary to take more than 2 regular differences.
Therefore, now I have my series stationary with respect to variance and to respect to
the mean.
-Third step: Identify the best model for out data series. I am going now to check the
best model for our series data, so it´s very useful this theoretical comparative table to
choose for the best model.
STATIONARITY INVERTIBILITY A.C.F P.A.C.F
MA(1) YES CHECK 1 ∞
MA(2) YES CHECK 2 ∞
AR(1) CHECK YES ∞ 1
AR(2) CHECK YES ∞ 2
ARMA(p,q) CHECK CHECK ∞ ∞
WHITE NOISE YES YES 0 0
I only must ask EViews for the correlogram of the series taken by logarithms and with
the first or second (if necessary) regular differences.
16
The Correlogram give me the significant values of the ACF and PACF, which are the
ones that surpass the significance bands. So, I just have to see the correlogram and
with the table choose the best model for our series.
With this last step, I just finished with the substage of identification, so now, I have the
2 or 3 optimal models I think are going to fit better the data so I can check them
through the diagnosis process.
·Estimation·
This substage is very simple and quickly, I just ask EViews for the estimate equation
regarding the series taken by logarithms and with the necessary regular differences
and I include in the equation the model I think predicts better the data.
Estimate: Specific value of the estimator based on sample information. For this
estimate I am going to obtain an estimation.
Estimation: Branch of statistical inference that aims to calculate the parameters of a
population model based on sample information.
For example, d(log(series),1) c AR(1)
I just include the constant of the equation if it´s significant, if it does not, I can exclude.
Once I had estimated all my alternative models for the data I just go and compared
each together in the diagnosis substage. This diagnosis it´s going to clear me which
model is the best for my series data.
·Diagnosis·
First, I have different methods to check the model, they are:
-Significant parameters: As I know, the models MA and AR, ARMA has each
parameters, in the case of the Moving Average Model, the parameter tetha (Ɵ), and
for the Autoregressive Model (AR), the parameter phi (ɸ), and obviously for the
Autoregressive Moving Average Model (ARMA) both parameters are considered.
These parameters are supposed to be significant, if they are, the model will predict
well the data series. If not, the prediction it´s not going to be as precise as I want to be.
I realize a Hypothesis for the parameters where:
·Null Hypothesis: The model has a significant parameter
·Alternative Hypothesis: The model has not a significant parameter
The parameter is significant if it´s below the value of 0,05.
-The residuals are white noise: Another important check in the diagnosis substage is
that the residuals of the model are white noise. I can check that assumption in two
ways, looking into the correlogram of the residuals and check if any value of the ACF
17
and PACF surpass the significance bands. And, I can check the Q-stat, which has to be
greater than 0,05, in order to confirm that the residuals are white noise.
Q-stat: Test to assess the joint statistical significance of several autocorrelation
coefficients.
-Stationarity and Invertibility
·Stationarity: A stationary time series is one whose statistical properties such as mean,
variance and autocorrelation, are all constant over time. Property of AR models
·Invertibility: Property of an MA model that guarantees and equivalent AR
representation in which the present is a function of past information.
I must check if my MA models are invertible, because I know that an MA model it´s
always stationary.
And, for sure I must check If my AR models are stationary, because know that an AR
model its always invertible.
So, to check I my models are stationary or invertible I am going to search for the
“Inverted MA/AR Roots” and I am going to realize a Hypothesis testing if the roots are
or not lower than 1.
·Null Hypothesis: The inverted roots are lower than 1
·Alternative Hypothesis: The inverted roots are higher than 1
So, I check the inverted roots and decide if the models are stationary or invertible.
-Residuals normality: The last step in the diagnosis process it´s to check if the residuals
are distributed as a normal random variable. I must investigate the Jarque-Bera test.
·Jarque-Bera: The Jarque-Bera is a goodness-of-fit test of whether sample data have
the skewness and kurtosis matching a normal distribution.
I ask EViews for the Histogram of the residuals of the model and I obtain the Jarque-
Bera test. One I have it, I realize a Hypothesis:
·Null hypothesis: The residuals follow a normal distribution
·Alternative hypothesis: The residuals do not follow a normal distribution
The residuals are going to follow a normal distribution if and only if the value of the
Jarque-Bera test is greater than 0,05.
-Akaike and Schwarz criterion: If the diagnosis of both alternatives models does not
clear at all which is the best one to predict the data series, I can appeal to the Akaike
and Schwarz criterion.
·Information criteria (AIC, SIC): Measures to select the best time series models by
minimizing the residual variances but considering a penalty function to compensate for
irrelevant regressors.
18
I am going to choose the model with the lower information criteria.

·Prediction·
I am going to predict the three variables of Spain, regarding different sample periods,
so in total I am going to predict 6 times each variable.
5. ·CONTAGIONS·
Sample range from 2/26/2020 to 3/20/2020
Stationarity with to respect to the variance
If I take logarithms to the series, the series its more stationary with respect to the
variance. I also can see if the series need to tale logarithms looking into the
correlogram of the series.
Stationarity with respect to the mean
As the Dickey-Fuller show me that the variable has a unit root, I must take one regular
difference to transform the series into stationarity with respect to the mean.
19
I do reject the Null Hypothesis

because the variable does not
have more unit roots.
Now I can identify the model properly
As I can see in the correlogram

of the series taken with
logarithms and with one regular
difference, the most probable
models that will fit better the
series are anyone. Why?
Because the ACF and PACF does

not surpass the significance
level bands.
Therefore, the series is going to be a white noise,

and it make sense because the data its very small to try to predict a random variable, I
am going to demonstrate during the research that as I have samples bigger, the
models are going to be more reliable and predictions are going to be precise.
The white noise is predicted as a Random Walk
A Random Wal is a mathematical object, known as a stochastic or random process,
that consists of a succession of random steps. In a Random Walk, as the values of the
sample are random, the predictions are going to be based on the last disposable value
of the sample.
So, the prediction for the next 5 days in the sample from 2/26/2020 to 3/20/2020 are:
DATE REAL VALUE PREDICTION ERROR
3/21/2020 25374 20410 24%
3/22/2020 28768 20410 41%
3/23/2020 35136 20410 72%
3/24/2020 39885 20410 95%
3/25/2020 49515 20410 143%
20

Stationarity with respect to the variance
I can see that once I logged the series with logarithms, it transformed into a more
homogeneous series with a constant tendency.
Stationary with respect to the mean
Analyzing both correlogram and Dickey-Fuller test and realizing a Hypothesis for the
Dickey-Fuller test, I can confirm that the variable has a unit root and that it needs a
regular difference to be stationary with respect to the mean.
I check now if the series taken by

logarithms and with one regular
difference has a unit root or not
realizing a Hypothesis. I reject the
Null Hypothesis because
0.0000<0.05 so the series does
not have mores unit roots.
To identify the model, I check the correlogram of the series taken with logarithms and
with one regular difference.
21
As in the last sample period, the

series in this sample period are a
white noise because neither the
ACF nor PACF surpass the
significance bands in the
correlogram.
The white noises are predicted with a Random Walk, where the predictions are going
to be equal to the last available value of the sample.

3/26/2020 57786 49515 17%
3/27/2020 65719 49515 33%
3/28/2020 73235 49515 48%
3/24/2020 80110 49515 62%
3/30/2020 87956 49515 78%
Obviously, the forecast error it is going to increase the longer the prediction.
Forecast Error: Difference between the realized value of the variable of interest and its
prediction.

Stationary with the respect to variance
I can see than taking logarithms to the series, it is more homogeneous and with a
constant positive tendency.
22
Looking into the correlogram of the series taken by logarithms and the Augmented
Dickey-Fuller, I have to not reject the Null Hypothesis and confirm that the variable has
a unit root, so I am going to take one regular difference and check again the Dickey-
Fuller statistic.
Now, the series does not need any

more regular differences because
the Augmented Dickey-Fuller is
lower than 0.05
After check that the variable does not have more regular differences, I can go to the
identification of the models, looking at the correlogram of the logged series with one
regular difference.
This series it’s a white noise

because neither the ACF or PACF
values surpass the significance
bands of the correlogram.
As with the 2 last sample periods, its normal that the first predictions of each variable
are white noise because the samples are formed with few observations.
23
The prediction for a white noise is the Random Walk, where the prediction for the next
5 days is the last observation available in the sample.
3/31/2020 95923 87956 9%
4/1/2020 104118 87956 18%
4/2/2020 112065 87956 27%
4/3/2020 119199 87956 36%
4/4/2020 126168 87956 43%
I could add an interesting explanation that is, if I compared the 3 prediction of the
variable contagions with the 3 different period sample, I could see that the error of the
prediction it decreasing the longer the sample period is.

Stationary with respect to the variance
As I can observe in the graph series, the series is not at most homogeneous, so I
applied logarithms to the series and now the tendency is more homogeneous and
constant.
24
Looking at the correlogram of the logged series I can see that the series need a regular
difference, first because the correlogram has infinite decreasing ACF and only one a
very high value of PACF. The Augmented Dickey-Fuller statistic confirmed that the
series need to take one regular difference because it has one-unit root.
Checking again the Augmented

Dickey-Fuller statistic after taking
1 regular difference I can see that
the variable does not need any
additional difference.
So, I can go to the identification of the model.
With the Correlogram, the most

probably models are MA(2) or
AR(2). I am going to test both in
the diagnosis process to choose
the best model that fits the series.
MA(2) DIAGNOSIS
Looking at the stats of the estimate

equation of the series, the parameters Ɵ1
and Ɵ2 are not significant because in both
cases I have to reject the Null Hypothesis
because the parameters are greater than
0.05, so the model does not pass the first
part of the diagnosis, that is to have
significant parameters. I am not going no
eliminate the non-significant parameter
because if I eliminate, I commit the error of
data mining.
Analyzing the stationarity and the invertibility, I know that the models MA are always
invertible, so what I am going to check is if it is invertible.
25
Both Inverted MA Roots are lower than one, even if the number are complex number I
already calculated and they are lower than 1, so the MA(2) is stationary and invertible.
Residuals
Checking the residuals, I must check if the residuals are white noise and if it follows a
normal distribution.
The residuals are white noise because neither the ACF nor PACF values surpass the
significance bands in the correlogram.
The residuals are white noise also because the Q-stat values are all greater than 0.05.
The residuals of the model do not follow a normal distribution because I have to reject
the Null Hypothesis as the value of the Jarque-Bera is lower than 0.05.
AR(2) DIAGNOSIS
The parameter ɸ1 is not

significant while the parameter ɸ2
is. So, I am not going to eliminate
the non-significant parameter.
Stationary and Invertibility

In order to check if the model its stationary, because an AR model it´s always
invertible, I look in the Inverter AR Roots, and, both roots are lower than 1 so the
model AR(2) is stationary and invertible.
26
Residuals
The residuals should be white noise and follow a normal distribution in order to pass
the diagnosis process.
significance bands and also because the Q-stat values are all greater than 0.05.
However, the residuals do not follow a normal distribution because the Jarque-Bera
value is lower than 0.05.
DIAGNOSIS COMPARISION
MA(2) AR(2)
SIGNIFICANT PARAMETERS NO YES
STATIONARY AND INVERTIBLE YES YES
RESIDUALS WHITE NOISE YES YES
NORMALITY NO NO
Like both models are very similar in the diagnosis, except that the AR(2) has significant
parameters I am going to also use the information criteria.
The AR(2) has lower values in the Akaike Info Criterion and in the Schwarz Criterion so I
am going to choose the AR(2) model to predict the sample.
Prediction
27
PREDICTION REAL VALUE ERROR %

4/5/2020 147974.050 131848 16328.05 12.4%
4/6/2020 173158.486 136675 36483.48 26.7%
4/7/2020 212423.629 141942 70481.63 49.7%
4/8/2020 260322.592 148220 112102.59 75.6%
4/9/2020 326026.434 153222 172804.43 112.8%
The predictions are more or less precise in the first 2 days, but the longer the day I
want to predict, the greater is the forecast error.
Interval forecast: Collection of forecasts enclosed between a lower and upper bound.
4/5/2020 (93589.88,202358.21)
4/6/2020 (77106.93,269210.03)
4/7/2020 (37582.31,387264.94)
4/8/2020 (-13499.90,534145.08)
4/9/2020 (-95763.98,747816.85)
So, the model to predict this sample series was ARIMA(2.1.0)

I logged the series in order to transform into stationary with respect to the variance.
28
Therefore, as the correlogram of the logged series and the Augmented Dickey-Fuller
test confirms, the series has a unit root, so I do not reject the Null Hypothesis and I
take one regular difference to the series.
The Augmented Dickey-Fuller

statistic of the logged series with
one regular difference tell me that
I must reject the Null Hypothesis
because the series does not have
more unit roots.
Identification
We identify the properly model in the correlogram of the logged series with one
regular difference.
Therefore, as the correlogram

show me, the models that are
going to represent better the
series are MA(3) or AR(2).
MA(3) DIAGNOSIS
The parameters Ɵ1, Ɵ2 and Ɵ3

are not significant so the MA(3)
model did not pass maybe the
crucial step of the diagnosis.
29
Stationarity and Invertibility

The MA models are always stationary so I have to check if this MA(3) is invertible,
looking at the Inverted MA Roots, as they are all lower than 1, the MA(3) is stationary
and also invertible.
Residuals
The residuals are white noise because, first, neither the ACF nor PACF values surpass
the significance bands an also because all the values of the Q-stat are greater than
0.05.
The model does not follow a normal distribution due to that the histogram of the
residuals show me that the value of the Jarque-Bera is lower than 0.05.
AR(2) DIAGNOSIS

significant but the parameter ɸ2
is, so I do not eliminate the non-
significant parameter of the
estimation and I continue with the
diagnosis.
30

The models AR are always invertible so I should check if it is stationary, so the Inverted
AR Roots confirm that the model AR(2) is stationary because the values are lower than
1.
Residuals
significance bands of the correlogram, and also because the values of the Q-stat are all
of them greater than 0.05.
The normality studied in the histogram of the residuals show me that the Jarque-Bera
value is lower tan 0.05, so the residual does not follow a normal distribution.
MA(3) AR(2)
REDISUALS WHITE NOISE YES YES
NORMALITY NO NO
As the MA(3) has not any significant parameter, I am going to choose AR(3) to predict
the sample.
Prediction
31

4/10/2020 173566.028 158273 15293.028 9.7%
4/11/2020 195540.961 163027 32513.961 19.9%
4/12/2020 231475.830 166831 64644.830 38.7%
4/13/2020 273206.478 170099 103107.478 60.6%
4/14/2020 331221.798 172541 148670.798 92.0%
4/15/2020 400891.356 177644 223247.36 125.7%
As in the last prediction, the forecast error is larger the larger is the prediction date.
Forecast interval
4/10/2020 (112909.02,234223.03)
4/11/2020 (92596.99,2988484.93)
4/12/2020 (45345.15,417606.50)
4/13/2020 (-10544.42,556957.37)
4/14/2020 (-99868.66,762292.25)
4/15/2020 (-217967.92,1019750.63)
The model to predict this sample series was an ARIMA(3.1.0)

Stationarity with respect to variance
I am going to log the series in order to reduce the variance and transform the series
with the objective of create a more homogeneous tendency.
I ask EViews for the correlogram and the Augmented Dickey-Fuller for the logged
series.
32
The Augmented Dickey-Fuller and the correlogram confirm that the logged series has a
unit root and needs one regular difference in order to transform the series and make
the mean more representative over the sample.
The Augmented Dickey-Fuller

statistic of the logged series and
with one regular difference does
not have any additional unit root.
So, I reject the Null Hypothesis
and the series does not need any
more regular difference.
Identification
I am going to test if the model

could be an AR(2) or AR(3) or a
MA(5).
MA(5) DIAGNOSIS
The estimate equation of the
MA(5) clearly are not going to be
a good model to predict my
sample, because none of the
theta parameters are significant,
33
so I am not going to continue the
diagnosis with MA(5).
AR(2) DIAGNOSIS

significant but ɸ2 is so the AR(2)
could be an adequate model for
our prediction.

The AR model is always invertible so I must check if this AR(2) is stationary, to do it, I
look the Inverted AR Roots, which are both lower than 1, so the model AR(2) is
invertible and stationary.
Residuals
The residuals are white noise, and I have two ways to know it. First one, the
correlogram of the residuals does not have any ACF nor PACF that surpass the
34
significance bands. The second way, it´s looking to the Q-stat values, that almost all of
them are greater than 0.05.
The histogram of the residuals shows me if the residuals follow a normal distribution.
In this case, the Jarque-Bera value it´s no greater than 0.05, so I have to reject the Null
Hypothesis and the residuals do not follow a normal distribution.
AR(3) DIAGNOSIS
The stats confirmed that the

model has 2 significant
parameters (ɸ2, ɸ3) but the
parameter ɸ1 is not. I am not
going to eliminate it to avoid data
mining.

The models AR are always invertible so I must check if this AR(3) is stationary, so I look
at the Inverted AR Roots. The values are all lower than 1, so the AR(3) is stationary and
invertible.
Residuals
The residuals are white noise, as I am looking for the important values, that are the
first 4-5 values. The Q-stats values confirm the assumption.
The residuals do not follow a normal distribution as the Jarque-Bera value is lower
than 0.05.
DIAGNOSIS COMPARASION
AR(2) AR(3)
35
SIGNIFICANT PARAMETERS YES YES

REDISUALS WHITE NOISE YES YES
NORMALITY NO NO
Like the both models pass the diagnosis with very similar results, I am going to appeal
to the Information Criteria
The Akaike Info Criterion and the Schwarz Criterion are lower in the AR(3) estimation
so I choose AR(3) to predict the sample.

4/16/2020 186366.595 184948 1418.595 0.8%
4/17/2020 196523.722 190839 5684.722 3.0%
4/18/2020 210206.102 191726 18480.103 9.6%
4/19/2020 226926.629 198674 28252.629 14.2%
4/20/2020 247082.338 200210 46872.338 23.4%
In the last prediction for the variable contagions, I can observe that the forecast error
in % terms is incredible because it is very small in relation with the first prediction for
the same variable.
4/16/2020 (126472.87,246260.32)
4/17/2020 (99117.33,293930.10)
4/18/2020 (50419.66,36992.54)
4/19/2020 (-16885.22,470738.47)
4/20/2020 (-99597.14,593761.81)
The chosen model was an ARIMA(3.1.0)
6. RECOVERED
36

I log the series to obtain a more constant and homogeneous tendency.

The correlogram with the ACF and PACF and also the Augmented Dickey-Fuller confirm
that the series has a unit root so I must not reject the Null Hypothesis and the series
needs a regular difference.
The actualized Augmented Dickey-

Fuller reveals that the series does
not have more unit roots, so I
must reject the Null Hypothesis
and I can identify the alternative
models.
Identification
37
The correlogram of the logged series

with one regular difference shows
that the series is a white noise,
because the values of ACF and PACF
does not surpass in any case the
significant bands.
Prediction
The white noise it´s going to be predicted as a Random Walk, where the predictions
are equal to the last available observation of the sample.
3/21/2020 2125 1588 34%
3/22/2020 2575 1588 62%
3/23/2020 2575 1588 62%
3/24/2020 3794 1588 139%
3/25/2020 5367 1588 238%
The error increase the larger is the predction date.

I already log the series to obtain a more constant tendency and also because the series
graphs is not homogeneous.
38
Both one and another reveals that the logged series has a unit root, so, the series
needs to take a regular difference.

Fuller statistic reveals that the
new logged and with one
difference series does not need
any more regular differences. So,
I reject the Null Hypothesis and I
can identify the series with
alternative models.
Identification
The correlogram of the logged

and with one regular difference
series reveals that the series is a
white noise, so I am going to
predict as it.
The white noises are predict with a process called Random Walk, where the
predictions of the sample are going to be the equal to the last available observation of
the sample.
Prediction
39

3/26/2020 7015 5367 31%
3/27/2020 9357 5367 74%
3/28/2020 12285 5367 129%
3/29/2020 14709 5367 174%
3/30/2020 16780 5367 213%

Stationarity with respecto to variance
I take logarithms to the serie in order to convert the tendency into one more constant
and homogeneuos.
Both the correlagram and the Augmented Dickey-Fuller statistic reveals that the series
need at least one regular difference.
I know because In the case of the correlogram of the logged series, the ACF are
infinetely decreasing values and the PACF its only one and high value.
40
After taking one regular

difference, the Augmented
Dickey-Fuller statistic tell me that
the series does not need any more
regular differences because the
value of the statistic is lower than
0.05.
Identification

series with one regular difference
reveals that the series in this
sample period is a white noise,
because neither the ACF nor PACF
values surpass the significant
bands.
Prediction
The prediction with a white noise is with a process called Random Walk, which the
predictions are going to be equal to the last available observation of the sample.

3/31/2020 19259 16780 15%
4/1/2020 22647 16780 35%
4/2/2020 26743 16780 59%
4/3/2020 30513 16780 82%
4/4/2020 34219 16780 104%
41

As the series graph has a non-constant tendency, I log the series to obtain a more
homogeneous series and with a positive constant tendency.
The correlogram of the logged series and the Augmented Dickey-Fuller confirm that
the series has a unit root, hence I do not reject the Null Hypothesis and in order to
make the mean more representative over the sample I am going to take one regular
difference to the series.

Fuller reveals that the series has
another unit root, so I do not
reject the Null Hypothesis and the
series need a second regular
difference.
42

Fuller reveals that the series does
not have more unit roots, so I
should reject the Null Hypothesis
and I can identify the series.
Identification
The correlogram reveals that most

probably models in this series are a
MA(1) and an AR(3), so I am going to try
both in the diagnosis and choose one of
them for the prediction.
DIAGNOSIS MA(1)
I can observe clearly that neither

the parameter Ɵ1 nor the
constant are significant, so this
model previously it´s not going to
predict well the sample, but I am
going to continue with the
diagnosis.
43
The models MA are always stationary, so I must check if the model MA(1) is invertible in order
to fulfill the condition of the diagnosis.
I look to the Inverted MA Roots and I confirm that this MA model is invertible because the
value is lower than 1.
Residuals
The residuals are not white noise because the values of the ACF and PACF supass the
significant bands.
The residuals are not distributed as a normal variable because the Jarque-Bera is lower than
0.05.
DIAGNOSIS AR(3)
The three parameters of the AR(3)

model (ɸ1, ɸ2, ɸ3) are significant,
this is
The AR(3) model is always invertible hence I have to check if it´s stationary, as the three
Inverter AR Roots are lower than 1 that means that the AR(3) is invertible and stationary.
44
Residuals
The residuals are white noise in the AR(3) model, because neither the ACF nor PACF values
surpass the significant bands. Also, I know that the residuals are white noise because the Q-
stat values are almost all greater than 0.05.
MA(1) AR(3)
STATIONARITY AND INVERTIBILITY YES YES
RESIDUALS WHITE NOISE NO YES
NORMALITY NO NO
I am going to choose the AR(3) to predict our sample.
Prediction

4/5/2020 37806.619 38080 -273.380 -0.7%
4/6/2020 41260.285 40437 823.286 2.0%
4/7/2020 44569.033 43208 1361.033 3.1%
4/8/2020 47723.118 48021 -297.881 -0.6%
4/9/2020 50713.692 52165 -1451.307 -2,8%
As the model pass very well the diagnosis, the predictions in this case are very precise
45
Forecast interval
4/5/2020 (36945.17,38668.06)
4/6/2020 (39147.42,43373.14)
4/7/2020 (40791.50,48346.55)
4/8/2020 (41884.26,53561.97)
4/9/2020 (42424.98,59002.40)
The model chosen to predict this sample series was ARIMA(3.1.0)

I already log the series in order to transform the tendency into a new one more
constanst and more homogeneous.
Both the corrlegram of the logged series and the Augmented Dickey-Fuller statistics
reveals that the series has a unit root and hecne needs to take one regular difference.
So, I must do not reject the null hypothesis yet.
46

series need another regular
difference.
The new actualized Augmented

Dickey-Fuller reveals that the
logged series with two regular
differences does not need
anymore. So, I reject the Null
Hypothesis.
Identification
The better models that will

probably fit the series are MA(1)
and also AR(3).
47
I am going to test both models in the diagnosis and choose th best one for the
prediction.
DIAGNOSIS MA(1)
The results of the estimation with

the MA(1) are not very positive,
with a constant © not significant
and also the parameter Ɵ1 is not
significant.

The MA(1) is always stationarty because of being a MA but I have to check if it´s
invertible. The Inverted MA Root is greater than 1, so the MA(1) model it´s not
invertible.
Residuals
48
First, instead the values of the Q-stat are all of them greater than 0.05, one value of
the ACF and one of the PACF are the sufficiently significant to not consider the
residuals as a white noise.
The residuals do not follow a normal distribution because the value of the Jarque-Bera
is lower than 0.05.
DIAGNOSIS AR(3)
The three parameters (ɸ1, ɸ2, ɸ3)

are significant so this it´s very
significantly to predict a sample.

The AR(3) model is always invertible but I have to check if its stationary. The Inverted
AR Roots are all of them lower than 1, so does it mean tht hthe model is always
atationary and invertible.
Residuals
49
The residuals are white noise.The AR(3) residuals does not follow a normal distribution
because the Jarque-Bera value is lower than 0.05.
MA(1) AR(3)
STATIONARITY AND INVERTIBILITY NO YES
RESIDUALS WHITE NOISE NO YES
NORMALITY NO NO
I am going to choose the AR(3) model.
Prediction

4/10/2020 56176.761 55668 508.761 0.9%
4/11/2020 60019.832 59109 910.832 1.5%
4/12/2020 63688.932 62391 1297.932 2.1%
4/13/2020 67177.868 64727 2450.868 3.8%
4/14/2020 70481.237 67504 2977.237 4.4%
4/15/2020 73594.328 70853 2741.32 3.9%
Forecast interval
4/10/2020 (54943.59,57409.92)
4/11/2020 (57253.66,62786.00)
4/12/2020 (59002.71,68375.14)
4/13/2020 (60215.84,74139.89)
4/14/2020 (60906.61,80055.85)
4/15/2020 (61083.11,86105.54)
So, the model used to predict this sample series was an ARIMA(3.1.0)
50

I log the series of the sample in order to have a more homogeneous and constant
tendency.
Both images, one from the correlogram and the other form the Augmented Dickey-
Fuller statistic reveals that the series has one unit roots and needs one regular
difference.
51

Fuller of the logged series with
one regular difference reveals that
the series needs another regular
difference in order to have a more
representative mean.
With the two regular differences,

the Augmented Dickey-Fuller
reveals that the series does not
has more unit roots, so I can
reject the Null Hypothesis.
Identification
If I observed the correlogram, the

better models to predict this
sample are ARMA(1.1) or AR(3)
52
ARMA(1.1) DIAGNOSIS
The results of the estimation are

very clear, neither the constant
nor the parameters are significant,
so this model is not going to work
in the prediction.

The ARMA(1.1) is invertible because the Inverted AR Root is lower than 1 but it is not
stationary because the Inverted MA Root is not lower than 1.
Residuals
The residuals are not white noise because one value of each ACF and PACF surpassed
the significance bands.
The residuals of the ARMA(1.1) model do not follow a normal distribution because the
value of the Jarque-Bera is not higher than 0.05.
DIAGNOSIS AR(3)
53
The AR(3) has the parameters ɸ1,

ɸ2 that are significant but the
parameter ɸ3 is not. But I am not
going no eliminate to avoid data
mining.

The AR(3) model is always invertible so I must check if the model is stationary, the
Inverted AR Roots are all of them lower than 1, so the AR(3) model is invertible and
stationary.
Residuals
The residuals are not white noise because some values of the ACF and PACF surpassed the
significance bands. And also because the Q-stat values are all lower than 0.05.
The residuals do not follow a normal distribution since the value of the Jarque-Bera is lower
than 0.05.
54
ARMA(1.1) AR(3)
STATIONARITY AND INVERTIBILITY NO YES
RESIDUALS WHITE NOISE NO NO
NORMALITY NO NO
I choose the AR(3) model to predict.
Prediction

4/16/2020 73991.688 74797 -805.311 -1.1%
4/17/2020 76941.587 74797 2144.587 2.9%
4/18/2020 79700.133 74797 4903.133 6.6%
4/19/2020 82266.08 77357 4909.080 6.3%
4/20/2020 84638.703 80587 4051.703 5.0%
Forecast interval
4/16/2020 (72828.57,75154.79)
4/17/2020 (74343.89,79539.28)
4/18/2020 (75313.00,84087.26)
4/19/2020 (75760.45,88771.70)
4/20/2020 (75702.03,93575.37)
The model used to predict was an ARIMA(3.1.0)
7. DEATHS
55

I log the original series with the objective of transform the tendency into one more
constant and more homogeneous.
Both the correlogram and the Augmented Dickey-Fuller statistic reveals that the series
has a unit root and needs at least one regular difference.

Fuller statistic demonstrate that
the regular difference works, and
that the series does not need
more differences. So, I must reject
the Null Hypothesis and identify
the series.
Identification
56

series with the regular difference
it going to be my tool to identify
the model. As they are not
significative values neither of ACF
nor PACF the series is a white
noise.
A white noise is predicted as a random variable with a process called Random Walk,
which the predictions of the sample are equal to the last available observation of the
sample.
3/21/2020 1045 830 26%
3/22/2020 1375 830 66%
3/23/2020 1772 830 113%
3/24/2020 2311 830 178%
3/25/2020 2908 830 238%

As I can see, if I apply logarithms to the seres, the series now has a constant and
positive tendency, which also is more homogeneous.
57
Both correlogram and the Augmented Dickey-Fuller statistic clear me that te series has
a unit root and need to take at least a regular difference.

Fuller reveals that afer the one
regular difference, the series now
has a more representative mean
over the sample, and that the
series has not more unit roots so
it does not need more regular
differences.
Identification
Based in the correlogram of the

logged series with one regular
differences, the series is a white
noise. The values of the ACF and
PACF does not surpass the
significance band in any case.
I am going to predict the white

noise with a process called
Random Walk, in which the
prediction is equal to the last
available observation of the
sample.
Prediction
58

3/26/2020 3647 2808 30%
3/27/2020 4365 2808 55%
3/28/2020 5138 2808 83%
3/29/2020 5982 2808 113%
3/30/2020 6803 2808 142%
The rest of the predictions of the deaths variable I am going to predict it in a different
way.
It´s a different approach form the Box-Jenkins metodology, but the objetive it´s exaclty
the same, to obtain a precise prediction of specific sample.
I am going to identify the tendency of my sereis graph, then I am going to work with
the filtred series in order to isolate the tendency from the series.
Later, I am going to predict the variable without the tendency and also I am going to
predict the tendency alone.
So the global prediction ir´s going to be the sum of both predictions and it´s going to
be interesting to see at the results.
Series graph and the filtered series graph
So both graph are from the same series, the unique difference it´s that form the left
graph it’s the original series and for the right graph it´s the filtered series.
So , I am going to create an estimate equation with the tendency of the graph, and
from that equation, I am going to create a residual, which are the filtered series.
Therefore, I am going to identify the best model that would predict that filtered series
and use to predict the filtres series without the tendency.
First, I am going to predict the tendency with the next estimate equation
59
From this equation I am going to

obtain the values of the predicted
tendency for the predicted days.
So, I revise the correlogram for the filtred series, and I must check if the filtered series
needs or does not need to take any regular differences. So, I check the correlogram
and the Augmented Dickey-Fuller statistic, and both confirmed that the filtered series
needs at least one regular difference.
After taking one regular difference, the filtered series need at lest another regular difference.
60

filtered series does not need any
additional regular difference so I
can identify the best model that
will predict it.
The better models are probably

MA(2) and AR(1).
I just going to compare the diagnosis of both models, with the calculations out of the
sheet, because otherwise the research it´s going to be enormous.
The AR(1) model
The AR(1) model has a non-

significant parameter ɸ1, it is
almost significant, so it´s not a
huge problem. The model it´s
invertible(always) and stationary
because the Inverted AR Root is
lower than 1.
The residuals are white noise, but

they do not follow normal
distribution.
The MA(2) model
The MA(2) model has both

parameters (Ɵ1 and Ɵ2)
significant so that is good news for
the prediction. The model it´s
stationary(always) and invertible
because the Inverted MA Roots
are lower than 1.
The residual are white noise but

the do not follow a normal
distribution.
61
So I am going to choose MA(2) for the prediction of the filtered series.

And I am going to sum both predictions, the tendency one and the filtered series.
Tendency forecast
Filtered prediction
DATE FILT.FORECAST TEND.FORECAS TOTAL FORECAST REAL VALUE ERROR %

T
3/31/2020 1211.956 7353.292 8565.248 8464 101.248 1.20%
4/1/2020 1450.691 8027.612 9478.304 9387 91.304 0.97%
4/2/2020 1691.037 8729.568 10420.605 10348 72.605 0.70%
4/3/2020 1931.450 9459.160 11390.610 11198 192.610 1.72%
4/4/2020 2166.767 10216.388 12383.156 11947 436.156 3.65
So as I can see and you, the error from the predictions is very lower, that demonstrates
this method is also valueable and correct to predict the tendency and later the filtered
series.
62

Series graph(left) and the filtered series graph(rigth).
Prediction of the tendency

Checking the correlogram for the filtered series, in order to know if the filtered sereis
has a unit root, and if the filtered series needs to take a reular difference.
63
After taking one regular difference, the filteres series has a unit root and needs another regular
difference.

additional regular differences
since it does not have any unit
root.

are MA(1) AND AR(1)
So, now I am going to compare each model to choose the best to predict.
AR(1) MODEL
The AR(1) model has one

significant parameter ɸ1. The
model its invertible(always) and it
´s stationary because the Inverted
AR Root is lower than 1.
The residuals are white noise and

they do not follow a normal
distribution.
64
The MA(1) model
The MA(1) model has a non-

significant parameter Ɵ1. The
model it´s stationary(always) and
it´s invertible because the
Inverted MA Root is lower than 1.

distribution.
So, like both models are similar in the diagnosis, I choose the AR(1) model based on
the Information Criteria, both Akaike Info Criterion and Schwarz Criterion are lower in
the AR(1) than in the MA(1).
Tendency forecast
Filtered forecast
65

T
4/5/2020 340.929 12431.072 12772.001 12641 131.001 1.04%
4/6/2020 217.830 13376.303 13594.133 13341 253.133 1.9%
4/7/2020 94.180 14354.360 14448.541 14045 403.541 2.87%
4/8/2020 -45.680 15365.243 15319.563 14792 527.563 3.57%
4/9/2020 -194.774 16408.952 16214.177 15447 767.177 4.97%


So, I revise the correlogram for the filtred series, and I must check if the filtered series
needs or does not need to take any regular differences.
So, I check the correlogram and the Augmented Dickey-Fuller statistic, the correlogram
with infinite and dcreasing ACF values and one and high PACF values reveals that the
filtered series needs one regular differences, instead of what Augmented Dickey-Fuller
said.
66
After taking one regular difference, the filtered series need at lest another regular difference.

will predict it.

MA(1) and AR(1).
67
The AR(1) model
The AR(1) has one significant

parameter ɸ1. The AR(1) is
invertible(always) and it´s
stationary because the Inverted
AR Root it´s lower than 1.

distribution.
The MA(1) model The MA(1) parameter has a non-

significant parameter Ɵ1. The
model it´s stationary(always) and
it´s invertible since the value of
the Inverted MA Root is lower
than 1.
The residuals are not white noise

and the do not follow a normal
distribution.
So I am going to choose AR(1) for the prediction of the filtered series.
Tendency forecast
68
Filtered forecast

T
4/10/2020 -992.123 17153.266 16161.142 16081 80.142 0.50%
4/11/2020 -1358.483 18233.383 16874.900 16606 268.900 1.62%
4/12/2020 -1733.554 19345.273 17611.718 17209 402.718 2.34%
4/13/2020 -2126.571 20488.934 18362.362 17756 606.362 3.41%
4/14/2020 -2533.898 21664.367 19130.469 18056 1074.469 5.95%
4/15/2020 -2956.965 22881.572 19914.607 18708 1206.607 6.45%

69

So, I check the correlogram and the Augmented Dickey-Fuller statistic, the correlogram
with infinite and dcreasing ACF values and one and high PACF values reveals that the
filtered series needs one regular differences, instead of what Augmented Dickey-Fuller
said.
After taking one regular difference, the filtered series need at lest another regular
difference.
70

will predict it.

MA(1) and ARMA(1).
The MA(1) model
The MA(1) model has a significant

parameter Ɵ1. The MA(1) is
stationary(always) and it´s
invertible because the Inverted
MA Root is lower than 1.

distribution.
The ARMA(1.1) model
71
The ARMA(1.1) model has both

non-significant parameters Ɵ1
and ɸ1. The model is stationary
and invertible since the values of
the Inverted MA and AR Root are
both lower than 1.

distribution.
So I am going to choose MA(1) for the prediction if the filtered series.
Tendency forecast
72
Filtered forecast

T
4/16/2020 -2442.035 21736.024 19293.988 19315 -21.011 -0.11%
4/17/2020 -2938.406 22830.451 19892.045 20002 -109.954 -0.55%
4/18/2020 -3448.986 23951.156 20502.169 20043 459.169 2.29%
4/19/2020 -3983.777 25098.138 21122.361 20453 671.361 3.28%
4/20/2020 -4512.777 26271.397 21758.620 20852 906.620 4.35%
8. CONCLUSION
To sum up with the research, I would say that the COVID-19 it´s going to have
unpredictable consequences, political, economics and social consequences that would
affect the whole world.
In order to analyze the data and the results of the calculations and predictions.
The variable contagions was first a white noise because of the small size of the sample,
but later, the bigger samples were predicted with AR(2), AR(2) and AR(3) for the next
prediction. The Autoregressive model was the main character in the contagions
73
scenario, but sincerely it’s the most reliable and precise model to predict the data,
because the Moving Average models does not represent as good as it´s necessary for
the predictions.
The variable recovered was also a white noise, as it´s going to be also the deaths
variable, because of the absence of sufficient observation for predict a model.
The autogressive model was again the protagonist here, with two AR(3) and one AR(2).
The Moving Average in this variable was better than in the contagions case, but not
sufficnetly to predict well a series.
If we talk about deaths, it´s curious because while in the case of contagions and
recovered, the three first predictions were white noise, but in the case of the deaths,
only the first two variables are white noise.
The variables contagions and recovered cases were predicted with the Box-Jenkinks
metodology. However, I tried to implement a different metodology for the last 4
predictions in the deaths variable.
This metodology based their predictions into 2 different predictions that create a total
result. The two different prediction are, one prediction for the variable without the
tendency and other prediction for only the tendency of the serie.
The solution reveals that either one or the other metodology are correct if the process
are perfectly realized.
To draw the conclusion, one can say that the principal learned tool for this research is
that it´s very difficult to analyze and predict series thath have small observations.
During the research, the prediction was better the longer the range of the sample.The
last predictions of the variables are better than the first ones.
Important thing to higlight about the research is that it´s very importante to create a
good diganosis for the alternative models, because sometimes models appear
something that when you test the significant parameters or the residuals you realize
that maybe the model it´s not the best to predict the sample.
I hope you like my research.
9. BIBLIOGRAFY
-Introduction. United Nations Development Programme.

https://www.undp.org/content/undp/en/home/coronavirus.html
-Investopedia Volatility
https://www.investopedia.com/terms/v/volatility.asp
-Science direct
74
https://www.sciencedirect.com/topics/economics-econometrics-and-finance/econometric-
model
-DeepAi
https://deepai.org/machine-learning-glossary-and-terms/white-noise
-Economipedia
https://economipedia.com/definiciones/logaritmos-en-econometria.html
-People.duke
https://people.duke.edu/~rnau/411diff.htm
-Forecasting for Economics and Business (Gloria González-Rivera)
75

Trabajo Definitivo Econometria

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Trabajo Definitivo Econometria

Uploaded by

Copyright:

Available Formats

Daniel Segura

·Similarities and differences between countries·

Why do I apply logarithms?

·Null Hypothesis: The variable has a unit root

I am going to choose the model with the lower information criteria.

Stationarity with respect to the mean

I do reject the Null Hypothesis

Now I can identify the model properly

As I can see in the correlogram

Because the ACF and PACF does

Therefore, the series is going to be a white noise,

Sample range from 2/26/2020 to 3/25/2020

I check now if the series taken by

As in the last sample period, the

DATE REAL VALUE PREDICTION ERROR

Sample range from 2/26/2020 to 3/30/2020

Stationarity with respect to the mean

Now, the series does not need any

This series it’s a white noise

Sample range from 2/26/2020 to 4/4/2020

Checking again the Augmented

So, I can go to the identification of the model.

With the Correlogram, the most

Looking at the stats of the estimate

The parameter ɸ1 is not

Stationary and Invertibility

PREDICTION REAL VALUE ERROR %

So, the model to predict this sample series was ARIMA(2.1.0)

The Augmented Dickey-Fuller

Therefore, as the correlogram

The parameters Ɵ1, Ɵ2 and Ɵ3

Stationarity and Invertibility

The parameter ɸ1 is not

Stationarity and Invertibility

PREDICTION REAL VALUE ERROR %

The model to predict this sample series was an ARIMA(3.1.0)

The Augmented Dickey-Fuller

I am going to test if the model

The parameter ɸ1 is not

Stationarity and Invertibility

The stats confirmed that the

Stationarity and Invertibility

SIGNIFICANT PARAMETERS YES YES

PREDICTION REAL VALUE ERROR %

The chosen model was an ARIMA(3.1.0)

Sample range from 2/26/2020 to 3/20/2020

I log the series to obtain a more constant and homogeneous tendency.

The actualized Augmented Dickey-

The correlogram of the logged series

The error increase the larger is the predction date.

Stationarity with respect to the mean

The actualized Augmented Dickey-

The correlogram of the logged

DATE REAL VALUE PREDICTION ERROR

Sample range from 2/26/2020 to 3/30/2020

After taking one regular

The correlogram of the logged

DATE REAL VALUE PREDICTION ERROR

Sample range from 2/26/2020 to 3/4/2020

Stationarity with respect to the mean

The actualized Augmented Dickey-

The actualized Augmented Dickey-

The correlogram reveals that most

I can observe clearly that neither