Professional Documents
Culture Documents
Answer of All Previous Question in Econometrics
Answer of All Previous Question in Econometrics
Answer of All Previous Question in Econometrics
Question 1:
A. What are the general problems that you might run into?
Answer: The problems that can be faced are as following:
i. Stationary
A common assumption in many time series techniques is that the data is stationary. I.e.
the property of mean-variance and autocorrelation structure does not change over
time.
ii. Non-Stationery
Two most important types of non-stationarity in economic time series
data are trends and breaks. Which are explained below:
iii. TRENDS:
A trend is a persistent long-term movement of a variable over time. A
time series fluctuates around its trend.
a- Deterministic trend
b- Stochastic trend
iv. BREAKS
structural break is a structural change in an unexpected shift in a time
A
series that can lead to huge forecasting error and unreliability of the
model in
general.
v. Heteroscedasticity
The error terms are assumed to have a constant variance in one variable
over a different period of time but if the variance depends upon the value
of X then heteroscedasticity exists.
vi. Auto-Correlation
This is the essence of autocorrelation. The errors follow a pattern,
showing that something is wrong with the regression model. If this
assumption is violated and the error term observations are correlated,
autocorrelation is present. Autocorrelation is a common problem in
time-series regressions.
1
Question b: How do you structure your analysis?
Non-stationary time series will have time-varying mean or a time-varying variance
or both. Therefore need to find out the time series is stationary and also its
transformation. The two test of stationarity are as following:
i. Graphical analysis: Its a panel data graph which contains the data
for foreign exchange rate of US-Dollar and GBP. On X-axis the time
is given from 1980 to 2010 with the lag of 10 years. The relationship
between them remains never consistent. Sometimes it’s up and
other time its down. Which proves it to be non-stationary.
non-stationary.
Non Stationary: Properties of mean, variance and autocorrelation
structure change over time.
AutoCorrelation: Error terms observation are correlated then there is a
problem of autocorrelation.
Heteroskedasticity: Problem arises due to the unequal variance.
ii. Correlogram: The data need to be plotted with a correlogram to
check for stationarity. Usually, the first difference of data shows
stationarity.AC is sample autocorrelation function solid vertical line
represents zero axis. Observation above the line are positive values
and those below the line are negative, for a purely white noise
process the AC at various lags. However around zero then time
series stationary.
1- The choice of lag length
Compute ACF up to one third to one quarter the length of time
series.
2- Testing for a break
Chow breakpoint and Quandt Andrew breakpoint test lag length
selection criterion.
The graph represents panel data series because the X axis has period of time while Y
axis has exchange rate between USD and GBP. Over all, this graph demonstrates a
downward trend with many fluctuation. This is because the USD-GBP exchange rate
started from around 2.3 and ended approximately at 1.3. This is also called
non-stationary. Even it can be seen that there are many structural breaks due to the
2
recession and financial crisis. For instants, the first structural break was in 1975 and the
percentage is almost 1.7 and afterward it reached at around 2.5 in the year of 1980.
However, the exchange rate of USD-GBP sharply dropped to 1.5 in 1985. It is also can
be seen that the USD-GBP exchange data had much fluctuation from 1985 to end of the
period.
Thus, it can be said that this graph has heteroscedasticity and autocorrelation problem
due to time lags.
Question C: Do you suspect structural breaks in the time series? If so, approximately
when?
Answer: It’s suspected in 1975, 1985, 2001 and 2009.
Question D: What effect do your suspected structural breaks have on your analysis?
Answer: Structural breaks shows that recession has taken place in the economy in these
specific time periods. Inflation might go down and financial crisis happened.
Question 2:
Question A: What kind of diagram is the above? Could you include additional
information within this graph? Which additional information could be of interest?
Answer: The above diagram is scatter plot. The diagram shows the regression findings for
the sales revenue from sales of the Microsoft Company. We can show standard error of the
regression and t-value. This information could be of the interest because it shows the
significance of the model.
Question B: Discuss the presentation and quality of the above relationship briefly.
Please support your answer with some arguments.
Answer: The above graph presents the overview of the sales revenue of the Microsoft
Company and its relation to the time period. It keeps increasing with the passage of time.
The R-Square value shows how deep the relationship between sales and time is. 0.87
R-square explains that sales will increase by this much percentage in every 2 years’ time
span.
3
Question C: Might heteroscedasticity pose a problem in the above model? Can you also
describe it graphically? If the data is heteroscedastic, what problem do we run into with
OLS? What is the solution to this problem?
Heteroscedastic data tends to follow a cone shape on a scatter graph.
Why do we care about whether or not data is heteroscedastic? Most of the time in
statistics, we don’t care. But if you’re running any kind of regression analysis,
having data that shows heteroscedasticity can ruin your results (at the very least, it
will give you biased coefficients). Therefore, you’ll want to check to make sure your
data doesn’t have this condition. One way to check is to make a scatter graph
(which is always a good idea when you’re running regression anyway). If your graph
has a rough cone shape (like the one above), you’re probably dealing with
heteroscedasticity. You can still run regression analysis, but you won’t get decent
results.
Answer: Yes, heteroscedasticity can create a problem in the above model.
Heteroscedasticity refers to the circumstance in which the variability of a variable is
unequal across the range of values of a second variable that predicts it.
It can cause following problems:
1- No independent variable is correlated with the error term.
2- Lack of bias does not guarantee accurate coefficient estimates since
heteroscedasticity increases the variance of the estimates but the distribution of
the estimates is still centred around.
4
3- It causes the OLS to underestimate the variance and standard error of the
coefficient. Due to this standard error and coefficient cannot be reliable.
Heteroskedasticity makes your hypothesis test unreliable because of too large and
too small standard errors.
Solutions:
1- Weighted least square:
If the errors are hetero than old is no longer BLUE (best Linear unbiased
estimator). Therefore the estimator should be constructed with small variance
than his estimator. This method is called WLS.
2- Heteroskedasticity corrected standard errors:
It focuses on estimating standard error without changing the estimates of the
slope coefficients. This standard error can be used in the t-test & another
hypothesis test without errors of inference caused by hetero.
Question D: What about the forecasting quality of the above model? Please take a look
at actual sales revenue for 2015.
Answer D: If we take a look at the actual sales of 2015 then the quality of
the analysis is better than average model. Just having a look at the value of
R-square which is 0.87, it shows that model is explained upto 87% and has
also met the significance level.
Question E: How could you improve this analysis?
Answer E: We can improve the model by working on most recent data like
in previous model it was worked till 2005 which is 11 years back from now.
It is always recommended in research to work on most recent data so
that we get the latest results which will help us in improving the analysis
and model.
Question 3:
what are the various threats to the internal validity of a multiple regression model?
Answer:
Internal validity: A statistical analysis is internally valid if the statistical inferences
about causal effects are valid for the population being studied. The various threats to
internal validity of a multiple regression model are below:
5
1. Omitted Variable Bias: OVB arises when a variable that both determines Y and its
correlated with one or more of the included regressors is omitted from the
regression. The bias persists in larger samples so that the OLS estimator is
inconsistent.
2. Functional Form misspecification (FFM): FFM arises when the functional form of the
estimated regressions function differ from the function form of population
regression function. If the FF is mis-specified then the estimator of the partial effect
of a change in one of the variables will be biased.
3. Error-in-Variables: EIV in the OLS estimator arises when an independent variable is
measured imprecisely. The bias depends on the nature of the measurement error
and persists even if the sample size is large.
4. Sample Selection: SS bias arises when a selection process influences the availability
of data and that process is related to the dependent variable. Sample selection
includes correlation between one or more regressions and the error term, leading
to bias and inconsistency of the OLS estimator.
5. Simultaneous Causality Bias: SCB also called simultaneous equations bias arises in a
regression of Y on X. When the causal link of interest from X to Y, there is a causal
link from Y to X. this reverse causality makes X correlated with the error term in the
population regression of interest.
Lastly, incorrect calculation of SE also poses threat to internal validity.
Question 4:
Question A: What kind of regression models do the authors present?
Answer: The author has presented the multiple regression model for the survey on life
satisfaction and longevity based on the German Socio-Economic Panel of 15,000 people
showed that 17 per cent of the respondents surveyed in 1984 died between 1984 and 2007. It
is showing the findings of health indicators which affects human life.
Question B: Which model do you prefer?
6
Answer: I prefer the fourth model where independent variables are life satisfaction and
doctor visits on annual basis. Because it has highest R-square and lowest standard error.
This shows the importance of the model amongst all of the rest models.
Question C: Chronic illness is a dummy variable. Please define this term?
Answer: The chronic illness is used as qualitative variable as 0,1 is used for the binary
system. The people are categorized into two classes. First, those who have a chronic illness
and second are those who do not have a chronic illness.
d. Explain these findings to a friend, who has no idea of econometrics.
Answer: A survey was conducted in 1984 in which 15000 people responded. 17% of the
people who responded died between 1984 to 2007. To see the effect of people health, life
and other health indicators on the death of the people a test is applied known as
regression. The findings show that the people who already have poor life satisfaction, poor
health and they were suffering from some chronic illness, only they died during this period.
Whereas the people who keep visiting a doctor regularly and have good life satisfaction
they did not die during this period. (1984-2007)
Question 5:
Question A: How do you formally generate the time series CPI-Inflation (INFL)?
Answer:
Genr | CPIUS = log (CPIUS)
Genr INFL = 400 (|CPIUS - |CPIUS (-1))
Genr DINFL = INFL – INFL (-1)
Genr year DINFL = INFL – INFL (-1)
Question B: The above represents the correlogram of the rate of inflation. What is
your assessment based on this output?
Answer:
The table conveys how strong is this serial correlation. The term autocorrelation is also
known as “serial correlation” or “lagged correlation” which is used to describe the
relationship between observations of the same variable over specific periods of time.
If a variable’s serial correlation is measured to be zero then it means there is no
correlation and each of the observations is independent of one another. Conversely. If
a variable’s serial correlation skews toward one, it means that the observations are
7
serially correlated and that future observations are affected by past values. Essentially,
a variable that is serially correlated has a pattern and is not random.
Here, the table has 808 observations from January 1948 to April 2016 with 10 lags. It is
strongly serially correlated as it’s AC value is nearly one and follow a downward trend
and has no fluctuation.
Question 6:
Question A: How do the above projections of the two monetary institutions –
concerning the inflation rate - differ from each other? (4 CP)
Answer A:
This forecast is represented by the darkest line in the middle of the graph. The
gradually spreading fan depicts the growth in risks in the central view, highlighting the
fact that the degree of uncertainty (forecast error) grows over time. Two equally
coloured bands, below and above the central projection represent the extension, of the
interval in which the future inflation value will be found, by a size corresponding to the
increase in probability by 10% on the preceding interval – confidence intervals. The
outermost two bands represent the increase in reliability to the final, 90% level. This
means that according to the forecast in Graph 1, inflation at the end of 2019 will, with
90% probability lie within the range 0.5% to 4.9 %.
First graph has used the previous data given by the National financial institution and in the
second graph the central tendency, median and range of the projections is used. The bank
of England has forecast till 2019 and US federal reserve has worked till 2018.
Question B: Which one of these presentations do you personally prefer? Why?
Answer B:
I would prefer the first presentation because it consist on most recent data taken by
National financial institution. It also covers more area for projection about inflation and
its projection looks realistic than the 2nd presentation.
8
Question C: In June 2016 - before the British referendum on exiting the EU – the
IMF published a country report on the macroeconomic risks for the UK. How do
the three different scenarios compare with the GDP projections of the Bank of
England?
Answer C:
On the basis of these scenarios the limited scenario represents the referendum results
of the people who wanted to stay in the EU and it’s result will be that GDP of England
will decrease by almost 5 % and the adverse scenarios represent that if England exit
from EU then its GDP and economy will fall down by almost 15 %.
Question D: What are the underlying risks in general, when economists
undertake forecasts?
Answer D:
i. GDP
ii. Unemployment Rate
iii. Exchange rate
iv. Inflation rate
v. Economic recession
vi. Past data
vii. Interest rate
viii. Other Economic indicators.
Question 7:
Question: For the US states we wanted to determine the factors that influence
the traffic fatality rate during the 1980ies. Please analyze the following five
models. Let us know which one of the models you prefer. Support your answer
with valid arguments.
After analyzing all the model one by one, in which I analyze the most important
term i.e value of Adjusted R2, Standard
error, P-value and the final numbers of the
independent variable. In the end, I came to prefer model 5. Because it adjusted R2
contains 92% it means model explains 92% variation in the dependent variable (Y)
due to the in the dependent variable (X), thus 8% remain unexplained but it still has
a high explanatory power. Moreover, it p-value has contained the minimum
9
numeric number and the valid argument is their standard errors are very low it
means it contains very low variation in errors and the last reason is it carries more
independent variables as compared to the other. In conclusion, I would the model
which contain maximum number adjusted R2, containing the low figure of standard
errors as well as P-Value.
In addition:
· There is a negative relationship between beer tax and fatality rate i.e if beer tax
increases it should lead to drop in the fatality rate.
· There is negative relationship unemployment rate and fatality rate i.e if higher
the unemployment rate associated with fewer fatalities; an increase in the
unemployment rate by 1% point is estimated to reduce traffic fatalities by (-7.70)
death per 10,000.
· A positive relationship can be seen between per capita income and fatality rate.
Income is high the consumption of alcohol is high therefore the FR remains
high.
The strength of this analysis is that including state and fixed time effects solves the
threats of omitted variable bias arising from unobserved variables that either do not
change over time (Cultural attitudes) or do not vary across the state.
Question 8:
After graduating, you are employed as an economist with the European central
bank. One of your first tasks is to analyze the inflation potential of the US economy.
First of all, you visualize the monthly rate of US inflation.
Question A: Is there an economic relationship between recessions and rates of
inflation?
Yes, there is a negative linear association between recession and inflation rate. The
recession is also known as economic death. Where interest rate goes higher. The
recession also decreases demands of goods, so when recession decreases then
inflation rate also increases. While inflation increases in demand, interest rate and
prices of goods.
Question B: How do assess the above information?
After analyzing all three models, I would recommend the last that is the third
model because it carries the most suitable P-Value, here we reject the null
hypothesis and is considered as a most significant model because other two models
contain high P-Value as compared to the third model. In the first model, we accept
H0 because of its P-Value i.e 10.1%.
10
Question C: Based on the following tables- what is the correct lag structure of an
autoregressive model for the monthly rate of US- inflation? Give some brief
arguments?
After assessing and analyzing all the lags structure, I would recommend the 3 lag
structure due to its lower figures of Schwarz criterion (BIC) and Akaike IC and
higher the value of adjusted R2 i.e 0.165 which has more explanatory power as
compared to other. On the other hand value of R2 is also higher which is 16.7%. My
answer is based due to all above reasons.
Econometrics February 2nd, 2015
Question 1
Question A: What are the general problems that you might run into? (4 CP)
Answer: The problems that can be faced are as following:
vii. Stationary
A common assumption in many time series techniques is that the data is stationary. I.e.
the property of mean-variance and autocorrelation structure does not change over
time.
viii. Non-Stationary
Two most important types of non-stationarity in economic time series
data are trends and breaks. Which are explained below:
ix. TRENDS:
A trend is a persistent long-term movement of a variable over time. A
time series fluctuates around its trend.
c- Deterministic trend
d- Stochastic trend
x. BREAKS
A structural break is a structural change in an unexpected shift
in a time series that can lead to huge forecasting error and
unreliability of the model in general.
xi. Heteroscedasticity
The error terms are assumed to have a constant variance in one variable
over a different period of time but if the variance depends upon the value
of X then heteroscedasticity exists.
xii. Auto-Correlation
This is the essence of autocorrelation. The errors follow a pattern,
showing that something is wrong with the regression model. If this
11
assumption is violated and the error term observations are correlated,
autocorrelation is present. Autocorrelation is a common problem in
time-series regressions.
Question b: H ow do you structure your analysis?
Non-stationary time series will have a time-varying mean or a time-varying
variance or both. Therefore need to find out the time series is stationary and also
its transformation. The two test of stationarity are as following:
iii. Graphical analysis: Plot the time series which gives clue
about its nature. E.g GBP shows an upward trend suggesting mean
of GBP is changing which means its non-stationary.
iv. Correlogram: The data need to be plotted with a correlogram
to check for stationarity. Usually, the first difference of data shows
stationarity.AC is sample autocorrelation function solid vertical line
represents zero axes. Observation above the line are positive values
and those below the line are negative, for a purely white noise
process the AC at various lags. However around zero then time
series stationary.
3- The choice of lag length
Compute ACF up to one third to one quarter the length of time
series.
4- Testing for a break
Chow breakpoint and Quandt Andrew breakpoint test lag length
selection criterion.
Question C: What kind of information is provided in Table 1? How could you make use of
the above information when building a forecasting model?
Answer:
Direction to the No of F. Value P. Value Decision
Causality lags
12
DIFFERUSEU does not 2 14.96 0.000000 Reject the Null
DIFFSTLFSI 4 hypothesis
Based on the probability values, it is clear that the hypothesis STLFSI does not Granger
causes DEXUSEU cannot be rejected. In contrast, the hypothesis of DEXUSEU does not
Granger cause STLFSI can be rejected. Therefore, it appears that Granger causality
runs one way.
Question D: Do you suspect structural breaks in the time series? If so, approximately
when?
Answer: Yes, I suspect that there are structural breaks in the time series which is in
2009 and 2011.
Question 2
Question A: List the most likely recommendations of Swabisch (2014) for improving
the representation of the data.
Answer A: Recommendations:
i. We can add R2 and adjusted R2
ii. In addition, we can add standard errors and T-value
iii. Regression model/Equation also would be effective in this case.
Question B: What is the relationship between education and a country’s comparative
advantage in office machines? How strong is the relationship - please, estimate the
strength of it
Answer B:
The relationship is positive and it is also quite evident that there is strong positive
linear relationship between education and a country’s comparative advantage in office
machines but the relationship is not perfect. This is because the regression line shows
that it is not significant upward straight line. However, it represents a moderate
relationship between variables.
Question C: In the text, for additional countries are mentioned to have “larger
positive residuals”. What does this term actually mean?
Answer C:
13
The countries are mentioned as additional counties in the text (Thailand, Malaysia,
Costa Rica) have higher positive errors. This term means that these countries are above
the estimated value of the model and these countries are far better than the
estimation.
Question D: Assuming that you want to econometrically analyse this relationship,
how are going to structure your work in general? Briefly mention the main tasks
you’d be willing to undertake.
Answer D:
I would analyse with regression line. In addition I would like to make regression
equation.
Question E: How does Argentina compare with China?
Answer E:
Argentina: Negative Error.
China: Positive Error
Question 3
Question A: What kind of regression models do the IMF economists present ?
Answer A: The IMF economists present multiple regression model.
Question B: Which model do you prefer ?
Answer B: I would prefer model 3 based on adjusted R Squared and standard error.
This is because the model 3 has the highest adjusted R2 with 0.68% and 32% is still
unexplained by it than others model. Model 3 and model 4 have almost the same
Standard errors and same number of observations, but the model 4 has less
14
standard error than model 3 in some cases. Moreover, model 3 does not have the
variables like export, GDP but still have a high explanatory power as contrast to
other regression models. After discussing model 3 I also want to give preference to
model 4 on model 3 because its adjusted R2 is stood at 2nd positions with 0.66%, so
I can't ignore the percentage of adjusted R2 of model 4 because it explains 66% and
33% is still unexplained by model 4.
Question C: Do you find any of the results or model specifications surprising?
Answer C: I want to discuss some surprising specifications which I observe such as, model
1 & model 3 have not economic variables like Exports and GDP and its considered as very
important economic variables as a Economic indicators. Last but not the least model 1 &
model 2 have same numbers of countries and number of observations. Same case with
model 3 and model 4. Moreover, standard errors of model 1 & 2 almost same figures and
same case with the model 3 & model 4 as well.
Question 4
Question A: How do the above projections of the three monetary institutions –
concerning the inflation rate - differ from each other?
15
Question 5
Question A: How would you analyse the determinants of the pay of internal medicine
practitioners in 2014?
Answer A:
I think the following determinants have significant impact on the pay of internal
medicine practitioners.
1. Location
2. Speciality
3. Experience
4. Facilities offered
5. Instruments Cost
6. Appointment fees
7. Years of Education
8. Size of the clinic.
Question B: Do you have any suggestions for possible determinants
Answer B: The following determinants have been suggested for possible determinants:
1. Location
2. Experience
3. Size of clinic
4. Facilities Offered
Question 6: Please explain heteroskedasticity to us. Can you also describe it graphically?
If the data is heteroscedastic, what problem do we run into with OLS? What is the
solution to this problem?
Answer 6:
In simple terms, heteroscedasticity is any set of data that isn’t homoscedastic. More technically,
it refers to data with unequal variability (scatter) across a set of second, predictor variables.
16
Heteroscedastic data tends to follow a cone shape on a scatter graph.
Why do we care about whether or not data is heteroscedastic? Most of the time in
statistics, we don’t care. But if you’re running any kind of regression analysis, having data
that shows heteroscedasticity can ruin your results (at the very least, it will give you biased
coefficients). Therefore, you’ll want to check to make sure your data doesn’t have this
condition. One way to check is to make a scatter graph (which is always a good idea when
you’re running regression anyway). If your graph has a rough cone shape (like the one
above), you’re probably dealing with heteroscedasticity. You can still run regression
analysis, but you won’t get decent results.
Severe heteroscedastic data can give you a variety of problems:
● OLS will not give you the estimator w ith the smallest variance (i.e. your estimators
will not be useful).
● Significance tests will run either too high or too low.
● Standard errors will be biased, along with their corresponding test statistics and
confidence intervals.
17
If your data is heteroscedastic, it would be inadvisable to run regression on the data as is.
There are a couple of things you can try if you need to run regression:
1. Give data that produces a large scatter less weight.
2. Transform the Y variable to achieve homoscedasticity. For example, use the
Box-Cox normality plot to transform the data.
Question 7
For the US/EUR exchange rate of 1999-2014 (graph in Exercise 1) you still aim to build a
forecasting model. The first step is to estimate the best AR model for the exchange rate.
Question A: Which of the following AR specifications do you prefer? The variable
DEXUSEU is the original US/EUR exchange rate (no transformation). State your
arguments briefly.
ANSWER: I prefer the third model because of the following reasons:
I- The P-value of this model is just perfect and it means the model is significant.
II- If we compare this model to others then it’s standard error is also low than others.
Comparing it with the first model then this model is better in terms of number of variables.
It consist of more number of variables than the first model. And has almost the same
results.
III- Adjusted R-square is also the best in this model comparing to other models.
Question B: In addition, the following graphical representation of the data as well as
some test results are provided. What did we do, and how do you interpret these results?
Answer:
Genr | CPIUS = log (CPIUS)
Genr INFL = 400 (|CPIUS - |CPIUS (-1))
Genr DINFL = INFL – INFL (-1)
Genr year DINFL = INFL – INFL (-1)
Interpretation Results
The table conveys how strong is this serial correlation. The term autocorrelation is also
known as “serial correlation” or “lagged correlation” which is used to describe the
relationship between observations of the same variable over specific periods of time.
If a variable’s serial correlation is measured to be zero then it means there is no
correlation and each of the observations is independent of one another. Conversely. If
18
a variable’s serial correlation skews toward one, it means that the observations are
serially correlated and that future observations are affected by past values. Essentially,
a variable that is serially correlated has a pattern and is not random.
Here, the table has 838 observations from January 1999 to January 2015 with 10 lags. It
is strongly serially correlated as it’s AC value is nearly one and follow a downward trend
and has no fluctuation.
Question C: Econometrically, what does “SIC” stand for? (4 CP)
Answer C:
SIC means Schwarz Information Criterion. Indeed, it is a criterion to select the best model
among a finite set of models. The core concept of this model is that the lowest SIC should
be preferred model. the SIC offers a potentially useful combination approach, and
that further investigation is warranted. For example, combination forecasts from a
simple averaging approach Monte Carlo Experiments‐dominate SIC combination
forecasts less than 25% of the time in most cases, while other ‘standard’
combination approaches fare even worse.
Question 8: For the 48 US states we wanted to determine the factors that influence the
traffic fatality rate during the 1980ies. Please analyze the following five models. Let us
know which one of the models you prefer. Support your best choice by valid arguments.
Answer 8:
After analyzing all the model one by one, in which I analyze the most important
term i.e value of Adjusted R2, Standard
error, P-value and the final numbers of the
independent variable. In the end, I came to prefer model 5. Because it adjusted R2
contains 92% it means model explains 92% variation in the dependent variable (Y)
due to the in the dependent variable (X), thus 18% remain unexplained but it still
has a high explanatory power. Moreover, it p-value has contained the minimum
numeric number and the valid argument is their standard errors are very low it
means it contains very low variation in errors and the last reason is it carries more
independent variables as compared to the other. In conclusion, I would the model
which contain maximum number adjusted R2, containing the low figure of standard
errors as well as P-Value.
In addition:
19
· There is a negative relationship between beer tax and fatality rate i.e if beer tax
increases it should lead to drop in the fatality rate.
· There is negative relationship unemployment rate and fatality rate i.e if higher
the unemployment rate associated with fewer fatalities; an increase in the
unemployment rate by 1% point is estimated to reduce traffic fatalities by (-7.70)
death per 10,000.
· A positive relationship can be seen between per capita income and fatality rate.
Income is high the consumption of alcohol is high therefore the FR remains
high.
The strength of this analysis is that including state and fixed time effects solves the
threats of omitted variable bias arising from unobserved variables that either do not
change over time (Cultural attitudes)or do not vary across the state.
Question 9:
Question A:
Which of the following model specification for forecasting the US/EUR exchange rate do
you prefer ? (4 CP)
Answer A:
I prefer the first model because of the P-value. The P-value of others model shows that
most of the variables are above of the benchmark which makes it non-significant. Even, the
value difference of the adjusted R squared is not considerable, whereas, the Standard Error
is also the lowest to compare with others model. Hence, I would prefer the first model.
Paper 1 Feb 2013
Q1- A) What are the general problems that you might run into?
Answer: The problems that can be faced are as following:
i. Stationary
A common assumption in many time series techniques is that the data is stationary. I.e.
the property of mean-variance and autocorrelation structure does not change over
time.
ii. Non-Stationery
Two most important types of non-stationarity in economic time series
data are trends and breaks. Which are explained below:
20
iii. TRENDS:
A trend is a persistent long-term movement of a variable over time. A
time series fluctuates around its trend.
a- Deterministic trend
b- Stochastic trend
iv. BREAKS
structural break is a structural change in an unexpected shift in a time
A
series that can lead to huge forecasting error and unreliability of the
model in
general.
v. Heteroscedasticity
The error terms are assumed to have a constant variance in one variable
over a different period of time but if the variance depends upon the value
of X then heteroscedasticity exists.
vi. Auto-Correlation
This is the essence of autocorrelation. The errors follow a pattern,
showing that something is wrong with the regression model. If this
assumption is violated and the error term observations are correlated,
autocorrelation is present. Autocorrelation is a common problem in
time-series regressions.
B) How do you structure your analysis?
Answer) Non-stationary time series will have time-varying mean or a time-varying
variance or both. Therefore need to find out the time series is stationary and also
its transformation. The two test of stationarity are as following:
i. Graphical analysis: Plot the time series which gives clue about its
nature. E.g GBP shows an upward trend suggesting mean of GBP is
changing which means its non-stationary.
ii. Correlogram: The data need to be plotted with a correlogram to
check for stationarity. Usually, the first difference of data shows
stationarity.AC is sample autocorrelation function solid vertical line
represents zero axes. Observation above the line are positive values
and those below the line are negative, for a purely white noise
process the AC at various lags. However around zero then time
series stationary.
1- The choice of lag length
21
Compute ACF up to one third to one quarter the length of time
series.
2- Testing for a break
Chow breakpoint and Quandt Andrew breakpoint test lag length
selection criterion.
c) What kind of information is provided in table 1? How could you make use of
the above information when building a forecasting model?
Direction to the No of F. Value P. Value Decision
Causality lags
22
Answer) It is a scatter plot with independent variable (X) Natural rate of
unemployment and dependent variable (Y) vacancy rate and line of
regression shows a moderate possitive relationship, moreover the dots
represents the variation in the figure.
B) What about the quality of the information given? Which kind of
information would you have provided the reader with?
Answer) This graph has given the value of R square only whereas for
quality analysis, we need P-Value, Standard Error and adjusted R
square. I
C) Are you expecting a change in the relationship of the vacancy rate
& natural rate of unemployment prior versus post the 2007 US
recession? Support your statement with some arguments?
Answer) If we see these variables before the recession then we see
there is weak relationship b/w variables. After recession
unemployment rate and vacancy rate are both increasing. But in 6.5%
unemployment rate we see no vacancy rate which shows there will be
no relationship,In recession period b/w there variables.
D) For the 2 quater of 2012, the congressional budget office estimated
the natural rate of unemployment to be at 5.36%. The US job vacancy
rate being 2.6%.How goods is the fit of the above mentioned model?
Answer) According to data only 33% of the model is explained and 67%
remains unexplained. So according to above model there are many
other factors who affects vacancy rate and natural unemployment rate.
Q3- A) How do you asses the data presented in the graph? Is there a
difference in perception depending on the religious background of
each country?
23
Answer) According to graph the believers in hell and heaven are almost
equal. In Pakistan, Turkey, saudi Arabia and Egypt. Whereas almost all
the non-catholics they believe in the heaven and the crime rates in
countries, who have believe in heaven they have more crimes rates
compare to others.
C) By calculating the so-called “Rate of Belief” as dependent variable,
do the auhors get rid of the problem of multicollinearity?
Answer) Solution to get rid of the Multicollinearity problem;
1- To remove the independent variable which is highly correlated with
each other.
2- Collecting more data can solve the issue of multicollinearity
(Collection of Insufficient Data)
3- Dummy variables may be incorrectly used (use dummy variables
correctly)
Q5-A) How do you assess the quality of the model 1?
Answer) Quality of this model is not so strong with 0.0224 i.e only
2.24%, further more it has not so much explanatory power, but in terms
of standard errors of regression it showing a good figure with avery
minor errors in it.
D) What is the idea of estimating model 2-5? Explain briefly?
Answer) The idea behind estimating model 2-5 is seeing a exchange
rate EURO to USD by using the least squares method with multiple
regression model. Moreover, we are also interested in estimating the
adjusted R square (Explanatory power) and last but not least is Standard
Errors of regression.
24
E) Which one of the models do would you use for solving the task
asking in D?
Answer) According to my opinion, i would like to use the model 2 for
solving the task, because it contains high explanatory power i.e 18% as
contrast to other model. Moreover, it explains 18% and 82% is remain
unexplains but better than nothing and it contains maximum numbers
of independent variables.
Q6) Which is your best model? Describe the structure of the model
you choose?
Answer) According to my knowledge, i would prefer and go with the
model 1 because it contains more numbers of independent variables
almost 9 because more number of variables and almost covered every
sector i.e education to gender provide good result and it will increase
the value of Mcfadden R-Square. Furthermore, the most important
reason of selection is the percentage of Mcfadden R- squared i.e 40%
which is highest as compared to another because it has more
explanatory power with 40% and 60% remain unexplained. Moreover, it
contains less number standard regression in terms of percentage i.e
22%. Model 1 contains more data and information as compared to other
models.
Q7) Which one of the models you prefer? Support your best choice by
valid arguments?
Answer) On the basis of adjusted R square value i would like to prefer
model 5, because it explains and contain 92% of the whole model. It has
containing more and high explanatory power i.e 92% and 8% is remain
unexplained by it. Second reason is to prefer this model because it
25
contains more number of independent variables as contrast to other
regression model. Third reason, it contains very significant p-Values.
After analyzing all the model one by one, in which I analyze the most important
term i.e value of Adjusted R2, Standard
error, P-value and the final numbers of the
independent variable. In the end, I came to prefer model 5. Because it adjusted R2
contains 92% it means model explains 92% variation in the dependent variable (Y)
due to the change in the independent variable (X), thus 8% remain unexplained but
it still has a high explanatory power. Moreover, it p-value has contained the
minimum numeric number and the valid argument is their standard errors are very
low it means it contains very low variation in errors and the last reason is it carries
more independent variables as compared to the other. In conclusion, I would prefer
the model which contain maximum number of adjusted R2, containing the low
figure of standard errors as well as P-Value.
In addition:
· There is a negative relationship between beer tax and fatality rate i.e if beer tax
increases it should lead to drop in the fatality rate.
· There is negative relationship unemployment rate and fatality rate i.e if higher
the unemployment rate associated with fewer fatalities; an increase in the
unemployment rate by 1% point is estimated to reduce traffic fatalities by (-7.70)
death per 10,000.
· A positive relationship can be seen between per capita income and fatality rate.
Income is high the consumption of alcohol is high therefore the FR remains
high.
The strength of this analysis is that including state and fixed time effects solves the
threats of omitted variable bias arising from unobserved variables that either do not
change over time (Cultural attitudes) or do not vary across the state.
Q8-B) How do these four tests statistics differ from each other?
Answer) Below four test statistics are differs on the basis of adjusted R square
value, P-values, standard errors of regressions. Moreover, these three components
are most important elements to create difference between the regression model.
After that numbers of independent variables are also matters alot which is present
in the regression model, In mostly cases more numbers of independent variable
provide more accurate results. I think, on the basis of these four important
elements all four regression model are creating difference between each other.
C) What is your interpretation of the analysis of the four tests below?
26
Answer) I would like to interpret all four model on the basis of important elements
which is considered very important in economics and finance, test 1 is containing
only 0.0016% value of adjusted R square which is very low and 84% remain
unexplained by it and Test 2 has containing 0.0002% which is considered very bad
for explaining power. Test 3 has containing 49% which is considered moderate for
explaining the variability in dependent variable. In test 4 has containing same
amount of variability as like test 3 i.e 49%. Furthermore, value of standard error in
test 1 is 0.0079%, in test 2 0.0079%, in test 3 0.0079% and in test4 with 0.0079%
and i also considered P-Value for interpretation of the analysis of the below test.
Question 2017
Question 1
A. What kind of diagram is this chart – anything special about it?
ANSWER: It’s the scatter plot which is showing relationship between labour share
and higher income inequality (Gini coefficients). The special thing about is that it’s
showing both gross level and net level results at the same time. There is negative
relation between the variables because the graph shows downward regression line.
The value of adjusted R-square 13% (gross) & 10% (net) shows its very poor relation
between them.
B. How would you measure the strength of the relationship between a country’s
labor share (in percent GDP) and Gini coefficient (49 countries) (3 CP) ?
ANSWER: The relationship between these variables can be measured with
R-square value and the regression lines in the graph. The regression line exhibits
the negative relationship between the variables. Because it’s going to the opposite
direction than the usual direction.
C. Of what quality are the estimated relationships? (4 CP)
Answer: The relationships between variables are always measured by R-square
which is known as correlation. So in this graph we shall also do the same as the
benchmark. Two levels of higher income is taken into consideration for finding out
the inequality between labour share. Those levels are gross level and net level. In
the gross level the relationship between variable is bit strong than net level but it is
still negative and weak relationship between variable.
27
D. Might heteroscedasticity pose a problem in the above model? Can you also
describe it graphically? If the data is heteroscedasticity, what problem do we run
into with OLS? What is the solution to this problem?
Answer:
Yes there is a problem of heteroscedasticity because we see the errors around the
regression line, they are variant with the passage of time. And if the errors are not
constant around the regression line then it shows there is some heteroscedasticity.
If you’re running any kind of regression analysis, having data that shows
heteroscedasticity can ruin your results (at the very least, it will give you biased
coefficients). One way to check is to make a scatter graph (which is always a good
idea when you’re running regression anyway). If your graph has a rough cone shape
(like the one above), you’re probably dealing with heteroscedasticity. One way to
check is to make a scatter graph (which is always a good idea when you’re running
regression anyway). If your graph has a rough cone shape (like the one above),
you’re probably dealing with heteroscedasticity.
In simple terms, heteroscedasticity is any set of data that isn’t homoscedastic. More
technically, it refers to data with unequal variability (scatter) across a set of second,
predictor variables.
Why do we care about whether or not data is heteroscedastic? Most of the time in
statistics, we don’t care. But if you’re running any kind of regression analysis,
having data that shows heteroscedasticity can ruin your results (at the very least, it
will give you biased coefficients). Therefore, you’ll want to check to make sure your
28
data doesn’t have this condition. One way to check is to make a scatter graph
(which is always a good idea when you’re running regression anyway). If your graph
has a rough cone shape (like the one above), you’re probably dealing with
heteroscedasticity. You can still run a regression analysis, but you won’t get decent
results.
Severe heteroscedastic data can give you a variety of problems:
● OLS will not give you the estimator with the smallest variance (i.e. your
estimators will not be useful).
● Significance tests will run either too high or too low.
● Standard errors will be biased, along with their corresponding test statistics
and confidence intervals.
If your data is heteroscedastic, it would be inadvisable to run a regression on the
data as is. There are a couple of things you can try if you need to run regression:
1. Give data that produces a large scatter less weight.
2. Transform the Y variable to achieve homoscedasticity. For example, use the
Box-Cox normality plot to transform the data.
Question 2
What are the various threats to the internal validity of a multiple regression
model?
Internal validity: A statistical analysis is internally valid if the statistical inferences about
causal effects are valid for the population being studied. The various threats to internal
validity of a multiple regression model are below:
1. Omitted Variable Bias: OVB arises when a variable that both determines Y and its
correlated with one or more of the included regressors is omitted from the regression.
The bias persists in larger samples so that the OLS estimator is inconsistent.
2. (Wrong) Functional Form misspecification (FFM): FFM arises when the functional form of
the estimated regressions function differ from the function form of population regression
29
function. If the FF is mis-speified then the estimator of the partial effect of a change in
one of the variables will be biased.
3. Error-in-Variables: EIV in the OLS estimator arises when an independent variable is
measured imprecisely. The bias depends on the nature of the measurement error and
persists even if the sample size is large.
4. Sample Selection: SS bias arises when a selection process influences the availability of
data and that process is related to the dependent variable. Sample selection includes
correlation between one or more regressions and the error term, leading to bias and
inconsistency of the OLS estimator.
5. Simultaneous Causality Bias: SCB also called simultaneous equations bias arises in a
regression of Y on X. When the causal link of interest from X to Y, there is a causal link
from Y to X. this reverse causality makes X correlated with the error term in the
population regression of interest.
Lastly, incorrect calculation of SE also poses threat to internal validity.
Question 3
Answer: This forecast is represented by the darkest line in the middle of the graph. The
gradually spreading fan depicts the growth in risks in the central view, highlighting the
fact that the degree of uncertainty (forecast error) grows over time. Two equally
coloured bands, below and above the central projection represent the extension, of the
interval in which the future inflation value will be found, by a size corresponding to the
increase in probability by 10% on the preceding interval – confidence intervals. The
outermost two bands represent the increase in reliability to the final, 90% level. This
means that according to the forecast in Graph 1, inflation at the end of 2020 will, with
90% probability lie within the range 1.7% to 5 %.
B) In june 2016- before the british referendum on exiting the EU- the IMF
published a country report on the macroeconomics risk for the UK. How do the
three different scenarios compare with the GDP projections of the most recent
report of the Bank of England?
30
Answer) On the basis of these scenarios the limited scenario represents the
referendum results of the people who wanted to stay in the EU and it’s result will be
that GDP of England will decrease by almost 5 % and the adverse scenarios represent
that if England exit from EU then its GDP and economy will fall down by almost 15 %.
C) What are the underlying risks in general, when economists undertake
forecasts?
Answer)
i. GDP
ii. Unemployment Rate
iii. Exchange rate
iv. Inflation rate
v. Economic recession
vi. Past data
vii. Interest rate
viii. Other Economic indicators.
Q4-A) What kind of diagram is this chart-anything special about it?
Answer) this is a line graph with time series. Figure shows the historical evolution
of the labor share and inequality in the United Kingdom, for which both series are
available for a long period.It indicates that labor shares were largely flat during the
first Industrial Revolution (usually referring to 1760–1820/1840), as early
19th-century mechanization was able to replace only a limited number of human
activities—it affected only some parts of the economy while increasing the demand
for labor complementary to the capital goods embodied in new technologies
(Mokyr 2002).
B) How would you measure the strength of the relationship between UK’S labor
share in percentage GDP and GINI coefficient?
Answer) Profit and capital shares (including net income) increased during the
1850s to 1870s at the expense of labor, as adoption of major labor-saving
technologies spread across the economy, including steam transportation, the large
scale manufacture of machine tools, and the use of machinery in steam-powered
factories. Labor shares initially increased during the Second Industrial Revolution
(1870–1914) as profits fell during the Long Depression (1873–96), in line with the
31
(countercyclical) behavior of labor shares during the recent global financial crisis.
In summary, technological progress during various episodes of industrialization
was associated with declines in labor shares during certain phases and for some
groups of workers—and with increases in inequality. Although the effects of
technology on these changes are difficult to quantify, the level of inequality at its
historical peak (typically around the late 19th to early 20th centuries in rich
countries) was considerably higher than it is today.
C) What challenges might you run into when trying to estimate the relationship?
Answer) Measures of inequality (based on social tables and housing wealth and tax
statistics) are more widely available for the earlier period and are likely to be
correlated with labor shares given that capital and land ownership were highly
concentrated. Moreover, there was likely less overlap at that time between capital
and labor income than there is today. Second, removing the relative importance of
various drivers is even more difficult for the historical episodes than for the more
recent period, as the evolution of labor shares may reflect not only technological
change, but also its interaction with other forces, such as increasing international
trade, the scarcity of labor, and policies and institutions.
D) What are your expectations for the quality of fit of your above relationship?
Answer) I expect that my analysis is good enough to understand the things which
were ignored while calculation of labour share and inequality. One can not decide
everything on the basis of one single thing for this much long period of time.
E) How could you improve your analysis?
Answer) I can improve my analysis by adding some important things. First of all i
will reduce the lag between the time period, then i will use relevant variables
according to updates/ innovation of technology and other things in industrial
world that can affect labour share.
Question 5
Which is your best model? Describe the structure of the model you chose.
32
Answer) According to the given data, i would prefer and go with the
model 1 because it contains more numbers of independent variables
almost 9 because more number of variables and almost covered every
sector i.e education to gender provide good result and it will increase
the value of Mcfadden R-Square. Furthermore, the most important
reason of selection is the percentage of Mcfadden R- squared i.e 40%
which is highest as compare to all others because it has more
explanatory power with 40% whereas 60% remains unexplained. And
this value also shows how strong is the relationship between the
variables. Moreover, it contains less number standard error in terms of
percentage i.e 22%. Model 1 contains more data and information as
compare to other models.
Q6) Please describe the graph above and the general problem that the
professional forecasters ran into?
Answer) This figure shows the real-time median forecast of the log of nonfarm
employment recorded by the Survey of Professional Forecasters in the quarters
leading up to and through the financial crisis. Even well after the crisis began and
real-time information about the collapse of the economy was available, these
forecasters consistently predicted a mild recession. Above diagram is a time series
chart which is very famous for forecasting, while from 2005 to almost 2008 the
number of jobs increasing upward hill, and followed a consistently downward trend
after 2008 to till 2010 and then very minor going upward.
This graph looks like Mother of All Forecast Errors. A small part of these errors is
due to revisions between preliminary and final data.but most of these errors, we
believe, represent a failure of forecasting models. Another open challenge lies in the
big data sphere. There is some evidence that the parametric
restrictions (or priors) that make these methods work discard potentially important
information.
33
Q-7: How do you interpret the above results presented in the regression model?
What about their Quality?
ANSWER: There is one dependent variable in the model and its relationship is
observed with different independent variables like Democrat, Republican, Daily
media time, social media, education, undecided, and Age. The positive values of the
independent variables shows that there is positive relationship with dependent
variable. And negative value shows that there is negative relationship between
variables. The stars over the S.E explains the significance level of the results. The
more stars means the model is more significant and one star means model is
comparatively less significant. And if there is no star then it means relationship is
not that good.
Question 8
Question 8: We would like to analyze the development of the US leading interest
rate (US Federal Funds Rate). For a first impression we ran the following three
tests. What did we (want to) find out? Please explain the nature of these tests.
Answer:
Direction to the Causality No F. Value P. Value Decision
of
lags
UNRATE does not FED FUNDS 2 15.3146 0.0000003 Reject the Null
hypothesis
FED FUNDS does not UNRATE 2 10.2276 0.00004 Reject the Null
hypothesis
UNRATE does not DFED FUND 2 18.8987 0.00000001 Reject the Null
hypothesis
DINFL does not DFED FUND 2 1.40041 0.2471 Accept the Null
hypothesis
34
DFED FUND does not DINFL 2 10.2295 0.00004 Reject the Null
hypothesis
Q9-A) Is there an economic relationship between recessions and the federal
funds rate?
Answer) The Federal Reserve has tools to control interest rates and federal fund
rates. During recession, the Fed usually tries to rates downward to stimulate the
economy. When a recession is on, people become borrowing money and are more
to save what they have, low demand for credit which means federal funds rate and
interest rates going downward.
B) What factors might influence the federal funds rate?
Answer) Following are the factors that influence the federal funds rate such as
inflation rate, unemployment rate, economic recession, interest rate, exchange rate
and the last but not least number of export also influence on federal fund rate.
C) How do you assess the information below?
Answer) The table conveys how strong is this serial correlation. The term
autocorrelation is also known as “serial correlation” or “lagged correlation” which is
used to describe the relationship between observations of the same variable over
specific periods of time.
If a variable’s serial correlation is measured to be zero then it means there is no
correlation and each of the observations is independent of one another. Conversely. If
a variable’s serial correlation skews toward one, it means that the observations are
serially correlated and that future observations are affected by past values. Essentially,
a variable that is serially correlated has a pattern and is not random.
35
Here, the table has 808 observations from January 1948 to April 2016 with 10 lags. It is
strongly serially correlated as it’s AC value is near to 1 and follows a downward trend
and has no fluctuation.
Extension of C Part (second table)
Answer) After analyzing both models, i would recommend the last i.e second model
because it contains the most suitable P-Value (0.0000) , here we reject the null
hypothesis and is considered as a most significant model because first model have
highly insignificant P-Value (0.1234). In second model we reject null hypothesis we
accept H0 because its P-Value is 0.1234.
D) Based on the following table- what is the correct lag structure of an autoregression
model for the monthly interest rate (Fed Fund)? Give some brief arguments?
Answer) After assessing and analyzing all the lag structure, i would recommend the 4 lag
because of the high value of adjusted R-Square i.e 0.170 (17%) which has more explanatory
power as compared to other lag structure with having a low Akaike AIC and schwarz
criterion (BIC). Because low value of AIC and BIC is always considered good value for lag
structure.
E) Which one of the models below do you prefer to base inflation forecasts on?
The dependent variable of all four models being Fed Fund rate and the method
least squares. Additional variables: Unemployment rate, CPI- Inflation?
Answer) After analysing below all regression models, i would like to preferred
model 5 because of its high Adjusted R-Square i.e 22% and it explains 22%
variability in dependent variable due to the independent variable and 78% remain
36
unexplained by it, but still better than other models. It contains low P-Values and
having a more numbers of independent variables.
37