807 2ready

Course Code: 807 Course Name: Basic Econometrics
Assignment no. 2
Question No. 1
Explain dummy variable trap. How annual salary (Y i) of a schoolteacher can be

modelled as a function of number of years of teaching experience (X 1) and
gender of teacher?
Answer:
The Dummy Variable Trap occurs when two or more dummy variables
created by one- hot encoding are highly correlated (multi-collinear). This means that
one variable can be predicted from the others, making it difficult to interpret
predicted coefficient variables in regression models. In other words, the individual
effect of the dummy variables on the prediction model cannot be interpreted well
because of multicollinearity.
Using the one-hot encoding method, a new dummy variable is created for
each categorical variable to represent the presence (1) or absence (0) of the
categorical variable. For example, if tree species is a categorical variable
made up of the values pine or oak, then tree species can be represented as a
dummy variable by converting each variable to a one-hot vector. This means that
a separate column is obtained for each category, where the first column
represents if the tree is pine and the second column represents if the tree is oak.
Each column will contain a 0 or 1 if the tree in question is of the column's species.
These two columns are multi-collinear since if a tree is pine, then we know it's
not oak and vice versa. Because a 1 in the pine column would mean a 0 in the oak
column, we can say xpine= 1– xoak. This results in two multi-collinear dummy
variables, so the dummy variable trap may occur in regression analysis.
To overcome the Dummy variable Trap, we drop one of the columns created
when the categorical variables were converted to dummy variables by one-hot
encoding. This can be done because the dummy variables include redundant
information.
As you can see, we were able to rewrite the regression equation using only xpine,
where the new coefficients to be predicted are (β0+β2) and (β1−β2). By dropping a
dummy variable column, we can avoid this trap.
This example shows two categories, but this can be expanded to any
number of categorical variables. In general, if we have p number of
Reg No: 0000104533 Course: MSc Economics

categories, we will use p−1 dummy variables. Dropping one dummy variable to
protect from the dummy variable trap.

Question No. 2
What is the use of Chow test? Describe the steps apply in the chow test.
A Chow test is a statistical test developed by economist Gregory Chow that is

used to test whether the coefficients in two different regression models on different
datasets are equal.
The Chow test is typically used in the field of econometrics with time series data to
determine if there is a structural break in the data at some point.
For example, consider the following scatterplot:

If we used one regression line to summarize the pattern in the data, it may
look like this:
And if we used two separate regression lines to summarize the pattern in

the data, it may look like this:

The Chow test allows us to test for whether or not the regression coefficients
of each regression line are equal. If the test determines that the coefficients
are not equal between the regression lines, this means there is significant
evidence that a structural break exists in the data. In other words, the pattern
in the data is significantly different before and after that structural break point.
When to use the Chow Test
The following examples illustrate situations where you may wish to perform a
Chow test:
 To determine if stock prices change at different rates before and after

an election.
 To determine if housing prices change before and after an interest

rate change.
 To determine if the average profit of public companies is different

before and after a new tax law is passed.
In each situation, we could use a Chow test to determine if there is a
structural break point in the data at a certain point in time.
Steps to Perform a Chow Test
We can use the following steps to perform a Chow test.
Step 1: Define the null and alternative hypotheses.
Suppose we fit the following regression model to our entire dataset:
 yt = a + bx1t + cxt2 + ε
Then suppose we split our data into two groups based on some structural
break point and fit the following regression models to each group:
 yt = a1 + b1x1t + c1xt2 + ε
 yt = a2 + b2x1t + c2xt2 + ε
We would use the following null and alternative hypotheses for the Chow test:
 Null (H0): a1 = a2, b1 = b2, and c1 = c2
 Alternative (HA): At least one of the comparisons in the Null is not

equal.
If we reject the null hypothesis, we have sufficient evidence to say that there
is a structural break point in the data and two regression lines can fit the data
better than one.
If we fail to reject the null hypothesis, we do not have sufficient evidence to
say that there is a structural break point in the data. In this case, we say
that the regression lines can be “pooled” into a single regression line that
represents the pattern in the data sufficiently well.
Step 2: Calculate the test statistic.
If we define the following terms:
 ST: The sum of squared residuals from the total data
 S1, S2:The sum of squared residuals from each group
 N1, N2: The number of observations in each group
 k: The number of parameters
Then we can say that the Chow test statistic is:
Chow test statistic = [(ST – (S1+S2))/k] / [(S1+S2)/ (N1+N2-2k)]
This test statistic follows the F-distribution with k and N1+N2-2k degrees of
freedom.
Step 3: Reject or fail to reject the null hypothesis.
If the p-value associated with this test statistic is less than a certain
significance level, we can reject the null hypothesis and conclude that there is
a structural break point in the data.
Fortunately, most statistical software is capable of performing a Chow test so
you will likely never have to perform the test by hand.

Question No. 3
Explain concept of ANOVA? How to construct ANOVA table?
Answer:-
We will next illustrate the ANOVA procedure using the five step approach.
Because the computation of the test statistic is involved, the computations are often
organized in an ANOVA table. The ANOVA table breaks down the components of
variation in the data into variation between treatments and error or residual variation.
Statistical computing packages also produce ANOVA tables as part of their standard
output for ANOVA, and the ANOVA table is set up as follows:
Source Sums of Squares (SS) Degrees of Mean Squares F
Variation Freedom (df) (MS)
Between k-1
Treatments
( N-k
Error
o
(or Residual)
r
N-1
Total
Where
 X = individual observation,
 j= sample mean of the jth treatment (or group),
 = overall sample mean,
 k = the number of treatments or independent comparison groups, and
 N = total number of observations or total sample size.
The ANOVA table above is organized as follows.
 The first column is entitled "Source of Variation" and delineates the

between treatment and error or residual variation. The total variation is the
sum of the between treatment and error variation.
 The second column is entitled "Sums of Squares (SS)". The between

treatment sums of squares is

and is computed by summing the squared differences between each

treatment (or group) mean and the overall mean. The squared differences
are weighted by the sample sizes per group (nj). The error sum of squares is:

observation and its group mean (i.e., the squared differences between each
observation in group 1 and the group 1 mean, the squared differences
between each observation in group 2 and the group 2 mean, and so on). The
double summation ( SS ) indicates summation of the squared differences
within each treatment and then summation of these totals across treatments
to produce a single value. (This will be illustrated in the following examples).
The total sum of squares is:

observation and the overall sample mean. In an ANOVA, data are organized
by comparison or treatment groups. If all of the data were pooled into a
single sample, SST would reflect the numerator of the sample variance
computed on the pooled or total sample. SST does not figure into the F
statistic directly. However, SST = SSB + SSE, thus if two sums of squares
are known, the third can be computed from the other two.
 The third column contains degrees of freedom. The between treatment

degrees of freedom is df1 = k-1. The error degrees of freedom is df2 = N - k.
The total degrees of freedom is N-1 (and it is also true that (k-1) + (N-k) = N-
1).
 The fourth column contains "Mean Squares (MS)" which are computed by
dividing sums of squares (SS) by degrees of freedom (df), row by row.
Specifically, MSB=SSB/ (k-1) and MSE=SSE/ (N-k). Dividing SST/ (N-1)
produces the variance of the total sample. The F statistic is in the rightmost
column of the ANOVA table and is computed by taking the ratio of
MSB/MSE.

Example:
A clinical trial is run to compare weight loss programs and participants are randomly
assigned to one of the comparison programs and are counseled on the details of the
assigned program. Participants follow the assigned program for 8 weeks. The
outcome of interest is weight loss, defined as the difference in weight measured at
the start of the study (baseline) and weight measured at the end of the study (8
weeks), and measured in pounds. Three popular weight loss programs are
considered. The first is a low calorie diet. The second is a low fat diet and the third is
a low carbohydrate diet. For comparison purposes, a fourth group is considered as a
control group. Participants in the fourth group are told that they are participating in a
study of healthy behaviors with weight loss only one component of interest. The
control group is included here to assess the placebo effect (i.e., weight loss due to
simply participating in the study). A total of twenty patients agree to participate in the
study and are randomly assigned to one of the four diet groups. Weights are
measured at baseline and patients are counseled on the proper implementation of
the assigned diet (with the exception of the control group). After 8 weeks, each
patient's weight is again measured and the difference in weights is computed by
subtracting the 8 week weight from the baseline weight. Positive differences indicate
weight losses and negative differences indicate weight gains. For interpretation
purposes, we refer to the differences in weights as weight losses and the observed
weight losses are shown below.
Low Calorie Low Fat Low Carbohydrate Control
8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3
Is there a statistically significant difference in the mean weight loss among the
four diets?
We will run the ANOVA using the five-step approach.
Step 1. Set up hypotheses and determine level of significance
H0: μ1 = μ2 = μ3 = μ4 H1: Means are not all equal α=0.05
 Step 2. Select the appropriate test statistic.

The test statistic is the F statistic for ANOVA, F=MSB/MSE.
 Step 3. Set up decision rule.
The appropriate critical value can be found in a table of probabilities for the F
distribution(see "Other Resources"). In order to determine the critical value of F we
need degrees of freedom, df1=k-1 and df2=N-k. In this example, df1=k-1=4-1=3 and
df2=N-k=20- 4=16. The critical value is 3.24 and the decision rule is as follows:
Reject H0 if F > 3.24.
 Step 4. Compute the test statistic.
To organize our computations we complete the ANOVA table. In order to compute

the sums of squares we must first compute the sample means for each group and
the overall mean based on the total sample.
Low Calorie Low Fat Low Carbohydrate Control
N 5 5 5 5
Group mean 6.6 3.0 3.4 1.2
If we pool all N=20 observations, the overall mean is = 3.6.
We can now compute
So, in this case:

SSB = 5 (6.6 - 3.6)2 + 5(3.0 – 3.6)2 + 5(3.4 - 3.6)2 + 5(1.2 – 3.6)2
SSB = 45.0 + 1.8 + 0.2 + 28.8 = 75.8
Next we compute,
SSE requires computing the squared differences between each observation and
its group mean. We will compute SSE in parts. For the participants in the low calorie
diet:
Low Calorie (X - 6.6) (X - 6.6)2
8 1.4 2.0
9 2.4 5.8
6 -0.6 0.4
7 0.4 0.2

3 -3.6 13.0
Totals 0 21.4
Thus,
For the participants in the low fat diet:

Low Fat (X - 3.0) (X - 3.0)2
2 -1.0 1.0
4 1.0 1.0
3 0.0 0.0
5 2.0 4.0
1 -2.0 4.0
Totals 0 10.0
Thus,
∑ (X - X2)2 = 10.0
For the participants in the low carbohydrate diet:
Low Carbohydrate (X - 3.4) (X - 3.4)2

3 -0.4 0.2
5 1.6 2.6
4 0.6 0.4
2 -1.4 2.0
3 -0.4 0.2
Totals 0 5.4
Thus,
∑ (X - X3)2 = 5.4
For the participants in the control group:

Control (X - 1.2) (X - 1.2)2
2 0.8 0.6
2 0.8 0.6
-1 -2.2 4.8
0 -1.2 1.4

3 1.8 3.2
Totals 0 10.6
Thus,
∑ (X – X4)2 = 10.6
Therefore,
= 21.4 + 10.0 + 5.4 + 10.6 = 47.4

Question No. 4
What is nature of autocorrelation? Explain OLS estimation under

autocorrelation.
Answer:-
Autocorrelation is a mathematical representation of the degree of similarity

between a given time series and a lagged version of itself over successive time
intervals. It's conceptually similar to the correlation between two different time
series, but autocorrelation uses the same time series twice: once in its original form
and once lagged one or more time periods.
For example, if it's rainy today, the data suggests that it's more likely to rain
tomorrow than if it's clear today. When it comes to investing, a stock might have a
strong positive autocorrelation of returns, suggesting that if it's "up" today, it's more
likely to be up tomorrow, too. Naturally, autocorrelation can be a useful tool for
traders to utilize; particularly for technical analysts. Autocorrelation can also be
referred to as lagged correlation or serial correlation, as it measures the relationship
between a variable's current value and its past values.
As a very simple example, take a look at the five percentage values in the
chart below. We are comparing them to the column on the right, which contains the
same set of values, just moved up one row.
Day % Gain or Loss Next Day's % Gain or Loss
Monday 10% 5%
Tuesday 5% -2%
Wednesday -2% -8%
Thursday -8% -5%
Friday -5%
When calculating autocorrelation, the result can range from -1 to +1.
An autocorrelation of +1 represents a perfect positive correlation (an increase

seen in one time series leads to a proportionate increase in the other time series).
On the other hand, an autocorrelation of -1 represents a perfect negative
correlation (an increase seen in one time series results in a proportionate
decrease in the other time series). Autocorrelation measures linear relationships.
Even if the autocorrelation is minuscule, there can still be a nonlinear

relationship between a time series and a lagged version of itself.
Testing for Autocorrelation
The most common method of test autocorrelation is the Durbin-Watson test.

Without getting too technical, the Durbin-Watson is a statistic that detects
autocorrelation from a regression analysis.
The Durbin-Watson always produces a test number range from 0 to 4. Values
closer to 0 indicate a greater degree of positive correlation, values closer to
4 indicate a greater degree of negative autocorrelation, while values closer to
the middle suggest less autocorrelation.
So why is autocorrelation important in financial markets? Simple.
Autocorrelation can be applied to thoroughly analyze historical price
movements, which investors can then use to predict future price movements.
Specifically, autocorrelation can be used to determine if a momentum
trading strategy makes sense.
Autocorrelation in Technical Analysis
Autocorrelation can be useful for technical analysis, That's because

technical analysis is most concerned with the trends of, and relationships
between, security prices using charting techniques. This is in contrast with
fundamental analysis, which focuses instead on a company's financial health
or management. Technical analysts can use autocorrelation to figure out how
much of an impact past prices for a security have on its future price.
Autocorrelation can help determine if there is a momentum factor at play
with a given stock. If a stock with a high positive autocorrelation posts two
straight days of big gains, for example, it might be reasonable to expect the
stock to rise over the next two days, as well.

Question No. 5
Consider the model: Yi =  + 1Xi + 2D2i + 3D3i+ i; Where,
Yi= Annual earnings of MBA graduates
Xi= Years of service
D2i= 1 if the individual has an MPhil degree from abroad 0 otherwise
D3i= if the individual has an MPhil degree from Pakistan 0 otherwise
a. What are the expected signs of various coefficients?
b. How would you interpret 2 and 3?
c. If 2> 3, what conclusions can be drawn?
Answer:-
So the model is undefined because the value of D3i is missing in question,

whereas the model; structured change.

807 2ready

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

807 2ready

Uploaded by

Copyright:

Available Formats

Course Code: 807 Course Name: Basic Econometrics

Explain dummy variable trap. How annual salary (Y i) of a schoolteacher can be

Reg No: 0000104533 Course: MSc Economics

Reg No: 0000104533 Course: MSc Economics

A Chow test is a statistical test developed by economist Gregory Chow that is

Reg No: 0000104533 Course: MSc Economics

And if we used two separate regression lines to summarize the pattern in

Reg No: 0000104533 Course: MSc Economics

When to use the Chow Test

 To determine if stock prices change at different rates before and after

 To determine if housing prices change before and after an interest

 To determine if the average profit of public companies is different

Steps to Perform a Chow Test

We can use the following steps to perform a Chow test.

Step 1: Define the null and alternative hypotheses.

Suppose we fit the following regression model to our entire dataset:

 Null (H0): a1 = a2, b1 = b2, and c1 = c2

 Alternative (HA): At least one of the comparisons in the Null is not

Reg No: 0000104533 Course: MSc Economics

 ST: The sum of squared residuals from the total data

 S1, S2:The sum of squared residuals from each group

 N1, N2: The number of observations in each group

 k: The number of parameters

Then we can say that the Chow test statistic is:

Chow test statistic = [(ST – (S1+S2))/k] / [(S1+S2)/ (N1+N2-2k)]

Step 3: Reject or fail to reject the null hypothesis.

Reg No: 0000104533 Course: MSc Economics

 j= sample mean of the jth treatment (or group),

 = overall sample mean,

 k = the number of treatments or independent comparison groups, and

 N = total number of observations or total sample size.

The ANOVA table above is organized as follows.

 The first column is entitled "Source of Variation" and delineates the

 The second column is entitled "Sums of Squares (SS)". The between

Reg No: 0000104533 Course: MSc Economics

and is computed by summing the squared differences between each

and is computed by summing the squared differences between each

and is computed by summing the squared differences between each

 The third column contains degrees of freedom. The between treatment

Reg No: 0000104533 Course: MSc Economics

H0: μ1 = μ2 = μ3 = μ4 H1: Means are not all equal α=0.05

 Step 2. Select the appropriate test statistic.

Reg No: 0000104533 Course: MSc Economics

The test statistic is the F statistic for ANOVA, F=MSB/MSE.

 Step 3. Set up decision rule.

To organize our computations we complete the ANOVA table. In order to compute

Low Calorie Low Fat Low Carbohydrate Control

Group mean 6.6 3.0 3.4 1.2

If we pool all N=20 observations, the overall mean is = 3.6.

We can now compute

So, in this case:

Low Calorie (X - 6.6) (X - 6.6)2

Reg No: 0000104533 Course: MSc Economics

For the participants in the low fat diet:

For the participants in the low carbohydrate diet:

Low Carbohydrate (X - 3.4) (X - 3.4)2

For the participants in the control group:

Reg No: 0000104533 Course: MSc Economics

= 21.4 + 10.0 + 5.4 + 10.6 = 47.4

Reg No: 0000104533 Course: MSc Economics

What is nature of autocorrelation? Explain OLS estimation under

Autocorrelation is a mathematical representation of the degree of similarity