Professional Documents
Culture Documents
Econometrics by G.S Madala
Econometrics by G.S Madala
).
Prob = #5) 0.975
If S? = 9, we get the 97.5% (right-sided) confidence interval for o? as (5.205,
). We can construct similar one-sided confidence interval for ».
2.9 Testing of Hypotheses
We list here some essential points regarding hypothesis testing.
1. A statistical hypothesis is a statement about the values of some parame-
ters in the hypothetical population from which the sample is drawn.2.9 TESTING OF HYPOTHESES 29
2. A hypothesis whith says that a parameter has a specified value is called
a point hypothesis. A hypothesis which says that a parameter lies in a specified
interval is called an internal hypothesis. For instsance, if wis the population
mean, then H: 4 = 4 is a point by hypothesis. H: 4 = p< 7 is an interval
hypothesis.
3. A hypothesis test is a procedure that answers the question of whether the
observed difference between the sample value and the population value hy-
pothesized is real or due to chance variation. For instance, if the hypothesis
says that the population mean . = 6 and the sample mean y = 8, then we want
to know whether this difference is real or due to chance variation.
4. The hypothesis we are testing is called the null hypothesis and is often
denoted by H,.? The alternative hypothesis is denoted by H;. The probability
of rejecting Hy when, in fact, it is true, is called the significance level. To test
whether the observed difference between the data and what is expected under
the null hypothesis Hy is real or due to chance variation, we use a fest statistic.
A desirable criterion for the test statistic is that its sampling distribution be
tractable, preferably with tabulated probabilities. Tables are already available
for the normal, 1, x?, and F distributions, and hence the test statistics chosen
are often those that have these distributions.
5. The observed significance level or P-value is the probability of getting a
value of the test statistic that is as extreme or more extreme than the observed
value of the test statistic. This probability is computed on the basis that the
null hypothesis is correct. For instance, consider a sample of n independent
observations from a normal population with mean p and variance o®, We want
to test
Aye against = Hy: p #7
The test statistic we use is
Valy - w)
Ss
which has a t-distribution with (n — 1) degrees of freedom. Suppose that 1 =
25, y = 10, S = 5. Then under the assumption that H, is true, the observed
value of 1 is t, = 3. Since high positive values of r are evidence against the null
hypothesis Hp, the P-value is [since degrees of freedom (n — 1) = 24]
P = Prob(ty > 3)
This is the observed significance level.
6. It is common practice to say simply that the result of the test is (statisti-
cally) significant or not significant and not report the actual P-values. The
meaning of the two terms is as follows:
"This terminology is unfortunate because “null” means “zero, void, insignificant, amounts to
nothing, etc.” A hypothesis p= 0 can be called a null hypothesis, but a hypothesis p = 100
Should not be called a “null” hypothesis, However, thts is the standard terminology that was
introduced in the 1930s by the statisticians Jerzy Neyman and E. S. Pearson.30 2. STATISTICAL BACKGROUND AND MATRIX ALGEBRA
Statistically significant. Sampling variation is an unlikely explanation of the
discrepancy between the null hypothesis and sample values.
Statistically insignificant. Sampling variation is a likely explanation of the
discrepancy between the null hypothesis and the sample value.
Also, the terms significant and highly significant are customarily used to
denote “significant at the 0.05 level” and “significant at the 0.01 level” respec-
tively. However, consider two cases where the P-values are 0.055 and 0.045,
respectively. Then in the former case one would say that the results are “not
significant” and in the latter case one would say that the results are “signifi-
cant,” although the sample evidence is marginally different in both cases. Sim-
ilarly, two tests with P-values of 0.80 and 0.055 will both be considered “not
significant,” although there is a tremendous difference in the compatibility of
the sample evidence with the null hypothesis in the two cases. That is why
many computer programs print out P-values.>
7. Statistical significance and practical significance are not the same thing.
A result that is highly significant statistically may be of no practical significance
at all. For instance, suppose that we consider a shipment of cans of cashews
with expected mean weight of 450 g. If the actual sample mean of weights is
449.5 g, the difference may be practically insignificant but could be highly sta-
tistically significant if we have a large enough sample or a small enough sam-
pling variance (note that the test statistic has \/n in the numerator and S in the
denominator). On the other hand, in the case of precision instruments, a part
expected to be of length 10 cm and a sample had a mean length of 9.9 cm. If
nis small and S is large, the difference could not be statistically significant but
could be practically very significant. The shipment could simply be useless.
8. It is customary to reject the null hypothesis Hy when the test statistic is
statistically significant at a chosen significance level and not to reject Hy when
the test statistic is not statistically significant at the chosen significance level.
There is, however, some controversy on this issue which we discuss later in
item 9. In reality, H, may be either true or false. Corresponding to the two
cases of reality and the two conclusions drawn, we have the following four
possibilities:
Reality
Result of the Test Hy Is True Hy Is False
Significant (reject ‘Type | error or a Correct conclusion
Hy) error
Not significant (do Correct conclusion Type I error or B
not reject Hy) error
>A question of historical interest is: “How did the numbers 0.05 and 0.01 creep into all these
textbooks?” The answer is that they were suggested by the famous statistician Sir R. A. Fisher
(1890-1962), the “father” of modern statistics, and his prescription has been followed ever
since.29. TESTING OF HYPOTHESES 31
There are two possible errors that we can make:
1, Rejecting H, when it is true. This is called the type I error or a error.
2. Not rejecting Hy when it is not true. This is called the type II error or B
error.
Thus
a = Prob(rejecting Ha|Hy is true)
8 = Prob(not rejecting H,|Hy is not true)
a is just the significance level, defined earlier. (1 — B) is called the power of
the test. The power of the test cannot be computed unless the alternative hy-
pothesis H, is specified; that is, Hy is not true means that H, is true.
For example, consider the problem of testing the hypothesis.
1S
0
Hew = 10 against Hy:
for a normal population with mean p and variance o?. The test statistic we use
ist = Vn(z — )/S. From the sample data we get the values of n, %, and S.
To calculate a we use p. = 10, and to calculate B we use p= 15, The two errors
are
a = Prob(t > r*|p, = 10)
B = Prob(t < f\p = 15)
where f* is the cutoff point of ¢ that we use to reject or not reject Hy. The
distributions of t under H, and H, are shown in Figure 2.1. In our example a is
the right-hand tail area from the distribution of ¢ under H, and 8 is the left-
hand tail area from the distribution of t under H,. If the alternative hypothesis
A, says that 2 < 0, then the distribution of ¢ under H, would be to the left of
the distribution of t under Hp. In this case a would be the left-hand tail area of
4
FUE) fitiety)
Sampling ‘Sampling
distribution distribution
of f under of ¢ under A’
Ho
10 15 t
Do not rect Hy Reject Ho
Figore 2.1. Type I and type II errors in testing a hypothesis.32 2. STATISTICAL BACKGROUND AND MATRIX ALGEBRA
the distribution of ¢ under H, and B would be the right-hand tail area of the
distribution of ¢ under H,.
The usual procedure that is suggested (which is called the Neyman-Pearson
approach) is to fix « at a certain level and minimize , that is, choose the test
statistic that has the most power. In practice, the tests we use, such as the f,
x’, and F tests, have been shown to be the most powerful tests.
9. There are some statisticians who disagree with the ideas of the Neyman—
Pearson theory. For instance, Kalbfleisch and Sprott‘ argue that it is a gross
simplification to regard a test of significance as a decision rule for accepting or
rejecting a hypothesis. They argue that such decisions are made on more than
just experimental evidence. Thus the purpose of a significance test is just to
quantify the strength of evidence in the data against a hypothesis expressed in
a (0, 1) scale, not to suggest an accept-reject rule (see item 7). There are some
statisticians who think that the significance level used should depend on the
sample size.
The problem with a preassigned significance level is that if the sample size is
large enough, we can reject every null hypothesis. This is often the experience
of those who use large cross-sectional data sets with thousands of observa-
tions. Almost every coefficient is significant at the 5% level. Lindley’ argues
that for large samples one should use lower significance levels and for smaller
samples higher significance levels. Leamer® derives significance levels in the
case of regression models for different sample sizes that show how significance
levels should be much higher than 5% for small sample sizes and much lower
than 5% for large sample sizes.
Very often the purpose of a test is to simplify an estimation procedure. This
is the “pretesting” problem, where the test is a prelude to further estimation.
In the case of such pretests it has been found that the significance levels to be
used should be much higher than the conventional 5% (sometimes 25 to 50%
and even 99%). The important thing to note is that tests of significance have
several purposes and one should not use a uniform 5% significance level.’
2.10 Relationship Between Confidence
Interval Procedures and
Tests of Hypotheses
There is a close relationship between confidence intervals and tests of hy-
potheses. Suppose that we want to test a hypothesis at the 5% significance
‘J. G. Kalbfieisch and D. A. Sprott, “On Tests of Significance,” in W. L. Harper and C. A.
Hooker (eds.), Foundations of Probability Theory, Statistical Inference, and Statistical Theory
of Science, Vol. 2 (Boston: D. Reidel, 1976), pp. 259-272.
SD. V. Lindley, “A Statistical Paradox,” Biometrika, 1957, pp. 187-192.
*Leamer, Specification Searches
7A good discussion of this problem is in the series of papers “For What Use Are Tests of Hy-
potheses and Tests of Significance” in Communications in Statistics, A. Theory and Methods,
Vol. 5, No. 8, 1976.SUMMARY 33
level. Then we can construct a 95% confidence interval for the parameter under
consideration and see whether the hypothesized value is in the interval. If it is,
we do not reject the hypothesis. If it is not, we reject the hypothesis. This
relationship holds good for tests of parameter values. There are other tests,
such as goodness-of-fit tests, tests of independence in contingency tables, and
so on, for which there is no confidence interval counterpart.
‘As an example, consider a sample of 20 independent observations from a
normal distribution with mean w and variance o?. Suppose that the sample
mean is § = 5 and sample variance S? = 9. We saw from equation (2.5) that
the 95% confidence interval was (3.6, 6.4). Suppose that we consider the prob-
lem of testing the hypothesis
Hew =7 against = Hy: p #7
If we use a 5% level of significance we should reject Hy if
Vay — 7)
Ss
since 2.093 is the point from the ¢-tables for 19 degrees of freedom such that
Prob(~2.093 < 1 < 2.093) = 0.95 or Probi{t| > 2.093) = 0.05. In our example
Val¥ — 7S = —3 and hence we reject Hy. Actually, we would be rejecting
Hy at the 5% significance level whenever Hy specifies 1 to be a value outside
the interval (3.6, 6.4).
For a one-tailed test we have to consider the corresponding one-sided con-
fidence interval. If Hy: » = 7 and H,: » <7, then we reject H, for low values
of ¢ = Vn(¥ — 7)/S. From the t-tables with 19 degrees of freedom, we find that
Prob(t < — 1.73) = 0.05. Hence we reject H, if the observed f < — 1.73. In our
example it is —3 and hence we reject Hy. The corresponding 95% one-sided
confidence interval is given by
> 2.093
Ss
which gives (on substituting n = 20, » = 5, S = 3) the confidence interval
(—*, 6.15).
Prob| —1.7 < a] = 0.95
Summary
1, If two random variables are uncorrelated, this does not necessarily imply
that they are independent. A simple example is given in Section 2.3.
2. The normal distribution and the related distributions: 1, x?, and F form
the basis of all statistical inference in the book, Although many economic data
do not necessarily satisfy the assumption of normality, we can make some
transformations that would produce approximate normality. For instance, we
consider log wages rather than wages.34 2. STATISTICAL BACKGROUND AND MATRIX ALGEBRA
3. The advantage of the normal distribution is that any linear function of
normally distributed variables is normally distributed. For the y?-distribution a
weaker property holds. The sum of independent y? variables has a y’-distribu-
tion. These properties are very useful in deriving probability distributions of
sample statistics. The ¢, x’, and F distributions are explained in Section 2.4.
4. A function of the sample observations is called a statistic (e.g., sample
mean, sample variance). The probability distribution of a statistic is called the
sampling distribution of the statistic.
5. Classical statistical inference is based entirely on sampling distributions.
By contrast, Bayesian inference makes use of sample information and prior
information. We do not discuss Bayesian inference in this book, but the basic
idea is explained in Section 2.5. Based on the prior distribution (which incor-
porates prior information) and the sample observations, we obtain what is
known as the posterior distribution and all our inferences are based on this
posterior distribution.
6. Classical statistical inference is usually discussed under three headings:
point estimation, interval estimation, and testing of hypotheses. Three desir-
able properties of point estimators—unbiasedness, efficiency, and consis-
tency—are discussed in Section 2.6.
7. There are three commonly used methods of deriving point estimators:
(a) The method of moments.
(b) The method of least squares.
(c) The method of maximum likelihood.
These are discussed in Chapter 3.
8. Section 2.8 presents an introduction to interval estimation and Section 2.9
gives an introduction to hypothesis testing. The interrelationship between the
two is explained in Section 2.10.
9. The main elements of hypothesis testing are discussed in detail in Section
2.9. Most important, arguments are presented as to why it is not desirable to
use the usual 5% significance level in all problems.
Exercises
1. The COMPACT computer company has 4 applicants, all equally qualified,
of whom 2 are male and 2 female. The company has to choose 2 candidates,
and it does not discriminate on the basis of sex. If it chooses the two can-
didates at random, what is the probability that the two candidates chosen
will be the same sex? A student answered this question as follows: There
are three possible outcomes: 2 female, 2 male, and 1 female and | male.
The number of favorable outcomes is two. Hence the probability is 3. Is
this correct?
2. Your friend John says: “Let’s toss coins. Each time I'll toss first and then
you. If either coin comes up heads, I win. If neither, you win.” You say,
“There are three possibilities. You may get heads first and the game endsEXERCISES 35
ww
there. Or you may get tails and I get heads, or we may both get tails. In
two of the three possible cases, you win.” “Right,” says your friend, “I
have two chances to your one. So I'll put up $2 against $1.” Is this a fair
bet?
. A major investment company wanted to know what proportions of inves-
tors bought stocks, bonds, both stocks and bonds, and neither stocks nor
bonds. The company entrusted this task to its statistician, who, in turn,
asked her assistant to conduct a telephone survey of 200 potential investors
chosen at random. The assistant, however, was a moonlighter with some
preconceptions of his own, and he cooked up the folowing results without
actually doing a survey. For a sample of 200, he reported the following
results:
Invested in stocks: 100
Invested in bonds: 75
Invested in both stocks and bonds: 45
Invested in neither:
The statistician saw these results and fired his moonlighting assistant.
Why?
. (The birthday problem) (In this famous problem it is easier to calculate the
probability of a complementary event than the probability of the given
event.) Suppose your friend, a teacher, says that she will pick 30 students
at random from her school and offer you an even bet that there will be at
least two students who have the same birthdays. Should you accept this
bet?
. (Getting the answer without being sure you have asked the question) This
is the randomized response technique due to Stanley L. Warner's paper in
Journal of the American Statistical Association, Vol. 60 (1965), pp. 63-69.
You want to know the proportion of college students who have used drugs.
A direct question will not give frank answers. Instead, you give the stu-
dents a box containing 4 blue, 3 red, and 4 white balls. Each student is
asked to draw a ball (you do not see the ball) and abide by the following
instructions, based on the color of the ball drawn:
Blue: Answer the question: “Have you used drugs?”
Red: Answer “yes.”
White: Answer “no.”
If 40% of the students answered “yes,” what is the percentage of students
who have used drugs?
. A petroleum company intends to drill for oil in four different wells. Use
the subscripts 1, 2, 3, 4 to denote the different wells, Each well is given an
equally likely chance of being successful. Consider the following events:
Event A: There are exactly two successful wells.
Event B: There are at least three successful wells.
Event C: Well number 3 is successful.
Event D: There are fewer than 3 successful wells.36 2. STATISTICAL BACKGROUND AND MATRIX ALGEBRA
Compute the following probabilities:
(a) P(A) (b) PCB) © PC) (d) P(D)
(e) P(AB) (f) P(A +B) (g) P(BC) (h) P(B+C)
(i) P(BD) G) PC) (k) P(A|B) @ PBC)
(m) P(C|D) (n) P(DIC)
7. On the TV show “Let’s Make a Deal,” one of the three boxes (A, B, C)
on the stage contains keys to a Lincoln Continental. The other two boxes.
are empty. A contestant chooses box B. Boxes A and C remain on the
table. Monty Hall, the energetic host of the show, suggests that the con-
testant surrender her box for $500. The contestant refuses. Monty Hall
then opens one of the remaining boxes, box A, which turns out to be
empty. Monty now offers $1000 to the contestant to surrender her box. She
again refuses but asks whether she can trade her box B for box C on the
table. Monty exclaims, “That's weird.” But is it really weird, or does the
contestant know how to calculate probabilities? Hint: Suppose that there
are n boxes. The probability that the key is in any of the (n — 1) boxes on
the stage (n — 1)/n. Monty opens p boxes, which turn out to be empty.
The probability that the key is in any of the (n — p — 1) boxes is (n — 1/
n(n — p —1). Without switching, the probability that the contestant wins
remains at I/n. Hence it is better to switch. In this case n
probability of winning without a switch is }. The probability of winning by
switching is 3.
8. You are given 10 white and 10 black balls and two boxes. You are told that
your instructor will draw a ball from one of the two boxes. If it is white,
you pass the exam, If it is black, you fail. How should you arrange the
balls in the boxes to maximize your chance of passing?
9. Suppose that the joint probability distribution of X and Y is given by the
following table.
y
x
1 0 0.2
2 0.2 0
3 0 0.2
(a) Are X and ¥ independent? Explai
(b) Find the marginal distributions of X and Y.
(©) ae the conditional distribution of Y given X = 1 and hence E(Y |
= Iand var(Y |X = 1).
qd) ee part (c) for X = 2 and X = 3 and hence verify the result that
V(Y) = EV(Y| X) + VE(¥ | X) that is the variance of a random
variable is equal to the expectation of its conditional variance plus
the variance of its conditional expectation.
10. The density function of a continuous random variable X is given by
kx(2 — x) for0Sx=2
0 otherwise
fa) =EXERCISES 37
i.
(a) Find k.
(b) Find E(X) and V(X).
Answer Exercise 10 when
kx for0=x=1
f@) = jkQ2)-x) forlsxs2
0 otherwise
. Suppose that X and Y are continuous random variables with the joint prob-
ability density function
_Skae+y) for0