Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Section 4

Mathematical statistics

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 65


Review of Random Sampling I

Population: Well defined group of subjects


Statistical inference involves learning something about a population
given the availability of a sample from that population.
Broadly this involves Estimation and hypothesis testing
Random Sample: If Y1 , Y2 , ...., Yn are independent random variables
with a common probability density function f (y ; ◊), then
{Y1 , Y2 , ..., Yn } is said to be a random sample from f (y ; ◊) or a
random sample drawn from a population represented by f (y ; ◊).
Different outcomes are possible before the sampling is actually carried
out
Once a sample is obtained, we have a set of numbers, say,
{y1 , y2 , ., yn }, which constitute the data that we work with.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 66


Estimator and Estimate I

Given a random sample {Y1 , Y2 , .., Yn } drawn from a population


distribution that depends on an unknown parameter ◊, an estimator of
◊ is a rule that assigns each possible outcome of the sample a value of
◊.
An estimator W of a parameter ◊ can be expressed as an abstract
mathematical formula:

W = h(Y1 , Y2 , ..., Yn )

Example: Y1 , Y2 , . . . , Yn be random sample from the same distribution


with mean µ. An estimator of µ is is sample average,
n
ÿ
Y = n≠1 Yi
i=1

Note that an estimator, W is also a random variable.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 67


Estimator and Estimate II

We study various properties of the probability distribution of the


random variable W
The distribution of an estimator is called sampling distribution
When a particular set of numbers y1 , y2 , . . . , yn is plugged into the
function h(.), we obtain an estimate of ◊

w = h(y1 , y2 , . . . , yn )

W is called a point estimator and w is called a point estimate


There are many ways to combine data to estimate parameters (many
estimators for the same parameter)
We need some sensible criteria (properties) to choose among estimators

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 68


General Approaches to Parameter Estimation

Are there general approaches to estimation that produce estimators


with good properties, such as unbiasedness, consistency, and efficiency?
I Method of Moments (Parameter ◊ is shown to be related to some
expected value in the distribution of Y. eg. sample average, sample
correlation coefficient)
I Maximum Likelihood (Out of all the possible values for , the value that
makes the likelihood of the observed data largest should be chosen)
I Least Squares (The estimator makes sum of squared deviation as small
as possible)
We will treat them in depth as required

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 69


Finite Sample Properties of Estimators

Unbiasedness: An estimator, W of ◊, is an unbiased estimator if


E (W ) = ◊.
Example: sample mean is an unbiased estimator of population mean.
Example: Sample variance defined as
n
1 ÿ
S2 = (Yi ≠ Ȳ )2
n ≠ 1 i=1

is unbiased for ‡ 2 where the sample Yi is drawn from a population


represented by a distribution with mean E (Y ) = µ and variance
Var (Y ) = ‡ 2 .
Note: If µ is known, then we do not need to divide by n ≠ 1. µ is
rarely known in practice.
Bias is defined as Bias(W ) = E (W ) ≠ ◊

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 70


The Sampling Variance of Estimators I

How does the distribution of an estimator spread out?


The variance of an estimator is often called its sampling variance
because it is the variance associated with a sampling distribution.
The sampling variance is not a random variable; it is a constant, but it
might be unknown.
Example: Find Var (Ȳ )
Relative Efficiency: If W1 and W2 are two unbiased estimators of ◊,
W1 is efficient relative to W2 when Var (W1 ) Æ Var (W2 ) for all ◊, with
strict inequality for at least one value of ◊.
Comparing variances is meaningless if we do not restrict our attention
to unbiased estimator.
One way to compare estimators that are not necessarily unbiased is to
compute the Mean Squared Error (MSE) of the estimators.
If W is an estimator of , then the MSE of W is defined as

MSE (W ) = E [(W ≠ ◊)2 ]


Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 71
The Sampling Variance of Estimators II

Shown that MSE (W ) = Var (W ) + [Bias(W )]2

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 72


Relative efficiency

f (w )

pdf of W1

pdf of W2

◊ w

Figure 3: Relative Efficiency

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 73


Large sample or Asymptotic property of an estimator

Consistency: Let Wn be an estimator of ◊ based on a sample


{Y1 , Y2 , .., Yn } of size n. Then, Wn is a consistent estimator of ◊ if for
every ‘ > 0,
P(|Wn ≠ ◊| > ‘) æ 0 as n æ Œ
When Wn is consistent, we also say that ◊ is the probability limit of
Wn , written as plim(Wn ) = ◊.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 74


Consistency
fWn (w )

n = 40

n = 16

n=4

◊ w
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 75
Exercise

What is Law of Large Number?


What is Central Limit Theorem?

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 76


Properties of plim

plim(g(Wn )) = g(plim(Wn )) for any continuous function g(.)


q
Example: Sn2 = (n ≠ 1)≠1 ni=1 (Yi ≠ Ȳn )2 is unbiased. You can prove
it consistent Ò
A natural estimator of ‡ is Sn = Sn2 . But this is not unbiased
because expected value of the square root is not the square root of the
expected value. However, Sn is consistent.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 77


Properties of plim

If plim(Tn ) = – and plim(Un ) = —, then

plim(Tn + Un ) = – + —
plim(Tn Un ) = –—
plim(Tn /Un ) = –/— provided — ”= 0

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 78


Exercise

Let µm and µf be the population mean of annual earnings of male and


female IIT graduates respectively. You are interested in percentage
difference in annual earnings “ © 100(µm ≠ µf )/µf .
Propose an estimator of “. Is it unbiased? Is it consistent?

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 79


Shape of the sampling distribution

All the above three properties do not tell us anything about the shape
of the distribution of an estimator
We need to approximate it for constructing interval estimator or
hypothesis testing
Asymptotic normality results is very useful for this purpose
Let {Z1 , Z2 , . . . , Zn } be a sequence of random variables, such that for
all number z
P(Zn Æ z) æ (z) as n æ Œ,
where (z) is the standard normal distribution function. Then, Zn is
said to have an asymptotic standard normal distribution. In short
a
Zn ≥ N(0, 1).

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 80


Central Limit Theorem

Let {Y1 , Y2 , . . . , Yn } be a random sample with mean µ and variance


‡ 2 . Then,
Y¯n ≠ µ
Zn = Ô
‡/ n
has an asymptotic standard normal distribution.
Exercise: If we replace ‡ by its sample counterpart Sn in the above
standardised Zn , what kind of distribution does it follow when n æ Œ
and when n is small?
When two consistent estimators have asymptotic normal distributions,
we choose the estimator with the smallest asymptotic variance.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 81


Interval Estimation and Confidence Intervals

Point estimate obtained from a particular sample does not, by itself,


provide enough information for testing economic theories or for
informing policy discussions.
It provides no information about how close the estimate is “likely” to
be to the population parameter.
How are we to know whether crime rates in states with higher literacy
is close to that with lower literacy?
How do we know that increasing tax rates makes a big difference in
tobacco consumption?
Reporting the standard deviation of the estimator, along with the point
estimate, provides some information on the accuracy of our estimate.
However, that makes no direct statement about where the population
value is likely to lie in relation to the estimate

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 82


Confidence interval

Example: Suppose the population has a N(µ, ‡ 2 ) distribution and let


{Y1 , ., Yn } be a random sample from this population. The variance of
the population is known.
The sample average, Ȳ , has a normal distribution with mean µ and
variance ‡ 2 /n. Ȳ ≥ N(µ, ‡ 2 /n).
We can standardize Ȳ , and, because the standardized version of Ȳ has
a standard normal distribution, we have
1 Ȳ ≠ µ 2
P ≠ 1.96 < Ô < 1.96 = 0.95
‡/ n

This information allows us to construct an interval estimate of µ.


Probabilistic
Ë interpretation:È the probability that the random interval
Ȳ ≠ 1.96 Ôn , Ȳ + 1.96 Ô‡n contains the population mean µ is 0.95 or

95%

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 83


When ‡ is unknown I

For unknown ‡, we must use an estimate


A n
B1/2
1 ÿ ! "2
s= yi ≠ ȳ
n ≠ 1 i=1

We obtain a confidence interval that depends entirely on the observed


data
Unfortunately, this does not preserve the 95% level of confidence
because s depends on the particular sample
Ô
The random interval [Ȳ ± 1.96(S/ n)] no longer contains µ with
probability 0.95
How should we proceed?
Ȳ ≠ µ
Ô ≥?
S/ n

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 84


When ‡ is unknown I
It follows t distribution (why?)

Ȳ ≠ µ
Ô ≥ tn≠1
S/ n

Let c denote the 97.5th percentile in the tn≠1


P(≠c < tn≠1 < c) = 0.95
The vale of c depends on the degree of freedom parameter
Once c has been properly chosen, the random interval
# Ô Ô $
Ȳ ≠ c.S/ n, Ȳ + c.S/ n

contains µ with probability 0.95.


Ô
The associated random variable S/ n is called standard error of Ȳ
For a particular sample, the 95% confidence interval is calculated as
# Ô Ô $
ȳ ≠ c.s/ n, ȳ + c.s/ n
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 85
When ‡ is unknown II

More generally, let c– denote the 100(1 ≠ –) percentile in the tn≠1


distribution. Then, a 100(1 ≠ –)% confidence interval is obtained as
# Ô Ô $
ȳ ≠ c –2 .s/ n, ȳ + c –2 .s/ n

Obtaining c –2 requires choosing and knowing the degrees of freedom


n≠1
A simple rule of thumb: [ȳ ± 2.se(ȳ )]

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 86


97.5th percentile
Area = 0.95

Area = 0.025
Area = 0.025

-c c
0

Figure 5: The 97.5th percentile, c, in a t distribution


Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 87
Exercise

Holzer, Block, Cheatham, and Knott (1993) studied the effects of job
training grants on worker productivity by collecting information on “scrap
rates” for a sample of Michigan manufacturing firms receiving job training
grants in 1988. There were no grants awarded in 1987. We are interested in
constructing CI for the change in scrap rate from 1987 to 1988 for the
population of all manufacturing firms.
The data given below is for a sample of 20 firms that received job training
grants in 1988. Scrap rate is measured as number of items per 100
produced that are not usable.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 88


Scrap rate data I

Table 1: Scrap rate Table 2: Scrap rate

Firm 1987 1988 Firm 1987 1988


1 10.00 3.00 11 11 0.98 0.51
2 1.00 1.00 12 12 1.00 0.50
3 6.00 5.00 13 13 0.45 0.61
4 0.45 0.50 14 14 5.03 6.70
5 1.25 1.54 15 15 8.00 4.00
6 1.30 1.50 16 16 9.00 7.00
7 1.06 0.80 17 17 18.00 19.00
8 3.00 2.00 18 18 0.28 0.20
9 8.18 0.67 19 19 7.00 5.00
10 1.67 1.17 20 20 3.97 3.83

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 89


Non-normal populations and asymptotics

Sometimes the population is clearly not normal


As sample size gets larger, we can assume asymptotic normality and
construct CI as [ȳ ± 1.96.se(ȳ )]
Example: Gender discrimination in job offer. Are females discriminated
against?

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 90


Gender discrimination in hiring: matched pairs
analysis

Construct a pair of CVs consisting of a male and a female for several


job applications.
In the pair, one person is male and the other is female having exactly
the same CV in terms of experience, qualification, etc.. except their
gender.
Each person in the pair was interviewed by an employer for the same
job, and the researchers recorded who got the job offer (both may get
the offer as well).
This is an example of a matched pairs analysis, where each trial
consists of data on two subjects that are thought to be similar in many
respects but different in one important characteristic.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 91


Example contd.

Let ◊m be the probability of that a male is offered a job and ◊f be that


for female.
Our interest is in the difference ◊f ≠ ◊m .
Let Mi denote a binary random variable equal to 1 if the male gets a
job offer from the employer i and zero otherwise. We define Fi in
similar way for the female.
We can construct several such pairs (with varying job
profiles/industries/qualifications) to have greater representations across
all spectrum of labour market.
Our sample will consists of pool of all such cases (trials), i.e. pairs of
interviews by employers

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 92


Example contd.

Let n = 241 (pairs of interviews)


Unbiased estimators of ◊m and ◊f are sample proportions (M̄ and F̄ ) of
interviews for which males and females were offered jobs, respectively.
Define a random variable Yi = Fi ≠ Mi . It takes three possible values
(discrete) - not normal distribution
Our interest is in population parameter, µ © E (Yi ) = E (Fi ) ≠ E (Mi ).
Though it is not normally distributed, can we construct approximate
confidence interval for µ as n is quite large?

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 93


Example contd.

Data gives: f¯ = .224 and m̄ = .357.


Then ȳ = .224 ≠ .357 = ≠.133
To construct CI, we need s. Data: s = .482 Ô
Approximate 95% CI for µ is ≠.133 ± 1.96(.482/ 241)
This example demonstrates how to find point estimate of a population
parameter and construct confidence interval.
But, can we answer whether females are discriminated against in
definite “yes” or “no” answer?

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 94


Hypothesis Testing I

Sometimes the question we are interested in has a definite yes or no


answer
Devising methods for answering such questions, using a sample of data,
is known as hypothesis testing.
Example: How strong is the sample evidence of comparing crime rates
in lower literacy rates with that in higher literacy rates?
We set up a hypothesis test
In order to test a hypothesis we specify a null (H0 ) and an alternative
(H1 ) hypothesis.
Null hypothesis is presumed to be true until the data strongly suggest
otherwise (just as a defendant is presumed to be innocent until proven
guilty)
We need to choose a test statistic (or statistic, for short) and a critical
value.
In hypothesis testing, we can make two kinds of mistakes

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 95


Hypothesis Testing II
I First, we can reject the null hypothesis when it is in fact true. This is
called a Type I error
I Second, failing to reject null when it is actually false is Type II error
A test statistic, denoted T , is some function of the random sample.
Given a test statistic, we can define a rejection rule that determines
when H0 is rejected in favour of H1 .
In order to conclude that H0 is false and that H1 is true, we must have
evidence “beyond reasonable doubt” against H0
How we quantify “beyond reasonable doubt”?
In hypothesis testing, we can make two kinds of mistakes: Type I and
Type II error
After deciding whether or not to reject the H0 , either we have decided
correctly or committed an error
We will never know with certainty whether an error was committed.
However, we can compute the probability of making either a Type I or
a Type II error

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 96


Hypothesis Testing III
Hypothesis testing rules are constructed to make the probability of
committing a Type I error fairly small (significance level)
We define significance level

– = P(Reject H0 |H0 is true)

Hypothesis testing requires that we initially specify a significance level


for a test
Once we have chosen the significance level, we would then like to
minimize the probability of a Type II error (alternatively, we would like
to maximize the power of a test against all relevant alternatives)
Power of a test is

fi(◊) = P(Reject H0 |◊) = 1 ≠ P(Type II|◊)

where ◊ denotes the actual value of the parameter

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 97


Hypothesis Testing IV

We would like the power to equal unity whenever the null hypothesis is
false (this is impossible to achieve while keeping the significance level
small)
We choose our tests to maximize the power for a given significance
level
In order to test a null hypothesis against an alternative, we need to
choose a test statistic (or statistic, for short) and a critical value
Given a test statistic, we can define a rejection rule that determines
when H0 is rejected in favour of H1
Usually, rejection rules are based on comparing the value of a test
statistic, t, to a critical value, c.
To determine the critical value, we must first decide on a significance
level (–) of the test.
The values of t that result in rejection of the null hypothesis are
collectively known as the rejection region.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 98


Hypothesis Testing V
Then, given –, the critical value associated with – is determined by the
distribution of T , assuming that H0 is true.
Example: Hypotheses about the mean µ from a N(µ, ‡ 2 )

H0 : µ = µ0

where µ0 is a value that we specify.


The rejection rule we choose depends on the nature of the alternative
hypothesis
There could be three possible alternative hypothesis.
I H1 : µ > µ0
I H1 : µ < µ0
I H1 : µ =
” µ0
Intuitively, in the first case we should reject the null in favour of the
alternative when the sample average is “sufficiently” greater than µ0 .
But how should we determine? This requires knowing the probability of
rejecting the null hypothesis when it is true.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 99
Hypothesis testing

Let us define a standardised version of mean of sample


! "
Ô ! " ȳ ≠ µo
t = n ȳ ≠ µ0 /s =
se(ȳ )

Given the sample of data, it is easy to obtain the above value.


Under the null hypothesis, the random variable
Ô ! "
T = n Ȳ ≠ µ0 /S

has a tn≠1 distribution


For 5% significance level, the critical value c is chosen so that
P(T > c|H0 ) = .05 (one-tailed test)
Once we found c (this is 100(1 ≠ –) percentile in a tn≠1 distribution),
the rejection rule is t > c.
The t is often called t ≠ statistics.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 100


Two-sided test

Similarly rejection rule for the second one-sided alternative is t < ≠c.
For two sided alternative, H1 : µ ”= µ0 , we reject the null if the sample
mean is far from the hypothesised value µ0 in absolute terms. The
rejection rule is
|t| > c
, where where the critical value is the 100(1 ≠ –/2) percentile in the
tn≠1 distribution.
Usually, we interpret like “we fail to reject H0 in favour of H1 at the
5% significance level”
With large n, we can compare the t statistic with the critical values
from a standard normal distribution.

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 101


p-values I
Different conclusion of a test for different – values
What is the largest significance level at which we could carry out the
test and still fail to reject the null hypothesis? This value is known as
the p-value of a test.
Given a value of t, we can find the largest significance level at which
we would fail to reject H0
This is the significance level associated with using t as our critical value
For one sided test

p ≠ value = P(T > t|H0 )

For two sided test

P(|T | > |t||H0 ) = 2P(T > |t||H0 )

Suppose p ≠ value is 0.065. Then the largest significance level at


which we can carry out this test and fail to reject is 6.5%.
Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 102
p-values II

If we carry out the test at a level below 6.5% (such as at 5%), we fail
to reject H0
If we carry out the test at a level larger than 6.5% (such as 10%), we
reject H0
Generally, small p ≠ values are evidence against H0

Sourabh B Paul (IIT Delhi) Econometric Methods II Semester 2023-24 103

You might also like