Statistics and Point Estimation: Motivation

Statistics and Point Estimation
Aria Nosratinia Probability and Statistics
8-1
Motivation
In all our studies so far, we have assumed to know certain things, e.g., mean and variance of a distribution. But HOW do we know them? Where do they come from? In practice we measure them using experiments. For example, we measure a voltage, which is a random variable, and get several samples: V1 V2 Vn
What is the mean and variance of this voltage? This chapter aims to answer questions of this sort.
8-2
Sample Mean
A sample is one outcome of a random variable. A sample mean is the empirical average calculated via several i.i.d. samples: X1 + + X n M n (X ) = n Sample mean is a random variable (why?) Theorem: Sample mean has mean and variance: E [Mn (X )] = E [X ] V ar(Mn (X )) = V ar(X ) n
This shows that sample mean is tight around E [X ] as n increases. We will quantify this eect in the sequel.
Aria Nosratinia Probability and Statistics 8-3
Tail Probability
How much probability in the tail of a random variable? We look at non-negative random variables. Markov Inequality: If P (X < 0) = 0, then for any value a > 0, P (X a) Proof:
a
E [X ] a
a
E [X ] =
0
xfX (x)dx +
a
xfX (x)dx afX (x)dx
xfX (x)dx
a
= aP (X a)
8-4
Example
Consider the exponential random variable: fX (x) = 2e2x u(x) Calculate P (X a) directly and via Markov inequality. Is the inequality tight in this case?
8-5
Deviation from the Mean

We have learned that the spread of a random variable is characterized by the variance. Here we quantify that. Chebychev Inequality: For any random variable Y with mean 2 Y and variance Y , and for any constant c, P |Y Y | > c
2 2 2 Y 2 c
Proof: Just write the Markov inequality for the random variable X = |Y Y | 2 . Unlike Markov inequality, Chebychev inequality is valid for all random variables.
8-6
Example
Consider the two-sided geometric distribution, dened thus: P X (n ) = 2|n| 3 n = . . . , 2, 1, 0, 1, 2, . . .
What is the mean of this distribution? What is the variance? Find the probability of a deviation from the mean both directly and via Chebychev inequality.
8-7
Point Estimates
A main problem in statistics is estimation of probability models: Mean Variance Other parameters A point estimate is a single number that is as close as possible to the parameter in question. A condence interval is a range of numbers that contains the true value of the parameter with high probability (we will see it later). Point estimates and condence intervals are measured using samples, therefore they are random variables.
Properties of Point Estimates

OBJECTIVE: nd the parameter r related to random variable X . We observe a number of i.i.d. versions of X namely X1 , X2 , . . .. 1 , then another R 2 , etc. We form a point estimate R We ask: are these estimates getting any better? How much? Consistent Estimator: is an estimator where for any
n
n r| lim P |R
=0
of r is unbiased if Unbiased estimator: An estimator R ] = r E [R
8-9
Point Estimate Properties (cont.)

Consistency requires a lot of estimators, while unbiased is the property of one estimator. We can also dene asymptotically unbiased, where eventually the mean of the estimators converges to the true value. Mean Square Error (MSE) of an estimator is: r )2 ] e = E [(R n } has MSE Theorem: If a sequence of unbiased estimators {R with property limn en = 0, then the estimators are consistent. Proof: Use Chebychev inequality.
8-10
More about Sample Mean
Sample mean is an unbiased estimator of E [X ]. Sample mean has mean square error en =
V ar (X ) . n
Sample mean Mn are consistent estimators of E [X ] if V ar(X ) is nite. This is also expressed as follows: Weak law of large numbers: If V ar(X ) is nite,
n
lim P |Mn (X ) X |
=0
Proof: use Chebychev inequality.
8-11
Weak Law vs. Strong Law

Weak law of large numbers (LLN) is an example of convergence in probability. We say Zn converges to Z in probability and write Zn Z if for any ,
n i.p.
lim P |Zn Z | ) = 0
a.s.
We say Zn converges to Z almost surely and write Zn Z if P lim Zn = Z = 0 Mn (X ) converges to X almost surely too, but it is harder to prove. This is known as the Strong Law of Large Numbers.
8-12
Point Estimate of Variance

Dene W = (X X )2 . Clearly E [W ] = V ar(X ) (why?) For point estimate of V ar(X ), we can use sample mean of W . 1 M n (W ) = n
n
(X i X ) 2
i=1
But what is X ? We have to nd that too! Sample Variance is dened as: 1 V n (X ) = n Sample variance is biased! E [Vn (X )] = Proof: plug and chug.
(Xi Mn (X ))2
i=1
n1 V ar(X ) n
Sample Variance
Unbiased Sample Variance: V n (X ) = 1 n1

n
(Xi Mn (X ))2
i=1
For large n, the two estimates of variance are almost the same. For small n, they are dierent. Vn highlights the fundamental inability to calculate variance from one sample point.
8-14
Condence Intervals
We know that as n , sample mean converges. But in practice we always have a nite number of samples. Practical question: how many samples do I need to take so that my point estimate is good enough? Usually, good enough has two components: A desired accuracy (interval) Probability of falling within that interval This probability can be bounded via Chebychev P |M n (X ) X | c
2 X nc2
8-15
Example 1
We have calibrated a machine to produce 20,000 ohm resistors. Due to natural process variations, the actual resistors vary in value, but we want the average to be 20,000 ohms. Assume that the process variation has a standard deviation of 100 ohms. How many resistors do I need to test from a batch to ensure that the average resistance of the batch is within 1% of nominal resistance with a probability of 99 percent? With a probability of 99.999%?
8-16
Example 2 Polling
We want to predict the outcome of the democratic primary race in Pennsylvania between senators Clinton and Obama. How many people do we need to call to be able to give an estimate of the percentage of population that supports each candidate? We want the poll to be within 3% with 90 percent probability.
8-17
Example 2 Polling
We want to predict the outcome of the democratic primary race in Pennsylvania between senators Clinton and Obama. How many people do we need to call to be able to give an estimate of the percentage of population that supports each candidate? We want the poll to be within 3% with 90 percent probability. We model the process as Bernoulli, where voters preferring Senator Obama are denoted with 1 and with probability p. Then E [X ] = p and V ar(X ) = p(1 p). We wish to have c = 0.03 P (|Mn (X ) p| 0.03) 1 p(1 p) 0.9 n(0.03)2
Because we dont know p ahead of time, the nal inequality should be true for all p. The maximum of the p(1 p) occurs at p = 0.25, which yields n 2778.
Interval Estimates
Previously: probability of falling within a deterministic interval [ c, + c]. How many samples are needed? That assumes we know , which we may not. Now we turn the question around, and ask the following: What is the chance that the unknown is within c units of sample mean? P M n (X ) c < E [X ] < M n (X ) + c 1
2 X nc2
We have used exactly the same Chebychev formula, just interpreted it dierently. [Mn (X ) c, Mn (X ) + c] is a condence interval estimate associated with condence probability 1
Aria Nosratinia Probability and Statistics 2 X nc2 .
8-18
Gaussian Approximation
The Chebychev formula gives an upper bound. We can sometimes make a Gaussian approximation for the parameter estimation.
Theorem: If X has mean and variance 2 , and if Mn (X ) can be approximated with a Gaussian, then the interval estimate M n (X ) c < < M n (X ) + c has condence probability 1 2Q( c n )
8-19
Example
Z is a Gaussian random variable with unknown mean and variance 10. Find the condence interval estimate with condence 0.99. What is our interval estimate using 100 measurements?
8-20

Statistics and Point Estimation: Motivation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics and Point Estimation: Motivation

Uploaded by

Copyright:

Available Formats

Statistics and Point Estimation

Aria Nosratinia Probability and Statistics

Aria Nosratinia Probability and Statistics

xfX (x)dx afX (x)dx

Aria Nosratinia Probability and Statistics

Aria Nosratinia Probability and Statistics

Deviation from the Mean

Aria Nosratinia Probability and Statistics

Aria Nosratinia Probability and Statistics

Properties of Point Estimates

of r is unbiased if Unbiased estimator: An estimator R ] = r E [R

Aria Nosratinia Probability and Statistics

Point Estimate Properties (cont.)

Aria Nosratinia Probability and Statistics

More about Sample Mean

Proof: use Chebychev inequality.

Aria Nosratinia Probability and Statistics

Weak Law vs. Strong Law

Aria Nosratinia Probability and Statistics

Point Estimate of Variance

Unbiased Sample Variance: V n (X ) = 1 n1

Aria Nosratinia Probability and Statistics

Aria Nosratinia Probability and Statistics

Aria Nosratinia Probability and Statistics

Aria Nosratinia Probability and Statistics

Aria Nosratinia Probability and Statistics

Aria Nosratinia Probability and Statistics

You might also like