Professional Documents
Culture Documents
Statistics and Point Estimation: Motivation
Statistics and Point Estimation: Motivation
8-1
Motivation
In all our studies so far, we have assumed to know certain things, e.g., mean and variance of a distribution. But HOW do we know them? Where do they come from? In practice we measure them using experiments. For example, we measure a voltage, which is a random variable, and get several samples: V1 V2 Vn
What is the mean and variance of this voltage? This chapter aims to answer questions of this sort.
8-2
Sample Mean
A sample is one outcome of a random variable. A sample mean is the empirical average calculated via several i.i.d. samples: X1 + + X n M n (X ) = n Sample mean is a random variable (why?) Theorem: Sample mean has mean and variance: E [Mn (X )] = E [X ] V ar(Mn (X )) = V ar(X ) n
This shows that sample mean is tight around E [X ] as n increases. We will quantify this eect in the sequel.
Aria Nosratinia Probability and Statistics 8-3
Tail Probability
How much probability in the tail of a random variable? We look at non-negative random variables. Markov Inequality: If P (X < 0) = 0, then for any value a > 0, P (X a) Proof:
a
E [X ] a
a
E [X ] =
0
xfX (x)dx +
a
xfX (x)dx
a
= aP (X a)
8-4
Example
Consider the exponential random variable: fX (x) = 2e2x u(x) Calculate P (X a) directly and via Markov inequality. Is the inequality tight in this case?
8-5
Proof: Just write the Markov inequality for the random variable X = |Y Y | 2 . Unlike Markov inequality, Chebychev inequality is valid for all random variables.
8-6
Example
Consider the two-sided geometric distribution, dened thus: P X (n ) = 2|n| 3 n = . . . , 2, 1, 0, 1, 2, . . .
What is the mean of this distribution? What is the variance? Find the probability of a deviation from the mean both directly and via Chebychev inequality.
8-7
Point Estimates
A main problem in statistics is estimation of probability models: Mean Variance Other parameters A point estimate is a single number that is as close as possible to the parameter in question. A condence interval is a range of numbers that contains the true value of the parameter with high probability (we will see it later). Point estimates and condence intervals are measured using samples, therefore they are random variables.
Aria Nosratinia Probability and Statistics 8-8
n r| lim P |R
=0
8-9
8-10
Sample mean is an unbiased estimator of E [X ]. Sample mean has mean square error en =
V ar (X ) . n
Sample mean Mn are consistent estimators of E [X ] if V ar(X ) is nite. This is also expressed as follows: Weak law of large numbers: If V ar(X ) is nite,
n
lim P |Mn (X ) X |
=0
8-11
lim P |Zn Z | ) = 0
a.s.
We say Zn converges to Z almost surely and write Zn Z if P lim Zn = Z = 0 Mn (X ) converges to X almost surely too, but it is harder to prove. This is known as the Strong Law of Large Numbers.
8-12
(X i X ) 2
i=1
But what is X ? We have to nd that too! Sample Variance is dened as: 1 V n (X ) = n Sample variance is biased! E [Vn (X )] = Proof: plug and chug.
Aria Nosratinia Probability and Statistics 8-13
(Xi Mn (X ))2
i=1
n1 V ar(X ) n
Sample Variance
(Xi Mn (X ))2
i=1
For large n, the two estimates of variance are almost the same. For small n, they are dierent. Vn highlights the fundamental inability to calculate variance from one sample point.
8-14
Condence Intervals
We know that as n , sample mean converges. But in practice we always have a nite number of samples. Practical question: how many samples do I need to take so that my point estimate is good enough? Usually, good enough has two components: A desired accuracy (interval) Probability of falling within that interval This probability can be bounded via Chebychev P |M n (X ) X | c
2 X nc2
8-15
Example 1
We have calibrated a machine to produce 20,000 ohm resistors. Due to natural process variations, the actual resistors vary in value, but we want the average to be 20,000 ohms. Assume that the process variation has a standard deviation of 100 ohms. How many resistors do I need to test from a batch to ensure that the average resistance of the batch is within 1% of nominal resistance with a probability of 99 percent? With a probability of 99.999%?
8-16
Example 2 Polling
We want to predict the outcome of the democratic primary race in Pennsylvania between senators Clinton and Obama. How many people do we need to call to be able to give an estimate of the percentage of population that supports each candidate? We want the poll to be within 3% with 90 percent probability.
8-17
Example 2 Polling
We want to predict the outcome of the democratic primary race in Pennsylvania between senators Clinton and Obama. How many people do we need to call to be able to give an estimate of the percentage of population that supports each candidate? We want the poll to be within 3% with 90 percent probability. We model the process as Bernoulli, where voters preferring Senator Obama are denoted with 1 and with probability p. Then E [X ] = p and V ar(X ) = p(1 p). We wish to have c = 0.03 P (|Mn (X ) p| 0.03) 1 p(1 p) 0.9 n(0.03)2
Because we dont know p ahead of time, the nal inequality should be true for all p. The maximum of the p(1 p) occurs at p = 0.25, which yields n 2778.
Aria Nosratinia Probability and Statistics 8-17
Interval Estimates
Previously: probability of falling within a deterministic interval [ c, + c]. How many samples are needed? That assumes we know , which we may not. Now we turn the question around, and ask the following: What is the chance that the unknown is within c units of sample mean? P M n (X ) c < E [X ] < M n (X ) + c 1
2 X nc2
We have used exactly the same Chebychev formula, just interpreted it dierently. [Mn (X ) c, Mn (X ) + c] is a condence interval estimate associated with condence probability 1
Aria Nosratinia Probability and Statistics 2 X nc2 .
8-18
Gaussian Approximation
The Chebychev formula gives an upper bound. We can sometimes make a Gaussian approximation for the parameter estimation.
Theorem: If X has mean and variance 2 , and if Mn (X ) can be approximated with a Gaussian, then the interval estimate M n (X ) c < < M n (X ) + c has condence probability 1 2Q( c n )
8-19
Example
Z is a Gaussian random variable with unknown mean and variance 10. Find the condence interval estimate with condence 0.99. What is our interval estimate using 100 measurements?
8-20