Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Statistics and Point Estimation

Aria Nosratinia Probability and Statistics

8-1

Motivation
In all our studies so far, we have assumed to know certain things, e.g., mean and variance of a distribution. But HOW do we know them? Where do they come from? In practice we measure them using experiments. For example, we measure a voltage, which is a random variable, and get several samples: V1 V2 Vn

What is the mean and variance of this voltage? This chapter aims to answer questions of this sort.

Aria Nosratinia Probability and Statistics

8-2

Sample Mean
A sample is one outcome of a random variable. A sample mean is the empirical average calculated via several i.i.d. samples: X1 + + X n M n (X ) = n Sample mean is a random variable (why?) Theorem: Sample mean has mean and variance: E [Mn (X )] = E [X ] V ar(Mn (X )) = V ar(X ) n

This shows that sample mean is tight around E [X ] as n increases. We will quantify this eect in the sequel.
Aria Nosratinia Probability and Statistics 8-3

Tail Probability
How much probability in the tail of a random variable? We look at non-negative random variables. Markov Inequality: If P (X < 0) = 0, then for any value a > 0, P (X a) Proof:
a

E [X ] a
a

E [X ] =
0

xfX (x)dx +
a

xfX (x)dx afX (x)dx

xfX (x)dx
a

= aP (X a)

Aria Nosratinia Probability and Statistics

8-4

Example
Consider the exponential random variable: fX (x) = 2e2x u(x) Calculate P (X a) directly and via Markov inequality. Is the inequality tight in this case?

Aria Nosratinia Probability and Statistics

8-5

Deviation from the Mean


We have learned that the spread of a random variable is characterized by the variance. Here we quantify that. Chebychev Inequality: For any random variable Y with mean 2 Y and variance Y , and for any constant c, P |Y Y | > c
2 2 2 Y 2 c

Proof: Just write the Markov inequality for the random variable X = |Y Y | 2 . Unlike Markov inequality, Chebychev inequality is valid for all random variables.

Aria Nosratinia Probability and Statistics

8-6

Example
Consider the two-sided geometric distribution, dened thus: P X (n ) = 2|n| 3 n = . . . , 2, 1, 0, 1, 2, . . .

What is the mean of this distribution? What is the variance? Find the probability of a deviation from the mean both directly and via Chebychev inequality.

Aria Nosratinia Probability and Statistics

8-7

Point Estimates
A main problem in statistics is estimation of probability models: Mean Variance Other parameters A point estimate is a single number that is as close as possible to the parameter in question. A condence interval is a range of numbers that contains the true value of the parameter with high probability (we will see it later). Point estimates and condence intervals are measured using samples, therefore they are random variables.
Aria Nosratinia Probability and Statistics 8-8

Properties of Point Estimates


OBJECTIVE: nd the parameter r related to random variable X . We observe a number of i.i.d. versions of X namely X1 , X2 , . . .. 1 , then another R 2 , etc. We form a point estimate R We ask: are these estimates getting any better? How much? Consistent Estimator: is an estimator where for any
n

n r| lim P |R

=0

of r is unbiased if Unbiased estimator: An estimator R ] = r E [R

Aria Nosratinia Probability and Statistics

8-9

Point Estimate Properties (cont.)


Consistency requires a lot of estimators, while unbiased is the property of one estimator. We can also dene asymptotically unbiased, where eventually the mean of the estimators converges to the true value. Mean Square Error (MSE) of an estimator is: r )2 ] e = E [(R n } has MSE Theorem: If a sequence of unbiased estimators {R with property limn en = 0, then the estimators are consistent. Proof: Use Chebychev inequality.

Aria Nosratinia Probability and Statistics

8-10

More about Sample Mean

Sample mean is an unbiased estimator of E [X ]. Sample mean has mean square error en =
V ar (X ) . n

Sample mean Mn are consistent estimators of E [X ] if V ar(X ) is nite. This is also expressed as follows: Weak law of large numbers: If V ar(X ) is nite,
n

lim P |Mn (X ) X |

=0

Proof: use Chebychev inequality.

Aria Nosratinia Probability and Statistics

8-11

Weak Law vs. Strong Law


Weak law of large numbers (LLN) is an example of convergence in probability. We say Zn converges to Z in probability and write Zn Z if for any ,
n i.p.

lim P |Zn Z | ) = 0
a.s.

We say Zn converges to Z almost surely and write Zn Z if P lim Zn = Z = 0 Mn (X ) converges to X almost surely too, but it is harder to prove. This is known as the Strong Law of Large Numbers.

Aria Nosratinia Probability and Statistics

8-12

Point Estimate of Variance


Dene W = (X X )2 . Clearly E [W ] = V ar(X ) (why?) For point estimate of V ar(X ), we can use sample mean of W . 1 M n (W ) = n
n

(X i X ) 2
i=1

But what is X ? We have to nd that too! Sample Variance is dened as: 1 V n (X ) = n Sample variance is biased! E [Vn (X )] = Proof: plug and chug.
Aria Nosratinia Probability and Statistics 8-13

(Xi Mn (X ))2
i=1

n1 V ar(X ) n

Sample Variance

Unbiased Sample Variance: V n (X ) = 1 n1


n

(Xi Mn (X ))2
i=1

For large n, the two estimates of variance are almost the same. For small n, they are dierent. Vn highlights the fundamental inability to calculate variance from one sample point.

Aria Nosratinia Probability and Statistics

8-14

Condence Intervals
We know that as n , sample mean converges. But in practice we always have a nite number of samples. Practical question: how many samples do I need to take so that my point estimate is good enough? Usually, good enough has two components: A desired accuracy (interval) Probability of falling within that interval This probability can be bounded via Chebychev P |M n (X ) X | c
2 X nc2

Aria Nosratinia Probability and Statistics

8-15

Example 1
We have calibrated a machine to produce 20,000 ohm resistors. Due to natural process variations, the actual resistors vary in value, but we want the average to be 20,000 ohms. Assume that the process variation has a standard deviation of 100 ohms. How many resistors do I need to test from a batch to ensure that the average resistance of the batch is within 1% of nominal resistance with a probability of 99 percent? With a probability of 99.999%?

Aria Nosratinia Probability and Statistics

8-16

Example 2 Polling
We want to predict the outcome of the democratic primary race in Pennsylvania between senators Clinton and Obama. How many people do we need to call to be able to give an estimate of the percentage of population that supports each candidate? We want the poll to be within 3% with 90 percent probability.

Aria Nosratinia Probability and Statistics

8-17

Example 2 Polling
We want to predict the outcome of the democratic primary race in Pennsylvania between senators Clinton and Obama. How many people do we need to call to be able to give an estimate of the percentage of population that supports each candidate? We want the poll to be within 3% with 90 percent probability. We model the process as Bernoulli, where voters preferring Senator Obama are denoted with 1 and with probability p. Then E [X ] = p and V ar(X ) = p(1 p). We wish to have c = 0.03 P (|Mn (X ) p| 0.03) 1 p(1 p) 0.9 n(0.03)2

Because we dont know p ahead of time, the nal inequality should be true for all p. The maximum of the p(1 p) occurs at p = 0.25, which yields n 2778.
Aria Nosratinia Probability and Statistics 8-17

Interval Estimates
Previously: probability of falling within a deterministic interval [ c, + c]. How many samples are needed? That assumes we know , which we may not. Now we turn the question around, and ask the following: What is the chance that the unknown is within c units of sample mean? P M n (X ) c < E [X ] < M n (X ) + c 1
2 X nc2

We have used exactly the same Chebychev formula, just interpreted it dierently. [Mn (X ) c, Mn (X ) + c] is a condence interval estimate associated with condence probability 1
Aria Nosratinia Probability and Statistics 2 X nc2 .

8-18

Gaussian Approximation

The Chebychev formula gives an upper bound. We can sometimes make a Gaussian approximation for the parameter estimation.

Theorem: If X has mean and variance 2 , and if Mn (X ) can be approximated with a Gaussian, then the interval estimate M n (X ) c < < M n (X ) + c has condence probability 1 2Q( c n )

Aria Nosratinia Probability and Statistics

8-19

Example
Z is a Gaussian random variable with unknown mean and variance 10. Find the condence interval estimate with condence 0.99. What is our interval estimate using 100 measurements?

Aria Nosratinia Probability and Statistics

8-20

You might also like