Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1

Upper Bounds on the Tail Probability

In evaluating the performance of a digital communication system, it is often necessary to determine the area under the tail of the pdf. We refer to this area as the tail probability. Two upper bounds on the tail probability is available. The rst upper bound obtained from the Chebyshev inequality is rather loose. The second upper bound known as the Cherno bound is much tighter.

Chebyshev Inequality:

Suppose that X is an arbitrary random variable with nite mean mx and nite variance 2 x . For any positive number , P (|X mx | ) This relation is called the Chebyshev inequality.
R 2 x 2

2 x =

(x mx )2 p (x) dx

|xmx | R 2 |xmx |

(x mx )2 p (x) dx p (x) dx = 2 P (|X mx | )

2 x 2

P (|X mx | )

Chebyshev inequality is simply an upper bound on the area under the tails of the pdf p (y ), where Y = X mx i.e. The area of p (y ) in the intervals (, ) and = (, )

The Chebyshev inequality may be expressed as: 1 [FY ( ) FY ( )] or 1 [FX (mx + ) FX (mx )]
2 x 2 2 x 2

2.1

Notes on the Chebyshev Inequality

Let X be a random variable with mean E (X ) = and variance 2 = var (X ). Then, the Chebyshev inequality states that

P (|X | t)

2 t2

for any t > 0. Other equivalent forms can be written for this inequality, by simple manipulation: P (|X | < t) > 1 P (|X | n)
1 n2 2 t2

We can also bound probabilities that involve an interval of X that is not centered on the mean, : 2 E [(X c)2 ] c)2 = +(t P (|X | t) 2 t2 P (|X c| < t) > 1
E [(X c)2 ] t2

The above inequality is the most general form of the 2-sided Chebyshev: putting c = yields the standard form. Note that the statement that |X c| < t is the same as t + c < X < t + c.Thus b 2 ba 2 +( a+ b 2 ) < 2 1 P (a < X < b) = P X a+ b1 2 2 ( 2 ) where we have substituted a = t + c and b = t + c.

Example 1 Roll a single fair die and let X be the outcome. Then, E (X ) = 3.5 and var(X ) = 35/12.(can check this) Suppose we want to compute p = P (X 6) . (a) Exact: We easily see that p = P (X 6) = P (X = 6) = (b) By Markov inequality we get: P (X 6)
21/6 6 1 6

0.167

0.583
35/12 (2.5)2

(c) By the usual (two-sided) Chebyshev inequality, we can obtain a stronger bound on p: P (X 6) P (X 6 OR X 1) = P (|X 3.5| 2.5) P (X 6) = P (X 3.5 + 2.5) 2
35/12 (35/12)+(2.5)2

7 15

0.467

(d) By obtaining the one sided Chebyshev inequality, we can obtain an even stronger bound on p: =
7 22

0.318

Cherno Bound:

In some applications we are interested only in the area under one tail, either in the interval (, ) or (, ) . In such a case we can obtain an extremely tight upper bound by overbounding the function g (Y ) by an exponential having a parameter that can be obtimized to yield as tight an upper bound as possible. Let us consider the tail probability in the interval (, ) . The function g (Y ) is overbounded as: g (Y ) e (Y ) 1 (Y ) g (Y ) = 0 (Y < ) and v 0 is the parameter to be optimized. The graph of g (Y ) and the exponential upper bound are shown in gure below: where g (Y ) is dened as:

Upper Bound ev(Y-)

g(Y) 1 Y 0

An exponential upper bound on g(Y) used in obtaining the tail probability

The expected value of g (Y ) is E[g (Y )] = P (Y ) E e (Y ) This bound is valid for any 0. The tightest upper bound is obtained by selecting the value of that minimizes E e (Y ) . A necessary condition for a minimum is (Y ) d =0 d E e 3

If we change the order of dierentiation and expectation we can write (Y ) d d (Y ) = E d e = E (Y ) e (Y ) = e E Y eY E eY = 0 d E e Therefore the value of that gives the tightest upper bound is the solution to the equation E Y eY E eY = 0

If the solution is b then the upper bound on the one-sided tail probability is Y E eb P (Y ) eb Example 2 Consider the (Laplace) pdf 1 p (y) = e|y| 2

p(y) 1/2

e-y

Let us evaluate the upper tail probability from the Cherno bound and compare it with the true tail probability which is as given below:
Z

P (Y ) =

1 1 y e dy = e 2 2

To nd b , we must determine the moments E Y eY and E eY . For the given pdf the expected values are:

E Y eY =

(v + 1)2 (v 1)2 1 = E eY (1 + v) (1 v) 4

then 2 2 (1 + v ) (1 v ) = 0 (v + 1) (v 1)
2

simplifying we obtain the quadratic

v2 + 2v

0 1

since b must be positive, we ignore one of the roots. v b=

= v b= 1 +

p 1 + 2

p 1 2

The upper bound on the one-sided tail probability is: 2 2 e1 1+ p 2 1 + 1 + 2

P (Y )

for 1 the above equation reduces to P (Y ) e 2 The Cherno bound decreases exponentially as increases. Consequently, it approximates closely the exact tail probability given by
Z

P (Y ) =

1 1 y e dy = e 2 2

Weak Law of Large Numbers

A result in probability theory, also known as Bernoullis theorem or the weak law of large numbers (in contrast to the strong law of large numbers). Let X1 , ......, Xn be a sequence of independent and identically distributed random variables, each having a mean hXi i = and standard deviation . If we dene a new variable 5

X1 , ......, Xn n

Then as n , the sample mean hxi equals the population mean of each variable.

hX i = In addition

X1 , ......, Xn n

n 1 (hX1 i + ......... + hXn i) = = n n

var (X ) = var

X1 , ......, Xn n

= var

X1 n

+ ....... + var

Xn n

2 2 2 + .... + = n2 n2 n

Therefore, by the Chebyshev inequality, for all

> 0, var(X )
2

P (|X | ) As n , it then follows that

2 n2

lim P (|X | ) = 0 for an arbitrary positive

Stated an other way, the probability that the average |(X1 , ......, Xn ) /n | < quantity approaches 1 as n . (Feller 1968, pp.228-229).

Strong Law of Large Numbers

The strong law of large numbers states that if X1 , X2 , X3 , ..... is an innite sequence of random variables that are independent and identically distributed, and have a common expected value then lim Xn = = 1

i.e. the sample average converges almost surely to .

This law justies the intuitive interpretation of the expected value of a random variable as the "long-term" average when sampling repeatedly.

Kolmogorovs Strong Law of Large Numbers

Let X1 , X2 , X3 , .....be a sequence of IID RVs with nite expectations Strong law of large numbers hold if one of the following conditions is satised:

1. The random variates are identically distributed 2. For each n , the variance of Xn is nite, and

X V ar (Xn ) < n2 n=1

Information theory deals with characterizing the limits of communication. Abstractly, we work with messages or sequences of symbols from a discrete alphabet that are generated by some source. We consider the problem of encoding the sequence of symbols for storage or (noiseless) transmission. In the literature of information theory, this general problem is referred to as source coding. How compactly can we represent messages emanating from a discrete source? In his original paper in 1948, Shannon assumed sources generated messages one symbol at a time according to a probability distribution; each new symbol might depend on the preceding symbols as well as the alphabet. Therefore, Shannon dened a source to be a discrete stochastic process.

You might also like