1 Markov's Inequality

- Hello, and welcome back.
In this set of lectures,
we're going to talk about inequalities
that are related to probability distributions.
We'll start with the,
you might want to think of it
as the mother of all such inequalities.
It's named after Andre Markov, shown here,
who was a Russian mathematician
active at the beginning of the 20th century.
We're going to motivate his inequality,
provide a little bit of intuition,
then formulize the inequality, prove it,
show an example, and then discuss possible extensions.
Let's start.
Why do we care about inequalities?
Often, we want to bound probabilities of events.
Typically they'll be bad events.
For example, we want to guarantee or assure
that the probability of excessive rain is small
or the probability of heavy traffic is not that high
or the probability that a company
will incur a large loss is contained
or the probability of disease outbreak is small.
We want to say that these bad events

have small probabilities.
What we're going to do now is,
we're going to describe Markov's inequality,
which, as I just mentioned, is the foundation
of many of the inequalities we may encounter later.
It's, this inequality is not so strong,
and later we'll describe stronger bounds.
But we're going to start with Markov's inequality,
and specifically, we'll give
a little bit of intuition about it.
Let's consider Markov's meerkats that are shown here.
Each of them is a height which, of course,
is a nonnegative number.
Let's assume that the average meerkat height is 10 inches.
So I have a question.
Can half the meerkats have a height
which is at least 40 inches?
Can half of them be at least 40 inches tall?
Clearly the answer is no
because if half the meerkats were more than 40 inches tall,
then just by looking at this half,
just they alone will count for an average
which is at least 20.
Even if all the other meerkats were zero inches tall,
then the average would be at least 20.

And if the other meerkats are more than zero,
then the average will be bigger than 20.
So the answer is no,
because if half the meerkats were more than 40 inches tall,
then the average would be at least half times 40,
which is 20 inches.
And we're told the average is, in fact, only 10 inches.
So this is impossible.
Therefore, we see that, if we want to find
what's the highest fraction of meerkats
that can be bigger than equal to 40 inches tall,
let call it F 40, fraction of meerkats 40 inches or taller,
then we see that if F 40 times 40,
take this fraction, multiply it by 40,
if this number is bigger than 10,
then the average would be bigger than 10 as well,
just like we saw here for half.
And therefore, F 40, again, the fraction of meerkats
that are over 40 inches tall, times 40,
must be less than equal to 10,
or F 40, this fraction, must be at most 10 over 40,
which is one quarter.
If we want, we can solve it for general mean, mu.
Here, the mean was 10.
But for general mean mu, we can say that F of four times mu,
the fraction of meerkats
that are at least four times mu inches tall,
times four mu, that has to be at most mu
because if it's bigger than mu,
then the average will be larger than mu.
So F four mu, the fraction four mu tall,
times four mu, is at most mu.
Then we can move four mu to the other side
and we get that F four mu is at most one quarter,
just like we got here.
So this will be true in general.
And F seven mu, it will be at most 1/7th and so on.
If you understand this meerkat calculation,
then you understand Markov inequality.
In other words, this is Markov's inequality in a nutshell.
Now to describe it a little more formally,
there are two form,
one which is intuitive and more memorable,
it's based on what we have just said now,
and the second one is more direct
and is more easily applicable
and therefore a little more common to see.
The one that we just described now says
that if x is a nonnegative random variable,
can be discrete or continuous, with finite mean mu,

then, first formulation,
for all alpha bigger than one,
the probability that x is bigger than alpha mu
is less than equal to one over alpha.
Like we said before,
the probability that x is bigger than four mu
was less than equal to 1/4th,
the probability that x will be bigger than 10 mu
will be less than equal to 1/10th and so on.
To me, that's the easier one to remember.
It says that the probability
that a nonnegative random variable is
at least alpha times its mean is at most alpha.
Now, the second formulation,
both of them apply to nonnegative random variable
with finite mean mu.
That applies to both of them.
But the second formulation provides for more direct proof,
which we'll do in a second, and is easier to apply.
And therefore it's a little more common.
It says that, here, if we let alpha mu,
this quantity, if we call it a,
then alpha bigger than equal to one means
that, because a is alpha mu
and alpha is bigger than equal to one,

it means that a is bigger than mu.
So for all a bigger than equal to mu,
the probability that x is bigger than a,
bigger than alpha mu,
so that's the same, probability that x is bigger than a
or or alpha mu, should be less than equal to one over alpha.
But from here we see that one over alpha is mu over a.
So is less than equal to mu over a.
So for all a which is bigger than the mean,
the probability that x is bigger than a
is at most mu divided by this a.
For example, if the mean is 10 and a is 20,
the probability that x is bigger than 20
will be bigger than equal to 10 over 20, which is 1/2.
If, again, the mean is 10 and a is 40,
the probability that x is bigger than 40
will be at most 10 divided by 40, which is one quarter.
You can see maybe why this is a little easier to apply,
because typically someone will ask us just,
"What is the probability that x is bigger than some number?"
So it can just plug this in,
and here plug mu and a and get the answer.
Let's see how we would prove it.
We want to show that the probability
that x is bigger than a is at most mu over a.

You notice we are proving the second formulation,
which is equivalent to the first.
And we'll prove it for discrete random variable.
But the same proof works for continuous.
Just replace the summation by an integral.
The mean, as we know, is going to be summation
over all x of x times p x.
So here, if this is x and p of x,
then it's the sum of x times p of x.
Here it's always continuous,
but you can think of it as it's discrete,
it (mumbles) regular.
So it's summation of x times p of x of all varies of x.
This is going to be bigger than equal than the sum,
the same sum, but when we sum only partial value,
remember we had given some value a,
only on x's that are bigger than equal to a.
So if this a, we are only sum from this point onwards
to the right, x times p x.
So instead of summing everything,
we are summing for all x which is bigger than or equal to a,
just summing here.
It's bigger than equal to the whole sum.
Now this, on the other hand,
we can, if we replace here x by a,

so here, in this region, x is bigger than a.
So if we replace x by a,
then the sum will be smaller.
That's what we did here.
Instead of looking at summation x p x,
we are looking at summation of a p x.
Because x is bigger than equal to a,
when we summing from here on,
then when we replace x by a,
we're making this summation smaller.
Now we can take a out.
And here we'll have summation
of all x bigger than equal to a of p x.
And this clearly just the probability
that x is bigger than equal to a.
So this is a times the probability
that x is bigger than equal to a.
What we get is that mu is bigger than equal to a
times the probability that x is bigger than equal to a.
And therefore the probability
that x is bigger than equal to a
is less than equal to mu over a,
which is what's written here.
So this is it, very simple proof of Markov's inequality.
It's deceptively simple because, as we said,

we're going to use it
to prove much stronger results later on.
Let's see an example, citation counts.
A journal paper is cited eight times on average.
This number is actually roughly right.
That's the average number of citation
that a journal paper gets.
But some paper gets significantly more citation than this.
For example, this here is a popular paper
on hypothesis testing that we're going to discuss very soon,
this paper on Controlling the False Discovery Rate.
And this particular paper has over 40,000 citations,
many more than the eight average number of citations.
What we want to do is, we want to bound the probability
that a paper gets cited at least 40,000 times,
like this paper.
Here we'll let x be the number of paper citation,
number of citations that a paper gets.
Notice that x is nonnegative,
so we can apply Markov's inequality.
We're told that the mean, the expectation mu, is eight.
So by Markov's inequality,
we see that the probability that x is bigger than equal to a
is at most mu over a.
Notice that we're using the second formulation

because we just want to see what's the probability
that x is bigger than 40,000.
So it's useful to just have a here.
Then we just need to plug in mu and a.
So this is going to be the probability
that x is bigger than equal to 40,000
is at most mu divided by 40K.
But mu we're told is eight, 'kay, here.
So this is 0.02%.
So Markov's inequality tells us that the probability
that a paper will get cited so many times
is at most 1/50th of 1%.
Even though this looks strong,
as we'll see later if we apply another inequality,
it makes more assumption,
then we can get even better results.
Now, couple of questions
once we see such a simple inequality is,
we wonder whether we can generalize it.
Question is, can Markov inequality be generalized?
And what that means is, can we relax the condition for it
and can it be strengthened?
So generalized, we mean that we want
to relax the conditions.
So we can wonder
whether we can remove the nonnegative assumption.
We're assuming that x is bigger than equal to zero.
Can we remove it and have the equality still hold?
And the answer is, no.
If x can be negative, then the probability
can be close to one for any a.
The reason is simple.
Here, let's say, is zero and here is a, any number.
If x can be negative,
we can have a have probability which is close to one
and yet put a small probability here
at a very, very large negative number,
so, like, minus almost infinity,
and we can make the mean be anything.
That way, we can make the probability
be as close to one as we want.
So here is how you specifically do it.
For x equal to a, we'll let the probability a
be one minus epsilon.
So we put probability one minus epsilon here.
And then, at the point mu minus one minus epsilon a
divided by epsilon,
so notice this probability,

as epsilon becomes close to zero,
this probability of a becomes close to one,
and this could potentially become very negative
because mu minus a could be negative,
and then we divide by epsilon, which is a very large number.
Then we put probability epsilon there.
If we calculate the mean, you can see
that what is the mean will get one from here,
from a will get one minus epsilon times a.
And from the other point, we'll get
epsilon will cancel with epsilon the denominator.
We'll get mu minus one minus epsilon over a.
So the one minus epsilon times a will cancel
and we'll get that the expect value is mu.
And yet, the probability that x is bigger than equal to a
is the probability of a, which, as we said,
is very close to one.
So that shows that we can have a anything we want.
We just need to make this other value
be sufficiently negative.
And then the probability of a will be close to one
and the mean will still be mu.
So we cannot remove the nonnegative.
That won't work.
The other question is, can we strengthen?

Namely, Markov's inequality says
the probability that x is bigger than equal to a
is at most mu over a.
You may ask, maybe we can prove
that the probability that x is bigger than equal to to a
is at most half of that or 1/3rd of that?
Again, the answer is no.
To see that,
we're going to show that this inequality
can hold with equality.
And if it can hold with equality,
we will not be able to prove
that it's less than equal to half that.
If we can prove that you can find in this distribution
such that probability of x bigger than equal to a
is equal to mu over a
will not be able to show
that it's less than equal to mu over a,
over two and so on.
To show that, we are going to take advantage
of the very simple proof that we had,
which is rewritten here.
Notice that we have two inequalities, here and here,
and we can show that these inequalities
can hold with equality.

Let's look at the first inequality here.
Observe that here what we did was,
here we have the sum of all x's
and here we have the sum of all x's
that are bigger than equal to a,
and therefore this quantity here is smaller,
because we're not summing on elements
that are between zero and a.
But if between zero and a, for all x between zero and a,
the probability of x is zero, then we'll get equality here
because these two sums will be the same.
Here we sum all x, and here we sum
of x bigger than equal to a.
But what we omitted, the probability of x was zero
so this sum was zero.
So if all x between zero and a, the probably of x is zero,
we got equality here.
Let's see here, can we get equality in this one?
Now, the difference between this term and this one
is that both of them were summing over x
with bigger than equal to a,
and here we multiply by x and here we multiply by a.
But if for all x which is strictly bigger than a,
p of x was zero,
then we would not be reducing any terms

because any term here for x bigger than a
had p of x which is zero.
So this would hold with equality also.
In other words, if x is nonzero only for zero and for a,
then we get equality.
Here, then, we get equality here and here.
And then we get the probability
that x is bigger than equal to a is equal to mu over a.
And therefore, Markov's inequality can hold with equality
and we cannot strengthen it.
So there are no sweeping improvements.
What we're going to do is, we're going continue looking.
And how will we look?
Well, let's look at the strengths and weaknesses
of Markov's inequality.
Let's start with the strengths.
One good thing about it is,
it applies to all nonnegative random variables.
It doesn't matter what the distribution is.
The inequality applies.
The second, and therefore it means
that it can always be used.
We don't need to worry about
whether this specific property
of the distribution needs to hold.

Third strong thing is that it can be used
to derive other inequalities,
for example Chebyshev and Cheroff inequality,
and we're going talk about them
in the next couple of lectures.
On the other hand, weaknesses.
The first weakness is actually the same as the strength,
that it applies to all nonnegative random variables.
That means that it's limited to inequalities
that hold for all distributions.
So we cannot expect it to be stronger
than the result we get, the weakest result
we can get for all distributions.
So what we're going to do in the future
if we want to strengthen it is,
we need to add assumptions.
Let's assume that the distribution satisfies
certain properties and then we can prove stronger results.
To summarize, we talked about Markov's inequality,
we motivated, give some intuition using Markov's meerkats.
And then we formulated it.
In fact, we gave two formulations.
And we proved it, showed an example,
and we talked about possible extensions,
those we cannot have, as we showed,

and those we might be able to have.
and what we're going to do next time,
we'll talk about Chebyshev's inequality.
See you then.
End of transcript. Skip to the start.
POLL
If a mob of 30 meerkats has an average height of 10”, at most how many meerkats can be 30” tall?
1 meerkat
6 meerkats
10 meerkats
None, this isn’t possible
Submit

1 Markov&#39;s Inequality

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Markov&#39;s Inequality

Uploaded by

Copyright:

Available Formats

- Hello, and welcome back.

In this set of lectures,

we're going to talk about inequalities

that are related to probability distributions.

We'll start with the,

you might want to think of it

as the mother of all such inequalities.

It's named after Andre Markov, shown here,

who was a Russian mathematician

active at the beginning of the 20th century.

We're going to motivate his inequality,

provide a little bit of intuition,

then formulize the inequality, prove it,

show an example, and then discuss possible extensions.

Why do we care about inequalities?

Often, we want to bound probabilities of events.

Typically they'll be bad events.

For example, we want to guarantee or assure

that the probability of excessive rain is small

or the probability of heavy traffic is not that high

or the probability that a company

will incur a large loss is contained

or the probability of disease outbreak is small.

We want to say that these bad events

What we're going to do now is,

we're going to describe Markov's inequality,

which, as I just mentioned, is the foundation

of many of the inequalities we may encounter later.

It's, this inequality is not so strong,

and later we'll describe stronger bounds.

But we're going to start with Markov's inequality,

and specifically, we'll give

a little bit of intuition about it.

Let's consider Markov's meerkats that are shown here.

Each of them is a height which, of course,

Let's assume that the average meerkat height is 10 inches.

Can half the meerkats have a height

which is at least 40 inches?

Can half of them be at least 40 inches tall?

Clearly the answer is no

because if half the meerkats were more than 40 inches tall,

then just by looking at this half,

just they alone will count for an average

which is at least 20.

Even if all the other meerkats were zero inches tall,

then the average would be at least 20.

then the average will be bigger than 20.

So the answer is no,

because if half the meerkats were more than 40 inches tall,

then the average would be at least half times 40,

And we're told the average is, in fact, only 10 inches.

Therefore, we see that, if we want to find

what's the highest fraction of meerkats

that can be bigger than equal to 40 inches tall,

let call it F 40, fraction of meerkats 40 inches or taller,

then we see that if F 40 times 40,

take this fraction, multiply it by 40,

if this number is bigger than 10,

then the average would be bigger than 10 as well,

just like we saw here for half.

And therefore, F 40, again, the fraction of meerkats

that are over 40 inches tall, times 40,

must be less than equal to 10,

1 Markov's Inequality

1 Markov's Inequality