Math

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Student: Togrul Asgerli

Group:692.20E
Instructor:N.Emirbeyova
Subject:Mathenatical Statics
and Probability
Topic:Chebyshev Inequality
Chebyshev’s Inequality

Chebyshev's inequality is a theory describing the maximum number of


extreme values in a probability distribution. It states that no more than a
certain percentage of values (1/k2) will be beyond a given distance
(k standard deviations) from the distribution’s average. The theorem is
particularly useful because it can be applied to any probability distribution,
even ones that violate normality, so long as the mean and variance are
known. The equation is:

1
Pr(|X−μ|≥k×σ)≤ k 2 ¿
¿

In plain English, the formula says that the probability of a value (X) being
more than k standard deviations (𝜎) away from the mean (𝜇) is less than or
equal to 1/k2. In that formula, k represents a distance away from the mean
expressed in standard deviation units, where k can take any above-zero
value. As an example, let's calculate the probability for k=2.

Pr(|X−μ|≥k)≤ σ 2 k 2

Although, the inequality is valid for any real number k>0, only the
case k>1 is practically useful.

As an example, let's calculate the probability for k=2:

1
Pr(|X−μ|≥2×σ)≤ 4
The answer is that no more than ¼ of values will be more than 2 standard
deviations away from the distribution’s mean. Note that this is a loose
upper bound, but it is valid for any distribution.

Finding the reverse: Number of values


within a range
Chebyshev’s inequality can also be used to find the reverse information.
Knowing the percentage of values outside a given range also by definition
communicates the percentage of values inside that range. Percentage
inside=1−Percentage outside. In other words, the maximum number of
1
values within k standard deviations of the mean will be 1– k 2 ¿ .
¿

1
In the previous example, we calculated that no more 4 of values will be
more than 2 standard deviations away from the distribution’s mean. To find
1 1 3 3
the reverse, we calculate 1− k 2 =1− 4 = 4 .∈¿ other words, no more than 4 of
values will be within 2 standard deviations from the distribution’s mean.

Applications
Chebyshev's inequality theorem is one of many (e.g., Markov’s inequality
theorem) helping to describe the characteristics of probability distributions.
The theorems are useful in detecting outliers and in clustering data into
groups.
A Numerical Example
Suppose a fair coin is tossed 50 times. The bound on the probability that
the number of heads will be greater than 35 or less than 15 can be found
using Chebyshev's Inequality.

Let X be the number of heads. Because X has a binomial distribution:

 The expected value will be: μ=n×p=50×0.5=25


 The variance will be: σ 2=¿n×p×(1−p)=50×0.5×0.5=12.5
 The standard deviation will be σ 2=12.5
 The value 35 and 15 are 10 away from the average, which is 2.8284
10
standard deviations (ie, )
√ 12.5
Using Chebyshev's Inequality we can write the following probability:

 In other words, chances of a fair coin coming up heads outside the


range of 15 to 35 times is at most 0.125.

Bernoulli’s Law of Large Numbers


• When we say the probability of flipping a coin heads is 50%, we understand
that in a small number of flips the percentage of heads may be quite
different from 50%. For instance in one flip the percentage of heads will be
either 0% or 100%. In two flips, the probability of 0% heads is ¼ as is the
probability of 100% heads. Even in ten flips the probability of getting 80%
or more heads is 56/1024 (you can calculate this easily now), which is about
5.5%, and similarly the probability of getting 20% or fewer heads is
56/1024. Intuitively, however, as the number of flips grows large we
imagine that the percentage of heads ought to grow close to 50%.
• Bernoulli’s Law of Large Numbers justifies this intuition. Let X be a binomial
random variable with parameters n and p, and let d be a (small) positive
number. Bernoulli’s Law of Large Numbers states that by making n large
enough, we can make the probability that the percentage of successes X/n
is within d of p as close to 1 as we like. For instance there is some fixed
number N such that if you flip a coin N times or more the probability of
getting between 49% and 51% heads is at least 0.9999 (in this case p=0.5
and d=0.01).
• Background: If X~binomial(n,p), then X is the number of successes in n
trials. Define a new random variable X  X / n . Then is the
fraction of successes in n trials. By previous theorems

E ( X )  E ( X / n)  E ( X ) / n  np / n  p
That is, the mean fraction of successes in n trials is precisely the
probability of success on each individual trial, an intuitively reasonable
result.

• Under the same circumstances

SD( X )  SD ( X / n)  SD( X ) / n  npq n  npq n 2  pq n

where q =1-p, the probability of failure on a single trial, as usual. Note that
this standard deviation shrinks to zero as n grows large. We will sometimes
denote this quantity by. Similarly

Var ( X )   X2  pq n

Theorem 3.16: Let X~binomial(n,p) and X  X / n . For fixed

d>0,
 
P X  p  d  pq /( nd 2 )   X2 d 2

and, equivalently,

 
P X  p  d  1  pq /( nd 2 )  1   X2 d 2

Note that the first probability approaches 0 as n increases without limit and
the second quantity approaches 1 under the same circumstances.

Example: Suppose you flip a coin 10,000 times. Find an upper bound on the
probability that the fraction of heads differs from p=0.50 by more than 0.02. In
this case n=10,000, p=q=0.50, and d=0.02. Applying the previous theorem


P X  0.50  0.02   (0.5)( 0.5)
(10,000)( 0.02) 2
 0.0625  6.25%

The second statement in the theorem says there is a probability of 93.25% (=1-
6.25%) that the fraction of heads in 10,000 flips will lie in the interval [0.48,0.52].

Bernoulli’s Law of Large Numbers (Theorem 3.17): Let X~binomial(n,p) and

X  X /n
For fixed d>0,


lim P X  p  d  1
n 

From the second inequality in Theorem 3.16,

lim P
n 
X  p d  lim1  pq 
/( nd 2
)  1.
n 

(Note that we replace > by >= because of taking limits). We know, however, that
all probabilities are less than or equal 1. Therefore (by the Squeeze Theorem)


lim P X  p  d  1
n 

THE IMPORTANCE OF CHEBYSHEV’S THEOREM
IN B2B MARKET RESEARCH
We are all aware of the “bell-shaped curve” and the distribution of data within the standard
deviations.
But there is one particularly applicable law for B2B market research,
called Chebyshev's Theorem. Essentially, "it helps you determine where most of
your data fall within a distribution of values. This theorem provides beneficial
results when you have only the mean and standard deviation, and you do not need
to know the distribution your data."

"Amazingly, even if it is inappropriate to use the mean and the standard deviation
as the measures of center and spread, there is an algebraic relationship between
them that can be exploited in any distribution."

In fact, for any numerical data set, at least 93.75% of the data lie within four
standard deviations of the mean. Clear Seas uses this law to account for severe
outliers in our data, even when we do not know our data's underlying center and
spread for any given survey. Here, we're using Chebyshev's as an instance of
"nonparametric" statistics because we do not need to specify the distribution.

Because nonparametric statistics is independent of the particular distribution, it is


often more robust.

A good real-world example of this can be found in our Water Heater CLEAReport, where we
can see a severe outlier well beyond four standard deviations from the mean. In this
image, the 2018 "Annual average number of Water Heaters installed" data, one individual
suggested they had over 10,000 water heater installations annually, over six times the
following highest number of annual installations. Here's a screenshot of a portion of the
data, sorted from highest to lowest, with the outlier in the white background.

And here, we have plotted the data before any elimination of extreme outliers
(defined as data lying more than four standard deviations from the mean):

By removing these outliers, we can still accurately assess a more realistic annual average
number for the industry, notwithstanding the underlying data distribution structure.

In this second image, we can see that just two individuals had pushed the average up
significantly and likely represented exaggerated and inaccurate data. Although without
forensic accounting of these individuals, we can feel relatively confident that this data is
inaccurate and should not be included in the overall calculation of the averages.

By removing these outliers, we can still accurately assess a more realistic annual average
number for the industry, notwithstanding the underlying data distribution structure.

In this second image, we can see that just two individuals had pushed the average up
significantly and likely represented exaggerated and inaccurate data. Although without
forensic accounting of these individuals, we can feel relatively confident that this data is
inaccurate and should not be included in the overall calculation of the averages.

If we consider the extreme unlikelihood of such an exceptionally high number, then when
we calculate the expected value, we're going to multiply the number with the probability.
Well, that probability is so small (whether it's due to inaccuracy or not) that it makes almost
no contribution to the rest of the expected value calculation. So, we might as well omit it.
We love Chebyshev's because it "enables us to derive bounds on probabilities
when only the mean, or both the mean and the variance, of the probability distribution,
are known."
References
https://www.learndatasci.com/glossary/chebyshevs-inequality/
https://en.wikipedia.org/wiki/Chebyshev%27s_inequality
https://clearseasresearch.com/blog/marketing-research/the-importance-of-
chebyshevs-theorem-in-b2b-market-research/

You might also like