Professional Documents
Culture Documents
Binomial and Normal Distributions: Business Statistics 41000
Binomial and Normal Distributions: Business Statistics 41000
Fall 2015
1
Topics
2. Binomial distribution
3. Normal distribution
4. Vignettes
2
Topic: sums of random variables
This second point is the topic of the next lecture. For now, we focus on
the direct case.
3
A sum of two random variables
Suppose X is a random variable denoting the profit from one wager and
Y is a random variable denoting the profit from another wager.
4
A sum of two random variables
1 3
$0 0 9 9
1 2 2
$100 9 9 9
5
A sum of two random variables
-$200 +$0 0
1
-$200 + $100 9
1
$100 + $0 9
2 3 5
$100 + $100 or $200 + $0 9 + 9 = 9
2
$200 + $100 9
6
Topic: binomial distribution
7
Sums of Bernoulli RVs
When rolling two dice, what is the probability of rolling two ones?
Now with three dice, what is the probability of rolling exactly two 1’s?
8
Sums of Bernoulli RVs (cont’d)
9
Sums of Bernoulli random variables (cont’d)
Event y P(Y = y )
000 0 (1 − p)3
111 3 p3
10
Sums of Bernoulli random variables (cont’d)
Determining the probability of a certain number of successes requires
knowing 1) the probability of each individual success and 2) the number
of ways that number of successes can arise.
Event y P(Y = y )
000 0 (1 − p)3
111 3 p3
5 5
We find that P(Y = 2) = 3p 2 (1 − p) = 3(1/36)(5/6) = 6(12) = 72 .
11
Sums of Bernoulli random variables (cont’d)
What if we had four rolls, and the probability of success was 13 ?
0000
1000
0100
1100
0010
1010
0110
1110
0001
1001
0101
1101
0011
1011
0111
1111
12
Sums of Bernoulli random variables (cont’d)
y P(Y = y )
0 (1 − p)4
1 4(1 − p)3 p
2 6(1 − p)2 p 2
3 4(1 − p)p 3
4 p4
1
Substituting p = 3 we can now find P(Y = y ) for any y = 0, 1, 2, 3, 4.
13
Defintion: N choose y
N choose y
The notation
N N!
=
y (N − y )!y !
designates the number of ways that y items can be assigned to N
possible positions.
This notation can be used to summarize the entries in the previous tables
for various values of N and y .
14
Definition: Binomial distribution
Binomial distribution
15
Example: drunk batter
What is the probability that our alcoholic major-leaguer gets more than 2
hits in a game in which he has 5 at bats?
X
x P(X = x)
0 (1 − p)5
1 5(1 − p)4 p
2 10(1 − p)3 p 2
3 10(1 − p)2 p 3
4 5(1 − p)p 4
5 p5
Assume that the Chicago Bulls have probability 0.4 of beating the Miami
Heat in any given game and that the outcomes of individual games are
independent.
What is the probability that the Bulls win a seven game series against the
Heat?
17
Example: winning a best-of-seven play-off (cont’d)
Consider the number of games won by the Bulls over a full seven games
against the Heat. We model this as a binomial random variable Y with
parameters N = 7 and p = 0.4, which we express with the notation
Y ∼ Bin(7, 0.4).
The symbol “∼” is read “distributed as”. “Bin” is short for “binomial”.
The numbers which follow are the values of the two binomial parameters,
the number of independent Bernoulli trials (N) and the probability of
success at each trial (p).
18
Example: winning a best-of-seven play-off (cont’d)
Although we never see all seven games played (because the series stops
as soon as one team wins four games) we note that in this expanded
event space
19
Example: winning a best-of-seven play-off (cont’d)
20
Example: winning a best-of-seven play-off (cont’d)
We may conclude that the probability of a series win for the Bulls is
21
Example: winning a best-of-seven play-off (cont’d)
! ! !
4 4 4 5 4 6 4
P(Bulls series win) = p + p (1 − p) + p (1 − p)2 + p (1 − p)3
3 3 3
! ! !
4 4 5 4 6 4
= p4 + p (1 − p) + p (1 − p)2 + p (1 − p)3
1 2 3
= 0.29.
This calculation explicitly accounts for the fact that Bulls series wins
necessarily conclude with a Bulls game win.
22
Example: double lottery winners
In 1971, Jane Adams won the lottery twice in one year! If you read of a
double winner in your daily newspaper, how surprised should you be?
Given these conditions, what is the probability that in one calendar year
there is at least one double winner?
23
Example: double lottery winners (cont’d)
Let Xi be the random variable denoting how many winning tickets person
i has:
Now let Yi be the dummy variable for the event Xi > 1, which is the
event that person i is a double (or more) winner:
Yi ∼ Bernoulli(q).
24
Example: double lottery winners (cont’d)
Not so rare!
25
Example: rural vs. urban hospitals
About as many boys as girls are born in hospitals. In a small Country Hospital
only a few babies are born every week. In the urban center, many babies are
born every week at City General. Say that a normal week is one where between
45% and 55% of the babies are female. An unusual week is one where more
than 55% are girls or more than 55% are boys.
I Unusual weeks are less common at Country Hospital than at City General.
26
Example: rural vs. urban hospital (cont’d)
We can model the births in the two hospitals as two independent random
variables. Let X = “number of baby girls born at Country Hospital” and
Y =“number of baby girls born at City General”.
X ∼ Binomial(N1 , p)
Y ∼ Binomial(N2 , p)
Assume that p = 0.5. The key difference is that N1 is much smaller than
N2 . To illustrate, assume that N1 = 20 and N2 = 500.
27
Example: rural vs. urban hospital (cont’d)
Note: satisfying the condition X < 9 is the same as not satisfying the
condition X ≥ 9; strict versus non-strict inequalities make a difference.
28
Example: rural vs. urban hospital (cont’d)
Country Hospital
0.20
0.15
Probability
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Births
29
Example: rural vs. urban hospital (cont’d)
30
Example: rural vs. urban hospital (cont’d)
City General
0.030
Probability
0.020
0.010
0.000
200 206 212 218 224 230 236 242 248 254 260 266 272 278 284 290
Births
31
Variance of a sum of independent random variables
A useful fact:
Pm
A weighted sum/difference of random variables Y = i ai Xi can be
expressed as
Xm
V(Y ) = ai2 V(Xi ).
i
How can this be used to derive the expression for the variance of a
binomial random variable?
32
Variance of binomial random variable
33
Variance of a proportion
By dividing through by the total number of babies born each week we
can consider the proportion of girl babies. Define the random variables
X Y
P1 = and P2 = .
N1 N2
Then it follows that
V(X ) N1 p(1 − p)
V (P1 ) = 2 = = p(1 − p)/N1
N1 N12
and
V(Y ) N2 p(1 − p)
V (P2 ) = 2 = = p(1 − p)/N2 .
N2 N22
34
Law of Large Numbers
As more and more individual random variables are averaged up, the
variance decreases but the mean stays the same.
35
Law of Large Numbers
36
Law of Large Numbers
0 0.7 1
37
Law of Large Numbers
0 0.7 1
38
Law of Large Numbers
0 0.7 1
39
Law of Large Numbers
0 0.7 1
40
Example: Schlitz Super Bowl taste test
41
Bell curve approximation to binomial
The binomial distributions can be approximated by a smooth density
function for large N.
0.15
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
42
Bell curve approximation to binomial
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
43
Bell curve approximation to binomial
0.03
0.02
0.01
0.00
340 346 352 358 364 370 376 382 388 394 400 406 412 418 424 430 436 442 448 454 460
What are some reasons that very small p or small N lead to bad
approximations?
44
Central limit theorem
PLAY VIDEO
The CLT can be stated more precisely, but the practical impact is just
this: random variables which arise as sums of many other random
variables (not necessarily normally distributed) tend to be normally
distributed.
45
Normal distributions
The normal family of densities has two parameters, typically denoted µ
and σ 2 , which govern the location and scale, respectively.
0.2
0.1
0.0
-4 -2 0 2 4
46
Normal distributions (cont’d)
I will use the terms normal distribution, normal density and normal
random variable more or less interchangeably.
0.4
0.2
0.0
-4 -2 0 2 4
X ∼ N(µ, σ 2 ).
48
Normal approximation to binomial
Notice that this just “matches” the mean and variance of the two
distributions.
49
Linear transformation of normal RVs
50
Standard normal RV
Standard normal
Z ∼ N(0, 1).
X = µ + σZ .
51
The “empirical rule”
It is convenient to characterize where the “bulk” of the probability mass
of a normal distribution resides by providing an interval, in terms of
standard deviations, about the mean.
N(µ,σ)
0.4
0.3
Density
0.2
68 %
0.1
0.0
µ − 4σ µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ µ + 4σ
52
The “empirical rule” (cont’d)
The widespread application of the normal distribution has lead this to be
dubbed the empirical rule.
N(µ,σ)
0.4
0.3
Density
0.2
95 %
0.1
0.0
µ − 4σ µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ µ + 4σ
53
The “empirical rule” (cont’d)
N(µ,σ)
0.4
0.3
Density
0.2
99.7 %
0.1
0.0
µ − 4σ µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ µ + 4σ
54
The “empirical rule” (cont’d)
I 68% of Chicago daily highs in the winter season are between 19 and
48 degrees.
55
Sums of normal random variables
For example if
Y ∼ N(m, v ).
where m = 0.1(5) + 0.9(1) = 1.4 and v = 0.12 (20) + 0.92 (0.5) = 0.605.
56
Linear combinations of normal RVs
Linear combinations of independent normal random variables
For i = 1, . . . , n, let
iid
Xi ∼ N(µi , σi2 ).
Pn
Define Y = i=1 ai Xi for weights a1 , a2 , . . . , an . Then
Y ∼ N(m, v )
where
n
X n
X
m= ai µi and v= ai2 σi2 .
i=1 i=1
57
Example: two-stock portfolio
What fraction of our investment should we put into stock A, with the
remainder put in stock B?
58
Example: two-stock portfolio (cont’d)
Y = αXA + (1 − α)XB
with distribution
Y ∼ N(m, v ).
59
Example: two-stock portfolio (cont’d)
Suppose we want to find α so that P(Y ≤ 0) is as small as possible.
Two-stock portfolio
0.6
0.5
Stock A
Stock B
0.4
Density
0.3
0.2
0.1
0.0
-5 0 5 10 15 20
Percent return
Probability of a loss
0.12
0.10
Probability
0.08
0.06
0.04
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
62
Vignettes
1. Differential dispersion
3. mean reversion
63
Vignette: a difference in dispersion
For more background, read the article “Sex Ed” from the February 2005
issue of the New Republic (available at the course home page).
64
A difference in dispersion
Consider two groups of college graduates with “employee fitness scores”
following the distributions shown below.
0.6
0.5
0.4
Probability
0.256
0.3
0.2
0.128 0.128
0.085 0.085
0.064 0.064
0.1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Score
0.464
0.5
0.4
Probability
0.3
0.171 0.171
0.2
0.063 0.063
0.1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Score
These distributions have the same mean, the same median, and the same
mode. But they differ in their dispersion, or variability.
65
A difference in dispersion (cont’d)
Let X denote the random variables recording the scores and let A and B
denote membership in the respective groups.
Distribution of Capabilities, Group A
0.6
0.5
0.4
Probability
0.256
0.3
0.2
0.128 0.128
0.085 0.085
0.064 0.064
0.1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Score
0.464
0.5
0.4
Probability
0.3
0.171 0.171
0.2
0.063 0.063
0.1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Score
0.6
0.5
0.4
Probability
0.256
0.3
0.2
0.128 0.128
0.085 0.085
0.064 0.064
0.1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Score
0.464
0.5
0.4
Probability
0.3
0.171 0.171
0.2
0.063 0.063
0.1
-5 -4 -3 -2 -1 0 1 2 3 4 5
Score
P(X ≥ 4 | A)P(A)
P(A | X ≥ 4) =
P(X ≥ 4 | A)P(A) + P(X ≥ 4 | B)P(B)
0.094(0.5)
=
0.094(0.5) + 0.012(0.5)
= 0.89.
68
Larry Summers and women-in-science
“Summers’s critics have repeatedly mangled his suggestion that
innate differences might be one cause of gender disparities ... into
the claim that they must be the only cause. And they have
converted his suggestion that the statistical distributions of men’s
and women’s abilities are not identical to the claim that all men are
talented and all women are not–as if someone heard that women
typically live longer than men and concluded that every woman lives
longer than every man. . . .
In many traits, men show greater variance than women, and are
disproportionately found at both the low and high ends of the
distribution. Boys are more likely to be learning disabled or retarded
but also more likely to reach the top percentiles in assessments of
mathematical ability, even though boys and girls are similar in the
bulk of the bell curve. . . .”
Aptitude distribution
0.4
women
0.3
men
Density
0.2
0.1
0.0
-6 -4 -2 0 2 4 6
Score
For women, 93.7% of the scores are between the vertical dashed lines,
whereas only 68.6% of the men’s scores fall in this range. 70
Example: gender and aptitudes revisited (cont’d)
The corresponding CDFs reveals the same difference.
0.4
0.2
0.0
-6 -4 -2 0 2 4 6
Score
72
A sex-partners statistical model
73
A sex-partners random variable
The quantity of interest is the number of sex partners. In our model, this
will be a number between 0 and 3.
74
Sally’s sex-partner distribution
Xs
Event x P(Xs = x)
JLR 0 (1-0.07)(1-0.06)(1-0.05)
JLR 3 (0.07)(0.06)(0.05)
Xs
JLR 0 0.83
JLR 3 0.0002
Here is what it looks like after the calculation (rounded a bit). We can
do similarly for each individual.
76
Sally’s sex-partners distribution
Here is a picture of Sally’s sex partner distribution.
0.8305
0.8
Probability
0.6
0.4
0.2
0.1592
0.0101 2e-04
0.0
0 1 2 3
Number of partners
To get the distribution for all females, we sum over the individual women.
We apply the law of total probability using all three conditional
distributions:
We assume that the women are selected at random with equal probability
P(Maude) = P(Chastity) = P(Sally) = 1/3.
78
Female sex-partner distribution
At the end we get a distribution like this.
0.5951
Probability
0.6
0.4
0.2315
0.2
0.1315
0.0418
0.0
0 1 2 3
Number of partners
0.6
0.4983
0.4417
0.4
0.2
0.0583
0.0017
0.0
0 1 2 3
Number of partners
The more general lesson is that using probability models and a little bit
of algebra can help us see a situation more clearly.
81
Idea: statistical “null” hypotheses
The hypothesis that events are independent often makes a nice contrast
to other explanations, namely that random events are somehow related.
This vantage point allows us to judge if those other explanations fit the
facts any better than the uninteresting “null” explanation that events are
independent.
82
Vignette: making better pilots
Flight instructors have a policy of berating pilots who make bad landings.
They notice that good landings met with praise mostly result in
subsequently less-good landings, while bad landings met with harsh
criticism mostly result in subsequently improved landings.
83
Example: making better pilots (cont’d)
Assume that landings can be classified into three types: poor, adequate,
or excellent. Further assume the following probabilities:
Event Probability
bad pb
adequate pa
good pg
Remember that pb + pa + pg = 1.
84
Example: making better pilots (cont’d)
Assume that the policy of criticism is judged to work when a poor
landing is followed by a not-poor landing. Then
by independence.
The previous example shows that the evidence can appear to favor
criticism over praise even if criticism and praise are totally irrelevant.
No, it just means that the observed facts are not compelling evidence
that criticism works, because they are entirely consistent with the null
hypothesis that landing quality is independent of previous landings and
feedback.
In cases like this we say we “fail to reject the null hypothesis”. We’ll
revisit this terminology a couple weeks from now.
86
Example: making better pilots (continuous version)
We will model this situation using normal random variables and see if the
same conclusions (that praise appears to hurt performance and criticism
seems to boost it) could arise by chance.
87
Example: making better pilots (continuous version, cont’d)
Assume that each pilot has a certain ability level, call it A. Each
individual landing score arises as a combination of this ability and certain
random fluctuations, call them . The landing score at time t can be
expressed as
St = A + t .
iid
Assuming that t ∼ N(0, σ 2 ), then
St ∼ N(A, σ 2 ).
88
Example: making better pilots (continuous version, cont’d)
Denote an average landing score as M. Consider a pilot with A > M.
When he makes an exceptional landing, because 1 > 2σ, he is unlikely to
best it on his next landing.
0.4
0.2
0.0
M A A+ε1
S2
For this reason, praise is unlikely to work even though landings are
independent of one another. 89
Example: making better pilots (continuous version, cont’d)
For a poor pilot with A < M a similar argument holds. When he makes a
very poor landing, because 1 < −2σ, he is unlikely to do worse on his
next landing.
0.4
0.2
0.0
A+ε1 A M
S2
For this reason, criticism is likely to “work” even though landings are
independent. 90
Idea: mean reversion
What might the flight instructors have done (as an experiment) to really
get to the bottom of their question?
91