Probability Cheatsheet Midterm

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Edited by Ethan Tan.

Original compiled by William Chen Joint, Marginal, and Conditional Random Variables and their Distributions
(http://wzchen.com) and Joe Blitzstein, with contributions from
Joint Probability P (A ∩ B) or P (A, B) – Probability of A and B.
Sebastian Chiu, Yuan Jiang, Yuqi Hou, and Jessy Hwang. Based
Marginal (Unconditional) Probability P (A) – Probability of A.
PMF, CDF, and Independence
on Joe Blitzstein’s (@stat110) lectures (http://stat110.net) and
Blitzstein/Hwang’s Introduction to Probability textbook Probability Mass Function (PMF) Gives the probability that a
Conditional Probability P (A|B) = P (A, B)/P (B) – Probability of discrete random variable takes on the value x.
(http://bit.ly/introprobability). Licensed under CC BY-NC-SA A, given that B occurred.
4.0. Please share comments, suggestions, and errors at pX (x) = P (X = x)
Conditional Probability is Probability P (A|B) is a probability
http://github.com/wzchen/probability_cheatsheet. function for any fixed B. Any theorem that holds for probability also The PMF satisfies
holds for conditional probability.
Counting
X
pX (x) ≥ 0 and pX (x) = 1
x
Probability of an Intersection or Union
Multiplication Rule Intersections via Conditioning Cumulative Distribution Function (CDF) Gives the probability
Let’s say we have a compound experiment (an experiment with that a random variable is less than or equal to x.
multiple components). If the 1st component has n1 possible outcomes, P (A, B) = P (A)P (B|A)
FX (x) = P (X ≤ x)
the 2nd component has n2 possible outcomes, . . . , and the rth P (A, B, C) = P (A)P (B|A)P (C|A, B)
component has nr possible outcomes, then overall there are The CDF is an increasing, right-continuous function with
n1 n2 . . . nr possibilities for the whole experiment. Unions via Inclusion-Exclusion
FX (x) → 0 as x → −∞ and FX (x) → 1 as x → ∞
Sampling Table P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
The sampling table gives the number of possible samples of size k out P (A ∪ B ∪ C) = P (A) + P (B) + P (C) Independence Intuitively, two random variables are independent if
of a population of size n, under various assumptions about how the knowing the value of one gives no information about the other.
− P (A ∩ B) − P (A ∩ C) − P (B ∩ C) Discrete r.v.s X and Y are independent if for all values of x and y
sample is collected.
+ P (A ∩ B ∩ C).
P (X = x, Y = y) = P (X = x)P (Y = y)
Order Matters Not Matter Law of Total Probability (LOTP)
n + k − 1 Expected Value and Indicators
k
With Replacement n Let B1 , B2 , B3 , ..., Bn be a partition of the sample space (i.e., they are
k
disjoint and their union is the entire sample space).
Without Replacement
n!  n Expected Value and Linearity
(n − k)! k P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + · · · + P (A|Bn )P (Bn ) Expected Value (a.k.a. mean, expectation, or average) is a weighted
P (A) = P (A ∩ B1 ) + P (A ∩ B2 ) + · · · + P (A ∩ Bn ) average of the possible outcomes of our random variable.
Naive Definition of Probability Mathematically, if x1 , x2 , x3 , . . . are all of the distinct possible values
For LOTP with extra conditioning, just add in another event C! that X can take, the expected value of X is
If all outcomes are equally likely, the probability of an event A P
happening is: E(X) = xi P (X = xi )
P (A|C) = P (A|B1 , C)P (B1 |C) + · · · + P (A|Bn , C)P (Bn |C) i
number of outcomes favorable to A P (A|C) = P (A ∩ B1 |C) + P (A ∩ B2 |C) + · · · + P (A ∩ Bn |C)
Pnaive (A) = Linearity For any r.v.s X and Y , and constants a, b, c,
number of outcomes
Special case of LOTP with B and B c as partition: E(aX + bY + c) = aE(X) + bE(Y ) + c
Thinking Conditionally c
P (A) = P (A|B)P (B) + P (A|B )P (B )
c
Same distribution implies same mean If X and Y have the same
P (A) = P (A ∩ B) + P (A ∩ B )
c distribution, then E(X) = E(Y ) and, more generally,
Independence
E(g(X)) = E(g(Y ))
Independent Events A and B are independent if knowing whether
A occurred gives no information about whether B occurred. More
formally, A and B (which have nonzero probability) are independent if Bayes’ Rule Indicator Random Variables
and only if one of the following equivalent statements holds: Indicator Random Variable is a random variable that takes on the
Bayes’ Rule, and with extra conditioning (just add in C!)
P (A ∩ B) = P (A)P (B) value 1 or 0. It is always an indicator of some event: if the event
P (B|A)P (A) occurs, the indicator is 1; otherwise it is 0. They are useful for many
P (A|B) = P (A) P (A|B) = problems about counting how many events of some kind occur. Write
P (B)
P (B|A) = P (B) (
1 if A occurs,
P (B|A, C)P (A|C) IA =
Conditional Independence A and B are conditionally independent P (A|B, C) = 0 if A does not occur.
given C if P (A ∩ B|C) = P (A|C)P (B|C). Conditional independence P (B|C)
does not imply independence, and independence does not imply We can also write 2
Note that IA = IA , IA IB = IA∩B , and IA∪B = IA + IB − IA IB .
conditional independence.
P (A, B, C) P (B, C|A)P (A) Distribution IA ∼ Bern(p) where p = P (A).
P (A|B, C) = =
Unions, Intersections, and Complements P (B, C) P (B, C) Fundamental Bridge The expectation of the indicator for event A is
the probability of event A: E(IA ) = P (A).
De Morgan’s Laws A useful identity that can make calculating Odds Form of Bayes’ Rule
probabilities of unions easier by relating them to intersections, and
vice versa. Analogous results hold with more than two sets. P (A|B) P (B|A) P (A) Variance and Standard Deviation
= 2 2 2
c c
(A ∪ B) = A ∩ B
c P (Ac |B) P (B|Ac ) P (Ac ) Var(X) = E (X − E(X)) = E(X ) − (E(X))
q
c c c
(A ∩ B) = A ∪ B The posterior odds of A are the likelihood ratio times the prior odds. SD(X) = Var(X)
Discrete Distributions Hypergeometric Distribution Poisson Distribution
Let us say that X is distributed HGeom(w, b, n). We know the
following: Let us say that X is distributed Pois(λ). We know the following:
Distributions for four sampling schemes
Story In a population of w desired objects and b undesired objects, Story There are rare events (low probability events) that occur many
Replace No Replace X is the number of “successes” we will have in a draw of n objects, different ways (high possibilities of occurrences) at an average rate of λ
Fixed # trials (n) Binomial HGeom without replacement. The draw of n objects is assumed to be a occurrences per unit space or time. The number of events that occur
(Bern if n = 1) simple random sample (all sets of n objects are equally likely). in that unit of space or time is X.
Draw until r success NBin NHGeom Examples Here are some HGeom examples.
(Geom if r = 1) Example A certain busy intersection has an average of 2 accidents
• Let’s say that we have only b Weedles (failure) and w Pikachus per month. Since an accident is a low probability event that can
(success) in Viridian Forest. We encounter n Pokemon in the happen many different ways, it is reasonable to model the number of
forest, and X is the number of Pikachus in our encounters.
Bernoulli Distribution accidents in a month at that intersection as Pois(2). Then the number
• The number of Aces in a 5 card hand. of accidents that happen in two months at that intersection is
The Bernoulli distribution is the simplest case of the Binomial
• You have w white balls and b black balls, and you draw n balls distributed Pois(4).
distribution, where we only have one trial (n = 1). Let us say that X is
without replacement. The number of white balls in your sample
distributed Bern(p). We know the following: Properties Let X ∼ Pois(λ1 ) and Y ∼ Pois(λ2 ), with X ⊥
⊥ Y.
is HGeom(w, b, n); the number of black balls is HGeom(b, w, n).
Story A trial is performed with probability p of “success”, and X is • Capture-recapture A forest has N elk, you capture n of them,
the indicator of success: 1 means success, 0 means failure. tag them, and release them. Then you recapture a new sample 1. Sum X + Y ∼ Pois(λ1 + λ2 )
Example Let X be the indicator of Heads for a fair coin toss. Then of size m. How many tagged elk are now in the new sample?
HGeom(n, N − n, m)
 
X ∼ Bern( 21 ). Also, 1 − X ∼ Bern( 21 ) is the indicator of Tails. 2. Conditional X|(X + Y = n) ∼ Bin n,
λ1
λ1 +λ2
• Connections between Binomial and HGeom.
Binomial Distribution Independent vs. Dependent. Trials in Binomial are 3. Chicken-egg If there are Z ∼ Pois(λ) items and we randomly
independent but trials in HGeom are dependent. Link between and independently “accept” each item with probability p, then
Let us say that X is distributed Bin(n, p). We know the following:
Binomial and HGeom. If we have two independent binomial the number of accepted items Z1 ∼ Pois(λp), and the number of
Story X is the number of “successes” that we will achieve in n distributions X ∼Bin(n, p) and Y ∼Bin(m, p) then the rejected items Z2 ∼ Pois(λ(1 − p)), and Z1 ⊥
⊥ Z2 .
independent trials, where each trial is either a success or a failure, each conditional distribution of X given X + Y = r is
with the same probability p of success. We can also write X as a sum HGeom(n, m, r). HGeom with large population. If HGeom
of multiple independent Bern(p) random variables. Let X ∼ Bin(n, p) has a large population then the PMF converges to Binomial as
and Xj ∼ Bern(p), where all of the Bernoullis are independent. Then the population approaches infinity, because sampling with Formulas
replacement from an infinite pool is the same as sampling
X = X1 + X2 + X3 + · · · + Xn without replacement.
Example If Jeremy Lin takes 10 free throws and each one Negative Hypergeometric Distribution Geometric Series
independently has a 43 chance of getting in, then the number of free
throws he makes is distributed Bin(10, 34 ). Let us say that X is distributed N HGeom(w, b, r). w = white, b = n−1
black, r = number of whites (successes) 2 n−1
X k 1 − rn
1 + r + r + ··· + r = r =
Properties Let X ∼ Bin(n, p), Y ∼ Bin(m, p) with X ⊥
⊥ Y. PMF 1−r
k=0
w  b
• Redefine success n − X ∼ Bin(n, 1 − p) r−1 k w−r+1
P (X = k) = ∗ 2 1
w+b  w+b−r−k+1 1 + r + r + ··· = if |r| < 1
• Sum X + Y ∼ Bin(n + m, p) r+k−1 1−r
r+k−1 w+b−r−k
• Conditional X|(X + Y = r) ∼ HGeom(n, m, r) r−1 w−r
P (X = k) =
• Binomial-Poisson Relationship Bin(n, p) is approximately
w+b
w Exponential Function (ex )
Pois(λ) if p is small. E(X) = rb/(w + 1)

xn x2 x3 x n
 
First Success For first success, x
X
e = =1+x+ + + · · · = lim 1+
Geometric Distribution r+k−1 w+b−r−k+1
n=0
n! 2! 3! n→∞ n
r−1 w−r
Let us say that X is distributed Geom(p). We know the following: P (X = k) = w+b
w
Story X is the number of “failures” that we will achieve before we and
achieve our first success. Our successes have probability p. E(X) = 1 + (rb/(w + 1))
LOTUS
1
Example If each pokeball we throw has probability to catch Mew,
10 Story An urn contains w white balls and b black balls, which are Expected value of a function of an r.v.
1
the number of failed pokeballs will be distributed Geom( 10 ). randomly drawn one by one without replacement, until r white balls
have been obtained. The number of black balls drawn before drawing X
E(g(X)) = g(x)P (X = x) (for discrete X)
First Success Distribution the r th white ball has a Negative Hypergeometric distribution with
x
parameters w, b, r. We denote this distribution by NHGeom(w, b, r).
Equivalent to the Geometric distribution, except that it includes the Of course, we assume that r ≤ w.
first success in the count. This is 1 more than the number of failures.
Z ∞
If X ∼ FS(p) then E(X) = 1/p. Examples Here are some NHGeom examples. E(g(X)) = g(x)f (x)dx (for continuous X)
−∞
• If we shuffle a deck of cards and deal them one at a time, the
Negative Binomial Distribution number of cards dealt before uncovering the first ace is What’s a function of a random variable? A function of a random
Let us say that X is distributed NBin(r, p). We know the following: NHGeom(4, 48, 1). variable is also a random variable. For example, if X is the number of
• Suppose a college offers g good courses and b bad courses, and a bikes you see in an hour, then g(X) = 2X is the number of bike wheels
Story X is the number of “failures” that we will have before we X(X−1)
you see in that hour and h(X) = X

student wants to find 4 good courses to take. Not having any 2 = is the number of
achieve our rth success. Our successes have probability p. 2
idea which of the courses are good, the student randomly tries pairs of bikes such that you see both of those bikes in that hour.
Example Thundershock has 60% accuracy and can faint a wild out courses one at a time, stopping when they have obtained 4
Raticate in 3 hits. The number of misses before Pikachu faints good courses. Then the number of bad courses the student tries What’s the point? You don’t need to know the PMF/PDF of g(X)
Raticate with Thundershock is distributed NBin(3, 0.6). out is NHGeom(g, b, 4). to find its expected value. All you need is the PMF/PDF of X.
Example Problems Story Proof (Vandermonde) Calculating Probability
m + n k
X m n  A textbook has n typos, which are randomly scattered amongst its n
= pages, independently. You pick a random page. What is the 
Chicken-egg Poisson k j=0
j k−j probability that it has no typos? Answer: There is a 1 − n 1

probability that any specific typo isn’t on your page, and thus a
Story: m people, n people, choose k people. This can be done in 2
Story: A hen lays a random number of N eggs, N ∼ P ois(λ). Each 
1 n

ways: 1) choose k people from a group of m + n people; 2) choose j
egg independently hatches with probability p and doesn’t hatch with 1− probability that there are no typos on your page. For n
people from the group of m people, then choose k − j people from the n
probability q = 1 − p, where X is the number of eggs that hatch and Y
group of n people, then sum up all possibilities of this scenario
is the number that don’t, and X + Y = N . Find the joint PMF of large, this is approximately e−1 = 1/e.
X + Y . Answer: We want P (X = i, Y = j). Consider N eggs as
independent Bernoulli trials. By the story of the binomial, the Gambler’s Ruin Linearity and Indicators (1)
conditional distributions are X|N = n ∼ Bin(n, p) and Two gamblers, A and B, make a sequence of $1 bets. In each bet, In a group of n people, what is the expected number of distinct
Y |N = n ∼ Bin(n, q). If only we knew the total number of eggs...let’s gambler A has probability p of winning, and gambler B has probability birthdays (month and day)? What is the expected number of birthday
use wishful thinking: condition on N and apply LOTP. q = 1 − p of winning. Gambler A starts with i dollars and gambler B matches? Answer: Let X be the number of distinct birthdays and Ij
starts with N − i dollars; the total wealth between the two remains be the indicator for the jth day being represented.

X constant since every time A loses a dollar, the dollar goes to B, and n
P (X = i, Y = j) = P (X = i, Y = j|N = n)P (N = n) vice versa. The probability of A winning with a starting wealth of i is E(Ij ) = 1 − P (no one born on day j) = 1 − (364/365)
n=0
n
By linearity, E(X) = 365 (1 − (364/365) ) . Now let Y be the
 q i
 1−( p ) , if p ̸= 1

P (X = i, Y = j|N = n) = 0 unless n = i + j (we can drop all other q N 2 number of birthday matches and Ji be the indicator that the ith pair
pi = 1−(
p
)
terms in the sum). Thus,  i ,
 1 of people have the same birthday. The probability that any two
N if p = 2 n
P (X = i, Y = j) = P (X = i, Y = j|N = i + j)P (N = i + j) specific people share a birthday is 1/365, so E(Y ) = /365 .
Random Coin Variation/Cond. PMF 2
Conditional on N = i + j,events X = i and Y = j are exactly the same
event, so keeping both is redundant. The first term is a binomial so There are two coins, one with probability p1 of Heads and the other Linearity and Indicators (2)
P (X = i, Y = j|N = i + j) = P (X = i|N = i + j) = i+j
 i j
p q The with probability p2 of Heads. One of the coins is randomly chosen
i
(with equal probabilities for the two coins). It is then flipped n ≥ 2 There are n people at a party, each with hat. At the end of the party,
e−λ λi+j they each leave with a random hat. What is the expected number of
second term is Poisson, so P (N = i + j) = (i+j)!
Thus, times. Let X be the number of times it lands Heads. To find PMF,
condition on which coin is chosen. OR A new treatment for a disease people who leave with the right hat? Answer: Each hat has a 1/n
is being tested, to see whether it is better than the standard chance of going to the right person. By linearity, the average number
i + j 
i j e−λ λi+j
P (X = i, Y = j) = p q ∗ treatment. The standard treatment is effective on 50% of patients. It of hats that go to their owners is n(1/n) = 1 .
i (i + j)! is believed initially that there is a 2/3 chance that the new treatment
is effective on 60% of patients,and a 1/3 chance that the new Linearity and First Success
e−λp (λp)i e−λq (λq)j treatment is effective on 50% of patients. In a pilot study done with 20
∗ This problem is known as the coupon collector problem. There are n
i! j! randomly selected patients, the new treatment is effective for 15 of the
patients. What is the probability that the new treatment is better coupon types. At each draw, you get a uniformly random coupon type.
The joint PMF factors into the product of the P ois(λp) PMF (as a than the standard treatment? Answer:Let B be the event that the What is the expected number of coupons needed until you have a
function of i) and the P ois(λq) PMF (as a function of j). This tells us new treatment is better than the standard treatment and let X be the complete set? Answer: Let N be the number of coupons needed; we
two elegant facts: (1) X and Y are independent, since their joint PMF number of people in the study for whom the new treatment is effective. want E(N ). Let N = N1 + · · · + Nn , where N1 is the draws to get our
is the product of their marginal PMFs, and (2) X ∼ P ois(λp) and first new coupon, N2 is the additional draws needed to draw our
Y ∼ P ois(λq). P (X = 15|B)P (B) second new coupon and so on. By the story of the First Success,
P (B|X = 15) = N2 ∼ FS((n − 1)/n) (after collecting first coupon type, there’s
P (X = 15|B)P (B) + P (X = 15|B c )P (B c )
(n − 1)/n chance you’ll get something new). Similarly,
St. Petersburg Paradox 20 15 5 2 N3 ∼ FS((n − 2)/n), and Nj ∼ FS((n − j + 1)/n). By linearity,
15 (0.6) (0.4) ( 3 )
= 20 20
Flip a fair coin until it lands Heads for the first time, and you will 2 20 1

15 5
15 (0.6) (0.4) ( 3 ) + 15 (0.5) ( 3 )
n
n n n X1
receive $2n if the game lasts for n rounds. What is the fair value of E(N ) = E(N1 ) + · · · + E(Nn ) = + + ··· + = n
this game (the expected payoff)? Answer: Let X be your winnings PMF (mixed binomial) n n−1 1 j
j=1
from playing the game. X = 2N where N is the number of rounds that 20 20
k 20−k 20
the game lasts. E(X) = 12 ∗ 1 + 14 ∗ 4 + ... = ∞. On the other hand, (0.6) (0.4) p+ (0.5) (1 − p) This is approximately n(log(n) + 0.577) by Euler’s approximation.
k k
N ∼ F S( 21 ), so E(N ) = p 1
= 2. The ∞ in the St. Petersburg paradox
is driven by an infinite “tail” of extremely rare events where you get
Orderings of i.i.d. (independent and identically
Conditional Distribution/Fisher Exact Test
extremely large payoffs. Cutting off this tail at some point, which distributed, meaning all variables are independent
makes sense in the real world, dramatically reduces the expected value Book has N typos. Prue and Frida each catch typos with prob p. Find
of the game. the cond. dist. of X1 given that X1 + X2 = t. X1 + X2 ∼ Bin(2N, p) and the probability dist. of each variable is the
P (X1 + X2 = t|X1 = k)P (X1 = k)
same) random variables
P (X1 = k|X1 + X2 = t) =
De Montmort’s Matching Problem. P (X1 + X2 = t) I call 2 UberX’s and 3 Lyfts at the same time. If the time it takes for
the rides to reach me are i.i.d., what is the probability that all the
A stack of chips is labelled 1 to n randomly. I flip over a chip and say P (X2 = t − k)P (X1 = k) Lyfts will arrive first? Answer: Since the arrival times of the five cars
the numbers 1 to n per flip. I win if the number I say is the same as = are i.i.d., all 5! orderings of the arrivals are equally likely. There are
P (X1 + X2 = t)
the number I flip over. What is the probability of winning? If I keep 3!2! orderings that involve the Lyfts arriving first, so the probability
playing, what’s the expected number of wins I get? Use binomial PMFs and simplify to
3!2!
= 1/10 . Alternatively, there are 53

Probability of winning is calculated using incl-excl. Let Ai be the N  N
that the Lyfts arrive first is
event that the ith chip has the number i. We want to find t−k k
5!
A1 ∪ ... ∪ An . This is 1 − 1/2! + 1/3! − 1/4!...1/n!. Compare this to 2N  ways to choose 3 of the 5 slots for the Lyfts to occupy, where each of
t the choices are equally likely. One of these choices has all 3 of the
the taylor series - this is close to 1 − e−1 for big n. Expected number
of wins is simply 1 by linearity of expectation (probability of any chip for k ∈ {0, 1, ..., t} Th: If X ∼ Bin(n, p), Y ∼ Bin(m, p) (X is indep 5
Lyfts arriving first, so the probability is 1/ = 1/10 .
being a win is 1/n, multiply this by n). of Y ), then the cond dist of X given X + Y = r is HGeom(n, m, r). 3
Expectation of Negative Hypergeometric Card Matching - PIE/Indicators 2. Don’t confuse unconditional, conditional, and joint
P (B|A)P (A)
n cards, number 1 to n. Ai is the event that card i is in position i. probabilities. In applying P (A|B) = P (B)
, it is not
What is the expected number of cards that you draw before you pick n
your first Ace in a shuffled deck (not counting the Ace)? Answer: [ n
n+1
correct to say “P (B) = 1 because we know B happened”; P (B)
P( Ai ) = nP (A1 ) − P (A1 ∩ A2 ) + ... + (−1) P (A1 ∩ ... ∩ An ) is the prior probability of B. Don’t confuse P (A|B) with
Consider a non-Ace. Denote this to be card j. Let Ij be the indicator 2
i=1 P (A, B).
that card j will be drawn before the first Ace. Note that Ij = 1 says n
that j is before all 4 of the Aces in the deck. The probability that this
[ 1 1
P( Ai ) ≈ 1 − since P (Ai ) = n OR Let X be the number of
occurs is 1/5 by symmetry. Let X be the number of cards drawn i=1
e
before the first Ace. Then X = I1 + I2 + ... + I48 , where each indicator matches, and Ij equal 1 if the jth card is a match, 0 otherwise. In
corresponds to one of the 48 non-Aces. Thus, other words, Ij is the indicator for Aj , the event that the jth card in
the deck is a match. We can imagine that each Ij “raises its hand” to
E(X) = E(I1 ) + E(I2 ) + ... + E(I48 ) = 48/5 = 9.6 . be counted if its card is a match; adding up the raised hands, we get
the total number of matches, X. X = I1 + ... + In and 3. Don’t assume independence without justification. In the
E(X) = E(I1 ) + ... + E(In ) = nE(I1 ) = n(1/n) by symmetry.
Pattern-matching with ex Taylor series matching problem, the probability that card 1 is a match and
card 2 is a match is not 1/n2 . Binomial and Hypergeometric

1
 Problem-Solving Strategies are often confused; the trials are independent in the Binomial
For X ∼ Pois(λ), find E . Answer: By LOTUS, story and dependent in the Hypergeometric story.
X+1
1. Using Linearity, Fundamental Bridge and Symmetry
  ∞ −λ k −λ ∞ k+1 −λ
together. How to find the expected value of a difficult random
1 X 1 e λ e X λ e λ variable X:
E = = = (e − 1)
X+1 k=0
k+1 k! λ k=0
(k + 1)! λ
(a) Define n indicator r.v.s Ij such that all n variables sum
up to X
Calculating Probability (b) By the fundamental bridge, E(I1 ) = P (I1 = 1)
4. Don’t forget to do sanity checks. Probabilities must be
A textbook has n typos, which are randomly scattered amongst its n (c) By symmetry, which means every outcome is equally between 0 and 1. Variances must be ≥ 0. Supports must make
pages, independently. You pick a random page. What is the  likely (a fair coin, or drawing a card from a deck), all sense. PMFs must sum to 1. PDFs must integrate to 1.
1
probability that it has no typos? Answer: There is a 1 − n E(Ij ) = E(I1 )
probability that any specific typo isn’t on your page, and thus a (d) By linearity, E(X) = E(I1 ) + ... + E(In ) = nE(I1 )
1 n
 
1− probability that there are no typos on your page. For n 2. Wishful thinking. When we encounter a problem that would
n
be easier if we know whether event E happebed, condition on E
−1
large, this is approximately e = 1/e. and then on E’, then use LOTP to combine both. Try simple
Bose-Einstein n boxes, k indistinguishable particles; choose where and extreme cases. To make an abstract experiment more
the bars go n+k−1
or choose where the particles go n+k−1

concrete, try drawing a picture or making up numbers that 5. Don’t confuse random variables, numbers, and events.
n−1 k
could have happened. Pattern recognition: does the structure Let X be an r.v. Then g(X) is an r.v. for any function g. In
of the problem resemble something we’ve seen before? particular, X 2 , |X|, F (X), and IX>3 are r.v.s.
Splitting into teams P (X 2 < X|X ≥ 0), E(X), Var(X), and g(E(X)) are numbers.
3. Calculating probability of an event. Use counting X = 2R and F (X) ≥ −1 are events. It does not make sense to
(a) Split 360 ppl into 120 teams of 3. Line up ppl in a line and say principles if the naive definition of probability applies. Is the write −∞∞
F (X)dx, because F (X) is a random variable. It does
every 3 ppl is a team. This overcounts by a factor of 120! bc the order probability of the complement easier to find? Look for
not make sense to write P (X), because X is not an event.
of the teams doesn’t matter, and a factor of 3!120 because the order symmetries. Look for something to condition on, then apply
within the teams doesn’t matter. Ans: 360!/6120 120! (b) 360 people Bayes’ Rule or the Law of Total Probability.
consist of 180 married couples; expected number of teams containing
4. Finding the distribution of a random variable. First make
married couples: any particular pair in the team has probability 1/
sure you need the full distribution not just the mean (see next
359 of being married to each other,so since there are 3 disjoint
item). Check the support of the random variable: what values
possibilities and total of 120 teams the probability is 3*120 / 359
can it take on? Use this to rule out distributions that don’t fit.
Is there a story for one of the named distributions that fits the
Couples/pairs in random sample problem at hand? Can you write the random variable as a 6. Don’t confuse a random variable with its distribution.
function of an r.v. with a known distribution, say Y = g(X)? To get the PDF of X 2 , you can’t just square the PDF of X. To
100 senators (2 from each state), random sample of 20 ppl: Find the get the PDF of X + Y , you can’t just add the PDF of X and
expected number of 5. Calculating expectation. If it has a named distribution,
2 1 20
 states with both senators in the sample.
20 19 check out the table of distributions. If it’s a function of an r.v.
the PDF of Y .
E(I1 ) = 100 99 2 = 100 99 OR 1000 ppl in a town (500 married
couples): Find expected number of couples in a random sample of 20 with a named distribution, try LOTUS. If it’s a count of
ppl. P (Bob in sample)P (Rob in sample|Bob in sample) = 1000 20 19 something, try breaking it up into indicator r.v.s.
999 =
20 1 6. Symmetry. If X1 , . . . , Xn are i.i.d., consider using symmetry.
2 999
7. Before moving on. Check some simple and extreme cases,
Finding Aces in a Deck (Indicator Cond. PMF) check whether the answer seems plausible, check for biohazards.
7. Don’t pull non-linear functions out of expectations.
Let X be the position of the first ace in a standard, well-shuffled deck
of 52 cards.
Biohazards E(g(X)) does not equal g(E(X)) in general. The St.
48  Petersburg paradox is an extreme example.
k−2 4
P (X = k) = 52  ∗ 1. Don’t misuse the naive definition of probability. When
k−1
52 − k+1 answering “What is the probability that in a group of 3 people,
no two have the same birth month?”, it is not correct to treat
Let I be the indicator of the (X + 1)st card in the deck also being an
3 the people as indistinguishable balls being placed into 12 boxes,
ace. Conditional PMF of I given X = k: P (I = 1|X = k) = 52−k and
since that assumes the list of birth months {January, January,
P (I = 0|X = k) = 49−k
52−k since we know that there are 52 − 1 − (k − 1) January} is just as likely as the list {January, April, June},
non-aces in the deck and 3 aces left in the deck. even though the latter is six times more likely.
Useful Math Tricks and Birthday Equiva-
lent

You might also like