Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Lecture 2 – Probability theory

Nguyễn Phương Thái


Computer Science Department
http://coltech.vnu.edu.vn/~thainp/

1
Outline

- Probability space
- Conditional probability and independence
- Bayes theorem
- Random variables
- Expectation and variation
- Conditional distribution and joint distribution
- Probability estimation
- Standard distributions

2
Probability

- Nothing in life is certain. In everything we do, we gauge the chances of


successful outcomes, from business to medicine to the weather.
- Probability is the formal study of the laws of chance

3
Probability (cont)

- Basic concepts:
o Trial: experiment or observation
o Elementary outcomes: all possible results of an experiment
o Sample space: set of elementary outcomes
o Event: a subset of sample space

4
Examples
- Tossing a coin one time, the elementary outcomes are head and tail, the
sample space is the set Ω = {H, T}.

- Tossing a coin two times, the sample space is the set Ω = {TT, TH, HT,
HH}
- Tossing a dice one time, the sample space is a little bigger Ω = {1, 2, …, 6}

5
Events

The beauty of using events rather than elementary outcomes is that we can
combine events to make other events using set operations
- The number of possible events is 2n (n is the size of Ω)
- Ω is the certain event, ᴓ is the null event
- 𝐴 ∪ 𝐵 = {𝑤: 𝑤 ∈ 𝐴 or 𝑤 ∈ 𝐵}: the event A or the event B occurs
- 𝐴 ∩ 𝐵 = {𝑤: 𝑤 ∈ 𝐴 and 𝑤 ∈ 𝐵}: the event A and the event B both occur
- 𝐴\𝐵 = {𝑤: 𝑤 ∈ 𝐴 and 𝑤 ∉ 𝐵} : the event A occurs but not the event B
- 𝐴̅ = {𝑤: 𝑤 ∉ 𝐴 }: the event A does not occur

6
Events (cont)

Tossing a pair of dices, the sample space looks like this:

7
Events (cont)

Event description Event’s elementary outcomes


A: dice add to 3 {(1, 2), (2, 1)}
B: dice add to 6 {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}
C: white dice shows 1 {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)}
D: black dice shows 1 {(1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1)}

8
Events (cont)

𝐶 ∪ 𝐷 and 𝐶 ∩ 𝐷

9
Probability of events

Suppose that A is an event of some trial:


- P(A), naturally exists, shows the chance of A’s occurrence.
This number is equal to 1 if A is the certain event, is equal to 0 if A is the
null event, if A and B are disjoint then 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
- Suppose Ω = {𝑤1 , 𝑤2 , … , 𝑤𝑘 , … }, every elementary outcome is assigned a
“weight” 𝑝𝑘 = 𝑝𝑘 (𝑤𝑘 ) satisfying:
𝑝𝑘 ≥ 0 for every 𝑘 ≥ 1
∑ 𝑝𝑘 = 1
𝑘
- Then:

𝑃(𝐴) = ∑ 𝑝𝑘
{𝑘:𝑤𝑘 ∈𝐴}

10
A classical definition of probability

In case all elementary outcomes have the same probability: P(wi) = 1/N for
every i. Then following the formula in the previous slide, we have:
|𝐴 | 𝑛
𝑃(𝐴) = =
|Ω| 𝑁

11
Some properties of probability

𝑃(ᴓ) = 0, 𝑃(Ω) = 1, 0 ≤ 𝑃(𝐴) ≤ 1


𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴𝐵)
If A and B are conflict then 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
𝑃(𝐴̅) = 1 − 𝑃(𝐴)

12
Example

Event description Event’s elementary outcomes Probability


A: dice add to 3 {(1, 2), (2, 1)} 2/36
B: dice add to 6 {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} 5/36
C: white dice shows 1 {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)} 6/36
D: black dice shows 1 {(1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1)} 6/36

13
Example (cont)

A box contains N balls numbered from 1 to N. Randomly pick a ball, then


return it, and repeat n times. Compute the probability of the event:
A = {n balls are different in pairs}
Sample space: Ω = {w = (a1,… , an): 1 ≤ 𝑎𝑖 ≤ 𝑁} where |Ω| = 𝑁 𝑛
|𝐴| = 𝐴𝑛𝑁 = 𝑁(𝑁 − 1) … (𝑁 − 𝑛 + 1)
Therefore:
|𝐴| 𝑁(𝑁−1)…(𝑁−𝑛+1)
𝑃(𝐴) = | | =
Ω 𝑁𝑛

14
Conditional probability

15
Conditional probability (cont)

16
Conditional probability (cont)

17
Conditional probability (cont)

Conditional probability of event A given B can be computed by:


𝑃(𝐴𝐵)
𝑃 ( 𝐴 |𝐵 ) = if P(B)>0
𝑃(𝐵)
Joint probability:
𝑃(𝐴𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴) if P(A)P(B)≠0
By using induction, it is easy to infer general joint probability of n events.

18
Example

Given a box containing a white balls and b black balls. Randomly pick up two
balls one after another without returning. Compute the probability of the event:
only the second ball is white.
Suppose that Ak represents the event “only k-th ball is white”, k = 1, 2, …
Using the joint probability formula:
𝑏 𝑎
𝑃(𝐴1̅ 𝐴2 ) = 𝑃(𝐴1̅ )𝑃(𝐴2 |𝐴1̅ ) = ×
𝑎+𝑏 𝑎+𝑏−1

19
Bayes formula

𝑃(𝐵𝐴) 𝑃(𝐴|𝐵)𝑃(𝐵)
𝑃(𝐵|𝐴) = =
𝑃(𝐴) 𝑃(𝐴)
If 𝑃(𝐴) > 0 and {B1, B2, …, Bn} is a complete system of events (⋃𝑖 𝐵𝑖 =
Ω, 𝑃(𝐵𝑖 ) > 0 for all i, 𝐵𝑖 ∩ 𝐵𝑗 = ∅ for 𝑖 ≠ 𝑗), then:
𝑃(𝐵𝑘 )𝑃(𝐴|𝐵𝑘 ) 𝑃(𝐵𝑘 )𝑃(𝐴|𝐵𝑘 )
𝑃(𝐵𝑘 |𝐴) = = 𝑛
𝑃(𝐴) ∑𝑖=1 𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 )
Example: Given a box containing a white balls and b black balls. Randomly
pick up two balls one after another without returning. Compute the
probability of the event: 1st ball is white given the 2nd ball is also white.

20
Bayes formula (cont)

Suppose Ak represents “k-th ball is white”, k = 1, 2, …


We need to compute P(A1|A2). Using Bayes formula:
𝑃(𝐴1 )𝑃(𝐴2 |𝐴1 ) 𝑎(𝑎 − 1)
𝑃(𝐴1 |𝐴2 ) = =
𝑃(𝐴2 ) 𝑎(𝑎 − 1) + 𝑎𝑏

21
Independence

22
Independence (cont)

Two events A and B are independent if:


𝑃(𝐴𝐵) = 𝑃(𝐴)𝑃(𝐵)
If 𝑃(𝐵) > 0, obviously A and B are independent if and only if:
𝑃(𝐴|𝐵) = 𝑃(𝐴)

23
Random variables

- A random variable is defined as the numerical outcome of an


experiment
- Generally, a random variable is a function: 𝑋: Ω → 𝑅𝑛 (n=1)
- Probability mass function (pmf) or probability distribution:
p(x) = p(X=x) = P(Ax) where 𝐴 = {𝑤 ∈ Ω: 𝑋(𝑤 ) = 𝑥}

∑ 𝑝(𝑥𝑖 ) = ∑ 𝑃(𝐴𝑥𝑖 ) = 𝑃(Ω) = 1


𝑖 𝑖

24
Random variables (cont)

Toss two coins, and record the number of heads: 0, 1, or 2

Note the notation! The variable is written with a capital X. The lower case x
represents a single value of X. For example x=2 if head comes up twice.
Probability mass function:

25
Mean and variance

The mean (or expected value) of a random variable X is defined


as:

𝐸 (𝑋) = ∑ 𝑥𝑝(𝑥)
𝑥

The variance of a random variable X is the expected squared


distance from the mean:
2
𝑉𝑎𝑟(𝑋) = 𝐸 ((𝑋 − 𝐸 (𝑋)) ) = 𝐸(𝑋 2 − 𝐸 2 (𝑋))

26
Mean and variance (cont)

Toss a dice and record the number of dots:


6 1 6 21 1
𝐸 (𝑋) = ∑ 𝑥𝑝(𝑥) = ∑ 𝑥 = =3
𝑖=1 6 𝑖=1 6 2
2) 2(
91 49 35
𝑉𝑎𝑟(𝑋) = 𝐸 (𝑋 − 𝐸 𝑋) = − =
6 4 12
Note: To compute E(X2), it can be useful to use ∑𝑛𝑖=1 𝑖 2 =
𝑛(𝑛+1)(2𝑛+1)
6

27
Conditional distribution and joint
distribution

Joint pmf function of two discrete random variables X, Y is:


p(x, y) = P(X = x, Y = y)
Conditional function:
𝑝(𝑥, 𝑦)
𝑝𝑋|𝑌 (𝑥 |𝑦) = for y satisfy 𝑝𝑌 (𝑦) > 0
𝑝𝑌 (𝑦)
Product rule:
p(w, x, y, z) = p(w)p(x|w)p(y|w,x)p(z|w,x,y)

28
Marginal distribution

Given a joint pmf function p(x, y) = P(X=x, Y=y):


𝑝(𝑥 ) = ∑𝑦 𝑝(𝑥, 𝑦)
𝑝(𝑦) = ∑𝑥 𝑝(𝑥, 𝑦)

29
The binomial distribution

- Resulting from a series of trials with only two outcomes, each


trial being independent from all the others.
- Repeatedly tossing a (possibly unfair) coin is the prototypical
example of something with a binomial distribution.
- Family of binomial distribution:
𝑏(𝑟; 𝑛, 𝑝) = 𝐶𝑛𝑟 𝑝𝑟 (1 − 𝑝)𝑛−𝑟
Where r is the number of successful trials among n trials.
- Applications: ngram models, hypothesis testing, etc.
- Generalization: multinomial distributions

30
The binomial distribution (cont)

31
The normal distribution

- This is a continuous distribution:


1 −(𝑥−𝜇)2 /(2𝜎 2 )
𝑛(𝑥; 𝜇, 𝜎) = 𝑒
√2𝜋𝜎
where 𝜇 is the mean and 𝜎 is the standard deviation
- Applications: modeling human height, IQ, machine learning
models, etc.
- Another name: Gaussians

32
The normal distribution (cont)

Note: curves, discrete and continuous, are quite similar

33

You might also like