Lecture 2 - Probability Theory

Lecture 2 – Probability theory
Nguyễn Phương Thái

Computer Science Department
http://coltech.vnu.edu.vn/~thainp/
1
Outline
- Probability space
- Conditional probability and independence
- Bayes theorem
- Random variables
- Expectation and variation
- Conditional distribution and joint distribution
- Probability estimation
- Standard distributions
2
Probability
- Nothing in life is certain. In everything we do, we gauge the chances of

successful outcomes, from business to medicine to the weather.
- Probability is the formal study of the laws of chance
3
Probability (cont)
- Basic concepts:
o Trial: experiment or observation
o Elementary outcomes: all possible results of an experiment
o Sample space: set of elementary outcomes
o Event: a subset of sample space
4
Examples
- Tossing a coin one time, the elementary outcomes are head and tail, the
sample space is the set Ω = {H, T}.
- Tossing a coin two times, the sample space is the set Ω = {TT, TH, HT,
HH}
- Tossing a dice one time, the sample space is a little bigger Ω = {1, 2, …, 6}
5
Events
The beauty of using events rather than elementary outcomes is that we can
combine events to make other events using set operations
- The number of possible events is 2n (n is the size of Ω)
- Ω is the certain event, ᴓ is the null event
- 𝐴 ∪ 𝐵 = {𝑤: 𝑤 ∈ 𝐴 or 𝑤 ∈ 𝐵}: the event A or the event B occurs
- 𝐴 ∩ 𝐵 = {𝑤: 𝑤 ∈ 𝐴 and 𝑤 ∈ 𝐵}: the event A and the event B both occur
- 𝐴\𝐵 = {𝑤: 𝑤 ∈ 𝐴 and 𝑤 ∉ 𝐵} : the event A occurs but not the event B
- 𝐴̅ = {𝑤: 𝑤 ∉ 𝐴 }: the event A does not occur
6
Events (cont)
Tossing a pair of dices, the sample space looks like this:
7
Events (cont)
Event description Event’s elementary outcomes

A: dice add to 3 {(1, 2), (2, 1)}
B: dice add to 6 {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}
C: white dice shows 1 {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)}
D: black dice shows 1 {(1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1)}
8
Events (cont)
𝐶 ∪ 𝐷 and 𝐶 ∩ 𝐷
9
Probability of events
Suppose that A is an event of some trial:

- P(A), naturally exists, shows the chance of A’s occurrence.
This number is equal to 1 if A is the certain event, is equal to 0 if A is the
null event, if A and B are disjoint then 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
- Suppose Ω = {𝑤1 , 𝑤2 , … , 𝑤𝑘 , … }, every elementary outcome is assigned a
“weight” 𝑝𝑘 = 𝑝𝑘 (𝑤𝑘 ) satisfying:
𝑝𝑘 ≥ 0 for every 𝑘 ≥ 1
∑ 𝑝𝑘 = 1
𝑘
- Then:
𝑃(𝐴) = ∑ 𝑝𝑘
{𝑘:𝑤𝑘 ∈𝐴}
10
A classical definition of probability
In case all elementary outcomes have the same probability: P(wi) = 1/N for
every i. Then following the formula in the previous slide, we have:
|𝐴 | 𝑛
𝑃(𝐴) = =
|Ω| 𝑁
11
Some properties of probability
𝑃(ᴓ) = 0, 𝑃(Ω) = 1, 0 ≤ 𝑃(𝐴) ≤ 1

𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴𝐵)
If A and B are conflict then 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
𝑃(𝐴̅) = 1 − 𝑃(𝐴)
12
Example
Event description Event’s elementary outcomes Probability

A: dice add to 3 {(1, 2), (2, 1)} 2/36
B: dice add to 6 {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} 5/36
C: white dice shows 1 {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)} 6/36
D: black dice shows 1 {(1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1)} 6/36
13
Example (cont)
A box contains N balls numbered from 1 to N. Randomly pick a ball, then

return it, and repeat n times. Compute the probability of the event:
A = {n balls are different in pairs}
Sample space: Ω = {w = (a1,… , an): 1 ≤ 𝑎𝑖 ≤ 𝑁} where |Ω| = 𝑁 𝑛
|𝐴| = 𝐴𝑛𝑁 = 𝑁(𝑁 − 1) … (𝑁 − 𝑛 + 1)
Therefore:
|𝐴| 𝑁(𝑁−1)…(𝑁−𝑛+1)
𝑃(𝐴) = | | =
Ω 𝑁𝑛
14
Conditional probability
15
Conditional probability (cont)
16
17
Conditional probability of event A given B can be computed by:

𝑃(𝐴𝐵)
𝑃 ( 𝐴 |𝐵 ) = if P(B)>0
𝑃(𝐵)
Joint probability:
𝑃(𝐴𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴) if P(A)P(B)≠0
By using induction, it is easy to infer general joint probability of n events.
18
Example
Given a box containing a white balls and b black balls. Randomly pick up two
balls one after another without returning. Compute the probability of the event:
only the second ball is white.
Suppose that Ak represents the event “only k-th ball is white”, k = 1, 2, …
Using the joint probability formula:
𝑏 𝑎
𝑃(𝐴1̅ 𝐴2 ) = 𝑃(𝐴1̅ )𝑃(𝐴2 |𝐴1̅ ) = ×
𝑎+𝑏 𝑎+𝑏−1
19
Bayes formula
𝑃(𝐵𝐴) 𝑃(𝐴|𝐵)𝑃(𝐵)
𝑃(𝐵|𝐴) = =
𝑃(𝐴) 𝑃(𝐴)
If 𝑃(𝐴) > 0 and {B1, B2, …, Bn} is a complete system of events (⋃𝑖 𝐵𝑖 =
Ω, 𝑃(𝐵𝑖 ) > 0 for all i, 𝐵𝑖 ∩ 𝐵𝑗 = ∅ for 𝑖 ≠ 𝑗), then:
𝑃(𝐵𝑘 )𝑃(𝐴|𝐵𝑘 ) 𝑃(𝐵𝑘 )𝑃(𝐴|𝐵𝑘 )
𝑃(𝐵𝑘 |𝐴) = = 𝑛
𝑃(𝐴) ∑𝑖=1 𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 )
Example: Given a box containing a white balls and b black balls. Randomly
pick up two balls one after another without returning. Compute the
probability of the event: 1st ball is white given the 2nd ball is also white.
20
Bayes formula (cont)
Suppose Ak represents “k-th ball is white”, k = 1, 2, …

We need to compute P(A1|A2). Using Bayes formula:
𝑃(𝐴1 )𝑃(𝐴2 |𝐴1 ) 𝑎(𝑎 − 1)
𝑃(𝐴1 |𝐴2 ) = =
𝑃(𝐴2 ) 𝑎(𝑎 − 1) + 𝑎𝑏
21
Independence
22
Independence (cont)
Two events A and B are independent if:

𝑃(𝐴𝐵) = 𝑃(𝐴)𝑃(𝐵)
If 𝑃(𝐵) > 0, obviously A and B are independent if and only if:
𝑃(𝐴|𝐵) = 𝑃(𝐴)
23
Random variables
- A random variable is defined as the numerical outcome of an

experiment
- Generally, a random variable is a function: 𝑋: Ω → 𝑅𝑛 (n=1)
- Probability mass function (pmf) or probability distribution:
p(x) = p(X=x) = P(Ax) where 𝐴 = {𝑤 ∈ Ω: 𝑋(𝑤 ) = 𝑥}
∑ 𝑝(𝑥𝑖 ) = ∑ 𝑃(𝐴𝑥𝑖 ) = 𝑃(Ω) = 1

𝑖 𝑖
24
Random variables (cont)
Toss two coins, and record the number of heads: 0, 1, or 2
Note the notation! The variable is written with a capital X. The lower case x
represents a single value of X. For example x=2 if head comes up twice.
Probability mass function:
25
Mean and variance
The mean (or expected value) of a random variable X is defined

as:
𝐸 (𝑋) = ∑ 𝑥𝑝(𝑥)
𝑥
The variance of a random variable X is the expected squared

distance from the mean:
2
𝑉𝑎𝑟(𝑋) = 𝐸 ((𝑋 − 𝐸 (𝑋)) ) = 𝐸(𝑋 2 − 𝐸 2 (𝑋))
26
Mean and variance (cont)
Toss a dice and record the number of dots:

6 1 6 21 1
𝐸 (𝑋) = ∑ 𝑥𝑝(𝑥) = ∑ 𝑥 = =3
𝑖=1 6 𝑖=1 6 2
2) 2(
91 49 35
𝑉𝑎𝑟(𝑋) = 𝐸 (𝑋 − 𝐸 𝑋) = − =
6 4 12
Note: To compute E(X2), it can be useful to use ∑𝑛𝑖=1 𝑖 2 =
𝑛(𝑛+1)(2𝑛+1)
6
27
Conditional distribution and joint
distribution
Joint pmf function of two discrete random variables X, Y is:

p(x, y) = P(X = x, Y = y)
Conditional function:
𝑝(𝑥, 𝑦)
𝑝𝑋|𝑌 (𝑥 |𝑦) = for y satisfy 𝑝𝑌 (𝑦) > 0
𝑝𝑌 (𝑦)
Product rule:
p(w, x, y, z) = p(w)p(x|w)p(y|w,x)p(z|w,x,y)
28
Marginal distribution
Given a joint pmf function p(x, y) = P(X=x, Y=y):

𝑝(𝑥 ) = ∑𝑦 𝑝(𝑥, 𝑦)
𝑝(𝑦) = ∑𝑥 𝑝(𝑥, 𝑦)
29
The binomial distribution
- Resulting from a series of trials with only two outcomes, each

trial being independent from all the others.
- Repeatedly tossing a (possibly unfair) coin is the prototypical
example of something with a binomial distribution.
- Family of binomial distribution:
𝑏(𝑟; 𝑛, 𝑝) = 𝐶𝑛𝑟 𝑝𝑟 (1 − 𝑝)𝑛−𝑟
Where r is the number of successful trials among n trials.
- Applications: ngram models, hypothesis testing, etc.
- Generalization: multinomial distributions
30
The binomial distribution (cont)
31
The normal distribution
- This is a continuous distribution:

1 −(𝑥−𝜇)2 /(2𝜎 2 )
𝑛(𝑥; 𝜇, 𝜎) = 𝑒
√2𝜋𝜎
where 𝜇 is the mean and 𝜎 is the standard deviation
- Applications: modeling human height, IQ, machine learning
models, etc.
- Another name: Gaussians
32
The normal distribution (cont)
Note: curves, discrete and continuous, are quite similar
33

Lecture 2 - Probability Theory

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 2 - Probability Theory

Uploaded by

Copyright:

Available Formats

Lecture 2 – Probability theory

Nguyễn Phương Thái

- Nothing in life is certain. In everything we do, we gauge the chances of

Tossing a pair of dices, the sample space looks like this:

Event description Event’s elementary outcomes

Suppose that A is an event of some trial:

𝑃(ᴓ) = 0, 𝑃(Ω) = 1, 0 ≤ 𝑃(𝐴) ≤ 1

Event description Event’s elementary outcomes Probability

A box contains N balls numbered from 1 to N. Randomly pick a ball, then

Conditional probability of event A given B can be computed by:

Suppose Ak represents “k-th ball is white”, k = 1, 2, …

Two events A and B are independent if:

- A random variable is defined as the numerical outcome of an

∑ 𝑝(𝑥𝑖 ) = ∑ 𝑃(𝐴𝑥𝑖 ) = 𝑃(Ω) = 1

Toss two coins, and record the number of heads: 0, 1, or 2

The mean (or expected value) of a random variable X is defined

The variance of a random variable X is the expected squared

Toss a dice and record the number of dots:

Joint pmf function of two discrete random variables X, Y is:

Given a joint pmf function p(x, y) = P(X=x, Y=y):

- Resulting from a series of trials with only two outcomes, each

- This is a continuous distribution:

Note: curves, discrete and continuous, are quite similar

You might also like