Professional Documents
Culture Documents
2023 Fall IBA6102 Lecture 2
2023 Fall IBA6102 Lecture 2
Qiyuan DENG
Statistics
Probability
Estimation
Linear Algebra
Useful for compact representation of data
Dimension reduction techniques
Optimization theory
Probabilities
Dependence, Independence, and Conditional Independence
Parameter Estimation
Maximum Likelihood Estimation (MLE)
Maximum A Posterior Estimation (MAP)
Python Review
Hands-on
Probabilities
Dependence, Independence, and Conditional Independence
Parameter Estimation
Maximum Likelihood Estimation (MLE)
Maximum A Posterior Estimation (MAP)
Python Review
Hands-on
a∈A
|B|
||v||
P
R
x, y, z
A, B
y = f (x)
y = f (x)
8 possible outcomes:
O = HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T
Possible events:
Two flips are heads up: E = (O ∈ {HHT, HT H, T HH})
First and third flips are tails up: E = (O ∈ {T HT, T T T })
Non negative
∀E ∈ F, P (E) ≥ 0
Scenario
Several random processes occur
Probabilities for each possible combination
Marginal probability
Probability distribution
P of a single variable in a joint distribution
P (X = x) = b=all values of Y P (X = x, Y = b)
Conditional probability
Probability distribution of one variable given that another variable
takes a certain value
P (Y = y|X = x) = P (X=x,Y
P (X=x)
=y)
Examples:
Independent: Wining on roulette this week and next week.
Russian roulette
Independent X, Y Dependent X, Y
Examples
Dependent: shoe size and reading skills
Conditionally independent: shoe size and reading skills given age
Qiyuan (CUHK, Shenzhen) Basics in ML September 12, 2023 23 / 51
Conditionally Independent
Finally another study pointed out that people wear coats when it
rains...
Parameter Estimation
Maximum Likelihood Estimation (MLE)
Maximum A Posterior Estimation (MAP)
Python Review
Hands-on
What’s the probability that the coin will fall with head up?
What’s the probability that the coin will fall with head up?
Let’s flip it a few times to estimate the probability:
What’s the probability that the coin will fall with head up?
Let’s flip it a few times to estimate the probability:
3
The estimated probability is: 5 “Frequency of heads”
What’s the probability that the coin will fall with head up?
Let’s flip it a few times to estimate the probability:
3
The estimated probability is: 5 “Frequency of heads”
Why frequency of heads?
How good is this estimation?
Why is this a machine learning problem?
Data:
αH : Number of Heads
αT : Number of Tails
∂J(θ)
∂θ = αH θαH −1 (1 − θ)αT − αT θαH (1 − θ)αT −1 |θ=θ̂M LE = 0
⇒ αH (1 − θ) − αT θ|θ=θ̂M LE = 0
αH
⇒ θ̂M LE = αH +αT
Great for large samples, but can be heavily biased for small samples
Interpretation
Computational efficiency
n
1P
µ̂M LE = n xi
i=1
n
2 1P
σ̂M LE = n (xi − µ̂)2
i=1
2 2 2
σ̂M LE is biased: E[σ̂M LE ] ̸= σ
n
2 1
(xi − µ̂)2
P
Unbiased estimator: σ̂unbiased = n−1
i=1
Parameter Estimation
Maximum Likelihood Estimation (MLE)
Maximum A Posterior Estimation (MAP)
Python Review
Hands-on
Bayes Theorem:
P (D|θ)P (θ)
P (θ|D) =
P (D)
Equivalently:
Choose the most probable hypothesis given the observed data and
prior belief by maximizing the posterior probability
Computational intensive
Choosing P (θ) reflects our prior knowledge about the learning task
θβH −1 (1 − θ)βT −1
P (θ) = ∼ Beta(βH , βT )
B(βH , βT )
R1
Beta function: B(x, y) = 0
tx−1 (1 − t)y−1 dt
Posterior is also Beta distribution:
P (θ|x1 , . . . , xn ) ∼ Beta(βH + αH , βT + αT )
Not good when sample is small Different answer for different priors
Parameter Estimation
Maximum Likelihood Estimation (MLE)
Maximum A Posterior Estimation (MAP)
Python Review
Hands-on
Installation
Download: https://www.python.org/downloads/
IDE
Jupyter notebook
Pycharm
Conda
Parameter Estimation
Maximum Likelihood Estimation (MLE)
Maximum A Posterior Estimation (MAP)
Python Review
Hands-on
Maths
Python