Professional Documents
Culture Documents
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
(WEEK 6, 2024)
REVISION &
INTRODUCTION TO PROBABILISTIC
GRAPHICAL MODELS
DR ANTHONY CONSTANTINOU 1
SCHOOL OF ELECTRONIC ENGINEERING AND COMPUTER SCIENCE
TIMETABLE
2
LECTURE OVERVIEW
Week 6
3
CATEGORICAL VS CONTINUOUS
Question:
Which attributes in the list should be represented
as continuous quantities?
4
HIDDEN SLIDE
5
CLASSIFICATION VS REGRESSION
Question:
There are two main types of supervised learning
algorithms: classification and regression. Which
type is appropriate to predict the outcomes of the
following problems?
6
HIDDEN SLIDE
7
REGRESSION / ERROR MEASURE
Question:
Consider an application of regression to predict
house prices. We want to know how much money
we expect to gain or lose for particular
investments, based on the recommendation
provided by the regression model. Which
evaluation metric is the most natural to use?
▪ A: R-squared.
▪ B: Adjusted R-Squared.
▪ C: Sum Squared Error (SSE).
▪ D: Mean Absolute Error (MAE).
8
HIDDEN SLIDE
9
HAMMING DISTANCE
Question:
In K-NN classification and K-Means clustering, we
need to compute distances between data points. For
data 𝒙𝟏 = [𝟎, 𝟎, 𝟏, 𝟐, 𝟏] and 𝒙𝟐 = [𝟎, 𝟏, 𝟏, 𝟎, 𝟏]. What is
the Hamming distance between them?
10
HIDDEN SLIDE
11
OVERFITTING
Question:
Which is the most likely symptom of overfitting?
13
OVERFITTING
Question:
Select all factors that could contribute to
overfitting:
15
BIG-𝑶 COMPLEXITY
Question:
You have 𝑵 training samples, 𝑴 testing samples,
and 𝑫 dimensions of data. What is the test time
computational complexity for K-NN and Logistic
Regression respectively?
17
DIMENSIONALITY REDUCTION
Question:
When using PCA to perform dimensionality reduction,
it is often suggested to preserve enough dimensions
so that they can explain 99% of the variance of the data
set. What does this mean?
19
ALGORITHMS AND OBJECTIVES
Question:
Match each algorithm with the corresponding objective
function.
A. Sum squared error between
original and reconstructed data.
A. K-Means. B. Sum squared error between each
data point and its assigned cluster.
B. Mixture of Gaussians.
C. Classification accuracy,
C. PCA. Information gain.
D. Sum squared error between
D. Statistical Regression. prediction and target.
E. Likelihood of the data under
Gaussian distributions.
F. Computational cost.
G. Memory utilisation.
20
HIDDEN SLIDE
21
PROBLEM 1
Question:
You are working for a bank that uses people’s social
network profile data to make lending decisions. You
have a historical data set of 10,000 users, capturing
both those who repaid and defaulted on their loan. For
each user you have extracted 1,000 features from their
social media profiles. Your manager suggests using
this data set to classify if a new person will pay or
default. What data problems may arise? How to
address them?
22
HIDDEN SLIDE
23
PROBLEM 2
Question:
You are working for a biotech company. Your task is to
improve disease prediction related to 10,000
sequenced genes of each client. You are provided with
a data set of 10,000 individuals that captures their
genes and disease status. A colleague suggests to
assess every possible combination of features for
maximum accuracy. What do you do?
24
HIDDEN SLIDE
25
COURSEWORK SUBMISSIONS: EXAMPLE 1
DIAGNOSING CARDIOVASCULAR DISEASE BASED ON
RANDOM FORESTS AND LOGISTIC REGRESSION
26
COURSEWORK SUBMISSIONS: EXAMPLE 2
PREDICTING HIT SONGS USING SPOTIFY DATA
27
COURSEWORK SUBMISSIONS: EXAMPLE 3
PREDICTION OF NITROGEN DIOXIDE LEVELS BASED ON
MULTIVARIATE LINEAR REGRESSION AND LINEAR SUPPORT
VECTOR MACHINES
28
10 MINUTTERS PAUSE
10分の休憩
10 MINUTEN PAUSE
دقائق استراحة10
10 MINUTI DI PAUSA
דקות10 הפסקה של
10 MINUTES DE PAUSE
10 मिनट का ब्रेक
10 MINUTES BREAK
10 МИНУТА ПАУЗЕ
10 মিমিটের মিরমি
ΔΙΑΛΕΙΜΜΑ 10 ΛΕΠΤΩΝ
ПЕРЕРЫВ 10 МИНУТ
休息10分钟
DESCANSO DE 10 MINUTOS
10 분 휴식
10 MINUTEN PAUZE 29
PROBABILISTIC GRAPHICAL MODELS
▪ A Probabilistic Graphical Model (PGM) is a model that
relies on probability theory to express dependencies
via a graph structure.
▪ Random variables, also known as stochastic or
uncertain variables, represent variables whose
possible values reflect the expected outcomes.
▪ We will focus on Bayesian Networks (BNs), which
represent a graphical model that can be used for
causal representation.
▪ Note that, in the literature, when a BN model is
assumed to capture causal relationships, it is often
referred to as a ‘Causal’ Bayesian Network (CBN).
30
BAYESIAN NETWORKS
A ▪ Nodes represent random variables.
▪ Arcs represent dependency:
𝐵 has parent 𝐴 and child 𝐷
𝐷 has parents 𝐵 and 𝐶, and ancestor 𝐴
B C
𝐴 has children 𝐵 and 𝐶, and descendant 𝐷
▪ Directed networks.
D
▪ Do not allow cycles.
we can say that A and D are independent if we
know the value B and C
Intervene on
Smoking
To manipulate
Yellow Lung
teeth cancer
32
CHARACTERISTICS OF
BAYESIAN NETWORKS
BNs (and causal BNs) are based on four fundamental areas:
▪ Bayesian/Inverse probability theory.
▪ Statistics: in addition to categorical variables, BNs can contain
variables represented by statistical distributions such as
Gaussian, Beta, Binomial, Poisson, Gamma distributions etc.
▪ Machine Learning: different algorithms to learn the graphical
structure and the parameters of the conditional distributions.
▪ Causality: the science of cause-and-effect; i.e., an effect is partly
dependent on a cause.
▪ Assumes effects can have multiple causes, and causes must
lie in the past of effects.
▪ An effect can in turn be a cause of many other effects which all
lie in its future.
33
THOMAS BAYES
▪ English statistician and philosopher.
▪ Born in 1701 and died in 1761.
▪ Studied Logic and Theology at the
University of Edinburgh.
▪ After Bayes’ death, his works were
passed to his friend Richard Price,
who was a philosopher and a
mathematician.
▪ Bayes’ notes were later edited and
published by Richard Price.
▪ In the first decades of the 18th
century, Bayes’ theorem solved
many problems of inverse
probability. 34
INVERSE PROBABILITY
▪ A method of assigning a
probability distribution to an
unobserved variable.
COIN DIE
42
BAYES THEOREM:
ILLUSTRATION OF INVERSE THINKING
▪ 𝐻 = “Have headache”. ▪ Let’s assume that, at any point in time around 1 in
10 people have headache and 1 in 40 people have
▪ 𝐶 = “Have COVID-19”. COVID-19.
▪ Assume COVID-19 causes headache 50% of the
▪ 𝑃 𝐻 =
1 time.
10
1 ▪ Suppose that a friend of yours has headache and
▪ 𝑃 𝐶 = states that, given that 50% of COVID-19 infections
40
1 are associated with headaches, the chance of
▪ 𝑃 𝐻|𝐶 = being infected with COVID-19 must be 50-50.
2
43
HIDDEN SLIDE
44
BAYES’ THEOREM
𝑃 𝐸|𝐻 𝑃 𝐻
𝑃 𝐻|𝐸 =
𝑃 𝐸
45
BAYES’ THEOREM
𝑃 𝐸|𝐻 𝑃 𝐻
𝑃 𝐻|𝐸 =
𝑃 𝐸
46
BAYES’ THEOREM
𝑃 𝐸|𝐻 𝑃 𝐻
𝑃 𝐻|𝐸 =
𝑃 𝐸
47
BAYES’ THEOREM
𝑃 𝐸|𝐻 𝑃 𝐻
𝑃 𝐻|𝐸 =
𝑃 𝐸
48
BAYES’ THEOREM
Bayes theorem gives us a method to calculate the probability of a hypothesis 𝐻
(e.g. red wine is healthy) given evidence 𝐸 (e.g. data) as follows:
𝑃 𝐸|𝐻 𝑃 𝐻
𝑃 𝐻|𝐸 =
𝑃 𝐸
The prior probability of 𝑯:
The posterior probability of 𝑯: The probability the
The probability the hypothesis 𝐻 hypothesis 𝐻 is true before
is true given evidence 𝐸. observing any evidence 𝐸.
50
CONDITIONAL PROBABILITY TABLE
▪ A Conditional Probability Table (CPT) captures the parameters of conditional
distributions.
▪ Suppose that we introduce the child node WIN (W) for both COIN TOSS (C)
and DIE ROLL (D), and we set 𝑊 = 𝑇 (i.e., true) when 𝐶 = 𝐻 and 𝐷 = 6.
These are (unconditional)
probability tables.
CPT of COIN TOSS
51
CONDITIONAL PROBABILITY TABLE
▪ The values in a CPT can be based on data, knowledge, or a
combination of the two. The parameters shown below are
based on knowledge. How do we know this?
52
PRIOR DISTRIBUTIONS
A prior distribution of a variable represents the expected outcomes of that
variable before any ‘observations’ are entered into the model.
𝑃 𝐷
𝑃 𝐶
𝑃 𝑊
What will happen to 𝑷 𝑾
if we set 𝑷 𝑪 = "𝐇“ ?
as H from 50% to 100%
P(W)=>
F= 53
T = 16.67
HIDDEN SLIDE
54
HIDDEN SLIDE
55
HIDDEN SLIDE
56
HIDDEN SLIDE
57
HIDDEN SLIDE
58
EXPLAINING AWAY
The notion of explaining away is a pattern of reasoning where the revised
beliefs of one cause influence the beliefs of some other independent cause,
when both causes share an effect (AKA ‘common-effect’). This way of
reasoning is very powerful and unique to Bayesian probability.
59
HIDDEN SLIDE
60
HIDDEN SLIDE
61
HIDDEN SLIDE
62
HIDDEN SLIDE
63
READING
Introductory book chapters in Bayesian AI:
64
READING
Introductory book chapters on Bayesian
probability and Bayesian networks:
▪ Book Title: Risk Assessment and Decision
Analysis with Bayesian Networks, by
Norman Fenton and Martin Neil.
▪ A PDF of chapters (4, 5, and 6) is available
for you to download on QM+.
▪ Chapter 4 covers the basics of probability.
You should skip subsection 4.5.
▪ Chapter 5 covers Bayes’ theorem and
conditional probability. You should skip
subsection 5.5.
▪ Chapter 6 covers how we move from
Bayes’ theorem to Bayesian networks. You
should skip subsections 6.7, 6.8 and 6.9. 65