Professional Documents
Culture Documents
Random ( Statistical Stochastic) Vari-Able: V X Rain
Random ( Statistical Stochastic) Vari-Able: V X Rain
Logistic Regression
• Random (= statistical = stochastic) vari-
able: upper-case letter, e.g. V or X, or
upper-case string, e.g. RAIN
Probability theory as basis for the construction
of classifiers:
• Binary variables: take one of two values,
X = true (abbreviated x) and X = false
• Multivariate probabilistic models (abbreviated ¬x)
• Conjunctions: (X = x) ∧ (Y = y) as
• Independence assumptions
X = x, Y = y
X n
^ n
Y n
^
P (x3 ) = P (X1 , X2 , x3) = 0.55 P( Xi ) = P (Xi | Xk )
X1,X2 i=1 i=1 k=i+1
Chain rule - digraph Does the chain rule help?
X3
X3 X3
X1 X2
X1 X2 X1 X2
(1) (2)
P (X1 , X2 , X3 ) = P (X1 | X2 , X3 ) ·
Factorisation (1): P (X2 | X3 )P (X3 )
i.e. we need:
P (X1 , X2 , X3 ) = P (X1 | X2 , X3 ) ·
P (x1 | x2, x3)
P (X2 | X3 )P (X3 )
P (x1 | ¬x2, x3)
P (x1 | x2, ¬x3)
P (x1 | ¬x2, ¬x3)
Other factorisation (2):
P (x2 | x3)
P (x2 | ¬x3)
P (X1 , X2 , X3 ) = P (X2 | X1 , X3 ) ·
P (x3 )
P (X1 | X3 )P (X3 )
Note P (¬x1 | x2, x3) = 1 − P (x1 | x2, x3), etc.
⇒ different factorisations possible ⇒ 7 probabilities required (as for P (X1 , X2 , X3))
Notation: X2 X3 | X1, X3 X2 | X1
|=
|=
FEVER
P (MY = y|FL = n) = 0.20 NO
YES
PNEUMONIA
NO TEMP
YES <=37.5
(yes/no) YES
YES
FEVER
P (TEMP ≤ 37.5|FE = y) = 0.1 NO
YES
PNEUMONIA
P (TEMP ≤ 37.5|FE = n) = 0.99 NO TEMP
YES <=37.5
>37.5
Spectrum Solution:
use simple probabilistic models for classifica-
naive Bayesian tree−augmented
network Bayesian network
tion:
general Bayesian (TAN)
• naive (independent) form BN
networks
• T ree-Augmented Bayesian Network (TAN)
Un restricted Restricted Structure Learning
Structure • F orest-Augmented Bayesian Network (FAN)
Learning
Naive (independent) form BN Example of naive Bayes
where C
• n is the size of the dataset D
Compute mutual information between variables
• n0 is the estimated size of the (virtual) ‘dataset’ E, E 0 conditioned on the class variable C:
on which the prior knowledge is based (equiv- X P (E, E 0 | C)
IP (E, E 0 | C) = P (E, E 0, C) · log
alence sample size) P (E | C)P (E 0 | C)
E,E 0,C
FAN algorithm
Performance evaluation
Choose k ≥ 0. Given evidence variables Ei, a
class variable C, and a dataset D:
• Success rate σ based on:
1. Compute mutual infor-
mation −IP (Ei, Ej | C) cmax = argmaxc{P (c | xi )}
∀(Ei, Ej ), i 6= j, in a
complete undirected graph for xi ∈ D, i = 1, . . . , n = |D|
2. Construct a minimum-cost
spanning forest containing
exactly k edges • Total entropy (penalty):
3. Change each tree in the for-
n
X
est into a directed tree E=− ln P (c | xi)
4. Add an arc from the class i=1
vertex C to every evidence
vertex Ei in the forest – if P (c | xi) = 1, then ln P (c | xi) = 0
5. Learn conditional probabil-
ity distributions from D us- – if P (c | xi) ↓ 0 then ln P (c | xi) → −∞
ing Dirichlet distributions
CLOTTING-FACTORS FEVER
80
>0.70 NO
<=0.55 YES-WITHOUT-CHILLS
0.56-0.70 WITH-CHILLS
GALL-BLADDER
NONE
COURVOISIER
75
BILIRUBIN ASCITES FIRM-OR-TENDER LEUKAEMIA-LYMPHOMA
<200UMOL/L NO NO
>=200UMOL/L YES YES
ASAT
Correct conclusion (%)
<40U/L
40-319U/L INTERMITTENT-JAUNDICE
UPPER-ABDOMINAL-PAIN
>=320U/L
SLIGHT-MODERATE
SEVERE
NO
YES
70
LIVER-SURFACE
SMOOTH
NODULAR GI-CANCER
NO
ALKALINE-PHOSPHATASE YES
<400U/L
400-1000U/L
>1000U/L LDH
65
<1300U/L
>=1300U/L
31-64YR
AGE 60
>=65YR
CONGESTIVE-HEART-FAILURE BILIARY-COLICS-GALLSTONES
ALCOHOL
NO
YES
NO
YES 55
1-4DRINKS/DAY
>=5DRINKS/DAY
HISTORY-GE-2-WEEKS
50
SPIDERS
NO
YES
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
NO
YES
Number of added arcs
NO
WEIGHT-LOSS 1200
YES
COMIK
JAUNDICE-DUE-TO-CIRRHOSIS
NO ACUTE-NON-OBSTRUCTIVE 1175
YES CHRONIC-NON-OBSTRUCTIVE
BENIGN-OBSTRUCTIVE
MALIGNANT-OBSTRUCTIVE
OTHER 1150
1125
1100
Based on COMIK dataset: 1075
Entropy
1050
• Dataset with 1002 patient cases with liver 1025
CT&RT-SCHEDULE
NO
YES
Q
E∈E P (E | C)P (C)
NONE
RT
GENERAL-HEALTH-STATUS CT
POOR
P (C | E) =
CT-NEXT-RT PERFORATION
AVERAGE
NO
GOOD
YES
HEMORRHAGE
P (E)
NO THERAPY-ADJUSTMENT
YES NO
YES
|=
AGE
10-19
20-29
30-39 EARLY-RESULT
CLINICAL-STAGE CR
40-44
I PR POST-CT&RT-SURVIVAL
Q
45-49
II1 NC NO
50-54
II2 PD
55-59 YES
For C = false:
Q
5-YEAR-RESULT
HISTOLOGICAL-CLASSIFICATION ALIVE
LOW-GRADE DEATH
P (¬c | E) =
NONE
CURATIVE NO
PALLIATIVE YES
CLINICAL-PRESENTATION
NONE
HEMORRHAGE
P (E)
PERFORATION
OBSTRUCTION
Q
P (c | E) P (E | c) P (c)
CLINICAL-STAGE
I
GENERAL-HEALTH-STATUS
POOR
CT&RT-SCHEDULE ⇒ = Q E∈E
P (¬c | E) E∈E (E | ¬c) P (¬c)
P
NONE
II1 AVERAGE
RT
II2 GOOD CLINICAL-STAGE CT&RT-SCHEDULE
CT
III I NONE
CT-NEXT-RT
IV II1 RT
II2 CT
m
III CT-NEXT-RT
Y
IV
GENERAL-HEALTH-STATUS
POOR
AVERAGE
= λi · O(c)
GOOD
AGE
10-19
AGE 20-29
10-19 30-39
20-29 40-44
i=1
45-49
30-39
40-44 50-54 5-YEAR-RESULT
5-YEAR-RESULT SURGERY 55-59 ALIVE
45-49
NONE 60-64 DEATH
50-54 ALIVE
CURATIVE 65-69
55-59 DEATH
= O(c | E)
PALLIATIVE 70-79
60-64
80-89
65-69
>=90
70-79
80-89
>=90 SURGERY
NONE
CURATIVE
PALLIATIVE
YES
BULKY-DISEASE HISTOLOGICAL-CLASSIFICATION
LOW-GRADE
CLINICAL-PRESENTATION
NONE
HEMORRHAGE
PERFORATION
YES
NO
BULKY-DISEASE HISTOLOGICAL-CLASSIFICATION
LOW-GRADE
HIGH-GRADE
CLINICAL-PRESENTATION
NONE
HEMORRHAGE
PERFORATION
Here is O(c | E) the conditional odds, and λi =
P (Ei | c)/P (Ei | ¬c) is a likelihood ratio
NO HIGH-GRADE OBSTRUCTION OBSTRUCTION
6
Back to probabilities:
Odds O
5
O(c | E)
4 P (c | E) =
1 + O(c | E)
3
Back to probabilities: x
decision hyperplane
O(c | E)
P (c | E) =
1 + O(c | E)
P Hyperplane: {ω | β T ω = 0} where
exp( m i=0 ωi )
= P
1 + exp( m i=0 ωi )
• c = β0ω0 is the intercept (recall that ω0 =
ln O(c), which is independent of any evidence
Adjust ωi with weights βi based in existing in- E)
teractions between variables:
• ωi, i = 1, . . . , m correspond to the probabili-
m
X
ln O(c | E) = βi ω i = β T ω ties we want to find
i=0
Coefficients...
Variable Coeff.
1 34.9227
2 -48.1161
3 7.8472
4 17.3933
5 -33.0445
6 22.2601
7 -82.415
8 -54.6671
Intercept 66.1064