06 Decision Trees

Classification using thresholds in1a1hierarchical fashion
! Exhaustive"information"often"is"not"immediately"available;"it"
may"not"even"be"needed."
! The"classifier"may"do"better"choosing"the"attributes"one"at"a"
time,"according"to"the"demands"of"the"situation."
! The"most"popular"tool"targeting"this"scenario"is"a"decision"tree."
.
ML"for"EO ©"TL&RS"Lab"2021
! A"decision"Tree"is"a"Supervised"
learning"technique that"can"be"
used"for"solving"classification"
problems
! TreeCstructured"classifier:
! internal"nodes"" features"of"a"
dataset;
! branches"" decision"rules
! leaf"node"" classification Figure"from:""https://www.javatpoint.com/machineClearningCdecisionCtreeCclassificationCalgorithm
outcome.
! With"respect"to"the"classifiers"introduced"earlier,"DTs"have"
one"striking"advantage:"interpretability""(especially"the"neural"
network"is"a"real"black"box).
! A"decision"tree"may"be"considered"as"a"set"of"rules,"hence"
they"can"be"used"to"implement"«expert"knowledge»"(if"it"is"
possible"to"express"it"via"if#then rules).
! DTs"built"from"examples"may"not"cover"all"possible"situations"
(“missing"edges”)." Choosing"the"class"randomly"or"preferring"
the"most"frequent"class"are"the"most"obvious"possibilities.
! Example"of"decision"
tree"applied"to"Earth"
observation"data"for"
land"cover"
classification
! In"this"case"bands"are"
evaluated"once"at"a"
time
Sharma,"R."et"al."“Decision"tree"approach"for"classification"of"remotely"sensed"satellite"data"using"open"source"
ML"for"EO support.”"Journal"of"Earth"System"Science"122"(2013):"1237O1247. ©"TL&RS"Lab"2021
! Divide"and"conquer:"start"from"a"random"attribute"as"root," at"
each"level"the"training"set"is"divided"into"disjoint"subsets,"and"
these"into"further"subsets,"and"so"on"until"each"subset"is"
“pure”"in"the"sense"that"all"its"examples"belong"to"the"same"
class.
! Starting"from"different"roots"leads"to"different"trees:"is"there"a"
criterion"to"select"one"of"them?
! One"could"be"the"size"of"the"DT"…
! Number"of"nodes"vs."number"of"tests"applied"in"average"to"an"
element"to"classify"it.
! Small"DT"are"preferable"because"of"their:
! easier"interpretability;
! ability"to"dispose"of"irrelevant"or"redundant"information;
! Smaller"probability"to"overfit"the"training"examples"(recall"linear"vs."
polynomial"discriminative"functions?)
! To"proceed"we"need"to"know"what"is"the"information"content"
of"a"subset"extracted"using"a"specific"node
! The"amount"of"information"contributed"by"at is"the"
difference"between"the"entropy"before"at has"been"
considered"and"the"entropy"after"this"attribute"has"been"
considered
! This"difference"cannot"be"negative;"information"can"only"be"
gained,"never"lost,"by"considering"at ."It"may"be"zero,"
however,"which"means"that"no"information"has"been"gained.
! The"trick"is"to"choose"a"threshold,"!,"and"then"decide"that"if"""""
x <"! ,"then"the"value"of"a"newly"created"boolean attribute"is"
true","and"otherwise"it"is"false (or"vice"versa)."But"what"! is"it"
better"to"choose?
! One"may"order"the"possible"values"of"x," !" , … , !% , … !& and"
'( )'(*+
select"among"the"possible"threshold"values," ,"the"one"
,
which"maximizes"the"information.
! This"is,"computationally"speaking,"very%demanding!
! The"best"candidate"threshold"never"finds"itself"between"
values"that"are"labeled"with"the"same"class."
! This"means"that"it"is"enough"to"investigate"the"contributed"
information"only"for"locations"between"values"with"opposite"
class"labels.
! Pros:
! very"intuitive"and"easy"to"explain
! robustness"to"missing"data,"lack"of"correction,"other"data"issues
! Needs"no"assumption"on"data"distribution
! Cons:
! Tend"to"overfit"" Random"forests"as"a"countermeasure
! Pruning"is"the"process"to"reduce"teh"size"of"a"DT"without"
reducing"too"much"its"performances.
! It"is"typically"carried"out"in"a"sequence"of"steps:"first"replace"
with"a"leaf"one"subtree,"then"another,"and"so"on,"as"long"as"
the"replacements"appear"to"be"beneficial"according"to"some"
reasonable"criterion.
! To"this"aim,"one"needs"to"consider"an"error"estimate
! In the first approximation, probability is identified with the relative frequency: to be fair,
though, such estimate can be trusted only when supported by many observations.
! Let"m2 and"m5 be"the"numbers"of"the"training"examples"
reaching"two"subtrees"ti and"ti+1 ,"respectively;"and"let"E2 and"
E5 be"the"error"estimates"of"the"two"subtrees.
! For"the"total"of"m2 +"m5 training"examples,"the"error"rate"of"
the"whole"subtree"is"estimated"as"the"weighted"average"of"the"
two"subtrees:
! Pruning"is"likely"to"change"the"classifier’s"performance."One"
way"to"assess"this"change"is"to"compare"the"error"estimate"of"
the"decision"tree"after"and"before"the"pruning":
! From"the"available"alternatives,"the"one"with"the"smallest"
value"is"selected,"but"only"if"this"value"is"smaller"than"a"
threshold"set"to"control"the"performance"degradation"vs."the"
tree’s"compactness.
! The"testing:set"curve’s"shape"shows"that"the"unpruned"tree"
usually"scores"poorly"on"testing"data,"because"of"overfitting.
! Excessive"pruning,"however,"remove"attribute"tests"that"do"
carry"useful"information,"with"a"detrimental"effect"on"
classification"performance.
! In"the"divide;and;conquer"procedure,"each"subsequent"
attribute"divides"the"set"of"training"examples"into"smaller"and"
smaller"subsets."
! The"number"of"samples"supporting"the"choice"of"the"tests"at"
lower"tree;levels"will"be"smaller.
! Therefore"on;line"pruning"may"be"considered"to"make"sure"
this"situation"is"prevented.
! If"the"training"subset"is"smaller"than"a"user;set"minimum,"any"
further"expansion"of"the"tree"is"stopped.
! Each"leaf"is"associated"with"a"concrete"conjunction"of"test"results,"
which"can"be"easily"transformed"in"a"rule."
! Considering"all"leaves,"the"complete"set"of"rules"can"be"extracted.
! Note:"there"is"a"default"class,"hence"in"a"domain"with"K classes,"only"
the"rules"for"K"K 1 classes"are"needed,"the"last"class"serving"as"the"
default.
! Once"the"tree"has"been"converted"to"rules,"however,"pruning"
gains"in"flexibility:"any test"in"the"if Epart"of"any rule"is"a"
potential"candidate"for"removal.
! The"idea"behind"decision"trees"was"first"put"forward"in"the"
late"1950s."An"important"result"of"the"research"was"reached"by"
Breiman et"al."with"the"so"called"CART"(Classification"and"
Regression"Tree)"system."
! The"idea"was"then"imported"to"the"machineJlearning"world."
The"most"famous"implementation"in"ML"is"the"algorithm"
called"C4.5,"designed"by"Quinlan.
Breiman,"L.,"Friedman,"J.,"Olshen,"R.,"&"Stone,"C."J."(1984)."Classification"and"regression"trees."Belmont:"Wadsworth"International"Group.
Quinlan,"J."R."(1986)."Induction"of"decision"trees."Machine"Learning,"1,"81–106.
Quinlan,"R."(1990)."Learning"logical"definitions"from"relations."Machine"Learning,"5,"239–266.
Quinlan,"J."R."(1993)."C4.5:"Programs"for"machine"learning."San"Mateo:"Morgan"Kaufmann.
! Multi7scale"Hierarchical"(Binary)"Decision"Tree"(MsHBDT)
(d j )
fi sk
Information A
(d j )
sk−1
fi € €
Information B ...
€
€ (d ) s1
Information C Information D
fi j €
Binary disaggregation of the task according €f i

(d j )
s0
to a global functional €
€ €
Features f and combination
operators need to be efficiently (Binary) multi-scale combination using
different operators to extract one kind of
ML"for"EO
selected at each scale s information at a time. ©"TL&RS"Lab"2021
! A"HDBT"is"built"following"these"steps:
! A"“node” is"simply"one"of"all"the"possible"processing"chains.
! Each"node"is"more"ore"less"“able” to"extract"a"specific"type"of"
information.
! Processing"chains"are"selected"according"to"the"best"“match” between
the"extracted and"the"wanted information.
! Each"selected"processing"chain"is"assigned"to"a"node"and"the"task"to"be"
completed"is"updated"by"excluding"the"extracted"information"type.
! The"previous"three"steps"are"iterated"until"all"the"nodes"are"set"(i.e.,"for"
each"information"type"the"“best"match” processing"chain"is"selected).
Design*step
Processing*chain*#1 Processing*
Chain
Processing*chain*#2 Inf.*A
Processing*chain*#3
Inf.*B
Processing*chain*#4
Inf.*C Inf.*D
Spectral/spatial*features List*of*operators
GLCM
Features OP1
NDSV OP2 Node
RSAdataAset(s)
Morphological
Filters
OP3
Radiometric OP4
values
K.#L.#Bakos and#P.#Gamba,#"Hierarchical Hybrid Decision Tree Fusion#of#Multiple#Hyperspectral#Data#Processing#Chains,"#

ML"for"EO in IEEE*Transactions on*Geoscience and*Remote*Sensing,#vol.#49,#no.#1,#pp.#388J394,#Jan.#2011, ©"TL&RS"Lab"2021
GLCM
SVM
Class*3 DMP
RF
Class*4 NDSV
KNN
Class*2 GLCM
MLH
VHR*VIS8NIR*image
Classification*map
Class*1 Class*5
Remote"Sensing"Group
Using"the"ranking"index"accuracy"calculation
2 2
w1 Ap + w2 Au
Ae =
2
Where,"
! Ae is"the"ranking"index"or""efficient"accuracy"value"calculated"for"a"given"class"in"the"given"confusion"matrix
! Ap is"the"producer"accuracy"of"class"within"the"confusion"matrix
! Au is"the"user"accuracy"of"a"class"in"the"confusion"matrix
! w1 and"w2 are"weights"to"emphasise"any"of"the"above"two.
Remote&Sensing&Group
! Add"a"segmentation"by"using"mathematical"morphpology
filters.
! At"each"node"of"the"HBDTC"a"filter"in"each"segment""is"
swapping"labels"for"the"labels"that"are"over"an"85%"of"
threshold"in"the"shape"to"the"majority"labels.
#−1
# !′ "
!"# = ! ′ " &'(&!"# ≠ & #−1
!
,"-.&!

06 Decision Trees

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06 Decision Trees

Uploaded by

Copyright:

Available Formats

Classification using thresholds in1a1hierarchical fashion

Binary disaggregation of the task according €f i

K.#L.#Bakos and#P.#Gamba,#"Hierarchical Hybrid Decision Tree Fusion#of#Multiple#Hyperspectral#Data#Processing#Chains,"#

You might also like