Professional Documents
Culture Documents
Lec 08 Ensembles
Lec 08 Ensembles
Ensembles
Machine Learning
Andrey Filchenkov
17.06.2016
Lecture plan
• Composition of algorithms
• Algorithm diversity
• Random forest
• Gradient Boosting
• AdaBoost and its theoretical properties
Python: VotingClassifier
eclf1 = VotingClassifier (estimators=
[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
voting='hard')
previously
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
WEKA: Vote
Vote voter = new Vote();
voter.setClassifiers(classifiers)
previously
Classifier[] classifiers = {
new J48(),
new NaiveBayes(),
new RandomForest()
};
Python: BaggingClassifier
BaggingRegressor
WEKA: Bagging
Python: RandomForestClassifier
RandomForestRegressor
ExtraTreesClassifier
ExtraTreesRegressor
WEKA: RandomForest
= �ℎ , � ,
�=
where � ∈ ℝ are the coefficients minimizing
empirical risk
�
= � � , � → min
�
for a loss function � , � .
�
It is hard to find exact solution �, � �= .
We will develop function step by step
� = �− + �ℎ , �
Gradient:
� � �
� � � �, �
= � = � =
� �− �=
� �− �=
�
� �− , �
= � .
� �− �=
� = �− − �
�
� = argmin �− � − , � .
�=
�,
Input:
� �
=LEARN � �= , � �=
1. for = to do
� ��−1 ,�� �
2. = �
���−1 �=
� �
3. � =LEARN � �= , � �=
�
4. � = argmin �= � , ℎ�− � − ℎ �, �
5. � = �− + �ℎ , �
6. return
� = = � � � � <
�= �= �=
• AdaBoost
• AnyBoost
• LogitBoost
• BrownBoost
• ComBoost
• Stochastic gradient boosting
Python: GradientBoostingClassifier
GradientBoostingRegressor
AdaBoostClassifier
AdaBoostRegressor
WEKA: AdaBoostM1
LogitBoost
AdditiveRegression
= �ℎ , � ,
�=
It is classification, therefore , = .
Loss function is = exp −
Term “weights” is historically earlier, than “gradient”.
For , = .
� �− � � �− �
= � = � � = � � �= ,
� �− �=
� �− � �=
� ��−1 ��
where � = � is weight of object �.
� ��−1 ��
Then the forth algorithm step is � =LEARN � �= , � �= :
ℎ , � = argmin ∈� ℎ �, � , � =
�=
= argmin ∈� � �ℎ �, � .
�=
1. for � = to � do
2. � =
�
3. for = to � do
�
4. � = argmin� ℎ , � ,
�
5. � = �= �[ �ℎ �. � < ]
− �
6. � = ln
�
7. for � = to � do
8. � = � exp − � �ℎ �, �
�
9. NORMALIZE( � �= )
10. return = ��= �ℎ , �
Machine learning. Lecture 8. Ensembles. 17.06.2016. 33
Main boosting theorem
νθ , � = [ ≤ ]
� �
�
�
�
ln ln �
≤ νθ , +� + ln .
� �
Advantages:
• hard to get overfitted
• can be applied for different loss functions
Disadvantages:
• no noise processing
• cannot be applied for powerful algorithm
• it is hard to explain result