Professional Documents
Culture Documents
Proposal Defense v6
Proposal Defense v6
• Freund, Y., & Schapire, R. E.. Experiments with a new boosting algorithm.
In ICML (Vol. 96, pp. 148-156). (1996)
• Eberhart, R. C., & Kennedy, J. (1995). A new optimizer using particle swarm
theory. In Proceedings of the sixth international symposium on micro machine
and human science (pp. 39–43)
Research Background
• There are various top-down decision trees induction algorithms
such as ID3, C4.5, CART.
• C4.5 and CART consist of two conceptual phases: Growing and
Pruning.
• C4.5 algorithm
developed by Ross Quinlan as an extended algorithm from
ID3 in building a decision tree classifiers/model
uses the information gain/gain ratio for selecting the best
attribute to be used as a node in the tree to split dataset,
constructing decision tree in a top-down recursive divide and
conquer manner.
• Ensemble techniques construct multiple classifiers from the
original data set and aggregate their predictions when
classifying unknown instances. he
Research Background
• Top-down decision trees induction algorithms: ID3
and C4.5.
ID3 C4.5
handles continuous and
handles discrete values discrete values
doesn’t handle missing and
continuous values handles missing values
doesn’t perform pruning pruning
uses information gain as uses gain ratio as splitting
splitting criteria criteria
Research Background
C4.5 Decision Tree Induction Algorithm
Input: Training set S of n examples
Output: decision tree with root R;
1. If the instances in S belong to the same class or the amount of instances in
S is too few, set R as leaf node and label the node R with the most frequent
class in S;
2. Otherwise, choose a test attribute X with two or more values (outcomes)
based on a selecting criterion, and label the node R with X;
3. Partition S into subsets S1, S2, …, Sm according to the outcome of
attribute X for each outcome; generate Rm children nodes R1, R2, …, Rm;
4. For every group (Si, Ri), build recursively a subtree with root Ri.
5. Recursive partitioning complete only if all instances belongs to same class,
or there are no more attributes remain, or no instances left.
Research Background
• Ensemble learning is defined as learning multiple
classifiers using different training data or different
learning algorithms
• Combine decisions of multiple classifiers, e.g. using
majority voting, weighted voting or Bayesian voting.
• Applied to improve classifiers and to improve their
accuracy in predictions.
• Ensemble methods
Boosting
Gradient Boosting/Stochastic Gradient Boosting
Bagging
Stacking
Research Background
• Ensemble Methods – Generic Approach:
Three main steps exist in an ensemble model are:
1. training set generation,
2. learning, and
3. integration.
Step 1 begins with the original training set S. From this training
set, t data subsets are created (S1, S2, …, St). Bagging and
boosting are common ways to accomplish this step
Step 2, t base classifiers are generated (I1, I2, …, It) training
data or different learning algorithms. These classifiers may all be
the same, all different, or contain any combination of the
same or different classifiers. Each classifier It is trained using
the subset St.
Step 3, the prediction of each classifier is combined in an a
predetermined way to produce the resulting classification.
Research Background
Ensemble Methods – Generic Approach:
Two primary approaches exist to the integration phase:
combination and selection.
1. The combination approach, the base classifiers produce their
class predictions and the final outcome is composed using
those predictions.
2. The selection approach, one of the classifiers is selected and
the final prediction is the one produced by it. The most
commonly used integration techniques are
1. voting,
2. simple and weighted averaging, and
3. a posteriori (Bayesian).
Research Background
• Use a single, arbitrary learning algorithm but
manipulate training data to make it learn multiple
models.
– Data1 Data2 … Data m
– Learner1 = Learner2 = … = Learner m
• Different methods for changing training data:
– Bagging: Resample training data (Bootstrap sampling)
– Boosting: Reweight training data
– DECORATE: Add additional artificial training data
• In WEKA, these are called meta-learners, they take a
learning algorithm as an argument (base learner) and
create a new learning algorithm.
Research Background
• Ensemble Method -Bagging
Create ensembles by repeatedly randomly resampling the
training data (Brieman, 1996).
Given a training set of size S, create m samples of size S by
drawing S examples from the original data, with replacement.
− Each bootstrap sample will on average contain 63.2% of
the unique training examples, the rest are replicates.
Combine the m resulting models using simple majority voting.
Decreases error by decreasing the variance in the results due to
unstable learners, algorithms (like decision trees) whose
output can change dramatically when the training data is
slightly changed.
Research Background
• Ensemble Methods - Boosting
Train multiple classifiers using the training data in following
way :
‾ Look at errors from previous classifiers to decide what to
focus on in the next training iteration.
‾ Each new classifier depends on its predecessors’ errors.
Result: more weight on ‘hard’ samples (the ones where
classifiers committed mistakes in the previous iterations).
• Predict outcome for a previously unseen sample by
aggregating predictions made by the multiple models
(ensembles).
• Hard and difficult samples in dataset are synonyms.
Research Background
• Ensemble Method: AdaBoost learning algorithm.
1. Assume that the learning algorithm accepts
weighted examples
2. At each step, AdaBoost increases the weights of
examples from the learning sample misclassified
by the previous model
3. Thus, the algorithm focuses on the hard/difficult
samples from the learning samples
4. In the weighted majority vote, AdaBoost gives
higher influence to the more accurate models
Research Background
• Ensemble Methods:AdaBoost (adaptive boosting) :
1. AdaBoost (two-class).
2. AdaBoost.M1 and AdaBoost.M2 (multi-class).
3. AdaBoostR (regression).
• AdaBoost.M1 is the most popular (classification
algorithm with more than two classes).
• Other families (what changes is the weight/loss
function and the voting function):
LogitBoost
L2Boost
Research Background
H( x) sign t h t ( x)
t
1
H( x) arg max y ln
t:h t (x) y t
Research Background
Microsoft Office
Word Document
Research Background
AdaBoost.M1 Algorithm
• Freund & Schapire (1996)
• Each classifier is generated with different training
set obtained from the original dataset using
resampling or reweighting techniques.
• Creates an ensemble of classifiers with each gives a
weighted vote
• Is used to boost decision trees
Research Background
AdaBoost .M1Algorithm for Multi Class
Microsoft Office
Word Document
Research Background
AdaBoost.M2 Algorithm:
• Freund et al.(1999)
• Extension of Adaboost.M1 for multiclass in
which it makes use of base classifiers’
confidence rates
Research Background
AdaBoost .M2 Algorithm for Multiclass
Microsoft Office
Word Document
Research Background
PSO – Particle Swarm Optimization
• developed in 1995 by James Kennedy and Russell Eberhart
• Basically PSO works as below :
Each particle is searching for the optimum
Each particle is moving and hence has a velocity.
Each particle remembers the position it was in where it had its
best result so far (its personal best)
A particle has a neighborhood associated with it.
A particle knows the fitnesses of those in its neighborhood, and
uses the position of the one with best fitness.
This position is simply used to adjust the particle’s velocity
Research Background
PSO - Pseudo code
For each particle
Initialize particle
END
Do
For each particle
Calculate fitness value
If the fitness value is better than its personal best
set current value as the new pBest
End
Choose the particle with the best fitness value of all as gBest
For each particle
Calculate particle velocity
Update particle position
End
While maximum iterations or minimum error criteria is not attained
Literature Review on
Sampling Techniques
• Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.:
“SMOTE: Synthetic Minority Over-sampling Technique”.
Journal of Artificial Intelligence Research 16 (2002) 321–357.
Microsoft Word
Document
Literature Review on
Assessment Metrics and Wrapper
Kohavi. R, John,G.H , Wrappers for feature subset
selection, Artificial Intelligence, Volume 97, Issues 1–2,
December 1997, Pages 273-324. (1997)
Negative Class FP TN
Positive Class TP FN
Data Algorithm
Undersampling Cost-Sensitive
• Random Undersampling Learning
• Condensed Nearest Neighbour
Rule
• Tomek Links
• Evolutionary Prototype
Selection One Class Learning
Oversampling
• Random Oversampling
• SMOTE
• Borderline-SMOTE1
• Borderline - SMOTE2
Methodology
• Hybrid ensemble decision tree is proposed in solving
binary and multiclass imbalance classification problem