Professional Documents
Culture Documents
Outlier and Class Imbalance: Dr. Manjubala Bisi
Outlier and Class Imbalance: Dr. Manjubala Bisi
Assistant Professor
Department of Computer Science and Engineering
National Institute of Technology Warangal
manjubalabisi@nitw.ac.in
02/05/2023
Outlier
Class Imbalance Problem
Case Study
Visualization Technique
Box plot
Histogram
Scatter plot
Mathematical Function
Z Score
IQR (Inter Quartile Range) Score
Tomek links are pairs of very close instances but of opposite classes
Removing the instances of the majority class of each pair increases the
space between the two classes, facilitating the classification process
Tomek’s link exists if the two samples are the nearest neighbors of
each other
Step 1: The method first finds the distances between all instances of
the majority class and the instances of the minority class. Here,
majority class is to be under-sampled
Step 2: Then, n instances of the majority class that have the smallest
distances to those in the minority class are selected
Input : Training data set Dtr with m samples (xi, yi), i = 1, ..., m,
where xi is an instance in the n dimensional feature space X and yi is
the class identity label associated with xi
Define ms and ml as the number of minority class examples and the
number of majority class examples, respectively. Therefore, ms ≤ ml
and ms + ml = m
dth is a preset threshold for the maximum tolerated degree of class
imbalance ratio
TPR = TP/(TP+FN)
TNR = TN / (TN+FP)
Accuracy = (TP+TN)/((TP+FP+TN+FN)
Recall = TP /(TP+FN)
Precision = TN / (TN+FN)
F-measure = 2*Precision*Recall/ (Precision + Recall)