Professional Documents
Culture Documents
ADII10 Analisa Outlier
ADII10 Analisa Outlier
Analisa Outlier
2
Outlier Detection Method
3
-
4
-
Note:
This slides are based on the additional material provided with the textbook that we use: J. Han, M. Kamber and J. Pei, “Data Mining:
Concepts and Techniques” and P. Tan, M. Steinbach, and V. Kumar "Introduction to Data Mining“.
Outlier and Outlier Analysis
What Are Outliers?
5
What Are Outliers?
● Outlier detection vs. novelty detection: early stage, outlier; but later
merged into the model
○ In "novelty detection", you have a data set that contains only
want to identify
6
Application of Outlier Detection
Distribution of
normal data
Types of Outliers (I)
12
Global Outlier
● Definition:
○ Global outliers are data points that deviate significantly from
set
● Example:
○ Intrusion detection in computer networks
● Causes:
○ Errors in data collection, measurement errors, or truly unusual
● Impact:
○ Global outliers can distort data analysis results and affect
● Detection:
○ Techniques include statistical methods (e.g., z-score,
● Handling:
○ Options may include removing or correcting outliers,
● Considerations:
○ Carefully considering the impact of global outliers is crucial
● Definition:
○ Contextual outliers are data points that deviate significantly from the
expected behavior within a specific context or subgroup.
● Characteristics:
○ Contextual outliers may not be outliers when considered in the entire
dataset, but they exhibit unusual behavior within a specific context or
subgroup.
● Detection:
○ Techniques for detecting contextual outliers include contextual
clustering, contextual anomaly detection, and context-aware machine
learning approaches.
Contextual Outlier
● Detection:
○ Techniques for detecting contextual outliers include
● Handling:
○ Handling contextual outliers may involve considering the
● Definition:
○ A subset of data objects collectively deviate significantly from
the whole data set, even if the individual data objects may not
be outliers
● Applications:
○ e.g., intrusion detection: When a number of computers keep
of groups of objects
○ Need to have the background knowledge on the relationship
● Definition:
○ Collective outliers are groups of data points that collectively
● Impact:
○ Collective outliers can represent interesting patterns or
● Considerations:
○ Detecting and interpreting collective outliers can be more
application
○ The border between normal and outlier objects is often a gray
area
● Application-specific outlier detection
○ Choice of distance measure among objects and the model of
outlier detection
● Understandability
○ Understand why these are outliers: Justification of the
detection
○ Specify the degree of an outlier: the unlikelihood of the object
obtained:
■ Supervised, semi-supervised vs. unsupervised methods
methods
Outlier Detection I: Supervised Methods
● Challenges
○ Imbalanced classes, i.e., outliers are rare: Boost the outlier
class and make up some artificial outliers
○ Catch as many outliers as possible, i.e., recall is more
important than accuracy (i.e., not mislabeling normal objects
as outliers)
Outlier Detection II: Unsupervised Methods