Decision Tree Algorithm

DECISION TREE
ALGORITHM
Guided By Presented By
Mrs Vani V Prakash Anoop N M
Assistant Professor S7 CSE
Dept of CSE CMA20CS011
Contents
1. Introduction
2. Phishing
3. Decision Tree
4. How Decision Trees can be used to detect phishing attacks
5. Architecture of Decision Tree
6. Features of Decision Tree
7. Advantages and disadvantages of Decision Tree
8. How the algorithm works
9. Conclusion
10. References
Introduction
● Phishing attacks, a prevalent form of cyber threat, continue to pose

significant risks to individuals and organizations
● In this context, leveraging advanced technologies like the Decision Tree

algorithm becomes crucial for enhancing security measures.
Phishing
Phishing attacks are a type of social engineering attack that attempts to trick
users into revealing sensitive information, such as passwords, credit card
numbers, or other personal details. These attacks can be carried out through a
variety of channels, including email, SMS, social media, and websites
Decision Tree
● A decision tree is a type of supervised learning algorithm that is commonly

used in machine learning to model and predict outcomes based on input
data. It is a tree-like structure where each internal node tests on attribute,
each branch corresponds to attribute value and each leaf node represents
the final decision or prediction.
● Decision tree classifiers have been used in a wide variety of applications,
including medical diagnosis, fraud detection, and customer segmentation.
WHAT IS MACHINE LEARNING?
● Machine Learning is said as a subset of Artificial Intelligence.

● The term machine learning was first introduced by Arthur Samuel in
1959.
● It mainly concerned with the development of algorithms which allow a
computer to learn from the data and past experiences on their own.
How Decision Trees can be used to detect
Phishing attacks
● Decision trees effectively identify phishing attacks by analyzing URL

features and website contents.
● They classify websites as legitimate or phishing based on specific
rules derived from training data.
Contd…
❖ Step 1: Collect data

❖ Step 2: Preprocess the data
❖ Step 3: Train the decision tree
❖ Step 4: Evaluate the decision tree
❖ Step 5: Deploy the decision tree
Architecture of Decision Tree
Features of Decision Tree
● Decision trees are a popular machine learning model known for their
simplicity and interpretability.
● They consist of nodes representing decisions or tests based on input

features, branches representing the outcomes of those decisions, and
leaves representing the final predictions.
● Decision trees are graphical, easy to interpret, and can be easily explained
to non-experts
Advantages
❖ Interpretability
Decision trees are easy to understand and interpret, making it simple for
security professionals and analysts to grasp the logic behind the decision-
making process
❖ Feature Importance
Decision trees can provide insights into the importance of different
features in making a decision. Features such as URL structure, domain
age, and presence of HTTPS can be weighed in terms of their significance
in determining whether a website is potentially malicious.
Contd…
❖ Efficiency
Decision trees can be efficient in terms of both training and
prediction times.
They are relatively quick to build, and once trained, the decision-
making process involves traversing the tree structure, which is
computationally efficient.
❖ Robustness
Decision trees are robust to irrelevant features, meaning that the
algorithm can still perform well even if some of the input features are
not particularly informative.
Disadvantages
❖ Overfitting
Decision trees are prone to overfitting, especially when the tree is deep and
captures noise or specific details of the training data that do not generalize
well to new, unseen data.
❖ Instability
Decision trees can be sensitive to small variations in the training data. A
small change in the input data can lead to a completely different tree
structure, making the model less stable compared to some other machine
learning algorithms.
Contd…
❖ Limited Handling of Missing Data

Decision trees can be sensitive to missing data. While there are
techniques to handle missing values, such as imputation, the inherent
structure of decision trees may not handle missing data as gracefully as
some other algorithms.
DECISION TREE ALGORITHMS
Decision Trees algorithms include:-
1. ID3 (Iterative Dichotomiser 3)

2. C4.5 (Successor of ID3)
3. CART (Classification and Regression Trees)
4. CHAID (Chi-squared Automatic Interaction Detector)
5. QUEST (Quick, Unbiased, Efficient Statistical Tree)
Literature Survey
● A Systematic Literature Review on Phishing Website Detection

Techniques
This literature survey, published in ScienceDirect in 2022, provides a
comprehensive overview of phishing website detection techniques,
including decision tree algorithms. The authors surveyed 53 studies
that used decision trees for phishing detection and found that they
achieved an average accuracy of 95%.
Contd…
● Phishing Classification Techniques: A Systematic Literature Review
This literature survey, published in IEEE Xplore in 2022, focuses on

phishing classification techniques, including decision trees. The
authors surveyed 43 studies that used decision trees for phishing
classification and found that they achieved an average accuracy of
96%. The authors also found that decision trees are more effective than
other classification techniques, such as support vector machines and
multilayer perceptrons, for phishing classification
Contd…
● Phishing Sites Detection Based on C4.5 Decision Tree Algorithm
This literature survey, published in IEEE Xplore in 2015, focuses on

the use of the C4.5 decision tree algorithm for phishing site
detection. The authors proposed a method for using the C4.5
algorithm to classify websites as phishing or legitimate based on
URL features. The authors found that their method achieved an
accuracy of 94.26%.
Accuracy
Regarding the accuracy of the Decision Tree Algorithm for phishing detection,
the paper mentioned that the algorithm achieved a detection accuracy of 96.59%
using the Decision Tree Algorithm with the lowest false positive rate.
Conclusion
The Decision Tree algorithm serves as an essential asset in the proactive defense
against phishing attacks. Its simplicity, interpretability, and ability to handle both
numerical and categorical data make it a noteworthy choice for those seeking a
comprehensive and understandable solution in the ongoing battle to secure
digital environments.
References
● Authors, F. (2016). A hybrid firefly and support vector machine

classifier for phishing email detection
● Gadge, L. M. J. (2017). Phishing sites detection based on C4.5.
● Mahajan, R. (2018). Phishing Website Detection using Machine
Learning Algorithms
● Sonowal, G. (2020). Phishing Email Detection Based on Binary
Search Feature Selection.
● Shahrivari, V. (2020). Phishing Detection Using Machine Learning
Techniques
THANK YOU

Decision Tree Algorithm

Uploaded by

Copyright:

Available Formats

You might also like

Decision Tree Algorithm

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decision Tree Algorithm

Uploaded by

Copyright:

Available Formats

DECISION TREE

● Phishing attacks, a prevalent form of cyber threat, continue to pose

● In this context, leveraging advanced technologies like the Decision Tree

● A decision tree is a type of supervised learning algorithm that is commonly

● Machine Learning is said as a subset of Artificial Intelligence.

● Decision trees effectively identify phishing attacks by analyzing URL

❖ Step 1: Collect data

● They consist of nodes representing decisions or tests based on input

❖ Limited Handling of Missing Data

Decision Trees algorithms include:-

1. ID3 (Iterative Dichotomiser 3)

● A Systematic Literature Review on Phishing Website Detection

● Phishing Classification Techniques: A Systematic Literature Review

This literature survey, published in IEEE Xplore in 2022, focuses on

● Phishing Sites Detection Based on C4.5 Decision Tree Algorithm

This literature survey, published in IEEE Xplore in 2015, focuses on

● Authors, F. (2016). A hybrid firefly and support vector machine

You might also like