HSMC

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

TECHNICAL REPORT WRITING FOR CA2 EXAMINATION

Topic: Decision Tree

NAME: AISHEE DUTTA


UNIVERSITY ROLL NO.: 18730522017
REGISTRATION NO.: 221870110320
PAPER NAME: Economics For Engineers
PAPER CODE: HSMC 301
STREAM: CSE (DATA SCIENCE)
BATCH: 2022-2026
INTRODUCTION

Decision trees are a popular and powerful machine learning algorithm used for both
classification and regression tasks. They are a fundamental part of the ensemble learning
methods like Random Forests and Gradient Boosting. Decision trees are widely applied in
various domains, including finance, healthcare, marketing, and natural language processing.
This technical report provides an in-depth understanding of decision trees, their construction,
advantages, disadvantages, and practical considerations.

Decision Tree Basics


1.1. Tree Structure
A decision tree is a hierarchical structure resembling an inverted tree, where each node
represents a decision or test on a specific feature, and each branch represents the outcome of
that decision. The leaves of the tree contain the final prediction or decision.

1.2. Decision Criteria


At each internal node, a decision tree uses a criterion to split the data into subsets. Common
criteria include Gini impurity for classification tasks and mean squared error for regression
tasks. These criteria help the tree find the most informative features and optimal split points.

1.3. Recursive Partitioning


The construction of a decision tree is a recursive process. It starts with the entire dataset and
repeatedly splits it into smaller subsets until a stopping condition is met, such as a maximum
tree depth or a minimum number of samples per leaf node.

Decision Tree Construction


2.1. Splitting Criteria
Decision trees aim to maximize information gain or minimize impurity at each split. Common
splitting criteria include:

• Gini Impurity: Measures the probability of misclassifying a randomly chosen element.


• Entropy: Measures the level of disorder or uncertainty in the data.
• Information Gain: Measures the reduction in entropy or impurity after a split.
• Mean Squared Error (MSE): Measures the variance of the target variable.

2.2. Pruning
Pruning is a technique to prevent overfitting by removing branches of the tree that do not
significantly improve predictive accuracy. It involves setting a minimum node size or a
maximum depth for the tree.

Advantages of Decision Trees


• Interpretability: Decision trees are easy to visualize and interpret, making them a
valuable tool for explaining model predictions.
• Non-linearity: They can capture non-linear relationships in the data.
• Feature Importance: Decision trees can rank features by their importance in making
decisions.
• Robustness: They perform well with both categorical and numerical data.

Disadvantages of Decision Trees


• Overfitting: Decision trees can easily overfit the training data if not properly pruned or
regularized.
• Instability: Small changes in the data can result in significantly different tree structures.
• Bias: Decision trees tend to favor features with more levels or categories.
• Limited Expressiveness: They may struggle with complex relationships that require
deep trees.

Practical Considerations
3.1. Handling Missing Values
Decision trees can handle missing values by either imputing them or using surrogate splits.

3.2. Categorical Data


Categorical data can be handled by methods like one-hot encoding or label encoding, depending
on the algorithm and problem.
3.3. Feature Scaling
Unlike some algorithms (e.g., k-means), decision trees are not sensitive to feature scaling.

3.4. Ensemble Methods


Ensemble methods like Random Forests and Gradient Boosting are often used to improve
decision tree performance and reduce overfitting.
CONCLUSION

Decision trees are versatile and interpretable machine learning models that have found
applications in various domains. Understanding their construction, strengths, and weaknesses
is crucial for effectively using them in practice. Careful pruning and the use of ensemble
methods can mitigate their limitations and enhance their predictive power. In the ever-evolving
field of machine learning, decision trees remain a valuable tool for both beginners and
experienced practitioners.

REFERENCES
• Investipedia.com
• Wikipedia.com

You might also like