Sanjay

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

“DECISION TREE ANALYSIS USING MACHINE LEARNING”

A Internship report submitted in partial fulfillment of the requirements for the award of a degree of

MASTER OF COMPUTER APPLICATIONS


of

Visvesvaraya Technological University

By
SANJAY D V
4UB22MC087
Under the Guidance of
Mr .CHETAN KUMAR G S
Asst. professor(ad-hoc)
Department of MCA

Department of Master of Computer Applications,

UNIVERSITY B.D.T COLLEGE OF ENGINEERING,

Dental College Road, Opposite Shanthala Shop,

DAVANGERE -577 004.

2023-2024
UNIVERSITY B.D.T COLLEGE OF ENGINEERING
DAVANGERE - 577 004
(A Constituent College of Visvesvaraya Technological University, Belagavi, Karnataka)

DEPARTMENT OF MASTER OF COMPUTER APPLICATIONS

CERTIFICATE
This is to Certify that Mr. SANJAY D V bearing USN: 4UB22MC087 has completed her 3rd semester
internship project work entitled "DECISION TREE ANALYSIS" as a partial fulfilment for the award of
Master of Computer Applications degree, during the academic year 2023-24

Signature of the guide Signature of HOD

Mr.Chetan Kumar G S Dr. Harish B G


Asst. Professor (ad-hoc) Asst. Professor and HOD
Department of MCA, Department of MCA,
UBDTCE Davangere. UBDTCE Davangere.
DECLARATION

Mr.SANJAY D V student of 3rd semester MCA, UNIVERSITY B.D.T COLLEGE OF


ENGINEERING, bearing USN: 4UB22MC087 hereby declare that the project entitled "DECISION
TREE ANALSIS " has been carried out by me under the supervision of Guide, Mr.Chetan Kumar G S
Dept. of MCA and submitted in partial fulfilment of the requirements for the award of the Degree of
Master of Computer Applications by the Visvesvaraya Technological University during the academic year
2023-24. This report has not been submitted to any other University for any award of degree or certificate.

Name: SANJAY D V

Signature:
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the progress and completion of any task would be
incomplete without the mention of the people who made it possible, whose constant guidance and
encouragement ground my efforts with success.

I consider it is a privilege to express my gratitude and respect to all those who guided me in the progress of
my project report.

I express my sincere words of gratitude to the honourable principal Dr. D.P. Nagarajappa, for his constant
and dedicated support.

It is a great privilege to place on record my deep sense of gratitude to the co-ordinator Dr.Harish B G
Department of Master of Computer Applications, who patronised the carrier and for the facilities being
provided for the work.

I express my deep sense of gratitude to Mr.Chetan Kumar G S, guide, Department of Master of


Computer Applications, for supporting and encouraging at each stage of my project work and guided me
to do my best.

I also sincerely express my gratitude for all the teaching and non-teaching staff of MCA Department,

UBDTCE Davangere, who directly or indirectly helped me in completing the project.

Last but not the least I want to thank my parents for their moral support and my beloved friends for their
help and suggestions.

NAME: SANJAY D V

USN: 4UB22MC087
ABSTRACT

Decision tree analysis is a versatile and interpretable method for predictive modeling and decision-
making. Rooted in the fields of data mining and machine learning, decision trees provide a
structured framework for representing and analyzing complex decision scenarios. The algorithm
recursively partitions data based on input features, creating a tree-like structure of decision nodes
and leaves. Each decision node corresponds to a feature test, leading to subsequent branches and
ultimately resulting in outcome predictions at the leaves.

This abstract explores the key aspects of decision tree analysis, including the construction process,
feature selection criteria, and methods for handling categorical and continuous variables. Decision
trees are known for their transparency and ease of interpretation, making them valuable tools for
both experts and non-experts seeking insights from data. The interpretability of decision trees
facilitates the extraction of actionable knowledge and the identification of significant patterns within
datasets.

The abstract also delves into various applications of decision tree analysis across domains such as
finance, healthcare, and marketing. Decision trees excel in classification and regression tasks,
enabling accurate predictions and informed decision-making. Furthermore, the abstract touches
upon ensemble methods like Random Forests and Gradient Boosting, which leverage multiple
decision trees for enhanced predictive performance.

Challenges associated with decision trees, such as overfitting and sensitivity to small changes in
data, are addressed, along with techniques for mitigating these issues. The abstract concludes by
highlighting the ongoing research and advancements in decision tree analysis, showcasing its
continual evolution and relevance in the ever-expanding landscape of data science and
artificial intelligence.
CONTENTS
Chapter 1 PAGE NO
ABOUT MACHINE LEARNING………………………………………………… 1-4
OBJECTIVES
METHODOLOGY

Chapter 2
INTRODUCTION ………………………………………………………………. …… 5-9

Chapter 3
OBJECTIVES…………………………………………………………………….. ….. 10-11
Chapter 4
METHODOLOGY……………………………………………………………...12-13
Chapter 5
PROJECT CODE……………………………………………………………. 14-17
Chapter 6
SOFTWARE AND HARDWARE REQUIREMENTS…………………….. 18-27
Chapter 7
USE CASE DIAGRAMS………………………………………………………. 28
Chapter 8
RESULTS………………………………………………………………………. 29
Chapter 9
CONCLUSION………………………………………………………………… 30
Chapter 10
REFERENCES………………………………………………………………… 31
DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 1

MACHINE LEARNING
Machine learning, a subfield of artificial intelligence, has emerged as a transformative
technology with applications spanning various domains. This abstract provides a concise
overview of the intersection between machine learning and Python, a versatile programming
language that has become a staple in the field. Python's extensive libraries, such as NumPy,
Pandas, and Scikitlearn, have played a pivotal role in democratizing machine learning,
making it accessible to a broad audience. This paper explores the key components of
Python's ecosystem that facilitate the implementation and deployment of machine learning
models.

The abstract delves into the fundamental concepts of supervised and unsupervised learning,
highlighting Python's role in the development of predictive models. Additionally, the
abstract addresses the significance of deep learning and neural networks, showcasing
popular frameworks like TensorFlow and PyTorch, both seamlessly integrated into Python
workflows.

Furthermore, the abstract discusses the importance of data preprocessing and feature
engineering, illustrating Python's prowess through practical examples. The interplay of data
visualization libraries like Matplotlib and Seaborn is explored, emphasizing their role in
understanding and interpreting machine learning results. The abstract concludes by
examining the growing trend of deploying machine learning models in real-world
applications, facilitated by Python's compatibility with web frameworks and cloud services.
As machine learning continues to evolve, Python remains a linchpin, empowering
researchers, developers, and data scientists to push the boundaries of what is achievable in
the realm of artificial intelligence.

DEPT. OF MCA Page 1


DECISION TREE ANALYSIS INTERNSHIP PROJECT

PYTHON MACHINE LEARNING


➢ Linear Regression: Predicting a continuous outcome by finding the bestfitting linear
relationship between input features and the target variable.

➢ Multivariate Regression: Extending linear regression to multiple predictor variables for


predicting a continuous outcome.

➢ Polynomial Regression: A form of regression analysis where the relationship between the
independent variable and the dependent variable is modeled as an nth degree polynomial.

➢ Confusion Matrix: A table used in classification to evaluate the performance of a


predictive model, showing the true positive, true negative, false positive, and false
negative values.

➢ Heat Map: A graphical representation of data where values in a matrix are represented as
colors. In machine learning, often used to visualize relationships or correlations in a
dataset.

➢ Decision Tree: A tree-like model where internal nodes represent features, branches
represent decisions, and leaves represent outcomes, used for both classification and
regression tasks.

➢ Hierarchical Clustering: A method of cluster analysis which builds a hierarchy of clusters,


often represented as a dendrogram, by either bottomup (agglomerative) or top-down
(divisive) approaches.

➢ Agglomerative Clustering: A bottom-up hierarchical clustering approach where each data


point starts in its own cluster and pairs of clusters are merged as one moves up the
hierarchy.
DEPT. OF MCA Page 2
DECISION TREE ANALYSIS INTERNSHIP PROJECT

➢ Random Forest: An ensemble learning method that constructs a multitude of decision


trees at training time and outputs the class that is the mode of the classes (classification)
or mean prediction (regression) of the individual trees.

➢ K-Means: A clustering algorithm that partitions n data points into k clusters based on the
mean distance from each point to the center of its assigned cluster.

➢ K-Nearest Neighbors (KNN): A classification algorithm that classifies a data point based
on how its neighbors are classified, with 'k' being the number of nearest neighbors to
constants.

OBJECTIVES

Machine Learning Objectives:

1. Develop Predictive Models:

*.Train and evaluate machine learning models for predictive tasks using datasets relevant to
the application domain.

2. Explore Anomaly Detection:

*Investigate and implement anomaly detection techniques using machine learning to identify
irregular patterns and potential security breaches.

3. Evaluate Model Robustness:

*Assess the robustness and resilience of machine learning models against adversarial
attacks, ensuring effectiveness in real-world cybersecurity scenarios.

DEPT. OF MCA Page 3


DECISION TREE ANALYSIS INTERNSHIP PROJECT

METHODOLOGY

Methodology for Machine Learning:

Define Problem Statement:

• Clearly articulate the problem you aim to solve or the task you want to accomplish using
machine learning.

1. Data Collection:

• Gather releant datasets that align with the problem statement, ensuring data quality and
diversity.

2. Data Preprocessing:

• Clean and preprocess the data by handling missing values, scaling, encoding
categorical variables, and addressing outliers.

3. Feature Engineering:

• Extract meaningful features from the data or create new features to enhance the
performance of the machine learning models.

4. Model Selection:

• Choose appropriate machine learning algorithms based on the nature of the problem
(classification, regression, clustering) and the characteristics of the data.

5. Model Training:

• Train the selected models using the training dataset, tuning hyper parameters to optimize
performance.

DEPT. OF MCA Page 4


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 2

INTRODUCTION
Decision Trees (DTs) are a non-parametric supervised learning method used
for classification and regression. The goal is to create a model that predicts the value of a target
variable by learning simple decision rules inferred from the data features. A tree can be seen as a
piecewise constant approximation.

For instance, in the example below, decision trees learn from data to approximate a sine curve with a
set of if-then-else decision rules. The deeper the tree, the more complex the decision rules and the
fitter the model.

DEPT. OF MCA Page 5


DECISION TREE ANALYSIS INTERNSHIP PROJECT

Some advantages of decision trees are:

• Simple to understand and to interpret. Trees can be visualized.

• Requires little data preparation. Other techniques often require data normalization, dummy
variables need to be created and blank values to be removed. Some tree and algorithm
combinations support missing values.

• The cost of using the tree (i.e., predicting data) is logarithmic in the number of data points
used to train the tree.

• Able to handle both numerical and categorical data. However, the scikit-learn implementation
does not support categorical variables for now. Other techniques are usually specialized in
analyzing datasets that have only one type of variable. See algorithms for more information.

• Able to handle multi-output problems.

• Uses a white box model. If a given situation is observable in a model, the explanation for the
condition is easily explained by boolean logic. By contrast, in a black box model (e.g., in an
artificial neural network), results may be more difficult to interpret.

• Possible to validate a model using statistical tests. That makes it possible to account for the
reliability of the model.

• Performs well even if its assumptions are somewhat violated by the true model from which the
data were generated.

The disadvantages of decision trees include:

• Decision-tree learners can create over-complex trees that do not generalize the data well. This
is called overfitting. Mechanisms such as pruning, setting the minimum number of samples
required at a leaf node or setting the maximum depth of the tree are necessary to avoid this
problem.

DEPT. OF MCA Page 6


DECISION TREE ANALYSIS INTERNSHIP PROJECT

• Decision trees can be unstable because small variations in the data might result in a
completely different tree being generated. This problem is mitigated by using decision trees
within an ensemble.

• Predictions of decision trees are neither smooth nor continuous, but piecewise constant
approximations as seen in the above figure. Therefore, they are not good at extrapolation.

• The problem of learning an optimal decision tree is known to be NP-complete under several
aspects of optimality and even for simple concepts. Consequently, practical decision-tree
learning algorithms are based on heuristic algorithms such as the greedy algorithm where
locally optimal decisions are made at each node. Such algorithms cannot guarantee to return
the globally optimal decision tree. This can be mitigated by training multiple trees in an
ensemble learner, where the features and samples are randomly sampled with replacement.

• There are concepts that are hard to learn because decision trees do not express them easily,
such as XOR, parity or multiplexer problems.

• Decision tree learners create biased trees if some classes dominate. It is therefore
recommended to balance the dataset prior to fitting with the decision tree.

1.10.1. Classification

DecisionTreeClassifier is a class capable of performing multi-class classification on a dataset.

As with other classifiers, DecisionTreeClassifier takes as input two arrays: an array X, sparse or
dense, of shape (n_samples, n_features) holding the training samples, and an array Y of integer
values, shape (n_samples,), holding the class labels for the training samples:

>>> from sklearn import tree

>>> X = [[0, 0], [1, 1]]

>>> Y = [0, 1]

>>> clf = tree.DecisionTreeClassifier()

DEPT. OF MCA Page 7


DECISION TREE ANALYSIS INTERNSHIP PROJECT

>>> clf = clf.fit(X, Y)

After being fitted, the model can then be used to predict the class of samples:

>>> clf.predict([[2., 2.]])

array([1])

In case that there are multiple classes with the same and highest probability, the classifier will predict
the class with the lowest index amongst those classes.

As an alternative to outputting a specific class, the probability of each class can be predicted, which
is the fraction of training samples of the class in a leaf:

>>> clf.predict_proba([[2., 2.]])

array([[0., 1.]])

DecisionTreeClassifier is capable of both binary (where the labels are [-1, 1]) classification and
multiclass (where the labels are [0, …, K-1]) classification.

Using the Iris dataset, we can construct a tree as follows:

>>> from sklearn.datasets import load_iris

>>> from sklearn import tree

>>> iris = load_iris()

>>> X, y = iris.data, iris.target

>>> clf = tree.DecisionTreeClassifier()

>>> clf = clf.fit(X, y)

Once trained, you can plot the tree with the plot_tree function:

>>> tree.plot_tree(clf)

DEPT. OF MCA Page 8


DECISION TREE ANALYSIS INTERNSHIP PROJECT

Alternative ways to export trees

DEPT. OF MCA Page 9


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 3

OBJECTIVES
1. Pattern Recognition:

• Objective: Identify and recognize patterns within datasets.


• Rationale: Decision trees excel at partitioning data based on input features, allowing the
identification of patterns and relationships that contribute to the decision-making process.
2. Predictive Modelling:

• Objective: Develop models for predicting outcomes based on input variables.


• Rationale: Decision trees can be used for both classification and regression tasks, making
them valuable for predicting categorical outcomes or estimating numerical values.
3. Interpretability:

• Objective: Create models that are transparent and easy to interpret.


• Rationale: Decision trees provide a visually intuitive representation of decision-making
processes, enabling non-experts to understand and trust the model's outcomes.
4. Feature Importance:

• Objective: Assess the importance of different features in influencing outcomes.


• Rationale: Decision trees highlight the most discriminative features early in the tree, helping
practitioners prioritize variables that contribute significantly to the decision process.
5. Decision Support:

• Objective: Provide a structured framework for decision support systems.


• Rationale: Decision trees assist in making decisions by guiding through a series of logical
choices based on input features, facilitating a systematic and transparent decision-making
process.
6. Handling Complex Decision Scenarios:

• Objective: Address complex decision scenarios with multiple influencing factors.

DEPT. OF MCA Page 10


DECISION TREE ANALYSIS INTERNSHIP PROJECT

• Rationale: Decision trees handle intricate decision scenarios by breaking them down into a
series of simpler, more manageable decisions, making it easier to understand and analyze
complex decision pathways.
7. Ensemble Methods and Model Improvement:

• Objective: Improve predictive performance through ensemble methods.


• Rationale: Techniques such as Random Forests and Gradient Boosting utilize multiple
decision trees to enhance predictive accuracy and robustness.
8. Identifying Anomalies:

• Objective: Detect anomalies or outliers within the data.


• Rationale: Decision trees can reveal unusual patterns or instances that deviate from the norm,
aiding in the identification of anomalies or potential errors in the dataset.
9. Handling Missing Values:

• Objective: Address missing values in the dataset.


• Rationale: Decision trees can accommodate missing data during the decision-making process,
making them robust in situations where other methods might struggle with incomplete
information.
10. Continuous Improvement:

• Objective: Continuously refine and improve the decision tree model.


• Rationale: Regularly update and optimize decision trees based on new data or changing
conditions to ensure the model remains accurate and relevant over time.

DEPT. OF MCA Page 11


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 4

METHODOLOGY
Decision tree analysis is a popular technique in data analysis and machine learning for making
decisions or predictions based on input data. Here's a general methodology for conducting decision
tree analysis:

1. Define the Problem:


• Clearly articulate the problem or decision you're trying to address with the decision tree
analysis.
2. Collect Data:
• Gather relevant data that can be used to train and test the decision tree model. Ensure
the data is representative of the problem domain.
3. Data Preprocessing:
• Handle missing values: Decide on a strategy to deal with missing data (e.g., imputation
or removal).
• Encode categorical variables: Convert categorical variables into a format suitable for
analysis (e.g., one-hot encoding).
• Feature scaling: Normalize or standardize numerical features if necessary.
4. Split the Dataset:
• Divide the dataset into two parts: one for training the decision tree model and another
for testing its performance. Common splits include 70-30 or 80-20 for training and
testing, respectively.
5. Build the Decision Tree:
• Choose an algorithm: Decide on the type of decision tree algorithm to use (e.g., ID3,
C4.5, CART, or random forests).
• Train the model: Use the training dataset to build the decision tree by recursively
splitting the data based on the features that provide the best information gain or Gini
impurity reduction.
6. Tune Hyperparameters:
• Adjust hyperparameters, such as the maximum depth of the tree, minimum samples per
leaf, or the splitting criterion, to optimize the model's performance.
7. Evaluate the Model:

DEPT. OF MCA Page 12


DECISION TREE ANALYSIS INTERNSHIP PROJECT

• Use the testing dataset to evaluate the decision tree model's performance. Common
evaluation metrics include accuracy, precision, recall, F1 score, and the area under the
receiver operating characteristic (ROC) curve.
8. Iterate and Refine:
• If the model's performance is not satisfactory, consider revisiting the preprocessing
steps, adjusting hyperparameters, or trying a different algorithm. Iteratively refine the
model until you achieve the desired results.
9. Interpret the Tree:
• Interpret the decision tree to gain insights into the decision-making process. Understand
which features are most influential in the model's decisions.
10. Visualize the Tree:
• Create visual representations of the decision tree for better understanding and
communication of the results.
11. Deploy the Model:
• If the decision tree meets your requirements, deploy it for making predictions on new,
unseen data.
12. Monitor and Maintain:
• Regularly monitor the performance of the deployed model and update it as needed,
especially if there are changes in the underlying data distribution.

DEPT. OF MCA Page 13


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 5

PROJECT CODE
import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

import matplotlib.pyplot as plt

from tkinter import *

# Load the dataset

df = pd.read_csv('show.csv')

# Map categorical variables to numerical values

df['Nationality'] = df['Nationality'].map({'UK': 0, 'USA': 1, 'N': 2})

df['Go'] = df['Go'].map({'NO': 0, 'YES': 1})

# Define features and target variable

features = ['Age', 'Experience', 'Rank', 'Nationality']

X = df[features]

y = df['Go']

DEPT. OF MCA Page 14


DECISION TREE ANALYSIS INTERNSHIP PROJECT

# Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the decision tree classifier

dtree = DecisionTreeClassifier()

dtree.fit(X_train, y_train)

# GUI

def predict_show():

age = int(entry_age.get())

experience = int(entry_experience.get())

rank = int(entry_rank.get())

nationality = int(entry_nationality.get())

prediction = dtree.predict([[age, experience, rank, nationality]])

result_label.config(text="Don't go for the show." if prediction == 0 else "Yes, go for the show.")

# Create main window

root = Tk()

DEPT. OF MCA Page 15


DECISION TREE ANALYSIS INTERNSHIP PROJECT

root.title("Show Decision Predictor")

# Create input fields

Label(root, text="Age:").grid(row=0, column=0)

entry_age = Entry(root)

entry_age.grid(row=0, column=1)

Label(root, text="Experience:").grid(row=1, column=0)

entry_experience = Entry(root)

entry_experience.grid(row=1, column=1)

Label(root, text="Rank:").grid(row=2, column=0)

entry_rank = Entry(root)

entry_rank.grid(row=2, column=1)

Label(root, text="Nationality (0 for UK, 1 for USA, 2 for N):").grid(row=3, column=0)

entry_nationality = Entry(root)

entry_nationality.grid(row=3, column=1)

# Button to predict

DEPT. OF MCA Page 16


DECISION TREE ANALYSIS INTERNSHIP PROJECT

predict_button = Button(root, text="Predict", command=predict_show)

predict_button.grid(row=4, column=0, columnspan=2)

# Result label

result_label = Label(root, text="")

result_label.grid(row=5, column=0, columnspan=2)

# Run the GUI

root.mainloop()

DEPT. OF MCA Page 17


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 6

SOFTWARE AND HARDWARE REQUIREMENTS


PyCharm is available in two editions:

Community (free and open-sourced): for smart and intelligent Python development, including code
assistance, refactorings, visual debugging, and version control integration.

• Professional (paid) : for professional Python, web, and data science development, including
code assistance, refactorings, visual debugging, version control integration, remote
configurations, deployment, support for popular web frameworks, such as Django and Flask,
database support, scientific tools (including Jupyter notebook support), big data tools.

For more information, refer to the editions comparison matrix.

Supported languages

To start developing in Python with PyCharm you need to download and install Python
from python.org depending on your platform.

PyCharm supports the following versions of Python:

• Python 2: version 2.7

• Python 3: from the version 3.6 up to the version 3.12

Besides, in the Professional edition, one can develop Django , Flask, and Pyramid applications. Also,
it fully supports HTML (including HTML5), CSS, JavaScript, and XML: these languages are bundle
in the IDE via plugins and are switched on for you by default. Support for the other languages and
frameworks can also be added via plugins (go to Settings | Plugins or PyCharm | Settings | Pluginsr
macOS users, to find out more or set them up during the first IDE launch).

Supported platforms

PyCharm is a cross-platform IDE that works on Windows, macOS, and Linux. Check the system
requirements:
DEPT. OF MCA Page 18
DECISION TREE ANALYSIS INTERNSHIP PROJECT

REQUIREMENT MINIMUM RECOMMENDED

RAM 4 GB of free RAM 8 GB of total system RAM

CPU Any modern CPU Multi-core CPU. PyCharm supports


multithreading for different operations
and processes making it faster the more
CPU cores it can use.

Disk space 3.5 GB SSD drive with at least 5 GB of free


space

Monitor resolution 1024×768 1920×1080

Operating system Officially released Latest 64-bit version of Windows,


64-bit versions of macOS, or Linux (for example,
the following: Debian, Ubuntu, or RHEL)

• Microsoft
Windows 10
1809 or later

You can install PyCharm using Toolbox or standalone installations. If you need assistance installing
PyCharm, see the installation instructions: Install PyCharm

Start with a project in PyCharm

Everything you do in PyCharm, you do within the context of a project. It serves as a basis for coding
assistance, bulk refactoring, coding style consistency, and so on. You have three options to start
working on a project inside the IDE:

• Open an existing project

• Check out a project from version control

DEPT. OF MCA Page 19


DECISION TREE ANALYSIS INTERNSHIP PROJECT

• Create a new project

Open an existing project

Begin by opening one of your existing projects stored on your computer. You can select one in the
listof the recent projects on the Welcome screen or click Open:

Otherwise, you can create a project for your existing source files. Select the command Open on
the File menu, and specify the directory where the sources exist. PyCharm will then create a project
from your sources for you. For more information, refer to Create a project from existing sources .

Check out an existing project from Version Control

You can also download sources from a VCS storage or repository. On the Welcome screen, click Get
from VCS, and then choose Git (GitHub), Mercurial, Subversion, or Perforce (supported in PyCharm
Professional only).

DEPT. OF MCA Page 20


DECISION TREE ANALYSIS INTERNSHIP PROJECT

Then, enter a path to the sources and clone the repository to the local host:

For more information, refer to Version control.

Create a new project

To create a project, do one of the following:

• Go to File | New Project

• On the Welcome screen, click New Project

In PyCharm Community, you can create only Python projects, whereas, with PyCharm Professional,
you have a variety of options to create a web framework project.

• Community
DEPT. OF MCA Page 21
DECISION TREE ANALYSIS INTERNSHIP PROJECT

• Professional

When creating a new project, you need to specify a Python interpreter to execute Python code in your
project. You need at least one Python installation to be available on your machine. For a new project,
PyCharm creates an isolated virtual environment: venv, pipenv, poetry, or Conda. As you work, you
can change it or create new interpreters. You can also quickly preview packages installed for your
interpreters and add new packages in the Python Package tool window.

DEPT. OF MCA Page 22


DECISION TREE ANALYSIS INTERNSHIP PROJECT

For more information, refer to Configure a Python interpreter.When you launch PyCharm for the
very first time, or when there are no open projects, you see the Welcome screen. It gives you the
main entry points into the IDE: creating or opening a project, checking out a project from version
control, viewing documentation, and configuring the IDE.

When a project is opened, you see the main window divided into several logical areas. Let’s take a
moment to see the key UI elements here:

• New UI
• Classic UI

DEPT. OF MCA Page 23


DECISION TREE ANALYSIS INTERNSHIP PROJECT

1. Window header contains a set of widgets which provide quick access to the most popular
actions: project widget, VCS widget, and run widget. It also allows you to open Code With
Me, Search Everywhere, and Settings.

2. Project tool window on the left side displays your project files.

3. Editor on the right side, where you actually write your code. It has tabs for easy navigation
between open files.

4. Context menus open when you right-click an element of the interface or a code fragment and
show the actions available.

5. Navigation bar allows you to quickly navigate the project folders and files.

DEPT. OF MCA Page 24


DECISION TREE ANALYSIS INTERNSHIP PROJECT

6. Gutter, the vertical stripe next to the editor, shows the breakpoints you have, and provides a
convenient way to navigate through the code hierarchy like going to definition/declaration. It
also shows line numbers and per-line VCS history.

7. Scrollbar, on the right side of the editor. PyCharm constantly monitors the quality of your
codeby running code inspections. The indicator in the top right-hand corner shows the overall
status of code inspections for the entire file.

8. Tool windows are specialized windows attached to the bottom and the sides of the workspace.
They provide access to typical tasks such as project management, source code search
navigation, integration with version control systems, running, testing, debugging, and so on.

9. The status bar indicates the status of your project and the entire IDE, and shows various
warnings and information messages like file encoding, line separator, inspection profile, and
on. It also provides quick access to the Python interpreter settings.

Code with smart assistance

When you have created a new project or opened an existing one, it is time to start coding.

Create a Python file

1. In the Project tool window, select the project root (typically, it is the root node in the project
tree), right-click it, and select File | New ....

DEPT. OF MCA Page 25


DECISION TREE ANALYSIS INTERNSHIP PROJECT

2. Select the option Python File from the context menu, and then type the new filename.

PyCharm creates a new Python file and opens it for editing.

DEPT. OF MCA Page 26


DECISION TREE ANALYSIS INTERNSHIP PROJECT

PyCharm takes care of the routine so that you can focus on the important. Use the following coding
capabilities to create error-free applications without wasting precious time.

DEPT. OF MCA Page 27


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 7

USE CASE DIAGRAMS

DEPT. OF MCA Page 28


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 8

RESULTS

DEPT. OF MCA Page 29


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 9

CONCLUSION
In this project, a decision tree classifier was implemented to determine whether an individual should
attend a show based on features such as age, experience, rank, and nationality. The dataset was
loaded from a CSV file, and categorical variables were mapped to numerical values for model
training. The scikit-learn library was utilized to split the dataset into training and testing sets, and a
Decision Tree classifier was trained on the training set. To aid understanding, the decision tree was
visualized using matplotlib. Furthermore, an interactive aspect was introduced, allowing users to
input values for prediction. Notably, a graphical user interface (GUI) was incorporated using the
tkinter library, providing a user-friendly way to input values and receive predictions. The GUI
includes entry fields for each feature, a prediction button, and a result label displaying the model's
prediction. This integration of machine learning with a simple GUI enhances the accessibility and
usability of the application, offering a practical tool for users to make decisions about attending a
show.

DEPT. OF MCA Page 30


DECISION TREE ANALYSIS INTERNSHIP PROJECT

CHAPTER 10

REFERENCES
[1] Morgan J, Sonquist J. Problems in the analysis of survey data, and a proposal. Journal of the
American Statistical Association. 1963;58(2):415-435

[2] Morgan J, Messenger R. THAID-A Sequential Analysis Program for the Analysis of Nominal
Scale Dependent Variables. Ann Arbor: Survey Research Center, Institute for Social Research,
University of Michigan; 1973

[3] Kass G. An exploratory technique for investigating large quantities of categorical data. Applied
Statistics. 1973; 29(2):119-127

[4] Breiman L, Friedman J, Stone C, Olshen R. Classification and Regression Trees. Taylor &
Francis;; 1984. Available from: https://books.google.fr/ books?id=JwQx-WOmSyQC

[5] Hunt E, Marin J, Stone P. Experiments in Induction. New York, NY, USA: Academic Press;
1997. Available from: http://www.univtebessa.dz/fichiers/mosta/544f77fe0cf 29473161c8f87.pdf

[6] Quinlan JR. Discovering rules by induction from large collections of examples. In: Michie D,
editor. Expert Systems in the Micro Electronic Age. Vol. 1. Edinburgh University Press; 1979. pp.
168-201

[7] Paterson A, Niblett T. ACLS Manual. Rapport Technique. Edinburgh: Intelligent Terminals, Ltd;
1982

[8] Kononenko I, Bratko I, Roskar E. Experiments in Automatic Learning of Medical Diagnostic


Rules. Techni

DEPT. OF MCA Page 31


DECISION TREE ANALYSIS INTERNSHIP PROJECT

DEPT. OF MCA Page 32

You might also like