Chapter 3

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

CHAPTER-3 EXPERIMENTAL OR MATERIAL AND METHOD

ALGORITHMS USED

The given data set is in the form of Classification algorithm. So,we used classification
types to predict the accuracy.

3.1 TYPE OF CLASSIFICATION ALGORITHMS USED

Decision tree algorithm

3.2 INTRODUCTION

Classification is a two-step process, learning step and prediction


step, in machine learning. In the learning step, the model is
developed based on given training data. In the prediction step, the
model is used to predict the response for given data. Decision Tree is
one of the easiest and popular classification algorithms to understand
and interpret.

Decision Tree Algorithm

Decision Tree algorithm belongs to the family of supervised learning


algorithms. Unlike other supervised learning algorithms, the decision
tree algorithm can be used for solving regression and classification
problems too.

The goal of using a Decision Tree is to create a training model that


can use to predict the class or value of the target variable
by learning simple decision rules inferred from prior data(training
data).

In Decision Trees, for predicting a class label for a record we start


from the root of the tree. We compare the values of the root attribute
with the record’s attribute. On the basis of comparison, we follow the
branch corresponding to that value and jump to the next node.

3.3 Types of Decision Trees

Types of decision trees are based on the type of target variable we


have. It can be of two types:

3.3.1 Categorical Variable Decision Tree: Decision Tree which has

a categorical target variable then it called a Categorical variable

decision tree.
1. 3.3.2 Continuous Variable Decision Tree: Decision Tree has a

continuous target variable then it is called Continuous Variable

Decision Tree.

3.4 How do Decision Trees work?

The decision of making strategic splits heavily affects a tree’s


accuracy. The decision criteria are different for classification and
regression trees.

Decision trees use multiple algorithms to decide to split a node into


two or more sub-nodes. The creation of sub-nodes increases the
homogeneity of resultant sub-nodes. In other words, we can say that
the purity of the node increases with respect to the target variable.
The decision tree splits the nodes on all available variables and then
selects the split which results in most homogeneous sub-nodes.

The algorithm selection is also based on the type of target variables.


Let us look at some algorithms used in Decision Trees:

ID3 → (extension of D3)


C4.5 → (successor of ID3)
CART → (Classification And Regression Tree)
CHAID → (Chi-square automatic interaction detection Performs
multi-level splits when computing classification trees)
MARS → (multivariate adaptive regression splines)

3.5 Attribute Selection Measures

Entropy,
Information gain,
Gini index,
Gain Ratio,
Reduction in Variance
Chi-Square

3.6 IMPORTED LIBRARIES

Libraries are collections of prewritten code that users can use to optimize
tasks. In project as python is used for implementation tool, it has the most
libraries as compared to other programming languages. More than of 60%
machine learning developers use and goes for python as it is easy to learn.
As python has comparatively large collection of libraries let’s l0ook at the
libraries that came in handy for mammographic datase

1 LIBRARIES USED

1. Pandas is a widely-used data analysis and manipulation library for python.


It provides a lot of functions and methods that expedite the data analysis and
preprocessing steps. IT also provides fast, flexible and expressive data
structures working with relational or labeled or both easy and intuitive.
Considered as nidamental high-level building block in performing practical,
real-world data analysis in python. Has powerful tools like Data Frame and
Series for analyzing

data = pd.read_csv('/Users/ML/DecisionTree/Social.csv')
data.head()

2. NumPy stands for Numerical Python, is a library consisting


ofmultidimensional array objects and a collection of countless of routines for
processing those arrays. Using this mathematical and logical operations on
arrays can be performed. The difference in using NumPy from pandas is, it
works on numerical data whereas pandas on tabular data

3 3. Sklearn stands for Scikit-learn, a machine learning library. It is imported


for various classification, regression and clustering algorithms including k-
means, random forest, support vector machines, gradient boosting and
DBSCAN. It is designed using libraries NumPy and SciPy. From the sklearn
library and from the tree inside the library Decision Tree Classifier. It is a class
capable of performing multi-class classifier on a dataset. When compared with
other classifiers, Decision Tree Classifier takes input as two arrays: an array
X, a parse or dense, of shape (n_samples,n_features) holding training
samples and an array Y of integer values, shape (n_samples), holding class
labels for training sample

Example of using decision tree


Accurency of decission tree

Graph of accurency of algorithms

You might also like