Decision Trees - Jupyter Notebook

information on how to convert the values.
{'UK': 0,
'USA': 1, 'N': 2}
In [1]:
CLASSIFICATION Means convert the values 'UK' to 0, 'USA' to 1, and 'N'
to 2.
In a case of a bigger dataset using a lable encoder is

import pandas
much easier as it will apply a unique value to all the
from sklearn import tree
from sklearn.tree import data available under the dataset.
DecisionTreeClassifier In [2]:
from sklearn.tree import d = {'UK': 0, 'USA': 1, 'N': 2}
DecisionTreeRegressor df['Nationality'] =
import matplotlib.pyplot as plt df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df = pandas.read_csv("E:/Limkokwing/Degree df['Go'] = df['Go'].map(d)
Trimester 2/Datasets/data.csv") print(df) print(df)
Age Experience Rank Nationality Go Age Experience Rank Nationality Go

0 36 10 9 UK NO 0 36 10 9 0 0
1 42 12 4 USA NO 1 42 12 4 1 0
2 23 4 6 N NO 2 23 4 6 2 0
3 52 4 4 USA NO 3 52 4 4 1 0
4 43 21 8 USA YES 4 43 21 8 1 1
5 44 14 5 UK NO 5 44 14 5 0 0
6 66 3 7 N YES 6 66 3 7 2 1
7 35 14 9 UK YES 7 35 14 9 0 1
8 52 13 7 N YES 8 52 13 7 2 1
9 35 5 9 N YES 9 35 5 9 2 1
10 24 3 5 USA NO 10 24 3 5 1 0
11 18 3 7 UK YES 11 18 3 7 0 1
12 45 9 9 UK YES 12 45 9 9 0 1
To make a decision tree, all data has to be numerical. Then we have to separate the feature columns from
the target column.
We have to convert the non numerical columns
The feature columns are the columns that we try to
'Nationality' and 'Go' into numerical values. Pandas predict from, and the target column is the column with
the values we try to predict.
has a map() method that takes a dictionary with
In [3]: print("Target \n")

#X is the feature columns, y is print(y)
the target column: features =
['Age', 'Experience', 'Rank', Features
'Nationality'] Age Experience Rank Nationality

0 36 10 9 0
X = df[features] 1 42 12 4 1
y = df['Go'] 2 23 4 6 2
print("\n") 3 52 4 4 1
print("Features\n") 4 43 21 8 1
print(X) 5 44 14 5 0
print("\n") 6 66 3 7 2
7 35 14 9 0 2 0
8 52 13 7 2 3 0
9 35 5 9 2 4 1
10 24 3 5 1 5 0
11 18 3 7 0 6 1
12 45 9 9 0 7 1
8 1
9 1
Target 10 0
11 1
0 0 12 1
1 0 Name: Go, dtype: int64
class sklearn.tree.DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None,

min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0,
max_features=None, random_state=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0)
In [4]: In [5]: er() dtree =
dtree.fit(X, y)
dtree = dtree.feature_import
DecisionTreeClassifi ances_
Out[5]: array([0. , 0.23214286, 0.72916667, 0.03869048])

return val[1]
values = dtree.feature_importances_
In [6]: In [7]:
features = list(X)
importances = [(features[i], values[i])
for i in range(len(features))]
from sklearn.model_selection import importances.sort(reverse=True,
cross_val_score key=sortSecond)
from sklearn.metrics import importances
accuracy_score
def sortSecond(val):
Out[7]: [('Rank', 0.7291666666666666),

('Experience', 0.23214285714285712),
('Nationality', 0.03869047619047619),
('Age', 0.0)]
cross_val_score(dtree, X,
y, cv=6) cscore
In [8]:
cscore=
Out[8]: array([0.66666667, 1. , 1. , 1. , 1. , 0.5 ])
In [ ]:
Regression In [ ]:
In [9]: In [10]: ) regressor =
regressor.fit(X, y)
regressor = regressor
DecisionTreeRegressor(
Out[10]: DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,

max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
In [11]: mportances_
regressor.feature_i
Out[11]: array([0.1547619 , 0.07738095, 0.76785714, 0. ])

importancesp = [(featuresp[i],
valuesp[i]) for i in
In [12]: range(len(featuresp))]
def sortSecond(valp): importancesp.sort(reverse=True,
return valp[1] key=sortSecond)
valuesp = regressor.feature_importances_ importancesp
featuresp = list(X)
Out[12]: [('Rank', 0.7678571428571429),

('Age',
0.15476190476190477),
('Experience',
0.07738095238095238),
('Nationality', 0.0)]
cross_val_score(regressor,
X, y, cv=6) Score
In [13]:
Score=
Out[13]: array([0., 1., 1., 0., 1., 0.])
Using Label Encoder
fit(y):-Fit label encoder.
fit_transform(y):-Fit label encoder and return encoded labels.
inverse_transform(y):- Transform labels back to original encoding
In [19]: In [22]: le =
preprocessing.LabelEn
coder()
from sklearn import encoder=le.fit_transf
preprocessing orm(features) encoder
Out[22]: array([0, 1, 3, 2], dtype=int64)
In [ ]:
In [ ]:
In [ ]:
In [17]:
References
1.
https://scikit-learn.org/stable/modules/gener
ated/sklearn.tree.DecisionTreeCla 2.
File "<ipython-input-17-5004365062d6>", line

3
1.
https://scikit-learn.org/stable/modules/gener
ated/sklearn.tree.DecisionT
reeClassifier.html
(https://scikit-learn.org/stable/modules/gene
rated/sklearn.t
ree.DecisionTreeClassifier.html)
^
SyntaxError: invalid syntax

Decision Trees - Jupyter Notebook

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decision Trees - Jupyter Notebook

Uploaded by

Copyright:

Available Formats

information on how to convert the values.

In a case of a bigger dataset using a lable encoder is

Trimester 2/Datasets/data.csv") print(df) print(df)

Age Experience Rank Nationality Go Age Experience Rank Nationality Go

In [3]: print("Target \n")

the target column: features =

['Age', 'Experience', 'Rank', Features

'Nationality'] Age Experience Rank Nationality

class sklearn.tree.DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None,

Out[5]: array([0. , 0.23214286, 0.72916667, 0.03869048])

Out[7]: [('Rank', 0.7291666666666666),

Out[8]: array([0.66666667, 1. , 1. , 1. , 1. , 0.5 ])

Out[10]: DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,

Out[11]: array([0.1547619 , 0.07738095, 0.76785714, 0. ])

Out[12]: [('Rank', 0.7678571428571429),

Out[13]: array([0., 1., 1., 0., 1., 0.])

Using Label Encoder

fit(y):-Fit label encoder.

fit_transform(y):-Fit label encoder and return encoded labels.

inverse_transform(y):- Transform labels back to original encoding

File "<ipython-input-17-5004365062d6>", line

You might also like