Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

information on how to convert the values.

{'UK': 0,

'USA': 1, 'N': 2}
In [1]:
CLASSIFICATION Means convert the values 'UK' to 0, 'USA' to 1, and 'N'
to 2.

In a case of a bigger dataset using a lable encoder is


import pandas
much easier as it will apply a unique value to all the
from sklearn import tree
from sklearn.tree import data available under the dataset.
DecisionTreeClassifier In [2]:
from sklearn.tree import d = {'UK': 0, 'USA': 1, 'N': 2}
DecisionTreeRegressor df['Nationality'] =
import matplotlib.pyplot as plt df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df = pandas.read_csv("E:/Limkokwing/Degree df['Go'] = df['Go'].map(d)

Trimester 2/Datasets/data.csv") print(df) print(df)

Age Experience Rank Nationality Go Age Experience Rank Nationality Go


0 36 10 9 UK NO 0 36 10 9 0 0
1 42 12 4 USA NO 1 42 12 4 1 0
2 23 4 6 N NO 2 23 4 6 2 0
3 52 4 4 USA NO 3 52 4 4 1 0
4 43 21 8 USA YES 4 43 21 8 1 1
5 44 14 5 UK NO 5 44 14 5 0 0
6 66 3 7 N YES 6 66 3 7 2 1
7 35 14 9 UK YES 7 35 14 9 0 1
8 52 13 7 N YES 8 52 13 7 2 1
9 35 5 9 N YES 9 35 5 9 2 1
10 24 3 5 USA NO 10 24 3 5 1 0
11 18 3 7 UK YES 11 18 3 7 0 1
12 45 9 9 UK YES 12 45 9 9 0 1

To make a decision tree, all data has to be numerical. Then we have to separate the feature columns from
the target column.
We have to convert the non numerical columns
The feature columns are the columns that we try to
'Nationality' and 'Go' into numerical values. Pandas predict from, and the target column is the column with
the values we try to predict.
has a map() method that takes a dictionary with

In [3]: print("Target \n")


#X is the feature columns, y is print(y)

the target column: features =

['Age', 'Experience', 'Rank', Features

'Nationality'] Age Experience Rank Nationality


0 36 10 9 0
X = df[features] 1 42 12 4 1
y = df['Go'] 2 23 4 6 2
print("\n") 3 52 4 4 1
print("Features\n") 4 43 21 8 1
print(X) 5 44 14 5 0
print("\n") 6 66 3 7 2
7 35 14 9 0 2 0
8 52 13 7 2 3 0
9 35 5 9 2 4 1
10 24 3 5 1 5 0
11 18 3 7 0 6 1
12 45 9 9 0 7 1
8 1
9 1
Target 10 0
11 1
0 0 12 1
1 0 Name: Go, dtype: int64

class sklearn.tree.DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None,


min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0,
max_features=None, random_state=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0)
In [4]: In [5]: er() dtree =
dtree.fit(X, y)

dtree = dtree.feature_import
DecisionTreeClassifi ances_

Out[5]: array([0. , 0.23214286, 0.72916667, 0.03869048])


return val[1]
values = dtree.feature_importances_
In [6]: In [7]:
features = list(X)
importances = [(features[i], values[i])
for i in range(len(features))]
from sklearn.model_selection import importances.sort(reverse=True,
cross_val_score key=sortSecond)
from sklearn.metrics import importances
accuracy_score

def sortSecond(val):

Out[7]: [('Rank', 0.7291666666666666),


('Experience', 0.23214285714285712),
('Nationality', 0.03869047619047619),
('Age', 0.0)]
cross_val_score(dtree, X,
y, cv=6) cscore
In [8]:
cscore=

Out[8]: array([0.66666667, 1. , 1. , 1. , 1. , 0.5 ])

In [ ]:

Regression In [ ]:
In [9]: In [10]: ) regressor =

regressor.fit(X, y)

regressor = regressor

DecisionTreeRegressor(

Out[10]: DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,


max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
In [11]: mportances_
regressor.feature_i

Out[11]: array([0.1547619 , 0.07738095, 0.76785714, 0. ])


importancesp = [(featuresp[i],
valuesp[i]) for i in
In [12]: range(len(featuresp))]
def sortSecond(valp): importancesp.sort(reverse=True,
return valp[1] key=sortSecond)
valuesp = regressor.feature_importances_ importancesp
featuresp = list(X)

Out[12]: [('Rank', 0.7678571428571429),


('Age',
0.15476190476190477),
('Experience',
0.07738095238095238),
('Nationality', 0.0)]
cross_val_score(regressor,
X, y, cv=6) Score
In [13]:
Score=

Out[13]: array([0., 1., 1., 0., 1., 0.])

Using Label Encoder

fit(y):-Fit label encoder.

fit_transform(y):-Fit label encoder and return encoded labels.

inverse_transform(y):- Transform labels back to original encoding

In [19]: In [22]: le =
preprocessing.LabelEn
coder()
from sklearn import encoder=le.fit_transf
preprocessing orm(features) encoder
Out[22]: array([0, 1, 3, 2], dtype=int64)

In [ ]:

In [ ]:

In [ ]:
In [17]:
References

1.
https://scikit-learn.org/stable/modules/gener
ated/sklearn.tree.DecisionTreeCla 2.

File "<ipython-input-17-5004365062d6>", line


3
1.
https://scikit-learn.org/stable/modules/gener
ated/sklearn.tree.DecisionT
reeClassifier.html
(https://scikit-learn.org/stable/modules/gene
rated/sklearn.t
ree.DecisionTreeClassifier.html)
^
SyntaxError: invalid syntax

You might also like