4.5. Label Encoding - Ipynb - Colaboratory

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

# loading the data from csv file to pandas dataFrame

iris_data = pd.read_csv('/content/iris_data.csv')
Label Encoding: iris_data.head()

converting the labels into numeric form Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa


# importing the Dependencies
import pandas as pd 1 2 4.9 3.0 1.4 0.2 Iris-setosa
from sklearn.preprocessing import LabelEncoder
2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa


Label Encoding of Breast Cancer Dataset
4 5 5.0 3.6 1.4 0.2 Iris-setosa

# loading the data from csv file to pandas dataFrame


cancer_data = pd.read_csv('/content/data.csv') iris_data['Species'].value_counts()

Iris-versicolor 50
# first 5 rows of the dataframe Iris-virginica 50
cancer_data.head() Iris-setosa 50
Name: Species, dtype: int64

id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean


poi # loding the label encoder
label_encoder_1 = LabelEncoder()
0 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001

1 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869


iris_labels = label_encoder_1.fit_transform(iris_data.Species)
2 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974

3 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 iris_data['target'] = iris_labels

4 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980


iris_data.head()

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species target


# finding the count of different labels
cancer_data['diagnosis'].value_counts() 0 1 5.1 3.5 1.4 0.2 Iris-setosa 0

B 357 1 2 4.9 3.0 1.4 0.2 Iris-setosa 0


M 212
2 3 4.7 3.2 1.3 0.2 Iris-setosa 0
Name: diagnosis, dtype: int64
3 4 4.6 3.1 1.5 0.2 Iris-setosa 0

# load the Label Encoder function 4 5 5.0 3.6 1.4 0.2 Iris-setosa 0
label_encode = LabelEncoder()

iris_data['target'].value_counts()
labels = label_encode.fit_transform(cancer_data.diagnosis)
2 50
1 50
# appending the labels to the DataFrame 0 50
cancer_data['target'] = labels Name: target, dtype: int64

cancer_data.head() Iris-setosa --> 0

Iris-versicolor --> 1
id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean
poin
Iris-virginica --> 2
0 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001

1 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869

2 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974

3 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414

4 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980

0 --> Benign

1 --> Malignant

cancer_data['target'].value_counts()

0 357
1 212
Name: target, dtype: int64

Label Encoding of iris data

You might also like