Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Department of AIML

PAI Practile file

NAME: Shivam
BRANCH: CSE(AI-ML)
SEM: 6TH
ROLL NO: 23242

Shivam (23242)
Department of CSE AIML
Certificate
Certified that this Practical entitled “Big Data Lab” submitted by Shivam (23242), student
of Computer Science & Engineering Department, Dronacharya College of
Engineering, Gurgaon in the partial fulfillment of the requirement for the award
Bachelor’s of Technology (Branch) Degree of MDU, Rohtak, is a record of student own
study carried under my supervision & guidance.

Shivam (23242)
Sr. Practical Name Signature
No.
1. Introduction of various python libraries used for
machine
learning.
2. Write a program to perform data pre-processing
techniques for effective machine learning.
3. Write a program to apply different feature encoding
schemes on the given dataset.

4. Write a program to apply filter feature selection


techniques

5.

6.

7.

8.

9.

10.

Shivam (23242)
PROGRAM 1: Introduction of various python libraries used for machine learning.

Code:

[1]: pandas as pd import numpy as np


import

[2]: # reading data


data=pd.read_csv("data.csv")

[3]: data

[3]: Country Age Salary Purchased


0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

[4]: student_data = {"Name":['Prateek','Ronak','Geetanshu','Naman','Ankit'], "exam_no":[18,25,45,34,36],


"Result":['pass','fail','pass','pass','fail']}

df = pd.DataFrame(student_data) df

[4] : Name exam_no Result


0 Prateek 18 pass
1 Ronak 25 fail
2 Geetanshu 45 pass
3 Naman 34 pass
4 Ankit 36 fail

[6]: # access data with the help of label


[6] : df.loc[2,['Name']]
Name Geetanshu
Name: 2, dtype:
object

Shivam (23242)
[7]: df.iloc[2,0]

[7] : 'Geetanshu'

[]:

PROGRAM 2: Write a program to perform data pre-processing techniques for effective


machine learning

Shivam (23242)
[1]:# import pandas
import pandas as pd

[47]:#read csv file


df=pd.read_csv('data.csv')

[30]:# print first 5 elements


df.head()

[30]: Country Age Salary Purchased


0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes

[6]:# import numpy


import numpy as np

[7]:# import StringIO


from io import StringIO

[31]:# check for the null value


df.isnull()

[31]: Country Age Salary Purchased


0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False True False
5 False False False False
6 False True False False
7 False False False False
8 False False False False
9 False False False False

Shivam (23242)
[59]: # assign 10 in place of null value df["Age"].fillna(10, inplace = True) df["Salary"].fillna(10, inplace =
True)

[60]: # print updates dataset

df

[60]: Country Age Salary Purchased


0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 10.0 Yes
5 France 35.0 58000.0 Yes
6 Spain 10.0 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

[34]: # check for null value after updation


df.isnull().sum()

[34]: Country 0
Age 0
Salary 0
Purchased 0
dtype: int64

[35]: # import SimpleImputer from sklearn


from sklearn.impute import SimpleImputer

[36]: # set model attributes


imr = SimpleImputer(strategy="constant",fill_value= 10 )

[37]: # Fit the data into the model


imr = imr.fit(df.values)

[54]: imputed_data = imr.transform(df.values)

[55]: # print data after transormed


imputed_data

[55]: array([['France', 44.0, 72000.0, 'No'],


['Spain', 27.0, 48000.0, 'Yes'],
['Germany', 30.0, 54000.0, 'No'],
['Spain', 38.0, 61000.0, 'No'],

['Germany', 40.0, 10, 'Yes'],


Shivam(23242)
['France', 35.0, 58000.0, 'Yes'],
['Spain', 10, 52000.0, 'No'],
['France', 48.0, 79000.0, 'Yes'],
['Germany', 50.0, 83000.0, 'No'],
['France', 37.0, 67000.0, 'Yes']], dtype=object)

Shivam(23242)
PROGRAM 3: Write a program to apply different feature encoding schemes on the given dataset.

[57]: #df.describe()

[57]: Age Salary


count 9.000000 9.000000
mean 38.777778 63777.777778
std 7.693793 12265.579662
min 27.000000 48000.000000
25% 35.000000 54000.000000
50% 38.000000 61000.000000
75% 44.000000 72000.000000
max 50.000000 83000.000000

[42]: # import and apply LabelEncoder to the data from sklearn.preprocessing import
LabelEncoder df_le= df
class_le = LabelEncoder()
df_le['Country'] = class_le.fit_transform(df_le['Country'].values) df_le

[42]: Country Age Salary Purchased


0 0 44.0 72000.0 No
1 2 27.0 48000.0 Yes
2 1 30.0 54000.0 No
3 2 38.0 61000.0 No
4 1 40.0 10.0 Yes
5 0 35.0 58000.0 Yes
6 2 10.0 52000.0 No
7 0 48.0 79000.0 Yes
8 1 50.0 83000.0 No
9 0 37.0 67000.0 Yes

[48]: df

[48]: Country Age Salary Purchased


0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes

Shivam(23242)
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

[61]: df_new=pd.get_dummies(df)

[62]: df_new

[62]: Age Salary Country_France Country_Germany Country_Spain \


0 44.0 72000.0 1 0 0
1 27.0 48000.0 0 0 1
2 30.0 54000.0 0 1 0
3 38.0 61000.0 0 0 1
4 40.0 10.0 0 1 0
5 35.0 58000.0 1 0 0
6 10.0 52000.0 0 0 1
7 48.0 79000.0 1 0 0
8 50.0 83000.0 0 1 0
9 37.0 67000.0 1 0 0

Purchased_No Purchased_Yes
0 1 0
1 0 1
2 1 0
3 1 0
4 0 1
5 0 1
6 1 0
7 0 1
8 1 0
9 0 1

[63]: df_le['Country']

[63]: 0 0
1 2
2 1
3 2
4 1
5 0

Shivam(23242)
6 2

Shivam(23242)
7 0
8 1
9 0

Shivam(23242)
PROGRAM 4: Write a program to apply filter feature selection techniques.

Shivam(23242)
Shivam(23242)
Shivam(23242)
Shivam(23242)

You might also like