Welcome to Scribd!

Data Preprocessing in Python

Uploaded by

0% found this document useful (0 votes)

10 views12 pages

Data preprocessing involves handling missing data, encoding categorical variables, splitting the dataset into training and test sets, and scaling features. Key steps include imputing missing values, one-hot encoding categorical variables, training and testing set splitting, and feature scaling of training data. Preprocessing transforms raw data into a format suitable for modeling and prepares it for analysis.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

10 views12 pages

Data Preprocessing in Python

Uploaded by

sredhar s

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 12

Search inside document

Data Preprocessing

Data Preprocessing Operations

The set of operations need to performs on the
dataset is
• Handling missing data
• Managing categorical data
• Dataset distribution(training, testing)
• Scaling the features
Preprocessing Steps in Python
• How to import libraries ?
Import pandas as pd
Import numpy as np
Continues…
• How to Load datasets?

If the dataset is in excel form then

Dataset=pd.read_excel(“dataset”)

• If the dataset is in CSV format then

df=pd.read_csv(“dataset.csv”, sep=‘ ‘)
Dataset
Inde Student Subject Marks Grade
x Name
0 Ramu Maths 70 A
1 Somu Maths 55 B
2 Lilly Technical English NaN O
3 Rose Python 80 A+
4 Nisha Java NaN O
5 Seetha Compiler 50 B
6 Patrick Big Data 40 Fail
7 Peter E-Commerce 75 A
Process Continues
• In the above dataset you can find the dataset
with NaN value, which requires to handle the
missing value. The missing values can be
handled by :
• Mean
• Median
• Mode
• Constant value
Handling Missing Values
from sklearn.impute import SimpleImputer
impute = SimpleImputer
(missing_values=np.nan, stratergy=“mean”)
imputer.fit(X[:,1:3)
X[:,1:3] = imputer.transform(X[:, 1:3])
Print(X)
Classifying Dependent and
Independent Variables
• In our dataset Mark and Grade are dependent
variables.
x=df.iloc [:, [0,1,2] ] . values
y=df.iloc[:, 3 ] .values
Print(x)
Print(y)
Feature Encoding
• In our dataset we have subject feature which
has string-based value. Since, String-based
features cannot be processed in training
model. Hence, a method is required to convert
the string to numeric value. This is called
Onehot Encoding.
How to encode?
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
Ct=ColumTransformer (transformers=[‘encoder’,
OneHotEncoder(), [0])], remainder=‘passthrough’)
X=ct.fit_transform(X)
X=np.array(X)
Print (X)
Dataset Distribution
#split data into train and test dataset

from sklearn.model_selection import train_test_split #(for

python2)
#from sklearn.model_selection import train_test_split (for
python3)
X_train, X_test, y_train, y_test = train_test_split(X,y,
test_size=0.2, random_state=0)print('X_train.shape: ',
X_train.shape)
print('X_test.shape: ', X_test.shape)
print('y_train.shape: ', y_train.shape)
print('y_test.shape: ', y_test.shape)
Scaling the Features
#feature scaling of training dataset

from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)print(X_train)
print(X_test)

Contractors List of USA
Document21 pages
Contractors List of USA
jerry 121
100% (2)
Biomedical Applications of Soft Robotics
Document11 pages
Biomedical Applications of Soft Robotics
Abdul Rehman
100% (2)
House Price Prediction: Project Description
Document11 pages
House Price Prediction: Project Description
POLURU SUMANTH NAIDU STUDENT - CSE
No ratings yet
Machine Learning Algorithms PDF
Document148 pages
Machine Learning Algorithms PDF
jeff omanga
No ratings yet
Green GDP 2
Document51 pages
Green GDP 2
Vikas Gupta
No ratings yet
Chemical Hardness Ralph G. Pearson
Document208 pages
Chemical Hardness Ralph G. Pearson
Vasu Nagpal
No ratings yet
Scikit Hca
Document8 pages
Scikit Hca
hade.bisns
No ratings yet
Unit-2 Feature Selection
Document92 pages
Unit-2 Feature Selection
Rahul Vashistha
No ratings yet
Unit 2 ML
Document93 pages
Unit 2 ML
Siti Hariksa Amalia
No ratings yet
Ass-2 Ds
Document29 pages
Ass-2 Ds
Vedant Andhale
No ratings yet
ML in Python Part-2
Document21 pages
ML in Python Part-2
Usman Ali
No ratings yet
Pythonfile
Document36 pages
Pythonfile
collection58209
No ratings yet
Pattern
Document1 page
Pattern
ahmadkhalil
No ratings yet
Scikit Learn
Document17 pages
Scikit Learn
RR
No ratings yet
03 A Polynomial Linear Regression
Document6 pages
03 A Polynomial Linear Regression
Gabriel Gheorghe
No ratings yet
Lab 08 - Data Preprocessing
Document9 pages
Lab 08 - Data Preprocessing
rida
No ratings yet
ML - Practical File
Document15 pages
ML - Practical File
Jatin Mathur
No ratings yet
Py PPT 06
Document33 pages
Py PPT 06
Adeesh Gowda A. T
No ratings yet
Matlab NN Toolbox
Document18 pages
Matlab NN Toolbox
كاميلو الرأسمالي
No ratings yet
Exercise and Experiment 3
Document14 pages
Exercise and Experiment 3
h8792670
No ratings yet
EE2211 CheatSheet
Document15 pages
EE2211 CheatSheet
Aditi
No ratings yet
Lecture Material 3
Document7 pages
Lecture Material 3
2021me372
No ratings yet
Case Study 1 v2
Document28 pages
Case Study 1 v2
Aiman Nazeer Ahmed
No ratings yet
1.2 Data Cleaning
Document8 pages
1.2 Data Cleaning
mohamed
No ratings yet
ML Lab Manual
Document38 pages
ML Lab Manual
Rahul
No ratings yet
Machine Learning LAB: Practical-1
Document24 pages
Machine Learning LAB: Practical-1
Tsering Jhakree
100% (2)
Implementation of Time Series Forecasting
Document12 pages
Implementation of Time Series Forecasting
Soba C
No ratings yet
Machine Learning With SQL
Document12 pages
Machine Learning With SQL
prince krish
100% (1)
ML Practical 205160694034
Document33 pages
ML Practical 205160694034
09Samrat Bikram Shah
No ratings yet
Exp 3
Document4 pages
Exp 3
jay
No ratings yet
Neural Lab 1
Document5 pages
Neural Lab 1
Bashar Asaad
No ratings yet
Kabir Data Preprocessing Python
Document14 pages
Kabir Data Preprocessing Python
El Arbi Abdellaoui Alaoui
No ratings yet
Pattern Recognition
Document26 pages
Pattern Recognition
Aryan Attri
No ratings yet
J 6,7,8,9
Document69 pages
J 6,7,8,9
Johny Singh
No ratings yet
Deep Learning With Python File
Document22 pages
Deep Learning With Python File
Arnav Shrivastava
No ratings yet
Building Good Training Sets UNIT 1 PART2
Document46 pages
Building Good Training Sets UNIT 1 PART2
Aditya Sharma
No ratings yet
Data Science - Unit II
Document173 pages
Data Science - Unit II
DHEEVIKA SURESH
100% (1)
Experiment No 6
Document2 pages
Experiment No 6
Shubham Baikar
No ratings yet
XII CS UNIT I Part5 RECURSION 2020 21
Document6 pages
XII CS UNIT I Part5 RECURSION 2020 21
ARPIT SINGH
No ratings yet
Keras
Document3 pages
Keras
magas32862
No ratings yet
Exp 3 Bi 30
Document7 pages
Exp 3 Bi 30
Smaranika Patil
No ratings yet
Vid 4
Document6 pages
Vid 4
diyalap01
No ratings yet
Unit 2 MLMM
Document41 pages
Unit 2 MLMM
face the fear
No ratings yet
Week 4
Document2 pages
Week 4
bamek59014
No ratings yet
Octave MLP Neural Networks
Document25 pages
Octave MLP Neural Networks
RIska dewi
No ratings yet
Dimensional Reduction in R
Document24 pages
Dimensional Reduction in R
Shil Shambharkar
No ratings yet
ML - LAB Record
Document36 pages
ML - LAB Record
Bruhathi.S
No ratings yet
On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras
Document34 pages
On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras
JorgeMoises
No ratings yet
Programming Questions
Document5 pages
Programming Questions
Hari Sree. M
No ratings yet
Machine Learnin
Document23 pages
Machine Learnin
Manoj Kumar 1183
100% (2)
Lecture Material 10
Document9 pages
Lecture Material 10
Ali Naseer
No ratings yet
Class Activity-3
Document2 pages
Class Activity-3
B.K. GOEL
No ratings yet
60 ChatGPT Prompts For Data Science 2023
Document67 pages
60 ChatGPT Prompts For Data Science 2023
T L
100% (2)
Data Mining Journal 4 Kashan
Document8 pages
Data Mining Journal 4 Kashan
Kashan Riaz
No ratings yet
K-Nearest Neighbor On Python Ken Ocuma
Document9 pages
K-Nearest Neighbor On Python Ken Ocuma
Aliyha Dionio
100% (2)
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
Document8 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
Harshali Mane
No ratings yet
Experiment2 2158
Document3 pages
Experiment2 2158
sohit sharma
No ratings yet
Week 7 Laboratory Activity
Document12 pages
Week 7 Laboratory Activity
Gar Noob
No ratings yet
Scikit MLP Classification
Document8 pages
Scikit MLP Classification
Priyangka John Jayaraj
No ratings yet
CSL0777 L09
Document29 pages
CSL0777 L09
Konkobo Ulrich Arthur
No ratings yet
Finance
Document1 page
Finance
ahmadkhalil
No ratings yet
Python Scikit-Learn Cheat Sheet For Machine Learning
Document3 pages
Python Scikit-Learn Cheat Sheet For Machine Learning
gepiv94928
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
Rating: 3 out of 5 stars
3/5 (1)
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Preprocessing PDF
Document41 pages
Preprocessing PDF
sredhar s
No ratings yet
Intro To Big Data PDF
Document42 pages
Intro To Big Data PDF
sredhar s
No ratings yet
FDams
Document76 pages
FDams
sredhar s
No ratings yet
OE Assignment PDF
Document5 pages
OE Assignment PDF
sredhar s
No ratings yet
Data Science PDF
Document11 pages
Data Science PDF
sredhar s
No ratings yet
Traffic Safety and Environment
Document49 pages
Traffic Safety and Environment
sredhar s
No ratings yet
Traffic Rotaries
Document26 pages
Traffic Rotaries
sredhar s
No ratings yet
Signal Co Ordination
Document9 pages
Signal Co Ordination
sredhar s
No ratings yet
Traffic Signal Design
Document37 pages
Traffic Signal Design
sredhar s
No ratings yet
Channelization
Document12 pages
Channelization
sredhar s
No ratings yet
Grade 9 Charity WHLP (Week 6, Quarter 1)
Document5 pages
Grade 9 Charity WHLP (Week 6, Quarter 1)
Giezelle Leopando
No ratings yet
Prezentare TC RH 800 E
Document1 page
Prezentare TC RH 800 E
Anonymous C1AUhXGkH
No ratings yet
Job Interview Questions Ok
Document2 pages
Job Interview Questions Ok
Jaqueline Fernández
No ratings yet
Solution Manual For Operations and Supply Chain Management For The 21st Century 1st Edition by Boyer
Document7 pages
Solution Manual For Operations and Supply Chain Management For The 21st Century 1st Edition by Boyer
gowdie.mornward.b50a
100% (55)
Jurassic Production Facility (JPF) at Sabriya: Supplier Document Cover Sheet
Document2 pages
Jurassic Production Facility (JPF) at Sabriya: Supplier Document Cover Sheet
Biswas
No ratings yet
1 Notes and HW On Even Odd and Neither
Document3 pages
1 Notes and HW On Even Odd and Neither
Alin Lynda
No ratings yet
Biography of Schacht PDF
Document52 pages
Biography of Schacht PDF
pjrgledhill
No ratings yet
Cida Bulletin of Construction Statistics January 2021
Document29 pages
Cida Bulletin of Construction Statistics January 2021
Niruban Thaventhiran
No ratings yet
Computer and Interfacing Chapter Six Pin and Clock Generator
Document78 pages
Computer and Interfacing Chapter Six Pin and Clock Generator
migad
No ratings yet
Ceramics II Project Ideas Spring 2019
Document2 pages
Ceramics II Project Ideas Spring 2019
api-170572422
No ratings yet
Chap 1-5 Notes
Document14 pages
Chap 1-5 Notes
Bismah Saleem
No ratings yet
Chamberlain 01 SeafoodMeals
Document42 pages
Chamberlain 01 SeafoodMeals
John Karuwal
No ratings yet
Z-009-03-2015 - NCS Core Abilities Level 3
Document53 pages
Z-009-03-2015 - NCS Core Abilities Level 3
Aini Kamarudin
No ratings yet
Final PPT Siemens & NAFTA
Document23 pages
Final PPT Siemens & NAFTA
Johnnybravou
No ratings yet
TCCC Handbook Fall 2013
Document192 pages
TCCC Handbook Fall 2013
AMG_IA
100% (4)
Total Marks: 25 Duration: One Hour: Xii-Msths Case Study Questions - Test Term-I
Document8 pages
Total Marks: 25 Duration: One Hour: Xii-Msths Case Study Questions - Test Term-I
SABARI SRINIVAS A
No ratings yet
Kisssoft Changelog Version 03/2018 - Service Pack 6: Kisssoft - 3D Geometry (Step Interface)
Document24 pages
Kisssoft Changelog Version 03/2018 - Service Pack 6: Kisssoft - 3D Geometry (Step Interface)
Ashwanth Micheal
No ratings yet
JNTUH - B Tech - 2019 - 3 2 - May - R18 - ECE - 136BD DIP Digital Image Processing
Document2 pages
JNTUH - B Tech - 2019 - 3 2 - May - R18 - ECE - 136BD DIP Digital Image Processing
Sri Krishna
No ratings yet
This Robot Can Unload Up To 1,000 Cases Per Hour - Mashable
Document1 page
This Robot Can Unload Up To 1,000 Cases Per Hour - Mashable
Nikos Papados
No ratings yet
2018 2019 Enrolees PC
Document73 pages
2018 2019 Enrolees PC
Mellize Royo Damyong
No ratings yet
Abortion - A Philosophical Perspective: Opsomming
Document7 pages
Abortion - A Philosophical Perspective: Opsomming
George Rares
No ratings yet
MLAJ185-01 Advanced Java Programming Learning Manual - V1.0 April 2018 PDF
Document317 pages
MLAJ185-01 Advanced Java Programming Learning Manual - V1.0 April 2018 PDF
michael
No ratings yet
Oil Pressure Transmission PT2509
Document3 pages
Oil Pressure Transmission PT2509
Trisna Tea
No ratings yet
Successful Interviews
Document3 pages
Successful Interviews
Ria Sepriani
No ratings yet
Ephesians 4 - Interlinear
Document5 pages
Ephesians 4 - Interlinear
Sidney Almeida
No ratings yet
Chozhi Prasanam
Document1 page
Chozhi Prasanam
jebabalan
No ratings yet