Data Handling Using

Python Pandas - 1

After completing this lesson, you should be able to do
the following:
• Use Python Libraries
• Define Pandas
• Create Series
• Define DataFrame
• Import and Export Data between
– CSV Files
– DataFrames
• Differciate Pandas Series and NumPy ndarray

Introduction to Python Libraries
• Python libraries contain a collection of built-in
• NumPy, Pandas and Matplotlib are three well-
established Python libraries for scientific and
analytical use.

Introduction to Python Libraries
• NumPy : stands for ‘Numerical Python’, it is a
package that can be used for numerical data analysis
and scientific computing.
• NumPy uses a multidimensional array object and has
functions and tools for working with these arrays.
• Elements of an array stay together in memory,
hence, they can be quickly accessed.

Introduction to Python Libraries
• PANDAS (PANel DAta) is a high-level data
manipulation tool used for analysing data.
• It is very easy to import and export data using Pandas
library which has a very rich set of functions.
• Pandas has three important data structures, namely

• Series, DataFrame and Panel to make the process of
analysing data organised, effective and efficient.

Introduction to Python Libraries
• Matplotlib library in Python is used for plotting
graphs and visualisation.
• Using Matplotlib, with just a few lines of code we can
generate publication quality plots, histograms, bar
charts, scatterplots, etc.
• It is also built on Numpy, and is designed to work
well with Numpy and Pandas.

NumPy v/s Pandas

Data Structure in Pandas
• Pandas is an open source, Berkeley Software
Distribution(BSD) library built for Python
programming language
• A data structure is a collection of data values and
operations that can be applied to that data.
• Pandas library need to import in Python environment
before it’s use.

Data Structure in Pandas
• Two commonly used data structures in Pandas will
be covered in this.
– Series
– DataFrame

• A Series is a one-dimensional array containing a
sequence of values of any data type (int, float, list,
string, etc) which by default have numeric data labels
starting from zero.
• Example Index Value
0 Arnab
1 Samridhi
2 Ramit
3 Divyam
4 Kritika

Creation of Series
• There are different ways in which a series can be
created in Pandas:
‒ Creation of series from scalar values
‒ Creation of Series from NumPy Arrays
‒ Creation of Series from Dictionary

Creation of Series
• Creation of series from scalar values:
import pandas as pd
series1 = pd.Series([10,20,30])
series2 = pd.Series(["Kavi","Shyam","Ravi"],
series2 =

Creation of Series
• Creation of Series from NumPy Arrays:
import numpy as np
import pandas as pd
array1 = np.array([1,2,3,4])
series3 = pd.Series(array1)
series4 = pd.Series(array1, index =
["Jan","Feb", "Mar", "Apr"])
series5 = pd.Series(array1, index =
["Jan","Feb", "Mar"])

Creation of Series
• Creation of Series from Dictionary
dict1 = {'India': 'NewDelhi', 'UK': 'London', 'Japan':
series8 = pd.Series(dict1)

Accessing Elements
• There are two common ways for accessing the
elements of a series:
– Indexing
– Slicing

Accessing Elements
• Indexing
seriesNum = pd.Series([10,20,30])
seriesMnths =
seriesCapCntry = pd.Series(['NewDelhi','WashingtonDC',
'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])

Accessing Elements
• Slicing
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC',
'Paris'], index=['India', 'USA', 'UK', 'France'])
seriesCapCntry['USA' : 'France']
seriesCapCntry[ : : -1]
import numpy as np
seriesAlph = pd.Series(np.arange(10,16,1),index = ['a',
'b', 'c', 'd', 'e', 'f'])
seriesAlph[1:3] = 50
seriesAlph['c':'e'] = 500

Attributes of Pandas Series
• Properties called attributes of a series can be access by using that
property with the series name:

Attributes of Pandas Series

Methods of Series
• Pandas Series supports methods for series
manipulation. Consider following series:
seriesAlph = pd.Series(np.arange(10,20,1))

Methods of Series

Methods of Series

Mathematical Operations on Series

• Mathematical operations can also be performed on

two series in Pandas
• Index matching is implemented and all missing
values are filled in with NaN by default.
seriesA = pd.Series([1,2,3,4,5], index =
['a', 'b', 'c', 'd', 'e'])

seriesB = pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e'])

Addition of two Series
• There are two ways for adding series:
– Two Series are simply added together.
seriesA + seriesB

Addition of two Series
• The second method applied when NaN value is not
required in the ouput
seriesA.add(seriesB, fill_value=0)

Subtraction of two Series
• Again, there are two ways for subtracting series:
– Two Series are simply subtracted from each other.
seriesA - seriesB

– Output
a 11.0
b NaN
c 53.0
d NaN
e -95.0
y NaN
z NaN
dtype: float64

Subtraction of two Series
• Now replace the missing values with 1000 before
subtracting seriesB from seriesA using explicit
subtraction method sub().
seriesA.sub(seriesB, fill_value=1000)

• Output
a 11.0
b -998.0
c 53.0
d -996.0
e -95.0
y 980.0
z 990.0
dtype: float64

Multiplication of two Series
• Again, there are two ways for multiplication
– Two Series are simply multiply from each other.
seriesA * seriesB

– Output
a -10.0
b NaN
c -150.0
d NaN
e 500.0
y NaN
z NaN
dtype: float64

Multiplication of two Series
• Now replace the missing values with 0 before
multiplication of seriesB with seriesA using explicit
multiplication method mul().
seriesA.mul(seriesB, fill_value=0)

• Output
a -10.0
b 0.0
c -150.0
d 0.0
e 500.0
y 0.0
z 0.0
dtype: float64

Division of two Series
• Again, there are two ways for division
– The first Series is simply divide by second.
seriesA / seriesB

– Output
a -0.10
b NaN
c -0.06
d NaN
e 0.05
y NaN
z NaN
dtype: float64

Division of two Series
• Now replace the missing values with 0 before
dividing seriesA by seriesB using explicit division
method div().
seriesA.div(seriesB, fill_value=0)

• Output
a -0.10
b inf
c -0.06
d inf
e 0.05
y 0.00
z 0.00
dtype: float64

• DataFrame is a two-dimensional labelled data
structure like a table of MySQL.
• Each column can have a different type of value such
as numeric, string, boolean, etc., as in tables of a

• It contains rows and columns, and has both a
row and column index.

• <DataFrameObject>=panda.DataFrame(<a 2D

Creation of DataFrame
• There are different ways in which a DataFrame can
be created in Pandas:
‒ Creation of an empty DataFrame
‒ Creation of DataFrame from NumPy Arrays
‒ Creation of DataFrame from list of Dictionaries
‒ Creation of DataFrame from Dictionary of Lists
‒ Creation of DataFrame from Series
‒ Creation of DataFrame from Dictionary of Series

Creation of DataFrame
• Creation of an empty DataFrame
import pandas as pd
dFrameEmt = pd.DataFrame()

• Creation of DataFrame from NumPy Arrays

import numpy as np
array1 = np.array([10,20,30])
array2 = np.array([100,200,300])
array3 = np.array([-10,-20,-30, -40])
dFrame4 = pd.DataFrame(array1)
dFrame5 = pd.DataFrame([array1, array3, array2],
columns=[ 'A', 'B', 'C', 'D'])

Creation of DataFrame
• Creation of DataFrame from list of Dictionaries
listDict = [{'a':10, 'b':20}, {'a':5, 'b':10,
dFrameListDict = pd.DataFrame(listDict)

• Creation of DataFrame from Dictionary of Lists

dictForest = {'State': ['Assam', 'Delhi',
'Kerala'], 'GArea': [78438, 1483, 38852] , 'VDF' :
[2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest1 = pd.DataFrame(dictForest,
columns = ['State','VDF', 'GArea'])

Creation of DataFrame
• Creation of DataFrame from Series
seriesA = pd.Series([1,2,3,4,5],
index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series ([1000,2000,-1000,-5000,1000],
index = ['a', 'b', 'c', 'd', 'e'])
seriesC = pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e'])
dFrame6 = pd.DataFrame(seriesA)
dFrame7 = pd.DataFrame([seriesA, seriesB])
dFrame8 = pd.DataFrame([seriesA, seriesC])

Creation of DataFrame
• Creation of DataFrame from Dictionary of Series
'Arnab': pd.Series([90, 91, 97],
'Ramit': pd.Series([92, 81, 96],
'Samridhi': pd.Series([89, 91, 88],
'Riya': pd.Series([81, 71, 67],
'Mallika': pd.Series([94, 95, 99],
ResultDF = pd.DataFrame(ResultSheet)

Creation of DataFrame
• Union of all series indexes used to create the
dictForUnion = { 'Series1' :
index = ['a', 'b', 'c', 'd', 'e']) ,
'Series2' :
index = ['z', 'y', 'a', 'c', 'e']),
'Series3' :
index = ['z', 'y', 'a', 'c', 'e']) }
dFrameUnion = pd.DataFrame(dictForUnion)

Operations in DataFrames
• Basic operations can be performed on rows and
columns of a DataFrame like
– Selection
– Deletion
– Addition
– Renaming

Selecting/Accessing a Subset
• To access row(s) and/or a combination of rows and
columns from a dataframe object, you can use
following syntax:

Accessing a Row
• To access row just give the row name/label:
• Example:

• Output:
Arnab 90
Ramit 92
Samridhi 89
Riya 81
Mallika 94
Name: Maths, dtype: int64

Accessing Multiple Rows
• To access multiple rows, use:
• Example:

• Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

Accessing Selective Columns
• To access selective columns , use:
• Examples

• Output:
Ramit Samridhi Riya
Maths 92 89 81
Science 81 91 71
Hindi 96 88 67

Accessing Range Columns/Rows

• To access range of columns from a range of rows, use


• Example

• Output
Ramit Samridhi Riya
Maths 92 89 81
Science 81 91 71
Hindi 96 88 67

Add Column to DataFrame
• New column can be added to a DataFrame

• Output
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76

Add New Row to DataFrame
• New row can be added to a DataFrame using
DataFrame.loc[] method.
ResultDF.loc['English'] = [85, 86, 83, 80, 90, 89]

• Output
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
English 85 86 83 80 90 89

Deleting Rows & Columns from a DataFrame

• Row and columns from a DataFrame can be remove

using DataFrame.drop[] method.
ResultDF = ResultDF.drop(‘Science‘, axis=0)

• Output
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Hindi 97 96 88 67 99 76
English 85 86 83 80 90 89

Deleting Rows & Columns from a DataFrame

• Deleting columns from a DataFrame using

DataFrame.drop[] method.
ResultDF = ResultDF.drop(['Samridhi', 'Ramit‘,
'Riya'], axis=1)

• Output
Arnab Mallika Preeti
Maths 90 94 89
Hindi 97 99 76
English 85 90 89

Rename Rows Labels of a DataFrame

• Labels of rows can be change of a DataFrame using

DataFrame.rename[] method.
• Output
Arnab Ramit Samridhi Riya Mallika
Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
Sub3 97 96 88 67 99
Sub4 97 89 78 60 45

Rename Column Labels of a DataFrame

• Labels of columns can be change of a DataFrame

using DataFrame.rename[] method.
• Output
Student1 Student2 Student3 Student4 Student5
Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
Sub3 97 96 88 67 99
Sub4 97 89 78 60 45

Creating DataFrame
• Q.1) Write the Python Code to create the DataFrame
that contains the following:
2020 2019 2018 2017

IP 100 99 100 96

CS 100 100 98 NaN

Maths 98 97 100 NaN

English 98 90 NaN NaN

• Also, Print the DataFrame.

Creating DataFrame
• There are two ways to create DataFrame:
– With the help of list
– With the help of Dictionary

2020 2019 2018 2017

IP 100 99 100 96
CS 100 100 98 NaN
Maths 98 97 100 NaN
Engli 98 90 NaN NaN

Creating DataFrame
2020 2019 2018 2017
• With the help of list
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
Maths 98 97 100 NaN
English 98 90 NaN NaN

Creating DataFrame
2020 2019 2018 2017
• With the help of list
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
English 98 90 NaN NaN

Creating DataFrame
2020 2019 2018 2017
• With the help of list
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
L2 = [100,100,98]
English 98 90 NaN NaN

Creating DataFrame
2020 2019 2018 2017
• With the help of list
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
L2 = [100,100,98]
L3 = [98,97,100] English 98 90 NaN NaN

Creating DataFrame
2020 2019 2018 2017
• With the help of list
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
L2 = [100,100,98]
L3 = [98,97,100] English 98 90 NaN NaN

L4 = [98,90]

Creating DataFrame
2020 2019 2018 2017
• With the help of list
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
L2 = [100,100,98]
L3 = [98,97,100] English 98 90 NaN NaN

L4 = [98,90]
data = pd.DataFrame([L1,L2,L3,L4], index=[‘IP’,’CS’,’Maths’,’English’], columns =

Creating DataFrame
2020 2019 2018 2017
• With the help of dictionary
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
h’:98} Maths 98 97 100 NaN
D2={‘IP’:99,’CS’:100,’Maths’:97,’English English 98 90 NaN NaN
D= pd.DataFrame({2020: D1, 2019: D2,
2018: D3, 2017: D4})

