Python Pandas Programming Lesson 1

Data Handling Using
Python Pandas - 1
© KIIT 2014
Objectives
After completing this lesson, you should be able to do
the following:
• Use Python Libraries
• Define Pandas
• Create Series
• Define DataFrame
• Import and Export Data between
– CSV Files
– DataFrames
• Differciate Pandas Series and NumPy ndarray
© KIIT 2014
Introduction to Python Libraries
• Python libraries contain a collection of built-in
modules.
• NumPy, Pandas and Matplotlib are three well-
established Python libraries for scientific and
analytical use.
© KIIT 2014
• NumPy : stands for ‘Numerical Python’, it is a
package that can be used for numerical data analysis
and scientific computing.
• NumPy uses a multidimensional array object and has
functions and tools for working with these arrays.
• Elements of an array stay together in memory,
hence, they can be quickly accessed.
© KIIT 2014
• PANDAS (PANel DAta) is a high-level data
manipulation tool used for analysing data.
• It is very easy to import and export data using Pandas
library which has a very rich set of functions.
• Pandas has three important data structures, namely
–
• Series, DataFrame and Panel to make the process of
analysing data organised, effective and efficient.
© KIIT 2014
• Matplotlib library in Python is used for plotting
graphs and visualisation.
• Using Matplotlib, with just a few lines of code we can
generate publication quality plots, histograms, bar
charts, scatterplots, etc.
• It is also built on Numpy, and is designed to work
well with Numpy and Pandas.
© KIIT 2014
NumPy v/s Pandas
© KIIT 2014
Data Structure in Pandas
• Pandas is an open source, Berkeley Software
Distribution(BSD) library built for Python
programming language
• A data structure is a collection of data values and
operations that can be applied to that data.
• Pandas library need to import in Python environment
before it’s use.
© KIIT 2014
Data Structure in Pandas
• Two commonly used data structures in Pandas will
be covered in this.
– Series
– DataFrame
© KIIT 2014
Series
• A Series is a one-dimensional array containing a
sequence of values of any data type (int, float, list,
string, etc) which by default have numeric data labels
starting from zero.
• Example Index Value
0 Arnab
1 Samridhi
2 Ramit
3 Divyam
4 Kritika
© KIIT 2014
Creation of Series
• There are different ways in which a series can be
created in Pandas:
‒ Creation of series from scalar values
‒ Creation of Series from NumPy Arrays
‒ Creation of Series from Dictionary
© KIIT 2014
Creation of Series
• Creation of series from scalar values:
import pandas as pd
series1 = pd.Series([10,20,30])
print(series1)
series2 = pd.Series(["Kavi","Shyam","Ravi"],
index=[3,5,1])
print(series2)
series2 =
pd.Series([2,3,4],index=["Feb","Mar","Apr"])
print(series2)
© KIIT 2014
Creation of Series
• Creation of Series from NumPy Arrays:
import numpy as np
import pandas as pd
array1 = np.array([1,2,3,4])
series3 = pd.Series(array1)
print(series3)
series4 = pd.Series(array1, index =
["Jan","Feb", "Mar", "Apr"])
print(series4)
series5 = pd.Series(array1, index =
["Jan","Feb", "Mar"])
© KIIT 2014
Creation of Series
• Creation of Series from Dictionary
dict1 = {'India': 'NewDelhi', 'UK': 'London', 'Japan':
'Tokyo'}
print(dict1)
series8 = pd.Series(dict1)
print(series8)
© KIIT 2014
Accessing Elements
• There are two common ways for accessing the
elements of a series:
– Indexing
– Slicing
© KIIT 2014
Accessing Elements
• Indexing
seriesNum = pd.Series([10,20,30])
seriesNum[2]
seriesMnths =
pd.Series([2,3,4],index=["Feb","Mar","Apr"])
seriesMnths["Mar"]
seriesCapCntry = pd.Series(['NewDelhi','WashingtonDC',
'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
seriesCapCntry['India']
seriesCapCntry[1]
seriesCapCntry[[3,2]]
seriesCapCntry[['UK','USA']]
seriesCapCntry.index=[10,20,30,40]
seriesCapCntry
© KIIT 2014
Accessing Elements
• Slicing
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC',
'London',
'Paris'], index=['India', 'USA', 'UK', 'France'])
seriesCapCntry[1:3]
seriesCapCntry['USA' : 'France']
seriesCapCntry[ : : -1]
import numpy as np
seriesAlph = pd.Series(np.arange(10,16,1),index = ['a',
'b', 'c', 'd', 'e', 'f'])
seriesAlph
seriesAlph[1:3] = 50
seriesAlph
seriesAlph['c':'e'] = 500
seriesAlph
© KIIT 2014
Attributes of Pandas Series
• Properties called attributes of a series can be access by using that
property with the series name:
© KIIT 2014
Attributes of Pandas Series
© KIIT 2014
Methods of Series
• Pandas Series supports methods for series
manipulation. Consider following series:
seriesAlph = pd.Series(np.arange(10,20,1))
Print(seriesTenTwenty)
© KIIT 2014
Methods of Series
© KIIT 2014
Methods of Series
© KIIT 2014
Mathematical Operations on Series
• Mathematical operations can also be performed on

two series in Pandas
• Index matching is implemented and all missing
values are filled in with NaN by default.
seriesA = pd.Series([1,2,3,4,5], index =
['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e'])
© KIIT 2014
Addition of two Series
• There are two ways for adding series:
– Two Series are simply added together.
seriesA + seriesB
© KIIT 2014
Addition of two Series
• The second method applied when NaN value is not
required in the ouput
seriesA.add(seriesB, fill_value=0)
© KIIT 2014
Subtraction of two Series
• Again, there are two ways for subtracting series:
– Two Series are simply subtracted from each other.
seriesA - seriesB
– Output
a 11.0
b NaN
c 53.0
d NaN
e -95.0
y NaN
z NaN
dtype: float64
© KIIT 2014
Subtraction of two Series
• Now replace the missing values with 1000 before
subtracting seriesB from seriesA using explicit
subtraction method sub().
seriesA.sub(seriesB, fill_value=1000)
• Output
a 11.0
b -998.0
c 53.0
d -996.0
e -95.0
y 980.0
z 990.0
dtype: float64
© KIIT 2014
Multiplication of two Series
• Again, there are two ways for multiplication
– Two Series are simply multiply from each other.
seriesA * seriesB
– Output
a -10.0
b NaN
c -150.0
d NaN
e 500.0
y NaN
z NaN
dtype: float64
© KIIT 2014
Multiplication of two Series
multiplication of seriesB with seriesA using explicit
multiplication method mul().
seriesA.mul(seriesB, fill_value=0)
• Output
a -10.0
b 0.0
c -150.0
d 0.0
e 500.0
y 0.0
z 0.0
dtype: float64
© KIIT 2014
Division of two Series
• Again, there are two ways for division
– The first Series is simply divide by second.
seriesA / seriesB
– Output
a -0.10
b NaN
c -0.06
d NaN
e 0.05
y NaN
z NaN
dtype: float64
© KIIT 2014
Division of two Series
dividing seriesA by seriesB using explicit division
method div().
seriesA.div(seriesB, fill_value=0)
• Output
a -0.10
b inf
c -0.06
d inf
e 0.05
y 0.00
z 0.00
dtype: float64
© KIIT 2014
DataFrame
• DataFrame is a two-dimensional labelled data
structure like a table of MySQL.
• Each column can have a different type of value such
as numeric, string, boolean, etc., as in tables of a
database.
© KIIT 2014
DataFrame
• It contains rows and columns, and has both a
row and column index.
© KIIT 2014
DataFrame
• <DataFrameObject>=panda.DataFrame(<a 2D
datastructure>,[column=column
list],[index=<indexes>])
© KIIT 2014
Creation of DataFrame
• There are different ways in which a DataFrame can
be created in Pandas:
‒ Creation of an empty DataFrame
‒ Creation of DataFrame from NumPy Arrays
‒ Creation of DataFrame from list of Dictionaries
‒ Creation of DataFrame from Dictionary of Lists
‒ Creation of DataFrame from Series
‒ Creation of DataFrame from Dictionary of Series
© KIIT 2014
• Creation of an empty DataFrame
import pandas as pd
dFrameEmt = pd.DataFrame()
dFrameEmt
• Creation of DataFrame from NumPy Arrays

import numpy as np
array1 = np.array([10,20,30])
array2 = np.array([100,200,300])
array3 = np.array([-10,-20,-30, -40])
dFrame4 = pd.DataFrame(array1)
dFrame4
dFrame5 = pd.DataFrame([array1, array3, array2],
columns=[ 'A', 'B', 'C', 'D'])
© KIIT 2014
• Creation of DataFrame from list of Dictionaries
listDict = [{'a':10, 'b':20}, {'a':5, 'b':10,
'c':20}]
dFrameListDict = pd.DataFrame(listDict)
dFrameListDict
• Creation of DataFrame from Dictionary of Lists

dictForest = {'State': ['Assam', 'Delhi',
'Kerala'], 'GArea': [78438, 1483, 38852] , 'VDF' :
[2797, 6.72,1663]}
dFrameForest= pd.DataFrame(dictForest)
dFrameForest
dFrameForest1 = pd.DataFrame(dictForest,
columns = ['State','VDF', 'GArea'])
© KIIT 2014
• Creation of DataFrame from Series
seriesA = pd.Series([1,2,3,4,5],
index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series ([1000,2000,-1000,-5000,1000],
index = ['a', 'b', 'c', 'd', 'e'])
seriesC = pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e'])
dFrame6 = pd.DataFrame(seriesA)
dFrame6
dFrame7 = pd.DataFrame([seriesA, seriesB])
dFrame7
dFrame8 = pd.DataFrame([seriesA, seriesC])
dFrame8
© KIIT 2014
• Creation of DataFrame from Dictionary of Series
ResultSheet={
'Arnab': pd.Series([90, 91, 97],
index=['Maths','Science','Hindi']),
'Ramit': pd.Series([92, 81, 96],
'Samridhi': pd.Series([89, 91, 88],
'Riya': pd.Series([81, 71, 67],
'Mallika': pd.Series([94, 95, 99],
index=['Maths','Science','Hindi'])}
ResultDF = pd.DataFrame(ResultSheet)
ResultDF
© KIIT 2014
• Union of all series indexes used to create the
DataFrame
dictForUnion = { 'Series1' :
pd.Series([1,2,3,4,5],
index = ['a', 'b', 'c', 'd', 'e']) ,
'Series2' :
pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e']),
'Series3' :
pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e']) }
dFrameUnion = pd.DataFrame(dictForUnion)
dFrameUnion
© KIIT 2014
Operations in DataFrames
• Basic operations can be performed on rows and
columns of a DataFrame like
– Selection
– Deletion
– Addition
– Renaming
© KIIT 2014
Selecting/Accessing a Subset
• To access row(s) and/or a combination of rows and
columns from a dataframe object, you can use
following syntax:
<DataFrameObject>.loc[<startrow>:<endrow>,
<startcolumn>:<endcolumn>]
© KIIT 2014
Accessing a Row
• To access row just give the row name/label:
<DataFrameObject>.loc[<rowlabel,:]
• Example:
ResultDF.loc['Maths',:]
• Output:
Arnab 90
Ramit 92
Samridhi 89
Riya 81
Mallika 94
Name: Maths, dtype: int64
© KIIT 2014
Accessing Multiple Rows
• To access multiple rows, use:
<DataFrameObject>.loc[<startrow>:<endrow>,:]
• Example:
ResultDF.loc['Maths’:’Hindi’,:]
• Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
© KIIT 2014
Accessing Selective Columns
• To access selective columns , use:
<DataFrameObject>.loc[:,<startcol>:<endcol>]
• Examples
ResultDF.loc[:,'Ramit':'Riya']
• Output:
Ramit Samridhi Riya
Maths 92 89 81
Science 81 91 71
Hindi 96 88 67
© KIIT 2014
Accessing Range Columns/Rows
• To access range of columns from a range of rows, use

<DataFrameObject>.loc[<startrow>:<endrow>,<startcol
>:<endcol>]
• Example
ResultDF.loc[‘Maths’:’Hindi’,'Ramit':'Riya']
• Output
Ramit Samridhi Riya
Maths 92 89 81
Science 81 91 71
Hindi 96 88 67
© KIIT 2014
Add Column to DataFrame
• New column can be added to a DataFrame
ResultDF['Preeti']=[89,78,76]
ResultDF
• Output
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
© KIIT 2014
Add New Row to DataFrame
• New row can be added to a DataFrame using
DataFrame.loc[] method.
ResultDF.loc['English'] = [85, 86, 83, 80, 90, 89]
ResultDF
• Output
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 76
English 85 86 83 80 90 89
© KIIT 2014
Deleting Rows & Columns from a DataFrame
• Row and columns from a DataFrame can be remove

using DataFrame.drop[] method.
ResultDF = ResultDF.drop(‘Science‘, axis=0)
ResultDF
• Output
Maths 90 92 89 81 94 89
Hindi 97 96 88 67 99 76
English 85 86 83 80 90 89
© KIIT 2014
Deleting Rows & Columns from a DataFrame
• Deleting columns from a DataFrame using

DataFrame.drop[] method.
ResultDF = ResultDF.drop(['Samridhi', 'Ramit‘,
'Riya'], axis=1)
ResultDF
• Output
Arnab Mallika Preeti
Maths 90 94 89
Hindi 97 99 76
English 85 90 89
© KIIT 2014
Rename Rows Labels of a DataFrame
• Labels of rows can be change of a DataFrame using

DataFrame.rename[] method.
ResultDF=ResultDF.rename({'Maths':'Sub1‘,‘Science':
'Sub2‘,'English':'Sub3‘,'Hindi':'Sub4'},axis='index
')
ResultDF
• Output
Arnab Ramit Samridhi Riya Mallika
Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
Sub3 97 96 88 67 99
Sub4 97 89 78 60 45
© KIIT 2014
Rename Column Labels of a DataFrame
• Labels of columns can be change of a DataFrame

using DataFrame.rename[] method.
ResultDF=ResultDF.rename({'Arnab':'Student1','Ramit
':'Student2',’Samridhi':'Student3',‘Riya':'Student4
'},’Malika’:’Student5’,axis='columns')
ResultDF
• Output
Student1 Student2 Student3 Student4 Student5
Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
Sub3 97 96 88 67 99
Sub4 97 89 78 60 45
© KIIT 2014
Creating DataFrame
• Q.1) Write the Python Code to create the DataFrame
that contains the following:
2020 2019 2018 2017
IP 100 99 100 96
CS 100 100 98 NaN
Maths 98 97 100 NaN
English 98 90 NaN NaN
• Also, Print the DataFrame.
© KIIT 2014
Creating DataFrame
• There are two ways to create DataFrame:
– With the help of list
– With the help of Dictionary
2020 2019 2018 2017

IP 100 99 100 96
CS 100 100 98 NaN
Maths 98 97 100 NaN
Engli 98 90 NaN NaN
sh
© KIIT 2014
Creating DataFrame
2020 2019 2018 2017
• With the help of list
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
Maths 98 97 100 NaN
© KIIT 2014
Creating DataFrame
2020 2019 2018 2017
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
© KIIT 2014
Creating DataFrame
2020 2019 2018 2017
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
L2 = [100,100,98]
© KIIT 2014
Creating DataFrame
2020 2019 2018 2017
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
L2 = [100,100,98]
L3 = [98,97,100] English 98 90 NaN NaN
© KIIT 2014
Creating DataFrame
2020 2019 2018 2017
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
L2 = [100,100,98]
L3 = [98,97,100] English 98 90 NaN NaN
L4 = [98,90]
© KIIT 2014
Creating DataFrame
2020 2019 2018 2017
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
L1 = [100,99,100,96]
Maths 98 97 100 NaN
L2 = [100,100,98]
L3 = [98,97,100] English 98 90 NaN NaN
L4 = [98,90]
data = pd.DataFrame([L1,L2,L3,L4], index=[‘IP’,’CS’,’Maths’,’English’], columns =
[2020,2019,2018,2017)
print(data)
© KIIT 2014
Creating DataFrame
2020 2019 2018 2017
• With the help of dictionary
IP 100 99 100 96
Import pandas as pd
CS 100 100 98 NaN
D1={‘IP’:100,’CS’:100,’Maths’:98,’Englis
h’:98} Maths 98 97 100 NaN
D2={‘IP’:99,’CS’:100,’Maths’:97,’English English 98 90 NaN NaN
’:90}
D3={‘IP’:100,’CS’:98,’Maths’:100}
D4={‘IP’:196}
D= pd.DataFrame({2020: D1, 2019: D2,
2018: D3, 2017: D4})
print(D)
© KIIT 2014

Python Pandas Programming Lesson 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python Pandas Programming Lesson 1

Uploaded by

Copyright:

Available Formats

Data Handling Using

• Mathematical operations can also be performed on

• Creation of DataFrame from NumPy Arrays

• Creation of DataFrame from Dictionary of Lists

• To access range of columns from a range of rows, use

• Row and columns from a DataFrame can be remove

• Deleting columns from a DataFrame using

• Labels of rows can be change of a DataFrame using

• Labels of columns can be change of a DataFrame

CS 100 100 98 NaN

Maths 98 97 100 NaN

English 98 90 NaN NaN

• Also, Print the DataFrame.

2020 2019 2018 2017

You might also like