Python 4

Pdf – PYTHON PANDAS
CLASS : XII 2022 - 23

PDF - 4
DATAFRAMES - continuation
DATAFRAME OPERATIONS on ROW and COLS :
# creating dataframe from Dictionary of Series

import pandas as pd
clas = pd.Series(["XII A", "XII B","XII C","XII D","XII E"])
P
name = pd.Series(["vikrant", “Kevin","Nitisha","Manoj","Artha"])
R
age = pd.Series([16,15,13,15,15])
O
oldschool = pd.Series(["APS BLR","KV MEG","APS ASC", "APS
G PRTC","APS PUNE"])
R dic= {"Class" : clas , "Name" : name , "Age" : age , "2018" :
A oldschool}
M df=pd.DataFrame(dic , columns = ["Class","Name","Age","2018"])
print (df)
O
U
T
P
U
T
#1. Selecting /Accessing a single column / Slicing single column :

The square bracket notation df1[“Class”] , df[“2018”] can be used when the column
names are strings('Class') or numbers(2018).
The dot notation can only be used when the column name is a string(df1.Class). Hence
we use the square bracket notation in general for all cases.
EXAMPLE - PROGRAM OUTPUT
Page 1 of 32
print (df["Class"])
print (df["2018"])
print (df.Class)
print (df.Name)
print (df.2018) Error

Reason : dot( . ) will work
only with string column
names
#2. Selecting / Accessing Multiple columns / Slicing multiple column :

To access multiple columns , you can give a list having multiple column names
inside the square brackets with dataframe object.

Class Age
0 XII A 16
print(df[["Class","Age"]])
1 XII B 15
2 XII C 13
3 XII D 15
4 XII E 15
Page 2 of 32
print(df[["Class","Age","Address"]]) KeyError: "['Address'] not in
index"
Q- Given a DataFrame namely aid that stores the aid by NGOs for different
states:
Write program to display the aid for

(i) Books and Uniform only
(ii) Shoes only
import pandas as pd
Andhra = {"Toys":7916,"Books":6189,"Uniform":610,"Shoes":8810} # dict 1
Odisha = {"Toys":8508,"Books":8208,"Uniform":508,"Shoes":6798} # dict 2
MP ={"Toys":7226,"Books":6149,"Uniform":611,"Shoes":9611} # dict 3
UP = {"Toys":7617,"Books":6157,"Uniform":457,"Shoes":6457} # dict 4
States = [Andhra, Odisha, MP, UP] # List of dictionaries
aid = pd.DataFrame(states , index = ['Andhra', 'Odisha', 'MP', 'UP'] )
print(aid)
print("Aid for books and uniform:")
print(aid[['Books','Uniform']])
print("Aid for shoes")
print(aid.Shoes)
Output
Page 3 of 32
#3. Selecting / Accessing Multiple ROWS :
For accessing multiple rows of a dataframe , [ ] notation can be used.
# creating dataframe from Dictionary of lists

import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
P
R name = ["vikrant", “Kevin","Nitisha","Manoj","Artha"]
O age = [16,15,13,15,15]
G dic= {"Class" : clas , "Name" : name , "Age" : age }
R df=pd.DataFrame(dic , columns = ["Class","Name","Age"] ,
A index=['S1','S2','S3','S4','S5'])
M print (df)
O
U
T
P
U
T
Page 4 of 32
print(df['S1':'S3'])
note : rows from index S1 to S3 will be sliced

and generated
print(df[1:3])
note : rows from index 1 to 3 will be sliced and

generated, row at index 3 will not be
generated.
print(df[:3])
note : rows from starting index to 3 will be

sliced and generated, row at index 3 will not
be generated.
print(df[2:])
note : rows from index 2 till end of dataframe

will be sliced and generated
print(df[0:4:2])
note : rows from index 0 to 4 , 4 will not be

considered and step value 2 will be followed.
Note : don‟t use [ ] for accessing individual rows.
#4. Selecting/Accessing a subset from a DataFrame using ROW/COLUMN

NAMES :
To access single row(s) and or a combination of rows and columns, you can use
following syntax to select/access from a database object.
<df>.loc[<startrow> : <endrow> , <startcolumn> : <endcolumn>]
a) To Access a single row :
<df>.loc[„row label / row index ,:]
Page 5 of 32
import pandas as pd
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS ASC",
"APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" : age ,
"2018" : oldschool}
df=pd.DataFrame(dic , index=['S1','S2','S3','S4','S5'] )
print (df)
print("The first row is \n" ,df.loc['S1',:]) The first row is

Class XII A
Note : loc[] works with label based index number Name vikrant
( Not writing anything after the ,: will retrieve all Age 16
columns values) Make sure not to miss the 2018 APS BLR
Name: S1, dtype: object
COLON AFTER COMMA.
OR
print("The first row is \n" , df.loc['S1'])
The above retrieval of a single row , retrieves

the output in form of a series object..
print("The THIRD row is \n" ,df.loc['S3',:]) The THIRD row is

Class XII C
( Not writing anything after the ,: will retrieve all Name Nitisha
columns values ) Age 13
2018 APS ASC
b) To Access multiple rows :

<df>.loc[<startrow> : <endrow> , :]
Page 6 of 32
import pandas as pd
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" :
age , "2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])
print (df)
print("The first three rows are \n" ,df.loc['S1' : The first three rows are
„S3‟,:]) Class Name Age 2018

S1 XII A vikrant 16 APS BLR
Note : loc[] works with label based index number
S2 XII B Kevin 15 KV MEG
S3 XII C Nitisha 13 APS ASC
# Accessing Random Rows
print(df.loc[['S1','S3']])
c) To Access selective columns :

<df>.loc[ : , <startcolumn> : <endcolumn>]
print("The first two columns are \n" ,df.loc[: , The first two columns are
Class Name
'Class' : 'Name'] )
S1 XII A vikrant
Note : loc[] works with label based index number S2 XII B Kevin
( Not writing anything before the ,: will retrieve all S3 XII C Nitisha
S4 XII D Manoj
records )
S5 XII E Artha
d) To Access Range of rows and Range of columns :

<df>.loc[<startrow> : <endrow> , <startcolumn> : <endcolumn>]
print("The first two columns and rows are \n" The first two columns and rows are
,df.loc[„S1‟:‟S2‟ , 'Class' : 'Name'] ) Class Name

S1 XII A vikrant
Note : loc[] works with label based index number
S2 XII B Kevin
Page 7 of 32
Q- Given a DataFrame namely aid that stores the aid by NGOs for different
states:
Write a program to display the aid for states “Andhra” and “Odisha” for
Books and Uniform only.
Solution-
import pandas as pd
Andhra = {"Toys":7916 , "Books":6189 , "Uniform":610 , "Shoes":8810}
Odisha = {"Toys":8508 , "Books":8208 , "Uniform":508 , "Shoes":6798}
MP = {"Toys":7226 , "Books":6149 , "Uniform":611 , "Shoes":9611}
UP = {"Toys":7617 , "Books":6157 , "Uniform":457 , "Shoes":6457}
states = [Andhra, Odisha, MP, UP]
aid = pd.DataFrame(states, index = ['Andhra', 'Odisha', 'MP', 'UP'])
print(aid.loc['Andhra' : 'Odisha', 'Books' : 'Uniform'])
Output-
NOTE:- You may also specify distinct row index and column names as lists with
loc.
E.g.
aid.loc[ ['Andhra' , 'Odisha'] , ['Books' , 'Uniform'] ]
Selecting ROWS/COLUMNS from a DataFrame :

Sometimes your dataframe object does not contain row or column labels or
even you may not remember them. In such cases, you can extract subset from
dataframe using the row and column numeric index/postion, but this time
you will use iloc instead of loc. iloc means integer location.
<df>.iloc[<startrow> : <endrow> , <startcolumn> : <endcolumn>]
a) To Access a single row :
Page 8 of 32
import pandas as pd
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS ASC",
"APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" : age ,
"2018" : oldschool}
print (df)
The SECOND row is

print("The SECOND row is \n" ,df.iloc[1,:])
Class XII B
Note : iloc[] works with integer based index number Name Kevin
Age 15
( Not writing anything after the ,: will retrieve all
2018 KV MEG
columns values)
b) To Access multiple rows :

<df>.iloc[<startrow> : <endrow> , :]
import pandas as pd
age = [16,15,13,15,15]
print (df)
print("The first three rows are \n" ,df.iloc[0:3 ,:]) The first three rows are
Class Name Age 2018
Note : iloc[] works with integer based index
number, the row at the ending index number will
Page 9 of 32
not be retrieved. S3 XII C Nitisha 13 APS ASC
# Accessing Random Rows
print(df.iloc[[0,3]])
c) To Access selective columns :

<df>.iloc[: , <startcolumn> : <endcolumn>]
print("The first two columns are \n" ,df.iloc[: , 0 The first two columns are
: 2] ) Class Name
S1 XII A vikrant
S2 XII B Kevin
number, the column at the ending index number
S3 XII C Nitisha
will not be retrieved.
S4 XII D Manoj
S5 XII E Artha
d) To Access Range of rows and Range of columns :

<df>.iloc[<startrow> : <endrow> , <startcolumn> : <endcolumn>]
print("The first two columns and rows are \n" The first two columns and rows are
,df.iloc[0 : 2 , 0 : 2] ) Class Name

S1 XII A vikrant
S2 XII B Kevin
number, the column row at the ending index
number will not be retrieved.
print("The columns and rows values are \n" The columns and rows values are
,df.iloc[0:3:2 , 0:3:2] ) Class Age
S1 XII A 16
Note : 0 is the starting index , 3 in the ending index (not
inclusive ) and 2 is the step S3 XII C 13
#5. Selecting/Accessing individual value using column name and

row name
To access the individual data value from a dataframe , we have 2 methods
(i) Providing Row label or row index in square brackets:
<df>.<column name>[<row name >or <row index]
print("The value at the row number 2 of Age The value at the row number 2
Column is :\t",df.Age[1]) of Age Column is : 15
Page 10 of 32
print("The value at the row number 2 of Age The value at the row number 2
Column is :\t",df.Age["S2"]) of Age Column is : 15
(ii) using „at‟ or „iat‟ :
<df>.at[<row label>,<column label>] # a form of [x,y]
<df>.iat[<row index>,<column index>]
print("The value at the S2 - Age is The value at the S2 - Age is : 15
:\t",df.at["S2","Age"])
print("The value at the 2,2 is :\t",df.iat[2,2]) The value at the 2,2 is : 13
Difference between at, iat, loc, iloc:

at –used to access a single element of a DataFrame using row index name and column
label name
iat - used to access a single element of a DataFrame using row index number and
column index number
loc – used to access a group of rows and columns using row index name and column
label name
iloc - used to access a group of rows and columns using row index number and
column index number
#6. Modifying a single data value :
All the four methods described previously to access individual values of a DataFrame
can be used to also change an individual value of a DataFrame.
import pandas as pd
age = [16,15,13,15,15]
Page 11 of 32
print (df)
df.at[“S1”, “Class”] = “XII F” Class Name Age 2018

S1 XII F vikrant 16 APS BLR
print(df)
( uses row label and column name )
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE
df.iat[0,0] = "XII A" Class Name Age 2018

print(df)
( uses row index and column index – row index
and column index starts with zero by default)
df.Age["S1"]= 20 Class Name Age 2018

print(df)
df.Age[2]= 16 Class Name Age 2018

print(df)
#7. Adding / Changing a column (same value in all rows)
Assigning a value to a column will add a new column(if doesn‟t exists) and modifies
the value of the column (if it exists).
< df >[„column name‟] = <value>
Page 12 of 32
# creating dataframe from Dictionary
of Series
import pandas as pd
clas = ["XII A", "XII B","XII
C","XII D","XII E"]
name = ["vikrant",
"Kevin","Nitisha","Manoj","Artha"]
eng = [76,75,73,85,95]
phy = [86,85,53,95,65]
maths = [66,95,63,75,65]
dic ={'Class':clas,'Name':name
,'Eng':eng,'Phy':phy,'Maths':maths}
df=pd.DataFrame(dic
,index=['S1','S2','S3','S4','S5'])
print (df)
df['Phy'] = 70
print(df)
( Note : since ‘Phy’ column was
already existing in the dataframe ,
the value of that column gets updated
with the value 70 for all rows. )
df['Phy']=[51,52] ValueError :
print(df) ValueError: Length of values (2) does not match length

of index (5)
df['Chem'] = 70
print(df)
( Note : creates a new column
‘Chem’ and fills the value 70 for all
rows of the dataframe )
#8 . Adding / Changing a column (different values in all rows)
< df >[„column name‟] =[ list of elements ]

df['Chem'] = [70 , 80 ,90,95, 56]
print(df)
Page 13 of 32
( Note : creates a new column ‘chem’
and fills the values the value as per the
list of elements )
df[„Chem‟] = [70 , 80 ,90,95, 56,82] ValueError: Length of values does not match length of
index
print(df)
( note : giving less / more values will
create error)
df['Total'] =
df['Eng']+df['Phy']+df['Maths']+df['Che
m']
print(df)
( Note : creates a new column ‘Total’
and fills the valueby adding eng ,phy,
maths and chem )
df.loc[:,'Grade']=['a1','a2','a1','b1','b2']
print(df)
[ Alternate method a new column with
loc method ]
#9 . Adding / Changing a row (same values in all columns)
Like columns , we can add /change rows to a DataFrame using at or loc attributes
<df>.at[<row label> , :] = <new value >
<df>.loc[<row label> , :] = <new value >
Note :
If there exists a row with the mentioned row label , then the value of the row
gets modified with the specified value else a new row will be created with that
label and gets filled with that value.
Page 14 of 32
df.at['S6',:] = 'XII F'
print(df)
( note : a new row with row label ‘S6’ will be
created with all column values as XII F)
df.loc['S6',:] = 'XII G'

print(df)
( note : the existing row with row label ‘S6’
is replaced with XIIG in all columns , the
same is the case with at function also)
Note : there is no difference between „at‟ and
„loc‟ methods
#10 . Adding / Changing a row (different values in different columns)
df.loc['S6',:] = [ 'XII G', 'anju', 55.0, 89.0, 100.0,

23.0, 279.0, 'b2']
print(df)
( note : the existing row with row label ‘S6’ is
replaced with list of elements in all columns )
Instead of loc , „at‟ also can be used
( note : giving less / more values will create
error)
Note :
If there exists a row with the mentioned row label , then the value of the row
gets modified with the specified values else a new row will be created with that
label and gets filled with that values.
Q- Consider the following dataframe saleDf:

Target Sales
zoneA 56000 58000
zoneB 70000 68000
zoneC 75000 78000
zoneC 60000 61000
Page 15 of 32
Write a program to add a column namely Orders having values 6000, 6700,
6200 and 6000 respectively for the zones A,B,C and D. The program should
also add a new row for a new zone ZoneE.
Solution-
import pandas as pd
zoneA={'Target':56000, 'Sales':58000}
zoneB={'Target':70000, 'Sales':68000}
zoneC={'Target':75000, 'Sales':78000}
zoneD={'Target':60000, 'Sales':61000}
zones=[ zoneA , zoneB , zoneC , zoneD ]
saleDf = pd.DataFrame(zones , index=['zoneA' , 'zoneB' , 'zoneC' ,'zoneD'] ,
columns=[ 'Target' , 'Sales' ])
saleDf['Orders'] = [6000, 6700, 6200, 6000]
saleDf.loc['zoneE', :] = [ 50000 , 45000, 5000]
print(saleDf)
Output:-
#11. Deleting an existing column / row from a DataFrame
There are three ways of deleting a column from a DataFrame:
a) using the python del method as: del <df>[columnname]

The del method is used to delete a single column from a DataFrame
b) using the dataframe drop() method :
<df>.drop( [row/col labels] , axis = 0 or 1 , inplace = True / False )
The drop method can be used to delete rows (axis=0) or columns(axis=1).
The first parameter is a list containing either the row index names or the column
Index names.
Page 16 of 32
The parameter inplace=True is used to modify the existing dataframe df
itself.
If this parameter is not specified or is mentioned False then the dataframe df is not
modified, instead it returns a new dataframe with the modifications.
c) Using the pop('columnname') method
The pop() method is used to delete a single column from a DataFrame. In addition,
The column that was deleted is returned back as a Series object.
Class Name Age 2018 Marks

S1 XII A vikrant 16 APS BLR 70
S2 XII B Kevin 15 KV MEG 80
Sample DataFrame (df ) = S3 XII C Nitisha 13 APS ASC 90
S4 XII D Manoj 15 APS PRTC 89
S5 XII E Artha 15 APS PUNE 67
S6 XII F ANOOP 20 APS KK 78
del df['Marks'] Class Name Age 2018

print(df)
( deletes the column named Marks from the S3 XII C Nitisha 13 APS ASC
DataFrame df ) S4 XII D Manoj 15 APS PRTC
S6 XII F ANOOP 20 APS KK
s = df.pop('Age') Class Name 2018

S1 XII A vikrant APS BLR
print(df)
S2 XII B Kevin KV MEG
( deletes the column named Age from the S3 XII C Nitisha APS ASC
DataFrame df and stores that series in an S4 XII D Manoj APS PRTC
series object named ‘s’) S5 XII E Artha APS PUNE
S6 XII F ANOOP APS KK
print(s) S1 16
S2 15
( s is a series that contains the deleted column
S3 13
that was deleted by using the pop( ) method in
S4 15
the previous command)
S5 15
Page 17 of 32
S6 20
Name: Age, dtype: object
Class Name Age 2018

Sample DataFrame (df ) = S3 XII C Nitisha 13 APS ASC
df.drop(['Age'], axis=1, inplace=True) Class Name 2018

print(df)
( deletes the column Age from axis 1( column S3 XII C Nitisha APS ASC
axis) and modifies the existing dataframe itself S4 XII D Manoj APS PRTC
because inplace=True ) S5 XII E Artha APS PUNE
S6 XII F ANOOP APS KK
df2 = df.drop( ['Age'], axis=1)

( deletes the column Age from axis 1( column
axis) and does not modify the existing
dataframe itself because inplace=False , rather
the modified dataframe will be stored in df2 )
print("The original dataframe is: \n",df) The original dataframe is :

Class Name Age 2018
print("The modified dataframe is: \n", df2) The modified dataframe is:
Class Name 2018
(df2 is created from the dataframe df drop( )
method)
S3 XII C Nitisha APS ASC
S4 XII D Manoj APS PRTC
S5 XII E Artha APS PUNE
Page 18 of 32
6 XII F ANOOP APS KK
df.drop(['S6'], axis=0, inplace=True) The modified dataframe is :

Class Name Age 2018
print("The modified dataframe is \n:",df)
( the row with label S6 will be removed from S2 XII B Kevin 15 KV MEG
this dataframe df. ) S3 XII C Nitisha 13 APS ASC
Note :
The drop( ) method of the DataFrame is a common method for removing
columns ( axis = 1) and rows ( axis = 0 ) , use the axis parameter as per
requirement .
If multiple rows / columns are to be deleted then the first parameter must contain the
list of row names / column names to be deleted .
Ex : df.drop(['S6', ‟S5‟, ‟S1‟], axis=0, inplace=True)
df.drop(['Age', „Class‟], axis=1, inplace=True)
Q- Given a DataFrame df namely aid that stores the aid by NGOs for
different states:
Modify the DataFrame so that it must not contain the column „Uniform‟ and
row „Odisha‟.
Solution-
import pandas as pd
Andhra = {"Toys":7916 , "Books":6189 , "Uniform":610 , "Shoes":8810}
Odisha = {"Toys":8508 , "Books":8208 , "Uniform":508 , "Shoes":6798}
MP = {"Toys":7226 , "Books":6149 , "Uniform":611 , "Shoes":9611}
UP = {"Toys":7617 , "Books":6157 , "Uniform":457 , "Shoes":6457}
states = [Andhra, Odisha, MP, UP]
df = pd.DataFrame(states, index = ['Andhra', 'Odisha', 'MP', 'UP'])
del df['Uniform']
df.drop(['Odisha'],inplace = True)
print(df)
Output-
Page 19 of 32
#12 . head( ) and tail( ) functions
The head() function is used to retrieve the top rows of a DataFrame whereas the tail()
function is used to retrieve the bottom rows of a DataFrame. If no parameter is passed,
then it retrieves the top 5 or bottom 5 rows.
If a positive value, n, is passed to the head function then it retrieves the top n rows. If a
negative n is passed to the head function, then it returns all the rows except the last n
rows.
Similarly, if a positive value, n, is passed to the tail function then it retrieves the bottom
n rows of the DataFrame. If a negative n is passed to the DataFrame then all the rows
except the first n rows are retrieved back.
These functions are useful for quickly verifying the data for example after sorting or
adding rows.
The modified dataframe is :
Class Name Age 2018
DataFrame , df = S2 XII B Kevin 15 KV MEG
print(df.head(2)) Class Name Age 2018

( will retrieve first 2 rows )
print(df.tail(2)) Class Name Age 2018

( will retrieve last 2 rows )
print(df.head(-2)) Class Name Age 2018

( will retrieve all rows except last 2 rows )
print(df.tail(-2)) Class Name Age 2018

( will retrieve all rows except first2 rows )
Page 20 of 32
#12 . Renaming Rows / Columns in DataFrame :
To change the name of any row / column individually , you can use the
rename()function of DataFrame as per the below given syntax.
<df>. rename(index={<names dictionary>}, columns={<names dictionary>} ,
inplace = True / False )
OR
<df>. rename({<names dictionary>}, axis='index', inplace = True / False )
The index parameter is used to change row labels
The columns parameter is used to change column labels
inplace = True will rename the specified column / row in the existing dataframe
and inplace = False (or not provided ) will make those changes in a new
DataFrame which will be created automatically as per the fiven DataFrame
name.
axis='value' , the „value‟ can be either „index‟ or „columns‟ , if row label is tb

renamed , use „index‟ , if columns labels needs to be renamed , use‟columns‟
import pandas as pd The original dataframe DF:

Name Age
dic={'Name':["Anoop","Priya","Santosh"],'Age':[15,16,17]}
0 Anoop 15
df= pd.DataFrame(dic) 1 Priya 16
print("The original dataframe df: \n",df) 2 Santosh 17
df.rename(index={0:'A',1:'B',2:'C'},columns={'Name':'SName'}, The dataframe DF with

renamed column and row is
inplace=True) :
SName Age
print("The dataframe df with renamed column and row is :
A Anoop 15
\n",df)
B Priya 16
( providing inplace = True will make changes in ‘df’ itself ) C Santosh 17
df1=df.rename(index={'A':'A1','B':'B1','C':'C1'}, columns = The dataframe DF1 with

renamed column and row is
{'SName':'Stud Name','Age':'SAge'}) :
Stud Name SAge
print("The dataframe df1 with renamed column and row is :
A1 Anoop 15
\n",df1)
B1 Priya 16
Page 21 of 32
( Not providing inplace will make changes in a new dataframe C1 Santosh 17
‘df1’ as assigned by the user)
print("The dataframe df is : \n",df ) The dataframe df is :

SName Age
( the renaming that happened in the previous step was reflected
A Anoop 15
to the dataframe named ‘df1’ , df remains intact )
B Priya 16
C Santosh 17
import pandas as pd
dic={'Name':["Anoop","Priya","Santosh"],'Age':[15,16,17]}
df= pd.DataFrame(dic)
print(df)
print()
df.rename({'Name' :"NAM"}, axis ='columns',inplace=True)

print(df)
print()
df.rename({0 :"A1"}, axis ='index',inplace=True)

print(df)
Note : inplace = True will rename the specified column / row in

the existing dataframe itself.
Q – Consider the saleDf shown below:-

Target Sales
zoneA 56000 58000
zoneB 70000 68000
zoneC 75000 78000
zoneC 60000 61000
Modify the saleDf to rename indexes of „zoneC‟ and „zoneD‟ as „Central‟ and
„Dakshin‟ respectively and the column names „Target‟ and „Sales‟ as „Targeted‟
and „Achieved‟ respectively.
Solution-
import pandas as pd
zoneA={'Target':56000, 'Sales':58000}
Page 22 of 32
zoneB={'Target':70000, 'Sales':68000}
zoneC={'Target':75000, 'Sales':78000}
zoneD={'Target':60000, 'Sales':61000}
zones=[ zoneA , zoneB , zoneC , zoneD ]
saleDf = pd.DataFrame(zones , index=['zoneA' , 'zoneB' , 'zoneC' ,'zoneD'] ,
columns=[ 'Target' , 'Sales' ])
saleDf.rename(index={'zoneC' : 'Central' , 'zoneD' : 'Dakshin'}, columns={
'Target' : 'Targeted' , 'Sales' : 'Achieved' } , inplace=True)
print(saleDf)
The above topics of accessing data worked with the

concept of label based indexing
Apart from accessing the data of a DataFrame by

using label index , data can also be accessed by
boolean index, which is discussed below
Boolean Indexing in Data Frame
Instead of selecting data on the basis of row or column labels ( labelled

indexing – discussed above , we can also select the data based on their values
present in the dataframe.
Boolean indexing helps us to select the data from the DataFrames using a
boolean vector. We create a DataFrame with a boolean index to use the boolean
indexing.
Page 23 of 32
The Boolean values True & False and 1 & 0 can be used as indexes in pandas
DataFrame. They can help us filter out the required records.
import pandas as pd
clas = ["XII A", "XII B","XII C","XII
D","XII E"]
name = ["vikrant",
“Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool =["APS BLR","KV MEG","APS
dic= {"Class" : clas , "Name" : name , "Age"
: age , "2018" : oldschool}
df=pd.DataFrame (dic , columns =
["Class","Name","Age","2018"] ,
index=[True,False,True,False,True])
print (df)
print (df.loc[0])
Extracts rows with index ‘False’
print (df.loc[1])
Extracts rows with index ‘True’
print (df.loc[True])
Extracts rows with index ‘True’
print (df.loc[False])
Extracts rows with index ‘False’
Page 24 of 32
print (df.iloc[0])
Extracts rows with integer index ‘0’,
Not suitable in this topic boolean
indexing
print (df.iloc[1])
Extracts rows with integer index ‘1’,
Not suitable in this topic boolean
indexing
print (df.iloc[True]) TypeError: Cannot index by location index with a non-

integer key
APPENDING DATAFRAME:
dataframe.append() function is used to append rows of other dataframe to the end of

the given dataframe, returning a new dataframe object. Columns not in the original
dataframes are added as new columns and the new cells are populated with NaN value.
Page 25 of 32
import pandas as pd
D","XII E"]
name = ["vikrant",
age = [16,15,13,15,15]
oldschool =["APS BLR","KV MEG","APS
dic= { "Class" : clas , "Name" : name , "Age"
: age , "Old_ School" : oldschool }
df1 = pd.DataFrame (dic )
print (df1)
print()
df2 = pd.DataFrame (dic )
print (df2)
print()
df3 = df1.append(df2)
print(df3)
import pandas as pd
D","XII E"]
name = ["vikrant",
age = [16,15,13,15,15]
oldschool =["APS BLR","KV
MEG","APS ASC", "APS PRTC","APS
PUNE"]
dic1= {"Class" : clas , "Name" : name
, "Age" : age }
dic2= {"Class" : clas , "Name" : name
,"Old_ School" : oldschool}
df1=pd.DataFrame (dic1 )
print (df1)
print()
Page 26 of 32
df2=pd.DataFrame (dic2 )
print (df2)
print()
df3=df1.append(df2)
print(df3)
df3=df1.append(df2, ignore_index =
True)
print(df3)
Note : # A continuous index value will be
maintained across the rows in the new
appended data frame.
Page 27 of 32
ITERATING OVER A DATAFRAME
Generally , In a DataFrame if some columns need to be worked on then the columns
are extracted using df[column_name] or any other equivalent method. And if some
processing on rows need to be performed, then the df.loc or df.iloc commands are used.
Page 28 of 32
Some times we need to process all the data values of a dataframe. Writng individual
statements to access /select individual values will makes the program lengthy , to
prevent from writing a huge program , we need to apply the concept of iteration /
looping over a dataframe . The most popular methods used in iteration are “
df.iteritems() or df.items()” and “df.iterrows() method”.
The df.iterrows() method views a dataframe in form of horizontal subsets (row wise )
and df.items() method views a dataframe in form of vertical subsets (column wise ).
Each horizontal subset in the form of ( row index , series ) , where series contains all
column values of that row index .
Each vertical subset in the form of ( column index , series ) , where series contains all
row values of that column index .
Methods :
1. Iterate directly over a DataFrame
2. Use the df.iteritems() or df.items() method
3. Use the df.iterrows() method
4. Use the df.itertuples() method
a) Iterating directly over a DataFrame

Iterating directly over a DataFrame returns the column names.
import pandas as pd name age hobby

s1 abc 19 reading
d={ 'name': ['abc','def','ghi'], 'age': [19,20,21] , s2 def 20 playing
'hobby':['reading' , 'playing ', 'gardening']} s3 ghi 21 gardening
df=pd.DataFrame(d , index=['s1','s2','s3'])
print(df)
print(„Iterating directly over a DataFrame‟) Iterating directly over a DataFrame

for i in df: name
print(i) age
( the for loop retrieves only the column names of the hobby
dataframe)
Page 29 of 32
b) Using the df.iteritems() or df.items() method
Using the df.iteritems() or the df.items() method has the same effect. It returns back two
objects - the first one is the column name and the second one is a Series object having
all the values of that particular column.

d={ 'name': ['abc','def','ghi'] ,'age': [19,20,21] , s1 abc 19 reading
'hobby':['reading', 'playing', 'gardening'] } s2 def 20 playing
df=pd.DataFrame(d, index=['s1','s2','s3']) s3 ghi 21 gardening
print(df)
for cname, cseries in df.items(): cname: name

cseries:
print('cname:',cname)
s1 abc
print('cseries:\n',cseries) s2 def
( # df.iteritems() also gives same results) s3 ghi
Name: name, dtype: object
( cname and cseries are user defined variables, as
cname: age
said in the above definition , cname retrieves the
cseries:
column name one at a time and cseries prints the
s1 19
values under that column name in form of a
s2 20
series.)
s3 21
Name: age, dtype: int64
cname: hobby
cseries:
s1 reading
s2 playing
s3 gardening
Name: hobby, dtype: object
c) Using the df.iterrows() method
Using df.iterrows() method we get back two objects - the first object is the row label or
index and the second object is a Series object containing the elements of one particular
row at each iteration.
Page 30 of 32
The Series object has index as the column name and the value of Series object is the
value under that particular column for that particular row.

s1 abc 19 reading
d={ 'name': ['abc','def','ghi'],'age': [19,20,21], s2 def 20 playing
'hobby':['reading', 'playing', 'gardening']} s3 ghi 21 gardening
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
print('Using iterrows') rname: s1

for rname, rseries in df.iterrows(): rseries:
print('rname:',rname) name abc
age 19
print('rseries:\n',rseries)
hobby reading
Name: s1, dtype: object
( rname and rseries are user defined variables, as
rname: s2
said in the above definition , rname retrieves the
row name one at a time and rseries prints the rseries:
elements under each row in form of a series.) name def
age 20
hobby playing
rname: s3
rseries:
name ghi
age 21
hobby gardening
d) Using the df.itertuples() method
The df.itertuples() method returns a named tuple for each row of the DataFrame.
The first element of the named tuple is the row label and the remaining elements are the
values under different columns for that particular row.
Page 31 of 32
s1 abc 19 reading
d={ 'name': ['abc','def','ghi'],'age': [19,20,21] , s2 def 20 playing
'hobby': ['reading', 'playing', 'gardening' ]} s3 ghi 21 gardening
df=pd.DataFrame(d, index=['s1','s2','s3'])
print(df)
for r in df.itertuples(): Pandas(Index='s1', name='abc', age=19,

hobby='reading')
print(r) Pandas(Index='s2', name='def', age=20,
hobby='playing')
Pandas(Index='s3', name='ghi', age=21,
hobby='gardening')
************* COMPLETED THE ABOVE TOPIC *****************
Note :
1. The above mentioned should be written in the Informatics Note book as a continuity
of the previous notes. ( No Printout Allowed )
2. The entire topic will be discussed in the class.
3. Mistakes / corrections ( if any ) will be rectified during class room discussion.
**********************************************************************
Page 32 of 32

Python 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python 4

Uploaded by

Copyright:

Available Formats

Pdf – PYTHON PANDAS

CLASS : XII 2022 - 23

# creating dataframe from Dictionary of Series

#1. Selecting /Accessing a single column / Slicing single column :

EXAMPLE - PROGRAM OUTPUT

print (df.2018) Error

#2. Selecting / Accessing Multiple columns / Slicing multiple column :

EXAMPLE - PROGRAM OUTPUT

Write program to display the aid for

For accessing multiple rows of a dataframe , [ ] notation can be used.

# creating dataframe from Dictionary of lists

note : rows from index S1 to S3 will be sliced

note : rows from index 1 to 3 will be sliced and

note : rows from starting index to 3 will be

note : rows from index 2 till end of dataframe

note : rows from index 0 to 4 , 4 will not be

Note : don‟t use [ ] for accessing individual rows.

#4. Selecting/Accessing a subset from a DataFrame using ROW/COLUMN

<df>.loc[„row label / row index ,:]

print("The first row is \n" ,df.loc['S1',:]) The first row is

print("The first row is \n" , df.loc['S1'])

The above retrieval of a single row , retrieves

print("The THIRD row is \n" ,df.loc['S3',:]) The THIRD row is

b) To Access multiple rows :

„S3‟,:]) Class Name Age 2018

# Accessing Random Rows

c) To Access selective columns :

d) To Access Range of rows and Range of columns :

,df.loc[„S1‟:‟S2‟ , 'Class' : 'Name'] ) Class Name

Selecting ROWS/COLUMNS from a DataFrame :

EXAMPLE - PROGRAM OUTPUT

The SECOND row is

b) To Access multiple rows :

# Accessing Random Rows

c) To Access selective columns :

d) To Access Range of rows and Range of columns :

,df.iloc[0 : 2 , 0 : 2] ) Class Name

#5. Selecting/Accessing individual value using column name and

(i) Providing Row label or row index in square brackets:

<df>.<column name>[<row name >or <row index]

Column is :\t",df.Age[1]) of Age Column is : 15

Column is :\t",df.Age["S2"]) of Age Column is : 15

(ii) using „at‟ or „iat‟ :

<df>.at[<row label>,<column label>] # a form of [x,y]

<df>.iat[<row index>,<column index>]

print("The value at the S2 - Age is The value at the S2 - Age is : 15

print("The value at the 2,2 is :\t",df.iat[2,2]) The value at the 2,2 is : 13

Difference between at, iat, loc, iloc:

#6. Modifying a single data value :

df.at[“S1”, “Class”] = “XII F” Class Name Age 2018

df.iat[0,0] = "XII A" Class Name Age 2018

df.Age["S1"]= 20 Class Name Age 2018

df.Age[2]= 16 Class Name Age 2018

#7. Adding / Changing a column (same value in all rows)

< df >[„column name‟] = <value>

print(df) ValueError: Length of values (2) does not match length

#8 . Adding / Changing a column (different values in all rows)

< df >[„column name‟] =[ list of elements ]

#9 . Adding / Changing a row (same values in all columns)

df.loc['S6',:] = 'XII G'

df.loc['S6',:] = [ 'XII G', 'anju', 55.0, 89.0, 100.0,

Q- Consider the following dataframe saleDf:

* COMPLETED THE ABOVE TOPIC *****