Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Pdf – PYTHON PANDAS

CLASS : XII 2022 - 23


PDF - 4
DATAFRAMES - continuation
DATAFRAME OPERATIONS on ROW and COLS :

# creating dataframe from Dictionary of Series


import pandas as pd
clas = pd.Series(["XII A", "XII B","XII C","XII D","XII E"])
P
name = pd.Series(["vikrant", “Kevin","Nitisha","Manoj","Artha"])
R
age = pd.Series([16,15,13,15,15])
O
oldschool = pd.Series(["APS BLR","KV MEG","APS ASC", "APS
G PRTC","APS PUNE"])
R dic= {"Class" : clas , "Name" : name , "Age" : age , "2018" :
A oldschool}
M df=pd.DataFrame(dic , columns = ["Class","Name","Age","2018"])
print (df)

O
U
T
P
U
T

#1. Selecting /Accessing a single column / Slicing single column :


The square bracket notation df1[“Class”] , df[“2018”] can be used when the column
names are strings('Class') or numbers(2018).
The dot notation can only be used when the column name is a string(df1.Class). Hence
we use the square bracket notation in general for all cases.

EXAMPLE - PROGRAM OUTPUT

Page 1 of 32
print (df["Class"])

print (df["2018"])

print (df.Class)

print (df.Name)

print (df.2018) Error


Reason : dot( . ) will work
only with string column
names

#2. Selecting / Accessing Multiple columns / Slicing multiple column :


To access multiple columns , you can give a list having multiple column names
inside the square brackets with dataframe object.

EXAMPLE - PROGRAM OUTPUT


Class Age
0 XII A 16
print(df[["Class","Age"]])
1 XII B 15
2 XII C 13
3 XII D 15
4 XII E 15

Page 2 of 32
print(df[["Class","Age","Address"]]) KeyError: "['Address'] not in
index"

Q- Given a DataFrame namely aid that stores the aid by NGOs for different
states:

Write program to display the aid for


(i) Books and Uniform only
(ii) Shoes only
import pandas as pd
Andhra = {"Toys":7916,"Books":6189,"Uniform":610,"Shoes":8810} # dict 1
Odisha = {"Toys":8508,"Books":8208,"Uniform":508,"Shoes":6798} # dict 2
MP ={"Toys":7226,"Books":6149,"Uniform":611,"Shoes":9611} # dict 3
UP = {"Toys":7617,"Books":6157,"Uniform":457,"Shoes":6457} # dict 4
States = [Andhra, Odisha, MP, UP] # List of dictionaries
aid = pd.DataFrame(states , index = ['Andhra', 'Odisha', 'MP', 'UP'] )
print(aid)
print("Aid for books and uniform:")
print(aid[['Books','Uniform']])
print("Aid for shoes")
print(aid.Shoes)
Output

Page 3 of 32
#3. Selecting / Accessing Multiple ROWS :

For accessing multiple rows of a dataframe , [ ] notation can be used.

# creating dataframe from Dictionary of lists


import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
P
R name = ["vikrant", “Kevin","Nitisha","Manoj","Artha"]
O age = [16,15,13,15,15]
G dic= {"Class" : clas , "Name" : name , "Age" : age }
R df=pd.DataFrame(dic , columns = ["Class","Name","Age"] ,
A index=['S1','S2','S3','S4','S5'])
M print (df)
O
U
T
P
U
T
Page 4 of 32
EXAMPLE - PROGRAM OUTPUT
print(df['S1':'S3'])

note : rows from index S1 to S3 will be sliced


and generated

print(df[1:3])

note : rows from index 1 to 3 will be sliced and


generated, row at index 3 will not be
generated.

print(df[:3])

note : rows from starting index to 3 will be


sliced and generated, row at index 3 will not
be generated.

print(df[2:])

note : rows from index 2 till end of dataframe


will be sliced and generated

print(df[0:4:2])

note : rows from index 0 to 4 , 4 will not be


considered and step value 2 will be followed.

Note : don‟t use [ ] for accessing individual rows.

#4. Selecting/Accessing a subset from a DataFrame using ROW/COLUMN


NAMES :
To access single row(s) and or a combination of rows and columns, you can use
following syntax to select/access from a database object.
<df>.loc[<startrow> : <endrow> , <startcolumn> : <endcolumn>]
a) To Access a single row :

<df>.loc[„row label / row index ,:]

Page 5 of 32
EXAMPLE - PROGRAM OUTPUT
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]

age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS ASC",
"APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" : age ,
"2018" : oldschool}
df=pd.DataFrame(dic , index=['S1','S2','S3','S4','S5'] )
print (df)

print("The first row is \n" ,df.loc['S1',:]) The first row is


Class XII A
Note : loc[] works with label based index number Name vikrant
( Not writing anything after the ,: will retrieve all Age 16

columns values) Make sure not to miss the 2018 APS BLR
Name: S1, dtype: object
COLON AFTER COMMA.

OR

print("The first row is \n" , df.loc['S1'])

The above retrieval of a single row , retrieves


the output in form of a series object..

print("The THIRD row is \n" ,df.loc['S3',:]) The THIRD row is


Class XII C
( Not writing anything after the ,: will retrieve all Name Nitisha
columns values ) Age 13
2018 APS ASC
Name: S3, dtype: object

b) To Access multiple rows :


<df>.loc[<startrow> : <endrow> , :]

Page 6 of 32
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" :
age , "2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])
print (df)

print("The first three rows are \n" ,df.loc['S1' : The first three rows are

„S3‟,:]) Class Name Age 2018


S1 XII A vikrant 16 APS BLR
Note : loc[] works with label based index number
S2 XII B Kevin 15 KV MEG
S3 XII C Nitisha 13 APS ASC

# Accessing Random Rows

print(df.loc[['S1','S3']])

c) To Access selective columns :


<df>.loc[ : , <startcolumn> : <endcolumn>]

print("The first two columns are \n" ,df.loc[: , The first two columns are
Class Name
'Class' : 'Name'] )
S1 XII A vikrant
Note : loc[] works with label based index number S2 XII B Kevin

( Not writing anything before the ,: will retrieve all S3 XII C Nitisha
S4 XII D Manoj
records )
S5 XII E Artha

d) To Access Range of rows and Range of columns :


<df>.loc[<startrow> : <endrow> , <startcolumn> : <endcolumn>]

print("The first two columns and rows are \n" The first two columns and rows are

,df.loc[„S1‟:‟S2‟ , 'Class' : 'Name'] ) Class Name


S1 XII A vikrant
Note : loc[] works with label based index number
S2 XII B Kevin

Page 7 of 32
Q- Given a DataFrame namely aid that stores the aid by NGOs for different
states:

Write a program to display the aid for states “Andhra” and “Odisha” for
Books and Uniform only.
Solution-
import pandas as pd
Andhra = {"Toys":7916 , "Books":6189 , "Uniform":610 , "Shoes":8810}
Odisha = {"Toys":8508 , "Books":8208 , "Uniform":508 , "Shoes":6798}
MP = {"Toys":7226 , "Books":6149 , "Uniform":611 , "Shoes":9611}
UP = {"Toys":7617 , "Books":6157 , "Uniform":457 , "Shoes":6457}
states = [Andhra, Odisha, MP, UP]
aid = pd.DataFrame(states, index = ['Andhra', 'Odisha', 'MP', 'UP'])
print(aid.loc['Andhra' : 'Odisha', 'Books' : 'Uniform'])
Output-

NOTE:- You may also specify distinct row index and column names as lists with
loc.
E.g.
aid.loc[ ['Andhra' , 'Odisha'] , ['Books' , 'Uniform'] ]

Selecting ROWS/COLUMNS from a DataFrame :


Sometimes your dataframe object does not contain row or column labels or
even you may not remember them. In such cases, you can extract subset from
dataframe using the row and column numeric index/postion, but this time
you will use iloc instead of loc. iloc means integer location.
<df>.iloc[<startrow> : <endrow> , <startcolumn> : <endcolumn>]
a) To Access a single row :

EXAMPLE - PROGRAM OUTPUT

Page 8 of 32
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]

age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS ASC",
"APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" : age ,
"2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])

print (df)

The SECOND row is


print("The SECOND row is \n" ,df.iloc[1,:])
Class XII B

Note : iloc[] works with integer based index number Name Kevin
Age 15
( Not writing anything after the ,: will retrieve all
2018 KV MEG
columns values)
Name: S2, dtype: object

b) To Access multiple rows :


<df>.iloc[<startrow> : <endrow> , :]

import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" :
age , "2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])

print (df)

print("The first three rows are \n" ,df.iloc[0:3 ,:]) The first three rows are
Class Name Age 2018
Note : iloc[] works with integer based index
S1 XII A vikrant 16 APS BLR
number, the row at the ending index number will
S2 XII B Kevin 15 KV MEG

Page 9 of 32
not be retrieved. S3 XII C Nitisha 13 APS ASC

# Accessing Random Rows

print(df.iloc[[0,3]])

c) To Access selective columns :


<df>.iloc[: , <startcolumn> : <endcolumn>]

print("The first two columns are \n" ,df.iloc[: , 0 The first two columns are

: 2] ) Class Name
S1 XII A vikrant
Note : iloc[] works with integer based index
S2 XII B Kevin
number, the column at the ending index number
S3 XII C Nitisha
will not be retrieved.
S4 XII D Manoj
S5 XII E Artha

d) To Access Range of rows and Range of columns :


<df>.iloc[<startrow> : <endrow> , <startcolumn> : <endcolumn>]

print("The first two columns and rows are \n" The first two columns and rows are

,df.iloc[0 : 2 , 0 : 2] ) Class Name


S1 XII A vikrant
Note : iloc[] works with integer based index
S2 XII B Kevin
number, the column row at the ending index
number will not be retrieved.

print("The columns and rows values are \n" The columns and rows values are
,df.iloc[0:3:2 , 0:3:2] ) Class Age
S1 XII A 16
Note : 0 is the starting index , 3 in the ending index (not
inclusive ) and 2 is the step S3 XII C 13

#5. Selecting/Accessing individual value using column name and


row name
To access the individual data value from a dataframe , we have 2 methods

(i) Providing Row label or row index in square brackets:

<df>.<column name>[<row name >or <row index]

print("The value at the row number 2 of Age The value at the row number 2

Column is :\t",df.Age[1]) of Age Column is : 15

Page 10 of 32
print("The value at the row number 2 of Age The value at the row number 2

Column is :\t",df.Age["S2"]) of Age Column is : 15

(ii) using „at‟ or „iat‟ :

<df>.at[<row label>,<column label>] # a form of [x,y]

<df>.iat[<row index>,<column index>]

print("The value at the S2 - Age is The value at the S2 - Age is : 15

:\t",df.at["S2","Age"])

print("The value at the 2,2 is :\t",df.iat[2,2]) The value at the 2,2 is : 13

Difference between at, iat, loc, iloc:


at –used to access a single element of a DataFrame using row index name and column
label name
iat - used to access a single element of a DataFrame using row index number and
column index number
loc – used to access a group of rows and columns using row index name and column
label name

iloc - used to access a group of rows and columns using row index number and
column index number

#6. Modifying a single data value :

All the four methods described previously to access individual values of a DataFrame
can be used to also change an individual value of a DataFrame.
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]

Page 11 of 32
dic= {"Class" : clas , "Name" : name , "Age" :
age , "2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])
print (df)

df.at[“S1”, “Class”] = “XII F” Class Name Age 2018


S1 XII F vikrant 16 APS BLR
print(df)
S2 XII B Kevin 15 KV MEG
( uses row label and column name )
S3 XII C Nitisha 13 APS ASC
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE

df.iat[0,0] = "XII A" Class Name Age 2018


S1 XII A vikrant 16 APS BLR
print(df)
S2 XII B Kevin 15 KV MEG
( uses row index and column index – row index
S3 XII C Nitisha 13 APS ASC
and column index starts with zero by default)
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE

df.Age["S1"]= 20 Class Name Age 2018


S1 XII A vikrant 20 APS BLR
print(df)
S2 XII B Kevin 15 KV MEG
S3 XII C Nitisha 13 APS ASC
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE

df.Age[2]= 16 Class Name Age 2018


S1 XII A vikrant 20 APS BLR
print(df)
S2 XII B Kevin 15 KV MEG
S3 XII C Nitisha 16 APS ASC
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE

#7. Adding / Changing a column (same value in all rows)

Assigning a value to a column will add a new column(if doesn‟t exists) and modifies
the value of the column (if it exists).

< df >[„column name‟] = <value>

Page 12 of 32
# creating dataframe from Dictionary
of Series
import pandas as pd
clas = ["XII A", "XII B","XII
C","XII D","XII E"]
name = ["vikrant",
"Kevin","Nitisha","Manoj","Artha"]
eng = [76,75,73,85,95]
phy = [86,85,53,95,65]
maths = [66,95,63,75,65]
dic ={'Class':clas,'Name':name
,'Eng':eng,'Phy':phy,'Maths':maths}
df=pd.DataFrame(dic
,index=['S1','S2','S3','S4','S5'])
print (df)

df['Phy'] = 70
print(df)
( Note : since ‘Phy’ column was
already existing in the dataframe ,
the value of that column gets updated
with the value 70 for all rows. )

df['Phy']=[51,52] ValueError :

print(df) ValueError: Length of values (2) does not match length


of index (5)

df['Chem'] = 70
print(df)
( Note : creates a new column
‘Chem’ and fills the value 70 for all
rows of the dataframe )

#8 . Adding / Changing a column (different values in all rows)

< df >[„column name‟] =[ list of elements ]


df['Chem'] = [70 , 80 ,90,95, 56]
print(df)
Page 13 of 32
( Note : creates a new column ‘chem’
and fills the values the value as per the
list of elements )

df[„Chem‟] = [70 , 80 ,90,95, 56,82] ValueError: Length of values does not match length of
index
print(df)
( note : giving less / more values will
create error)

df['Total'] =
df['Eng']+df['Phy']+df['Maths']+df['Che
m']
print(df)
( Note : creates a new column ‘Total’
and fills the valueby adding eng ,phy,
maths and chem )

df.loc[:,'Grade']=['a1','a2','a1','b1','b2']
print(df)
[ Alternate method a new column with
loc method ]

#9 . Adding / Changing a row (same values in all columns)

Like columns , we can add /change rows to a DataFrame using at or loc attributes
<df>.at[<row label> , :] = <new value >
<df>.loc[<row label> , :] = <new value >
Note :
If there exists a row with the mentioned row label , then the value of the row
gets modified with the specified value else a new row will be created with that
label and gets filled with that value.

Page 14 of 32
df.at['S6',:] = 'XII F'
print(df)
( note : a new row with row label ‘S6’ will be
created with all column values as XII F)

df.loc['S6',:] = 'XII G'


print(df)
( note : the existing row with row label ‘S6’
is replaced with XIIG in all columns , the
same is the case with at function also)
Note : there is no difference between „at‟ and
„loc‟ methods
#10 . Adding / Changing a row (different values in different columns)

df.loc['S6',:] = [ 'XII G', 'anju', 55.0, 89.0, 100.0,


23.0, 279.0, 'b2']
print(df)
( note : the existing row with row label ‘S6’ is
replaced with list of elements in all columns )
Instead of loc , „at‟ also can be used
( note : giving less / more values will create
error)

Note :
If there exists a row with the mentioned row label , then the value of the row
gets modified with the specified values else a new row will be created with that
label and gets filled with that values.

Q- Consider the following dataframe saleDf:


Target Sales
zoneA 56000 58000
zoneB 70000 68000
zoneC 75000 78000
zoneC 60000 61000

Page 15 of 32
Write a program to add a column namely Orders having values 6000, 6700,
6200 and 6000 respectively for the zones A,B,C and D. The program should
also add a new row for a new zone ZoneE.
Solution-
import pandas as pd
zoneA={'Target':56000, 'Sales':58000}
zoneB={'Target':70000, 'Sales':68000}
zoneC={'Target':75000, 'Sales':78000}
zoneD={'Target':60000, 'Sales':61000}
zones=[ zoneA , zoneB , zoneC , zoneD ]
saleDf = pd.DataFrame(zones , index=['zoneA' , 'zoneB' , 'zoneC' ,'zoneD'] ,
columns=[ 'Target' , 'Sales' ])
saleDf['Orders'] = [6000, 6700, 6200, 6000]
saleDf.loc['zoneE', :] = [ 50000 , 45000, 5000]
print(saleDf)
Output:-

#11. Deleting an existing column / row from a DataFrame

There are three ways of deleting a column from a DataFrame:

a) using the python del method as: del <df>[columnname]


The del method is used to delete a single column from a DataFrame
b) using the dataframe drop() method :
<df>.drop( [row/col labels] , axis = 0 or 1 , inplace = True / False )
The drop method can be used to delete rows (axis=0) or columns(axis=1).
The first parameter is a list containing either the row index names or the column
Index names.
Page 16 of 32
The parameter inplace=True is used to modify the existing dataframe df
itself.
If this parameter is not specified or is mentioned False then the dataframe df is not
modified, instead it returns a new dataframe with the modifications.
c) Using the pop('columnname') method
The pop() method is used to delete a single column from a DataFrame. In addition,
The column that was deleted is returned back as a Series object.

Class Name Age 2018 Marks


S1 XII A vikrant 16 APS BLR 70
S2 XII B Kevin 15 KV MEG 80
Sample DataFrame (df ) = S3 XII C Nitisha 13 APS ASC 90
S4 XII D Manoj 15 APS PRTC 89
S5 XII E Artha 15 APS PUNE 67
S6 XII F ANOOP 20 APS KK 78

del df['Marks'] Class Name Age 2018


S1 XII A vikrant 16 APS BLR
print(df)
S2 XII B Kevin 15 KV MEG
( deletes the column named Marks from the S3 XII C Nitisha 13 APS ASC
DataFrame df ) S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE
S6 XII F ANOOP 20 APS KK

s = df.pop('Age') Class Name 2018


S1 XII A vikrant APS BLR
print(df)
S2 XII B Kevin KV MEG
( deletes the column named Age from the S3 XII C Nitisha APS ASC
DataFrame df and stores that series in an S4 XII D Manoj APS PRTC
series object named ‘s’) S5 XII E Artha APS PUNE
S6 XII F ANOOP APS KK

print(s) S1 16
S2 15
( s is a series that contains the deleted column
S3 13
that was deleted by using the pop( ) method in
S4 15
the previous command)
S5 15

Page 17 of 32
S6 20
Name: Age, dtype: object

Class Name Age 2018


S1 XII A vikrant 16 APS BLR
S2 XII B Kevin 15 KV MEG
Sample DataFrame (df ) = S3 XII C Nitisha 13 APS ASC
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE
S6 XII F ANOOP 20 APS KK

df.drop(['Age'], axis=1, inplace=True) Class Name 2018


S1 XII A vikrant APS BLR
print(df)
S2 XII B Kevin KV MEG
( deletes the column Age from axis 1( column S3 XII C Nitisha APS ASC
axis) and modifies the existing dataframe itself S4 XII D Manoj APS PRTC
because inplace=True ) S5 XII E Artha APS PUNE
S6 XII F ANOOP APS KK

df2 = df.drop( ['Age'], axis=1)


( deletes the column Age from axis 1( column
axis) and does not modify the existing
dataframe itself because inplace=False , rather
the modified dataframe will be stored in df2 )

print("The original dataframe is: \n",df) The original dataframe is :


Class Name Age 2018
S1 XII A vikrant 16 APS BLR
S2 XII B Kevin 15 KV MEG
S3 XII C Nitisha 13 APS ASC
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE
S6 XII F ANOOP 20 APS KK

print("The modified dataframe is: \n", df2) The modified dataframe is:
Class Name 2018
(df2 is created from the dataframe df drop( )
S1 XII A vikrant APS BLR
method)
S2 XII B Kevin KV MEG
S3 XII C Nitisha APS ASC
S4 XII D Manoj APS PRTC
S5 XII E Artha APS PUNE
Page 18 of 32
6 XII F ANOOP APS KK

df.drop(['S6'], axis=0, inplace=True) The modified dataframe is :


Class Name Age 2018
print("The modified dataframe is \n:",df)
S1 XII A vikrant 16 APS BLR
( the row with label S6 will be removed from S2 XII B Kevin 15 KV MEG
this dataframe df. ) S3 XII C Nitisha 13 APS ASC
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE

Note :
The drop( ) method of the DataFrame is a common method for removing
columns ( axis = 1) and rows ( axis = 0 ) , use the axis parameter as per
requirement .
If multiple rows / columns are to be deleted then the first parameter must contain the
list of row names / column names to be deleted .
Ex : df.drop(['S6', ‟S5‟, ‟S1‟], axis=0, inplace=True)
df.drop(['Age', „Class‟], axis=1, inplace=True)
Q- Given a DataFrame df namely aid that stores the aid by NGOs for
different states:

Modify the DataFrame so that it must not contain the column „Uniform‟ and
row „Odisha‟.
Solution-
import pandas as pd
Andhra = {"Toys":7916 , "Books":6189 , "Uniform":610 , "Shoes":8810}
Odisha = {"Toys":8508 , "Books":8208 , "Uniform":508 , "Shoes":6798}
MP = {"Toys":7226 , "Books":6149 , "Uniform":611 , "Shoes":9611}
UP = {"Toys":7617 , "Books":6157 , "Uniform":457 , "Shoes":6457}
states = [Andhra, Odisha, MP, UP]
df = pd.DataFrame(states, index = ['Andhra', 'Odisha', 'MP', 'UP'])
del df['Uniform']
df.drop(['Odisha'],inplace = True)
print(df)
Output-

Page 19 of 32
#12 . head( ) and tail( ) functions
The head() function is used to retrieve the top rows of a DataFrame whereas the tail()
function is used to retrieve the bottom rows of a DataFrame. If no parameter is passed,
then it retrieves the top 5 or bottom 5 rows.
If a positive value, n, is passed to the head function then it retrieves the top n rows. If a
negative n is passed to the head function, then it returns all the rows except the last n
rows.
Similarly, if a positive value, n, is passed to the tail function then it retrieves the bottom
n rows of the DataFrame. If a negative n is passed to the DataFrame then all the rows
except the first n rows are retrieved back.
These functions are useful for quickly verifying the data for example after sorting or
adding rows.
The modified dataframe is :
Class Name Age 2018
S1 XII A vikrant 16 APS BLR
DataFrame , df = S2 XII B Kevin 15 KV MEG
S3 XII C Nitisha 13 APS ASC
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE

print(df.head(2)) Class Name Age 2018


S1 XII A vikrant 16 APS BLR
( will retrieve first 2 rows )
S2 XII B Kevin 15 KV MEG

print(df.tail(2)) Class Name Age 2018


S4 XII D Manoj 15 APS PRTC
( will retrieve last 2 rows )
S5 XII E Artha 15 APS PUNE

print(df.head(-2)) Class Name Age 2018


S1 XII A vikrant 16 APS BLR
( will retrieve all rows except last 2 rows )
S2 XII B Kevin 15 KV MEG
S3 XII C Nitisha 13 APS ASC

print(df.tail(-2)) Class Name Age 2018


S3 XII C Nitisha 13 APS ASC
( will retrieve all rows except first2 rows )
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE

Page 20 of 32
#12 . Renaming Rows / Columns in DataFrame :
To change the name of any row / column individually , you can use the
rename()function of DataFrame as per the below given syntax.
<df>. rename(index={<names dictionary>}, columns={<names dictionary>} ,
inplace = True / False )
OR
<df>. rename({<names dictionary>}, axis='index', inplace = True / False )

The index parameter is used to change row labels

The columns parameter is used to change column labels

inplace = True will rename the specified column / row in the existing dataframe
and inplace = False (or not provided ) will make those changes in a new
DataFrame which will be created automatically as per the fiven DataFrame
name.

axis='value' , the „value‟ can be either „index‟ or „columns‟ , if row label is tb


renamed , use „index‟ , if columns labels needs to be renamed , use‟columns‟

import pandas as pd The original dataframe DF:


Name Age
dic={'Name':["Anoop","Priya","Santosh"],'Age':[15,16,17]}
0 Anoop 15
df= pd.DataFrame(dic) 1 Priya 16
print("The original dataframe df: \n",df) 2 Santosh 17

df.rename(index={0:'A',1:'B',2:'C'},columns={'Name':'SName'}, The dataframe DF with


renamed column and row is
inplace=True) :
SName Age
print("The dataframe df with renamed column and row is :
A Anoop 15
\n",df)
B Priya 16
( providing inplace = True will make changes in ‘df’ itself ) C Santosh 17

df1=df.rename(index={'A':'A1','B':'B1','C':'C1'}, columns = The dataframe DF1 with


renamed column and row is
{'SName':'Stud Name','Age':'SAge'}) :
Stud Name SAge
print("The dataframe df1 with renamed column and row is :
A1 Anoop 15
\n",df1)
B1 Priya 16
Page 21 of 32
( Not providing inplace will make changes in a new dataframe C1 Santosh 17
‘df1’ as assigned by the user)

print("The dataframe df is : \n",df ) The dataframe df is :


SName Age
( the renaming that happened in the previous step was reflected
A Anoop 15
to the dataframe named ‘df1’ , df remains intact )
B Priya 16
C Santosh 17

import pandas as pd
dic={'Name':["Anoop","Priya","Santosh"],'Age':[15,16,17]}
df= pd.DataFrame(dic)
print(df)
print()

df.rename({'Name' :"NAM"}, axis ='columns',inplace=True)


print(df)
print()

df.rename({0 :"A1"}, axis ='index',inplace=True)


print(df)

Note : inplace = True will rename the specified column / row in


the existing dataframe itself.

Q – Consider the saleDf shown below:-


Target Sales
zoneA 56000 58000
zoneB 70000 68000
zoneC 75000 78000
zoneC 60000 61000
Modify the saleDf to rename indexes of „zoneC‟ and „zoneD‟ as „Central‟ and
„Dakshin‟ respectively and the column names „Target‟ and „Sales‟ as „Targeted‟
and „Achieved‟ respectively.
Solution-
import pandas as pd
zoneA={'Target':56000, 'Sales':58000}
Page 22 of 32
zoneB={'Target':70000, 'Sales':68000}
zoneC={'Target':75000, 'Sales':78000}
zoneD={'Target':60000, 'Sales':61000}
zones=[ zoneA , zoneB , zoneC , zoneD ]
saleDf = pd.DataFrame(zones , index=['zoneA' , 'zoneB' , 'zoneC' ,'zoneD'] ,
columns=[ 'Target' , 'Sales' ])
saleDf.rename(index={'zoneC' : 'Central' , 'zoneD' : 'Dakshin'}, columns={
'Target' : 'Targeted' , 'Sales' : 'Achieved' } , inplace=True)
print(saleDf)

The above topics of accessing data worked with the


concept of label based indexing

Apart from accessing the data of a DataFrame by


using label index , data can also be accessed by
boolean index, which is discussed below

Boolean Indexing in Data Frame

Instead of selecting data on the basis of row or column labels ( labelled


indexing – discussed above , we can also select the data based on their values
present in the dataframe.

Boolean indexing helps us to select the data from the DataFrames using a
boolean vector. We create a DataFrame with a boolean index to use the boolean
indexing.

Page 23 of 32
The Boolean values True & False and 1 & 0 can be used as indexes in pandas
DataFrame. They can help us filter out the required records.

import pandas as pd
clas = ["XII A", "XII B","XII C","XII
D","XII E"]
name = ["vikrant",
“Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool =["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age"
: age , "2018" : oldschool}
df=pd.DataFrame (dic , columns =
["Class","Name","Age","2018"] ,
index=[True,False,True,False,True])

print (df)

print (df.loc[0])
Extracts rows with index ‘False’

print (df.loc[1])
Extracts rows with index ‘True’

print (df.loc[True])
Extracts rows with index ‘True’

print (df.loc[False])
Extracts rows with index ‘False’

Page 24 of 32
print (df.iloc[0])
Extracts rows with integer index ‘0’,
Not suitable in this topic boolean
indexing

print (df.iloc[1])
Extracts rows with integer index ‘1’,
Not suitable in this topic boolean
indexing

print (df.iloc[True]) TypeError: Cannot index by location index with a non-


integer key

APPENDING DATAFRAME:

dataframe.append() function is used to append rows of other dataframe to the end of


the given dataframe, returning a new dataframe object. Columns not in the original
dataframes are added as new columns and the new cells are populated with NaN value.

Page 25 of 32
import pandas as pd
clas = ["XII A", "XII B","XII C","XII
D","XII E"]
name = ["vikrant",
“Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool =["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= { "Class" : clas , "Name" : name , "Age"
: age , "Old_ School" : oldschool }
df1 = pd.DataFrame (dic )
print (df1)
print()
df2 = pd.DataFrame (dic )
print (df2)
print()
df3 = df1.append(df2)
print(df3)

import pandas as pd
clas = ["XII A", "XII B","XII C","XII
D","XII E"]
name = ["vikrant",
“Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool =["APS BLR","KV
MEG","APS ASC", "APS PRTC","APS
PUNE"]
dic1= {"Class" : clas , "Name" : name
, "Age" : age }
dic2= {"Class" : clas , "Name" : name
,"Old_ School" : oldschool}
df1=pd.DataFrame (dic1 )
print (df1)
print()

Page 26 of 32
df2=pd.DataFrame (dic2 )
print (df2)
print()
df3=df1.append(df2)
print(df3)

df3=df1.append(df2, ignore_index =
True)
print(df3)
Note : # A continuous index value will be
maintained across the rows in the new
appended data frame.

Page 27 of 32
ITERATING OVER A DATAFRAME
Generally , In a DataFrame if some columns need to be worked on then the columns
are extracted using df[column_name] or any other equivalent method. And if some
processing on rows need to be performed, then the df.loc or df.iloc commands are used.

Page 28 of 32
Some times we need to process all the data values of a dataframe. Writng individual
statements to access /select individual values will makes the program lengthy , to
prevent from writing a huge program , we need to apply the concept of iteration /
looping over a dataframe . The most popular methods used in iteration are “
df.iteritems() or df.items()” and “df.iterrows() method”.
The df.iterrows() method views a dataframe in form of horizontal subsets (row wise )
and df.items() method views a dataframe in form of vertical subsets (column wise ).
Each horizontal subset in the form of ( row index , series ) , where series contains all
column values of that row index .
Each vertical subset in the form of ( column index , series ) , where series contains all
row values of that column index .
Methods :
1. Iterate directly over a DataFrame
2. Use the df.iteritems() or df.items() method
3. Use the df.iterrows() method
4. Use the df.itertuples() method

a) Iterating directly over a DataFrame


Iterating directly over a DataFrame returns the column names.

import pandas as pd name age hobby


s1 abc 19 reading
d={ 'name': ['abc','def','ghi'], 'age': [19,20,21] , s2 def 20 playing
'hobby':['reading' , 'playing ', 'gardening']} s3 ghi 21 gardening
df=pd.DataFrame(d , index=['s1','s2','s3'])
print(df)

print(„Iterating directly over a DataFrame‟) Iterating directly over a DataFrame


for i in df: name
print(i) age
( the for loop retrieves only the column names of the hobby
dataframe)

Page 29 of 32
b) Using the df.iteritems() or df.items() method

Using the df.iteritems() or the df.items() method has the same effect. It returns back two
objects - the first one is the column name and the second one is a Series object having
all the values of that particular column.

import pandas as pd name age hobby


d={ 'name': ['abc','def','ghi'] ,'age': [19,20,21] , s1 abc 19 reading
'hobby':['reading', 'playing', 'gardening'] } s2 def 20 playing
df=pd.DataFrame(d, index=['s1','s2','s3']) s3 ghi 21 gardening
print(df)

for cname, cseries in df.items(): cname: name


cseries:
print('cname:',cname)
s1 abc
print('cseries:\n',cseries) s2 def
( # df.iteritems() also gives same results) s3 ghi
Name: name, dtype: object
( cname and cseries are user defined variables, as
cname: age
said in the above definition , cname retrieves the
cseries:
column name one at a time and cseries prints the
s1 19
values under that column name in form of a
s2 20
series.)
s3 21
Name: age, dtype: int64
cname: hobby
cseries:
s1 reading
s2 playing
s3 gardening
Name: hobby, dtype: object

c) Using the df.iterrows() method

Using df.iterrows() method we get back two objects - the first object is the row label or
index and the second object is a Series object containing the elements of one particular
row at each iteration.

Page 30 of 32
The Series object has index as the column name and the value of Series object is the
value under that particular column for that particular row.

import pandas as pd name age hobby


s1 abc 19 reading
d={ 'name': ['abc','def','ghi'],'age': [19,20,21], s2 def 20 playing

'hobby':['reading', 'playing', 'gardening']} s3 ghi 21 gardening

df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)

print('Using iterrows') rname: s1


for rname, rseries in df.iterrows(): rseries:
print('rname:',rname) name abc
age 19
print('rseries:\n',rseries)
hobby reading
Name: s1, dtype: object
( rname and rseries are user defined variables, as
rname: s2
said in the above definition , rname retrieves the
row name one at a time and rseries prints the rseries:
elements under each row in form of a series.) name def
age 20
hobby playing
Name: s2, dtype: object
rname: s3
rseries:
name ghi
age 21
hobby gardening
Name: s3, dtype: object

d) Using the df.itertuples() method

The df.itertuples() method returns a named tuple for each row of the DataFrame.

The first element of the named tuple is the row label and the remaining elements are the
values under different columns for that particular row.
Page 31 of 32
import pandas as pd name age hobby
s1 abc 19 reading
d={ 'name': ['abc','def','ghi'],'age': [19,20,21] , s2 def 20 playing

'hobby': ['reading', 'playing', 'gardening' ]} s3 ghi 21 gardening

df=pd.DataFrame(d, index=['s1','s2','s3'])
print(df)

for r in df.itertuples(): Pandas(Index='s1', name='abc', age=19,


hobby='reading')
print(r) Pandas(Index='s2', name='def', age=20,
hobby='playing')
Pandas(Index='s3', name='ghi', age=21,
hobby='gardening')

************* COMPLETED THE ABOVE TOPIC *****************

Note :

1. The above mentioned should be written in the Informatics Note book as a continuity

of the previous notes. ( No Printout Allowed )

2. The entire topic will be discussed in the class.

3. Mistakes / corrections ( if any ) will be rectified during class room discussion.

**********************************************************************

Page 32 of 32

You might also like