Professional Documents
Culture Documents
Python 4
Python 4
O
U
T
P
U
T
Page 1 of 32
print (df["Class"])
print (df["2018"])
print (df.Class)
print (df.Name)
Page 2 of 32
print(df[["Class","Age","Address"]]) KeyError: "['Address'] not in
index"
Q- Given a DataFrame namely aid that stores the aid by NGOs for different
states:
Page 3 of 32
#3. Selecting / Accessing Multiple ROWS :
print(df[1:3])
print(df[:3])
print(df[2:])
print(df[0:4:2])
Page 5 of 32
EXAMPLE - PROGRAM OUTPUT
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS ASC",
"APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" : age ,
"2018" : oldschool}
df=pd.DataFrame(dic , index=['S1','S2','S3','S4','S5'] )
print (df)
columns values) Make sure not to miss the 2018 APS BLR
Name: S1, dtype: object
COLON AFTER COMMA.
OR
Page 6 of 32
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" :
age , "2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])
print (df)
print("The first three rows are \n" ,df.loc['S1' : The first three rows are
print(df.loc[['S1','S3']])
print("The first two columns are \n" ,df.loc[: , The first two columns are
Class Name
'Class' : 'Name'] )
S1 XII A vikrant
Note : loc[] works with label based index number S2 XII B Kevin
( Not writing anything before the ,: will retrieve all S3 XII C Nitisha
S4 XII D Manoj
records )
S5 XII E Artha
print("The first two columns and rows are \n" The first two columns and rows are
Page 7 of 32
Q- Given a DataFrame namely aid that stores the aid by NGOs for different
states:
Write a program to display the aid for states “Andhra” and “Odisha” for
Books and Uniform only.
Solution-
import pandas as pd
Andhra = {"Toys":7916 , "Books":6189 , "Uniform":610 , "Shoes":8810}
Odisha = {"Toys":8508 , "Books":8208 , "Uniform":508 , "Shoes":6798}
MP = {"Toys":7226 , "Books":6149 , "Uniform":611 , "Shoes":9611}
UP = {"Toys":7617 , "Books":6157 , "Uniform":457 , "Shoes":6457}
states = [Andhra, Odisha, MP, UP]
aid = pd.DataFrame(states, index = ['Andhra', 'Odisha', 'MP', 'UP'])
print(aid.loc['Andhra' : 'Odisha', 'Books' : 'Uniform'])
Output-
NOTE:- You may also specify distinct row index and column names as lists with
loc.
E.g.
aid.loc[ ['Andhra' , 'Odisha'] , ['Books' , 'Uniform'] ]
Page 8 of 32
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS ASC",
"APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" : age ,
"2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])
print (df)
Note : iloc[] works with integer based index number Name Kevin
Age 15
( Not writing anything after the ,: will retrieve all
2018 KV MEG
columns values)
Name: S2, dtype: object
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age" :
age , "2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])
print (df)
print("The first three rows are \n" ,df.iloc[0:3 ,:]) The first three rows are
Class Name Age 2018
Note : iloc[] works with integer based index
S1 XII A vikrant 16 APS BLR
number, the row at the ending index number will
S2 XII B Kevin 15 KV MEG
Page 9 of 32
not be retrieved. S3 XII C Nitisha 13 APS ASC
print(df.iloc[[0,3]])
print("The first two columns are \n" ,df.iloc[: , 0 The first two columns are
: 2] ) Class Name
S1 XII A vikrant
Note : iloc[] works with integer based index
S2 XII B Kevin
number, the column at the ending index number
S3 XII C Nitisha
will not be retrieved.
S4 XII D Manoj
S5 XII E Artha
print("The first two columns and rows are \n" The first two columns and rows are
print("The columns and rows values are \n" The columns and rows values are
,df.iloc[0:3:2 , 0:3:2] ) Class Age
S1 XII A 16
Note : 0 is the starting index , 3 in the ending index (not
inclusive ) and 2 is the step S3 XII C 13
print("The value at the row number 2 of Age The value at the row number 2
Page 10 of 32
print("The value at the row number 2 of Age The value at the row number 2
:\t",df.at["S2","Age"])
iloc - used to access a group of rows and columns using row index number and
column index number
All the four methods described previously to access individual values of a DataFrame
can be used to also change an individual value of a DataFrame.
import pandas as pd
clas = ["XII A", "XII B","XII C","XII D","XII E"]
name = ["vikrant", "Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool = ["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
Page 11 of 32
dic= {"Class" : clas , "Name" : name , "Age" :
age , "2018" : oldschool}
df=pd.DataFrame(dic ,index=['S1','S2','S3','S4','S5'])
print (df)
Assigning a value to a column will add a new column(if doesn‟t exists) and modifies
the value of the column (if it exists).
Page 12 of 32
# creating dataframe from Dictionary
of Series
import pandas as pd
clas = ["XII A", "XII B","XII
C","XII D","XII E"]
name = ["vikrant",
"Kevin","Nitisha","Manoj","Artha"]
eng = [76,75,73,85,95]
phy = [86,85,53,95,65]
maths = [66,95,63,75,65]
dic ={'Class':clas,'Name':name
,'Eng':eng,'Phy':phy,'Maths':maths}
df=pd.DataFrame(dic
,index=['S1','S2','S3','S4','S5'])
print (df)
df['Phy'] = 70
print(df)
( Note : since ‘Phy’ column was
already existing in the dataframe ,
the value of that column gets updated
with the value 70 for all rows. )
df['Phy']=[51,52] ValueError :
df['Chem'] = 70
print(df)
( Note : creates a new column
‘Chem’ and fills the value 70 for all
rows of the dataframe )
df[„Chem‟] = [70 , 80 ,90,95, 56,82] ValueError: Length of values does not match length of
index
print(df)
( note : giving less / more values will
create error)
df['Total'] =
df['Eng']+df['Phy']+df['Maths']+df['Che
m']
print(df)
( Note : creates a new column ‘Total’
and fills the valueby adding eng ,phy,
maths and chem )
df.loc[:,'Grade']=['a1','a2','a1','b1','b2']
print(df)
[ Alternate method a new column with
loc method ]
Like columns , we can add /change rows to a DataFrame using at or loc attributes
<df>.at[<row label> , :] = <new value >
<df>.loc[<row label> , :] = <new value >
Note :
If there exists a row with the mentioned row label , then the value of the row
gets modified with the specified value else a new row will be created with that
label and gets filled with that value.
Page 14 of 32
df.at['S6',:] = 'XII F'
print(df)
( note : a new row with row label ‘S6’ will be
created with all column values as XII F)
Note :
If there exists a row with the mentioned row label , then the value of the row
gets modified with the specified values else a new row will be created with that
label and gets filled with that values.
Page 15 of 32
Write a program to add a column namely Orders having values 6000, 6700,
6200 and 6000 respectively for the zones A,B,C and D. The program should
also add a new row for a new zone ZoneE.
Solution-
import pandas as pd
zoneA={'Target':56000, 'Sales':58000}
zoneB={'Target':70000, 'Sales':68000}
zoneC={'Target':75000, 'Sales':78000}
zoneD={'Target':60000, 'Sales':61000}
zones=[ zoneA , zoneB , zoneC , zoneD ]
saleDf = pd.DataFrame(zones , index=['zoneA' , 'zoneB' , 'zoneC' ,'zoneD'] ,
columns=[ 'Target' , 'Sales' ])
saleDf['Orders'] = [6000, 6700, 6200, 6000]
saleDf.loc['zoneE', :] = [ 50000 , 45000, 5000]
print(saleDf)
Output:-
print(s) S1 16
S2 15
( s is a series that contains the deleted column
S3 13
that was deleted by using the pop( ) method in
S4 15
the previous command)
S5 15
Page 17 of 32
S6 20
Name: Age, dtype: object
print("The modified dataframe is: \n", df2) The modified dataframe is:
Class Name 2018
(df2 is created from the dataframe df drop( )
S1 XII A vikrant APS BLR
method)
S2 XII B Kevin KV MEG
S3 XII C Nitisha APS ASC
S4 XII D Manoj APS PRTC
S5 XII E Artha APS PUNE
Page 18 of 32
6 XII F ANOOP APS KK
Note :
The drop( ) method of the DataFrame is a common method for removing
columns ( axis = 1) and rows ( axis = 0 ) , use the axis parameter as per
requirement .
If multiple rows / columns are to be deleted then the first parameter must contain the
list of row names / column names to be deleted .
Ex : df.drop(['S6', ‟S5‟, ‟S1‟], axis=0, inplace=True)
df.drop(['Age', „Class‟], axis=1, inplace=True)
Q- Given a DataFrame df namely aid that stores the aid by NGOs for
different states:
Modify the DataFrame so that it must not contain the column „Uniform‟ and
row „Odisha‟.
Solution-
import pandas as pd
Andhra = {"Toys":7916 , "Books":6189 , "Uniform":610 , "Shoes":8810}
Odisha = {"Toys":8508 , "Books":8208 , "Uniform":508 , "Shoes":6798}
MP = {"Toys":7226 , "Books":6149 , "Uniform":611 , "Shoes":9611}
UP = {"Toys":7617 , "Books":6157 , "Uniform":457 , "Shoes":6457}
states = [Andhra, Odisha, MP, UP]
df = pd.DataFrame(states, index = ['Andhra', 'Odisha', 'MP', 'UP'])
del df['Uniform']
df.drop(['Odisha'],inplace = True)
print(df)
Output-
Page 19 of 32
#12 . head( ) and tail( ) functions
The head() function is used to retrieve the top rows of a DataFrame whereas the tail()
function is used to retrieve the bottom rows of a DataFrame. If no parameter is passed,
then it retrieves the top 5 or bottom 5 rows.
If a positive value, n, is passed to the head function then it retrieves the top n rows. If a
negative n is passed to the head function, then it returns all the rows except the last n
rows.
Similarly, if a positive value, n, is passed to the tail function then it retrieves the bottom
n rows of the DataFrame. If a negative n is passed to the DataFrame then all the rows
except the first n rows are retrieved back.
These functions are useful for quickly verifying the data for example after sorting or
adding rows.
The modified dataframe is :
Class Name Age 2018
S1 XII A vikrant 16 APS BLR
DataFrame , df = S2 XII B Kevin 15 KV MEG
S3 XII C Nitisha 13 APS ASC
S4 XII D Manoj 15 APS PRTC
S5 XII E Artha 15 APS PUNE
Page 20 of 32
#12 . Renaming Rows / Columns in DataFrame :
To change the name of any row / column individually , you can use the
rename()function of DataFrame as per the below given syntax.
<df>. rename(index={<names dictionary>}, columns={<names dictionary>} ,
inplace = True / False )
OR
<df>. rename({<names dictionary>}, axis='index', inplace = True / False )
inplace = True will rename the specified column / row in the existing dataframe
and inplace = False (or not provided ) will make those changes in a new
DataFrame which will be created automatically as per the fiven DataFrame
name.
import pandas as pd
dic={'Name':["Anoop","Priya","Santosh"],'Age':[15,16,17]}
df= pd.DataFrame(dic)
print(df)
print()
Boolean indexing helps us to select the data from the DataFrames using a
boolean vector. We create a DataFrame with a boolean index to use the boolean
indexing.
Page 23 of 32
The Boolean values True & False and 1 & 0 can be used as indexes in pandas
DataFrame. They can help us filter out the required records.
import pandas as pd
clas = ["XII A", "XII B","XII C","XII
D","XII E"]
name = ["vikrant",
“Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool =["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= {"Class" : clas , "Name" : name , "Age"
: age , "2018" : oldschool}
df=pd.DataFrame (dic , columns =
["Class","Name","Age","2018"] ,
index=[True,False,True,False,True])
print (df)
print (df.loc[0])
Extracts rows with index ‘False’
print (df.loc[1])
Extracts rows with index ‘True’
print (df.loc[True])
Extracts rows with index ‘True’
print (df.loc[False])
Extracts rows with index ‘False’
Page 24 of 32
print (df.iloc[0])
Extracts rows with integer index ‘0’,
Not suitable in this topic boolean
indexing
print (df.iloc[1])
Extracts rows with integer index ‘1’,
Not suitable in this topic boolean
indexing
APPENDING DATAFRAME:
Page 25 of 32
import pandas as pd
clas = ["XII A", "XII B","XII C","XII
D","XII E"]
name = ["vikrant",
“Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool =["APS BLR","KV MEG","APS
ASC", "APS PRTC","APS PUNE"]
dic= { "Class" : clas , "Name" : name , "Age"
: age , "Old_ School" : oldschool }
df1 = pd.DataFrame (dic )
print (df1)
print()
df2 = pd.DataFrame (dic )
print (df2)
print()
df3 = df1.append(df2)
print(df3)
import pandas as pd
clas = ["XII A", "XII B","XII C","XII
D","XII E"]
name = ["vikrant",
“Kevin","Nitisha","Manoj","Artha"]
age = [16,15,13,15,15]
oldschool =["APS BLR","KV
MEG","APS ASC", "APS PRTC","APS
PUNE"]
dic1= {"Class" : clas , "Name" : name
, "Age" : age }
dic2= {"Class" : clas , "Name" : name
,"Old_ School" : oldschool}
df1=pd.DataFrame (dic1 )
print (df1)
print()
Page 26 of 32
df2=pd.DataFrame (dic2 )
print (df2)
print()
df3=df1.append(df2)
print(df3)
df3=df1.append(df2, ignore_index =
True)
print(df3)
Note : # A continuous index value will be
maintained across the rows in the new
appended data frame.
Page 27 of 32
ITERATING OVER A DATAFRAME
Generally , In a DataFrame if some columns need to be worked on then the columns
are extracted using df[column_name] or any other equivalent method. And if some
processing on rows need to be performed, then the df.loc or df.iloc commands are used.
Page 28 of 32
Some times we need to process all the data values of a dataframe. Writng individual
statements to access /select individual values will makes the program lengthy , to
prevent from writing a huge program , we need to apply the concept of iteration /
looping over a dataframe . The most popular methods used in iteration are “
df.iteritems() or df.items()” and “df.iterrows() method”.
The df.iterrows() method views a dataframe in form of horizontal subsets (row wise )
and df.items() method views a dataframe in form of vertical subsets (column wise ).
Each horizontal subset in the form of ( row index , series ) , where series contains all
column values of that row index .
Each vertical subset in the form of ( column index , series ) , where series contains all
row values of that column index .
Methods :
1. Iterate directly over a DataFrame
2. Use the df.iteritems() or df.items() method
3. Use the df.iterrows() method
4. Use the df.itertuples() method
Page 29 of 32
b) Using the df.iteritems() or df.items() method
Using the df.iteritems() or the df.items() method has the same effect. It returns back two
objects - the first one is the column name and the second one is a Series object having
all the values of that particular column.
Using df.iterrows() method we get back two objects - the first object is the row label or
index and the second object is a Series object containing the elements of one particular
row at each iteration.
Page 30 of 32
The Series object has index as the column name and the value of Series object is the
value under that particular column for that particular row.
df=pd.DataFrame(d,index=['s1','s2','s3'])
print(df)
The df.itertuples() method returns a named tuple for each row of the DataFrame.
The first element of the named tuple is the row label and the remaining elements are the
values under different columns for that particular row.
Page 31 of 32
import pandas as pd name age hobby
s1 abc 19 reading
d={ 'name': ['abc','def','ghi'],'age': [19,20,21] , s2 def 20 playing
df=pd.DataFrame(d, index=['s1','s2','s3'])
print(df)
Note :
1. The above mentioned should be written in the Informatics Note book as a continuity
**********************************************************************
Page 32 of 32