Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

Prepared by Shankar Wagh

Linkedin Page (https://www.linkedin.com/in/shankar-wagh)


Pandas Full Tutorial

1. Pandas is built on top of numpy


2. It is written in C
3. It is used to represents the 2D data unlike array which can be used to represent the similar
type of data
4. We can give user defined names to rows and columns
5. In Data frame we can represent the data in tabular format

In [2]: import pandas as pd


import numpy as np

Series Creation
In [3]: # Creating series from List
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturd
days_ser = pd.Series(days)
print(days_ser, type(days_ser))

0 Sunday

1 Monday

2 Tuesday

3 Wednesday

4 Thursday

5 Friday

6 Saturday

dtype: object <class 'pandas.core.series.Series'>

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 1/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [515]: print(days_ser[0])

Sunday

In [516]: print(days_ser[len(days_ser)-1])
print(days_ser.shape)
print(days_ser.size)

Saturday

(7,)

In [517]: # Negative indexing Not possible


# print(days_ser[-1]) # ValueError: -1 is not in range

In [518]: # Slicing
print(days_ser[1:4])

1 Monday

2 Tuesday

3 Wednesday

dtype: object

In [519]: # Explicit indexing


days_ser.index = ['day1', 'day2', 'day3', 'day4', 'day5', 'day6', 'day7']

In [520]: days_ser

Out[520]: day1 Sunday

day2 Monday

day3 Tuesday

day4 Wednesday

day5 Thursday

day6 Friday

day7 Saturday

dtype: object

In [521]: print(days_ser['day1'])
print(days_ser['day6'])

Sunday

Friday

In [522]: days_ser['day3':'day5': ]

Out[522]: day3 Tuesday

day4 Wednesday

day5 Thursday

dtype: object

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 2/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [523]: # Reversing series


days_ser[::-1]

Out[523]: day7 Saturday

day6 Friday

day5 Thursday

day4 Wednesday

day3 Tuesday

day2 Monday

day1 Sunday

dtype: object

In [524]: # Passing index parameter


states = ['MH', 'UP', 'MP', 'AP', 'KA', 'TN', 'WB', 'RJ', "DL"]
state_ser = pd.Series(states, index = ['st' + str(i) for i in range(1, len(states
state_ser

Out[524]: st1 MH

st2 UP

st3 MP

st4 AP

st5 KA

st6 TN

st7 WB

st8 RJ

st9 DL

dtype: object

In [525]: state_ser[0]

Out[525]: 'MH'

In [527]: # Slicing
state_ser[3:6:2]

Out[527]: st4 AP

st6 TN

dtype: object

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 3/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [528]: # Passing index parameter with duplicate index


capitals = [ 'MUM','LKO', 'BHP', 'AMT', 'BLR', 'CHN', 'KOL','JP', 'DL']
capitals_ser = pd.Series(capitals, index = list('abcdaedfg'))
capitals_ser

Out[528]: a MUM

b LKO

c BHP

d AMT

a BLR

e CHN

d KOL

f JP

g DL

dtype: object

In [529]: capitals_ser['a']

Out[529]: a MUM

a BLR

dtype: object

In [530]: # capitals_ser['a':'a'] # KeyError: "Cannot get left slice bound for non

In [531]: # Creating a series from a dictionary


d = dict(zip(states, capitals))
pd.Series(d)

Out[531]: MH MUM

UP LKO

MP BHP

AP AMT

KA BLR

TN CHN

WB KOL

RJ JP

DL DL

dtype: object

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 4/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [532]: d = dict(zip(states, capitals))


pd.Series(d, index = list('abcdefghij'))

Out[532]: a NaN

b NaN

c NaN

d NaN

e NaN

f NaN

g NaN

h NaN

i NaN

j NaN

dtype: object

In [533]: d = dict(zip(states, capitals))


state_cap = pd.Series(d, index = ['MH', 'UP', 'MP', 'AP', 'KA', 'TN', 'WB', 'RJ',
state_cap

Out[533]: MH MUM

UP LKO

MP BHP

AP AMT

KA BLR

TN CHN

WB KOL

RJ JP

DL DL

PN NaN

dtype: object

In [534]: # Adding data to Series


state_cap['PN'] = 'Chandigarh'
state_cap['JK'] = 'Kashmir'
state_cap['ZA'] = 'Zarkhand'

In [535]: state_cap

Out[535]: MH MUM

UP LKO

MP BHP

AP AMT

KA BLR

TN CHN

WB KOL

RJ JP

DL DL

PN Chandigarh

JK Kashmir

ZA Zarkhand

dtype: object

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 5/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

DataFrame
In [536]: # Creating dataframe from Dictionary
df_dict = {'Year' : [1990, 1994, 1998, 2002],
'Country' : ['Italy', 'USA', 'France', 'Japan'],
'Winner' : ['Germany', 'Brazil', 'France', 'Brazil'],
'GoalScored' : [115, 141, 171, 161]
}
df_dict = pd.DataFrame(df_dict)
df_dict

Out[536]:
Year Country Winner GoalScored

0 1990 Italy Germany 115

1 1994 USA Brazil 141

2 1998 France France 171

3 2002 Japan Brazil 161

In [537]: print(type(df_dict))

<class 'pandas.core.frame.DataFrame'>

In [538]: # Creating dataframe from List of tuples


df_lotuples = [(2002, 'Japan', 'Brazil', 161),
(2006, 'Germany', 'Italy', 147),
(2010, 'South Africa', 'Spain', 145),
(2014, 'Brazil', 'Germany', 171)
]
pd.DataFrame(df_lotuples, columns = ['Year', 'Country','Winner','GoalScored'])

Out[538]:
Year Country Winner GoalScored

0 2002 Japan Brazil 161

1 2006 Germany Italy 147

2 2010 South Africa Spain 145

3 2014 Brazil Germany 171

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 6/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [539]: # creating dataframe from list of list


df_listoflist = [[2002, 'Japan', 'Brazil', 161],
[2006, 'Germany', 'Italy', 147],
[2010, 'South Africa', 'Spain', 145],
[2014, 'Brazil', 'Germany', 171]
]
pd.DataFrame(df_listoflist, columns = ['Year', 'Country','Winner','GoalScored'])

Out[539]:
Year Country Winner GoalScored

0 2002 Japan Brazil 161

1 2006 Germany Italy 147

2 2010 South Africa Spain 145

3 2014 Brazil Germany 171

In [540]: # Creating dataframe using list of dictionary


df_lodict = [
{'year' : 2002, 'HostCountry' : 'Japan', 'Winner' : 'Brazil'},
{'year' : 2006, 'HostCountry' : 'Germany', 'Winner' : 'Italy'},
{'year' : 2010, 'HostCountry' : 'South Africa', 'Winner' : 'Spain'},
{'year' : 2014, 'HostCountry' : 'Brazil', 'Winner' : 'Germany'},
]
pd.DataFrame(df_lodict)

Out[540]:
year HostCountry Winner

0 2002 Japan Brazil

1 2006 Germany Italy

2 2010 South Africa Spain

3 2014 Brazil Germany

Pandas Level Function

pd.read_csv
Read a comma-separated values (csv) file into DataFrame.

pd.read_csv(
filepath_or_buffer: 'FilePathOrBuffer',
sep=,
delimiter=None,
header='infer',
names=,
index_col=None,
usecols=None,
squeeze=False,
prefix=,
mangle_dupe_cols=True,
dtype:
'DtypeArg | None' = None,
engine=None,
converters=None,
true_values=None,
false_values=None,
skipinitialspace=False,
skiprows=None,
skipfooter=0,
nrows=None,
na_values=None,
keep_default_na=True,
na_filter=True,
verbose=False,
skip_blank_lines=True,
parse_dates=False,
infer_datetime_format=False,
keep_date_col=False,
date_parser=None,
dayfirst=False,
cache_dates=True,
iterator=False,
chunksize=None,
compression='infer',
thousands=None,
decimal: 'str' = '.',
lineterminator=None,
quotechar='"',
quoting=0,

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 7/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

doublequote=True,
escapechar=None,
comment=None,
encoding=None,
encoding_errors: 'str |
None' = 'strict',
dialect=None,
error_bad_lines=None,
warn_bad_lines=None,
on_bad_lines=None,
delim_whitespace=False,
low_memory=True,
memory_map=False,
float_precision=None,
storage_options: 'StorageOptions' = None,
)

In [541]: avocado_data = pd.read_csv('avocado.csv')


avocado_data.head()

Out[541]:
Small Large Total Total
Region Type AveragePrice Date
Bags Bags Bags Volume

2018-03-
0 Atlanta organic 89424.11 207.08 89631.19 190257.38 1.70
25

2018-03-
1 Atlanta conventional 102717.50 153.00 102870.50 202790.74 1.75
18

2018-03-
2 Boston organic 120465.39 18.83 120484.22 236822.98 1.58
11

2018-03-
3 Boston conventional 136877.43 60.60 136938.03 239135.67 1.57
04

2018-02-
4 California organic 66273.89 46.58 66320.47 179041.72 1.82
25

In [542]: avocado_data = pd.read_csv('avocado.csv', usecols=['Region','Type','AveragePrice


avocado_data.head()

Out[542]:
Region Type AveragePrice

0 Atlanta organic 1.70

1 Atlanta conventional 1.75

2 Boston organic 1.58

3 Boston conventional 1.57

4 California organic 1.82

In [544]: avocado_data = pd.read_csv('avocado.csv', usecols=[0,1,6])


avocado_data.head()

Out[544]:
Region Type AveragePrice

0 Atlanta organic 1.70

1 Atlanta conventional 1.75

2 Boston organic 1.58

3 Boston conventional 1.57

4 California organic 1.82

pd.read_excel
localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 8/70
5/11/22, 9:50 PM
p _ Pandas Full Tutorial - Jupyter Notebook

Read an Excel file into a pandas DataFrame.

In [3]: # read excel data


pd.read_excel('football_worldcup.xlsx')

Out[3]:
Year Country Winner Runners-Up GoalsScored MatchesPlayed

0 1990 Italy Germany Argentina 115 52

1 1994 USA Brazil Italy 141 52

2 1998 France France Brazil 171 64

3 2002 Japan Brazil Germany 161 64

4 2006 Germany Italy France 147 64

5 2010 South Africa Spain Netherlands 145 64

6 2014 Brazil Germany Argentina 171 64

pd.read_clipboard
Read text from clipboard and pass to read_csv.

In [546]: # copy above data using mouse cursor


pd.read_clipboard(header=None)

Out[546]:
0 1 2 3 4 5 6

0 0 1990 Italy Germany Argentina 115 52

1 1 1994 USA Brazil Italy 141 52

2 2 1998 France France Brazil 171 64

3 3 2002 Japan Brazil Germany 161 64

4 4 2006 Germany Italy France 147 64

5 5 2010 South Africa Spain Netherlands 145 64

6 6 2014 Brazil Germany Argentina 171 64

pd.get_dummies
Convert categorical variable into dummy/indicator variables.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 9/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [547]: avocado_data.head()

Out[547]:
Region Type AveragePrice

0 Atlanta organic 1.70

1 Atlanta conventional 1.75

2 Boston organic 1.58

3 Boston conventional 1.57

4 California organic 1.82

In [548]: # One Hot Encoding using get_dummies


pd.get_dummies(avocado_data)

Out[548]:
AveragePrice Region_Atlanta Region_Boston Region_California Region_NewYork Region_SanFr

0 1.70 1 0 0 0

1 1.75 1 0 0 0

2 1.58 0 1 0 0

3 1.57 0 1 0 0

4 1.82 0 0 1 0

5 1.01 0 0 1 0

6 1.38 0 0 0 1

7 1.29 0 0 0 1

8 1.16 0 0 0 0

9 1.17 0 0 0 0

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 10/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [550]: # One Hot Encoding using get_dummies


# for removing dummy variable trap use drop_first=True
pd.get_dummies(avocado_data, drop_first=True)

Out[550]:
AveragePrice Region_Boston Region_California Region_NewYork Region_SanFrancisco Type_o

0 1.70 0 0 0 0

1 1.75 0 0 0 0

2 1.58 1 0 0 0

3 1.57 1 0 0 0

4 1.82 0 1 0 0

5 1.01 0 1 0 0

6 1.38 0 0 1 0

7 1.29 0 0 1 0

8 1.16 0 0 0 1

9 1.17 0 0 0 1

pd.to_datetime
Convert argument to datetime.

In [552]: daywise= pd.read_csv('daywise.csv', usecols=[0,1,2,3])


daywise.head()

Out[552]:
Date Confirmed Deaths Recovered

0 1/22/20 555 17 28

1 1/23/20 654 18 30

2 1/24/20 941 26 36

3 1/25/20 1434 42 39

4 1/26/20 2118 56 52

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 11/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [554]: # for checking datatype of Date column


daywise.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 264 entries, 0 to 263

Data columns (total 4 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Date 264 non-null object

1 Confirmed 264 non-null int64

2 Deaths 264 non-null int64

3 Recovered 264 non-null int64

dtypes: int64(3), object(1)

memory usage: 8.4+ KB

In [555]: # Converting Object to datetime using to_datetime


pd.to_datetime(daywise['Date'])

Out[555]: 0 2020-01-22

1 2020-01-23

2 2020-01-24

3 2020-01-25

4 2020-01-26

...

259 2020-09-05

260 2020-09-06

261 2020-09-07

262 2020-09-08

263 2020-09-09

Name: Date, Length: 264, dtype: datetime64[ns]

pd.to_numeric
Convert argument to a numeric type.

In [556]: pd.to_numeric(daywise['Deaths'])

Out[556]: 0 17

1 18

2 26

3 42

4 56

...

259 879645

260 883414

261 892726

262 897463

263 903759

Name: Deaths, Length: 264, dtype: int64

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 12/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [ ]: ​

pd.unique
Uniques are returned in order
of appearance. This does NOT sort.

In [567]: avocado_data.head()

Out[567]:
Region Type AveragePrice

0 Atlanta organic 1.70

1 Atlanta conventional 1.75

2 Boston organic 1.58

3 Boston conventional 1.57

4 California organic 1.82

In [565]: pd.unique(avocado_data['Region'])

Out[565]: array(['Atlanta', 'Boston', 'California', 'NewYork', 'SanFrancisco'],

dtype=object)

pd.value_counts
Value counts of unique data

In [566]: pd.value_counts(avocado_data['Region'])

Out[566]: Atlanta 2

Boston 2

California 2

NewYork 2

SanFrancisco 2

Name: Region, dtype: int64

pd.factorize
Encode the object as an enumerated type or categorical variable.

In [570]: codes, uniques = pd.factorize(avocado_data['Region'])

In [571]: codes

Out[571]: array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype=int64)

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 13/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [572]: uniques

Out[572]: Index(['Atlanta', 'Boston', 'California', 'NewYork', 'SanFrancisco'], dtype='ob


ject')

DataFrame Level Function

df.abs
Return a Series/DataFrame with absolute numeric value of each element.

In [573]: avocado_data = pd.read_csv('avocado.csv')


avocado_data.head()

Out[573]:
Small Large Total Total
Region Type AveragePrice Date
Bags Bags Bags Volume

2018-03-
0 Atlanta organic 89424.11 207.08 89631.19 190257.38 1.70
25

2018-03-
1 Atlanta conventional 102717.50 153.00 102870.50 202790.74 1.75
18

2018-03-
2 Boston organic 120465.39 18.83 120484.22 236822.98 1.58
11

2018-03-
3 Boston conventional 136877.43 60.60 136938.03 239135.67 1.57
04

2018-02-
4 California organic 66273.89 46.58 66320.47 179041.72 1.82
25

In [574]: avocado_data.at[0, 'Large Bags'] = -340.8

In [575]: avocado_data.head()

Out[575]:
Small Large Total Total
Region Type AveragePrice Date
Bags Bags Bags Volume

2018-03-
0 Atlanta organic 89424.11 -340.80 89631.19 190257.38 1.70
25

2018-03-
1 Atlanta conventional 102717.50 153.00 102870.50 202790.74 1.75
18

2018-03-
2 Boston organic 120465.39 18.83 120484.22 236822.98 1.58
11

2018-03-
3 Boston conventional 136877.43 60.60 136938.03 239135.67 1.57
04

2018-02-
4 California organic 66273.89 46.58 66320.47 179041.72 1.82
25

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 14/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [576]: avocado_data['Large Bags'] = avocado_data['Large Bags'].abs()

In [578]: avocado_data.head()

Out[578]:
Small Large Total Total
Region Type AveragePrice Date
Bags Bags Bags Volume

2018-03-
0 Atlanta organic 89424.11 340.80 89631.19 190257.38 1.70
25

2018-03-
1 Atlanta conventional 102717.50 153.00 102870.50 202790.74 1.75
18

2018-03-
2 Boston organic 120465.39 18.83 120484.22 236822.98 1.58
11

2018-03-
3 Boston conventional 136877.43 60.60 136938.03 239135.67 1.57
04

2018-02-
4 California organic 66273.89 46.58 66320.47 179041.72 1.82
25

df.add
Get Addition of dataframe and other, element-wise (binary operator add ).

In [579]: avocado_data[['Large Bags','Total Bags','AveragePrice']].add(1)

Out[579]:
Large Bags Total Bags AveragePrice

0 341.80 89632.19 2.70

1 154.00 102871.50 2.75

2 19.83 120485.22 2.58

3 61.60 136939.03 2.57

4 47.58 66321.47 2.82

5 187.20 106985.89 2.01

6 93.29 124215.59 2.38

7 197.57 197282.89 2.29

8 1287.43 236418.93 2.16

9 610.20 166837.16 2.17

Among flexible wrappers ( add , sub , mul , div , mod , pow ) to


arithmetic operators: + , - , * , / , // , % , ** .

df.add_prefix
localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 15/70
5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

For DataFrame, the column labels are prefixed.

In [581]: avocado_data.add_prefix('New_')

Out[581]:
New_Small New_Large New_Total New_Total
New_Region New_Type New_AveragePrice N
Bags Bags Bags Volume

0 Atlanta organic 89424.11 340.80 89631.19 190257.38 1.70

1 Atlanta conventional 102717.50 153.00 102870.50 202790.74 1.75

2 Boston organic 120465.39 18.83 120484.22 236822.98 1.58

3 Boston conventional 136877.43 60.60 136938.03 239135.67 1.57

4 California organic 66273.89 46.58 66320.47 179041.72 1.82

5 California conventional 103033.73 186.20 106984.89 1203274.11 1.01

6 NewYork organic 119694.95 92.29 124214.59 777300.99 1.38

7 NewYork conventional 193813.92 196.57 197281.89 904333.98 1.29

8 SanFrancisco organic 231913.11 1286.43 236417.93 1051308.50 1.16

9 SanFrancisco conventional 162913.33 609.20 166836.16 984000.13 1.17

pd.add_suffix
For DataFrame, the column labels are suffixed.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 16/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [582]: avocado_data.add_suffix('_New')

Out[582]:
Small Large Total Total
Region_New Type_New AveragePrice_New D
Bags_New Bags_New Bags_New Volume_New

0 Atlanta organic 89424.11 340.80 89631.19 190257.38 1.70

1 Atlanta conventional 102717.50 153.00 102870.50 202790.74 1.75

2 Boston organic 120465.39 18.83 120484.22 236822.98 1.58

3 Boston conventional 136877.43 60.60 136938.03 239135.67 1.57

4 California organic 66273.89 46.58 66320.47 179041.72 1.82

5 California conventional 103033.73 186.20 106984.89 1203274.11 1.01

6 NewYork organic 119694.95 92.29 124214.59 777300.99 1.38

7 NewYork conventional 193813.92 196.57 197281.89 904333.98 1.29

8 SanFrancisco organic 231913.11 1286.43 236417.93 1051308.50 1.16

9 SanFrancisco conventional 162913.33 609.20 166836.16 984000.13 1.17

df.agg
Aggregate using one or more operations over the specified axis.

In [583]: avocado_data_num = avocado_data[['Small Bags','Large Bags','Total Bags','Total Vo


avocado_data_num.agg(['sum','max', 'min'])

Out[583]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

sum 1327127.36 2990.50 1347979.87 5968266.20 14.43

max 231913.11 1286.43 236417.93 1203274.11 1.82

min 66273.89 18.83 66320.47 179041.72 1.01

df.aggregate
localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 17/70
5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

Aggregate using one or more operations over the specified axis.

In [587]: avocado_data_num.aggregate(['sum','max', 'min'])

Out[587]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

sum 1327127.36 2990.50 1347979.87 5968266.20 14.43

max 231913.11 1286.43 236417.93 1203274.11 1.82

min 66273.89 18.83 66320.47 179041.72 1.01

df.all
Return whether all elements are True, potentially over an axis.

In [588]: avocado_data_num.all()

Out[588]: Small Bags True

Large Bags True

Total Bags True

Total Volume True

AveragePrice True

dtype: bool

df.any
Return whether any element is True, potentially over an axis.

In [589]: avocado_data_num.any()

Out[589]: Small Bags True

Large Bags True

Total Bags True

Total Volume True

AveragePrice True

dtype: bool

df.append

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 18/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [590]: avocado_data_num

Out[590]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 340.80 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

5 103033.73 186.20 106984.89 1203274.11 1.01

6 119694.95 92.29 124214.59 777300.99 1.38

7 193813.92 196.57 197281.89 904333.98 1.29

8 231913.11 1286.43 236417.93 1051308.50 1.16

9 162913.33 609.20 166836.16 984000.13 1.17

In [591]: new_data = pd.DataFrame([[10,11,12,13,14]],


columns=['Small Bags','Large Bags','Total Bags','Total Volume

In [592]: new_data

Out[592]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 10 11 12 13 14

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 19/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [593]: avocado_data_num.append(new_data)

Out[593]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 340.80 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

5 103033.73 186.20 106984.89 1203274.11 1.01

6 119694.95 92.29 124214.59 777300.99 1.38

7 193813.92 196.57 197281.89 904333.98 1.29

8 231913.11 1286.43 236417.93 1051308.50 1.16

9 162913.33 609.20 166836.16 984000.13 1.17

0 10.00 11.00 12.00 13.00 14.00

In [594]: avocado_data_num.append(new_data, ignore_index=True)

Out[594]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 340.80 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

5 103033.73 186.20 106984.89 1203274.11 1.01

6 119694.95 92.29 124214.59 777300.99 1.38

7 193813.92 196.57 197281.89 904333.98 1.29

8 231913.11 1286.43 236417.93 1051308.50 1.16

9 162913.33 609.20 166836.16 984000.13 1.17

10 10.00 11.00 12.00 13.00 14.00

df.apply
Apply a function along an axis of the DataFrame.

In [595]: import numpy as np

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 20/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [596]: avocado_data_num.apply(func = np.sqrt)

Out[596]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 299.038643 18.460769 299.384686 436.185030 1.303840

1 320.495710 12.369317 320.734314 450.322929 1.322876

2 347.081244 4.339355 347.108369 486.644614 1.256981

3 369.969499 7.784600 370.051388 489.015000 1.252996

4 257.437157 6.824954 257.527610 423.133218 1.349074

5 320.988676 13.645512 327.085448 1096.938517 1.004988

6 345.969580 9.606768 352.440903 881.646749 1.174734

7 440.243024 14.020342 444.164260 950.964763 1.135782

8 481.573577 35.866837 486.228269 1025.333360 1.077033

9 403.625235 24.681977 408.455824 991.967807 1.081665

In [597]: avocado_data_num.apply(func = lambda x:x*2)

Out[597]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 178848.22 681.60 179262.38 380514.76 3.40

1 205435.00 306.00 205741.00 405581.48 3.50

2 240930.78 37.66 240968.44 473645.96 3.16

3 273754.86 121.20 273876.06 478271.34 3.14

4 132547.78 93.16 132640.94 358083.44 3.64

5 206067.46 372.40 213969.78 2406548.22 2.02

6 239389.90 184.58 248429.18 1554601.98 2.76

7 387627.84 393.14 394563.78 1808667.96 2.58

8 463826.22 2572.86 472835.86 2102617.00 2.32

9 325826.66 1218.40 333672.32 1968000.26 2.34

df.applymap
Apply a function to a Dataframe elementwise.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 21/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [598]: avocado_data_num.applymap(func = np.sqrt)

Out[598]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 299.038643 18.460769 299.384686 436.185030 1.303840

1 320.495710 12.369317 320.734314 450.322929 1.322876

2 347.081244 4.339355 347.108369 486.644614 1.256981

3 369.969499 7.784600 370.051388 489.015000 1.252996

4 257.437157 6.824954 257.527610 423.133218 1.349074

5 320.988676 13.645512 327.085448 1096.938517 1.004988

6 345.969580 9.606768 352.440903 881.646749 1.174734

7 440.243024 14.020342 444.164260 950.964763 1.135782

8 481.573577 35.866837 486.228269 1025.333360 1.077033

9 403.625235 24.681977 408.455824 991.967807 1.081665

df.astype
Cast a pandas object to a specified dtype dtype .

In [599]: avocado_data_num.astype('int64')

Out[599]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424 340 89631 190257 1

1 102717 153 102870 202790 1

2 120465 18 120484 236822 1

3 136877 60 136938 239135 1

4 66273 46 66320 179041 1

5 103033 186 106984 1203274 1

6 119694 92 124214 777300 1

7 193813 196 197281 904333 1

8 231913 1286 236417 1051308 1

9 162913 609 166836 984000 1

df.at
Access a single value for a row/column label pair.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 22/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [600]: avocado_data_num = avocado_data[['Small Bags','Large Bags','Total Bags','Total Vo

In [601]: avocado_data_num

Out[601]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 340.80 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

5 103033.73 186.20 106984.89 1203274.11 1.01

6 119694.95 92.29 124214.59 777300.99 1.38

7 193813.92 196.57 197281.89 904333.98 1.29

8 231913.11 1286.43 236417.93 1051308.50 1.16

9 162913.33 609.20 166836.16 984000.13 1.17

In [602]: avocado_data_num.at[0, 'Large Bags']

Out[602]: 340.8

In [603]: avocado_data_num.at[0, 'Large Bags'] = 100

In [604]: avocado_data_num.head(3)

Out[604]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 100.00 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

df.iat
Access a single value for a row/column pair by integer position.

In [605]: avocado_data_num.iat[0, 1]

Out[605]: 100.0

df.boxplot
localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 23/70
5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [606]: avocado_data_num.boxplot()

Out[606]: <matplotlib.axes._subplots.AxesSubplot at 0x1db6dcc7390>

In [607]: # AttributeError: 'Series' object has no attribute 'boxplot'


# avocado_data_num['Small Bags'].boxplot()

df.columns
Gives columns of Dataframe

In [610]: avocado_data_num.columns

Out[610]: Index(['Small Bags', 'Large Bags', 'Total Bags', 'Total Volume',

'AveragePrice'],

dtype='object')

df.corr
Compute pairwise correlation of columns, excluding NA/null values.

In [611]: avocado_data_num.corr()

Out[611]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

Small Bags 1.000000 0.780559 0.999473 0.601784 -0.617340

Large Bags 0.780559 1.000000 0.784400 0.576798 -0.550959

Total Bags 0.999473 0.784400 1.000000 0.625409 -0.637999

Total Volume 0.601784 0.576798 0.625409 1.000000 -0.970196

AveragePrice -0.617340 -0.550959 -0.637999 -0.970196 1.000000

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 24/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [612]: import seaborn as sns


sns.heatmap(avocado_data_num.corr(), annot=True)

Out[612]: <matplotlib.axes._subplots.AxesSubplot at 0x1db6de18d30>

df.count
Count non-NA cells for each column or row.
If 0 or 'index' counts are generated for each column.
If
1 or 'columns' counts are generated for each row.

In [613]: avocado_data_num

Out[613]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 100.00 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

5 103033.73 186.20 106984.89 1203274.11 1.01

6 119694.95 92.29 124214.59 777300.99 1.38

7 193813.92 196.57 197281.89 904333.98 1.29

8 231913.11 1286.43 236417.93 1051308.50 1.16

9 162913.33 609.20 166836.16 984000.13 1.17

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 25/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [614]: # Count non-NA cells for each column --> axis = 0


avocado_data_num.count(axis = 0)

Out[614]: Small Bags 10

Large Bags 10

Total Bags 10

Total Volume 10

AveragePrice 10

dtype: int64

In [615]: # Count non-NA cells for each row --> axis = 1


avocado_data_num.count(axis = 1)

Out[615]: 0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 5

dtype: int64

df.cov
Compute pairwise covariance of columns, excluding NA/null values.

In [616]: avocado_data_num.cov()

Out[616]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

Small Bags 2.543504e+09 1.547827e+07 2.608450e+09 1.281268e+10 -8725.105398

Large Bags 1.547827e+07 1.545972e+05 1.596005e+07 9.574319e+07 -60.708644

Total Bags 2.608450e+09 1.596005e+07 2.677879e+09 1.366291e+10 -9252.211901

Total Volume 1.281268e+10 9.574319e+07 1.366291e+10 1.782244e+11 -114781.831256

AveragePrice -8.725105e+03 -6.070864e+01 -9.252212e+03 -1.147818e+05 0.078534

df.cummax
Return cumulative sum over a DataFrame or Series axis.

To iterate over columns and find the sum in each row,


use axis=1

To iterate over rows and find the sum in each column,


use axis=0

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 26/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [618]: avocado_data_num

Out[618]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 100.00 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

5 103033.73 186.20 106984.89 1203274.11 1.01

6 119694.95 92.29 124214.59 777300.99 1.38

7 193813.92 196.57 197281.89 904333.98 1.29

8 231913.11 1286.43 236417.93 1051308.50 1.16

9 162913.33 609.20 166836.16 984000.13 1.17

In [619]: # Calculate cumulative sum wrt column


avocado_data_num.cumsum(axis=0)

Out[619]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 100.00 89631.19 190257.38 1.70

1 192141.61 253.00 192501.69 393048.12 3.45

2 312607.00 271.83 312985.91 629871.10 5.03

3 449484.43 332.43 449923.94 869006.77 6.60

4 515758.32 379.01 516244.41 1048048.49 8.42

5 618792.05 565.21 623229.30 2251322.60 9.43

6 738487.00 657.50 747443.89 3028623.59 10.81

7 932300.92 854.07 944725.78 3932957.57 12.10

8 1164214.03 2140.50 1181143.71 4984266.07 13.26

9 1327127.36 2749.70 1347979.87 5968266.20 14.43

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 27/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [620]: # Calculate cumulative sum wrt row


avocado_data_num.cumsum(axis=1)

Out[620]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 89524.11 179155.30 369412.68 369414.38

1 102717.50 102870.50 205741.00 408531.74 408533.49

2 120465.39 120484.22 240968.44 477791.42 477793.00

3 136877.43 136938.03 273876.06 513011.73 513013.30

4 66273.89 66320.47 132640.94 311682.66 311684.48

5 103033.73 103219.93 210204.82 1413478.93 1413479.94

6 119694.95 119787.24 244001.83 1021302.82 1021304.20

7 193813.92 194010.49 391292.38 1295626.36 1295627.65

8 231913.11 233199.54 469617.47 1520925.97 1520927.13

9 162913.33 163522.53 330358.69 1314358.82 1314359.99

df.cummin
Return cumulative minimum over a DataFrame or Series axis.

In [621]: avocado_data_num.cummin()

Out[621]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 100.00 89631.19 190257.38 1.70

1 89424.11 100.00 89631.19 190257.38 1.70

2 89424.11 18.83 89631.19 190257.38 1.58

3 89424.11 18.83 89631.19 190257.38 1.57

4 66273.89 18.83 66320.47 179041.72 1.57

5 66273.89 18.83 66320.47 179041.72 1.01

6 66273.89 18.83 66320.47 179041.72 1.01

7 66273.89 18.83 66320.47 179041.72 1.01

8 66273.89 18.83 66320.47 179041.72 1.01

9 66273.89 18.83 66320.47 179041.72 1.01

df.describe
Generate descriptive statistics.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 28/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

Descriptive statistics include those that summarize the central


tendency, dispersion and shape of a
dataset's distribution, excluding NaN values.

In [622]: avocado_data_num.describe()

Out[622]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

count 10.00000 10.00000 10.000000 1.000000e+01 10.00000

mean 132712.73600 274.97000 134797.987000 5.968266e+05 1.44300

std 50433.16001 393.18854 51748.226118 4.221664e+05 0.28024

min 66273.89000 18.83000 66320.470000 1.790417e+05 1.01000

25% 102796.55750 68.52250 103899.097500 2.112988e+05 1.20000

50% 120080.17000 126.50000 122349.405000 5.082183e+05 1.47500

75% 156404.35500 193.97750 159361.627500 9.640836e+05 1.67000

max 231913.11000 1286.43000 236417.930000 1.203274e+06 1.82000

In [626]: avocado_data_num.describe(percentiles=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9])

Out[626]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

count 10.00000 10.00000 10.000000 1.000000e+01 10.00000

mean 132712.73600 274.97000 134797.987000 5.968266e+05 1.44300

std 50433.16001 393.18854 51748.226118 4.221664e+05 0.28024

min 66273.89000 18.83000 66320.470000 1.790417e+05 1.01000

10% 87109.08800 43.80500 87300.118000 1.891358e+05 1.14500

20% 100058.82200 57.79600 100222.638000 2.002841e+05 1.16800

30% 102938.86100 82.78300 105750.573000 2.266133e+05 1.25400

40% 113030.46200 96.91600 115084.488000 2.382106e+05 1.34400

50% 120080.17000 126.50000 122349.405000 5.082183e+05 1.47500

60% 127030.20600 166.28000 129303.966000 8.281142e+05 1.57400

70% 144688.20000 189.31100 145907.469000 9.282338e+05 1.61600

80% 169093.44800 279.09600 172925.306000 9.974618e+05 1.71000

90% 197623.83900 676.92300 201195.494000 1.066505e+06 1.75700

max 231913.11000 1286.43000 236417.930000 1.203274e+06 1.82000

df.drop
Drop specified labels from rows or columns.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 29/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [627]: # For simplicity take only 5 rows using nrow parameter


citibike_tripdata = pd.read_csv('citibike_tripdata.csv', nrows=5)

In [628]: citibike_tripdata

Out[628]:
start start end end
tripduration starttime stoptime station station station station bikeid name_localizedValu
id name id name

2018-05- 2018-05-
Newport
0 338 01 01 3639 Harborside 3199 33558 Annual Membersh
Pkwy
00:04:47 00:10:25

2018-05- 2018-05-
1 1482 01 01 3681 Grand St 3185 City Hall 33593 24 Ho
01:31:10 01:55:53

2018-05- 2018-05- FREE Bonus Mon


McGinley Lincoln
2 232 01 01 3194 3193 29217 with Annu
Square Park
01:31:29 01:35:22 Membersh

2018-05- 2018-05-
Grove
3 190 01 01 3185 City Hall 3186 29662 24 Ho
St PATH
02:03:29 02:06:40

2018-05- 2018-05-
Oakland
4 303 01 01 3207 3195 Sip Ave 15271 Annual Membersh
Ave
04:27:12 04:32:16

In [629]: citibike_tripdata.drop('usertype', axis=1)

Out[629]:
start start end end
tripduration starttime stoptime station station station station bikeid name_localizedValu
id name id name

2018-05- 2018-05-
Newport
0 338 01 01 3639 Harborside 3199 33558 Annual Membersh
Pkwy
00:04:47 00:10:25

2018-05- 2018-05-
1 1482 01 01 3681 Grand St 3185 City Hall 33593 24 Ho
01:31:10 01:55:53

2018-05- 2018-05- FREE Bonus Mon


McGinley Lincoln
2 232 01 01 3194 3193 29217 with Annu
Square Park
01:31:29 01:35:22 Membersh

2018-05- 2018-05-
Grove
3 190 01 01 3185 City Hall 3186 29662 24 Ho
St PATH
02:03:29 02:06:40

2018-05- 2018-05-
Oakland
4 303 01 01 3207 3195 Sip Ave 15271 Annual Membersh
Ave
04:27:12 04:32:16

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 30/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [630]: # TypeError: drop() got multiple values for argument 'axis'


# citibike_tripdata.drop('usertype','name_localizedValue','bikeid', axis=1)

In [631]: # If we have to drop multiple columns then pass inside list


citibike_tripdata.drop(['usertype','name_localizedValue','bikeid'], axis=1)

Out[631]:
start start station end end station
tripduration starttime stoptime
station id name station id name

2018-05-01 2018-05-01 Newport


0 338 3639 Harborside 3199
00:04:47 00:10:25 Pkwy

2018-05-01 2018-05-01
1 1482 3681 Grand St 3185 City Hall
01:31:10 01:55:53

2018-05-01 2018-05-01 McGinley


2 232 3194 3193 Lincoln Park
01:31:29 01:35:22 Square

2018-05-01 2018-05-01 Grove St


3 190 3185 City Hall 3186
02:03:29 02:06:40 PATH

2018-05-01 2018-05-01
4 303 3207 Oakland Ave 3195 Sip Ave
04:27:12 04:32:16

In [632]: # drop rows using axis=0


citibike_tripdata.drop([1,3], axis=0)

Out[632]:
start start end end
tripduration starttime stoptime station station station station bikeid name_localizedValu
id name id name

2018-05- 2018-05-
Newport
0 338 01 01 3639 Harborside 3199 33558 Annual Membersh
Pkwy
00:04:47 00:10:25

2018-05- 2018-05- FREE Bonus Mon


McGinley Lincoln
2 232 01 01 3194 3193 29217 with Annu
Square Park
01:31:29 01:35:22 Membersh

2018-05- 2018-05-
Oakland
4 303 01 01 3207 3195 Sip Ave 15271 Annual Membersh
Ave
04:27:12 04:32:16

drop_duplicates
Return DataFrame with duplicate rows removed.

In [633]: citibike_tripdata = citibike_tripdata.drop_duplicates()

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 31/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [634]: citibike_tripdata

Out[634]:
start start end end
tripduration starttime stoptime station station station station bikeid name_localizedValu
id name id name

2018-05- 2018-05-
Newport
0 338 01 01 3639 Harborside 3199 33558 Annual Membersh
Pkwy
00:04:47 00:10:25

2018-05- 2018-05-
1 1482 01 01 3681 Grand St 3185 City Hall 33593 24 Ho
01:31:10 01:55:53

2018-05- 2018-05- FREE Bonus Mon


McGinley Lincoln
2 232 01 01 3194 3193 29217 with Annu
Square Park
01:31:29 01:35:22 Membersh

2018-05- 2018-05-
Grove
3 190 01 01 3185 City Hall 3186 29662 24 Ho
St PATH
02:03:29 02:06:40

2018-05- 2018-05-
Oakland
4 303 01 01 3207 3195 Sip Ave 15271 Annual Membersh
Ave
04:27:12 04:32:16

df.dropna
Remove missing values.

In [635]: weatherHistory = pd.read_csv('weatherHistory.csv')

In [636]: # Precip Type having 517 null(missing) values


weatherHistory.isnull().sum()

Out[636]: Formatted Date 0

Summary 0

Precip Type 517

Temperature (C) 0

Apparent Temperature (C) 0

Humidity 0

Wind Speed (km/h) 0

Wind Bearing (degrees) 0

Visibility (km) 0

Loud Cover 0

Pressure (millibars) 0

Daily Summary 0

dtype: int64

In [637]: weatherHistory = weatherHistory.dropna()

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 32/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [638]: weatherHistory.isnull().sum()

Out[638]: Formatted Date 0

Summary 0

Precip Type 0

Temperature (C) 0

Apparent Temperature (C) 0

Humidity 0

Wind Speed (km/h) 0

Wind Bearing (degrees) 0

Visibility (km) 0

Loud Cover 0

Pressure (millibars) 0

Daily Summary 0

dtype: int64

df.dtypes
display datatypes of each column

In [639]: weatherHistory.dtypes

Out[639]: Formatted Date object

Summary object

Precip Type object

Temperature (C) float64

Apparent Temperature (C) float64

Humidity float64

Wind Speed (km/h) float64

Wind Bearing (degrees) float64

Visibility (km) float64

Loud Cover float64

Pressure (millibars) float64

Daily Summary object

dtype: object

df.duplicated
Return boolean Series denoting duplicate rows.

In [643]: df = pd.DataFrame({
'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie','Indo
'style': ['cup', 'cup', 'cup', 'pack', 'pack','pack'],
'rating': [4, 4, 3.5, 15, 5,5]})

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 33/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [644]: df

Out[644]:
brand style rating

0 Yum Yum cup 4.0

1 Yum Yum cup 4.0

2 Indomie cup 3.5

3 Indomie pack 15.0

4 Indomie pack 5.0

5 Indomie pack 5.0

In [645]: # True means it is duplicate in dataframe


df.duplicated()

Out[645]: 0 False

1 True

2 False

3 False

4 False

5 True

dtype: bool

In [646]: # Find duplicate using below code


df[df.duplicated()==True]

Out[646]:
brand style rating

1 Yum Yum cup 4.0

5 Indomie pack 5.0

df.explode
Transform each element of a list-like to a row, replicating index values.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 34/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [647]: data = pd.DataFrame({"city": ['P', 'Q', 'R'],


"day1": [22, 25, 21],
'day2':[31, 12, 67],
'day3': [27, 20, [41, 45, 67, 90, 21]],
'day4': [64, 47, 24],
'day5': [23, 54, 16]})
data

Out[647]:
city day1 day2 day3 day4 day5

0 P 22 31 27 64 23

1 Q 25 12 20 47 54

2 R 21 67 [41, 45, 67, 90, 21] 24 16

In [648]: data.explode(column = 'day3', ignore_index=True)

Out[648]:
city day1 day2 day3 day4 day5

0 P 22 31 27 64 23

1 Q 25 12 20 47 54

2 R 21 67 41 24 16

3 R 21 67 45 24 16

4 R 21 67 67 24 16

5 R 21 67 90 24 16

6 R 21 67 21 24 16

df.fillna
Fill NA/NaN values using the specified method.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 35/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [652]: # creating dataframe using Dictionary


data = {'Year' : [1990, 1994, 1998, 2002],
'Country' : ['Italy', np.nan, 'France', 'Japan'],
'Winner' : ['Germany', 'Brazil', 'France', np.nan],
'GoalScored' : [115, np.nan, 171, 161]
}
data = pd.DataFrame(data)
data

Out[652]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 NaN Brazil NaN

2 1998 France France 171.0

3 2002 Japan NaN 161.0

In [653]: data.isnull().sum()

Out[653]: Year 0

Country 1

Winner 1

GoalScored 1

dtype: int64

In [654]: data.fillna(0)

Out[654]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 0 Brazil 0.0

2 1998 France France 171.0

3 2002 Japan 0 161.0

In [655]: data.fillna("Missing")

Out[655]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 Missing Brazil Missing

2 1998 France France 171.0

3 2002 Japan Missing 161.0

df.groupby
Group DataFrame using a mapper or by a Series of columns.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 36/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

A groupby operation involves some combination of splitting the


object, applying a function, and
combining the results. This can be
used to group large amounts of data and compute operations
on these
groups.

In [656]: avocado_data = pd.read_csv('avocado.csv')


avocado_data.head()

Out[656]:
Small Large Total Total
Region Type AveragePrice Date
Bags Bags Bags Volume

2018-03-
0 Atlanta organic 89424.11 207.08 89631.19 190257.38 1.70
25

2018-03-
1 Atlanta conventional 102717.50 153.00 102870.50 202790.74 1.75
18

2018-03-
2 Boston organic 120465.39 18.83 120484.22 236822.98 1.58
11

2018-03-
3 Boston conventional 136877.43 60.60 136938.03 239135.67 1.57
04

2018-02-
4 California organic 66273.89 46.58 66320.47 179041.72 1.82
25

In [657]: g = avocado_data.groupby(by='Type')

'agg', 'aggregate', 'all', 'any', 'apply', 'backfill', 'bfill', 'boxplot', 'corr', 'corrwith', 'count', 'cov',
'cumcount', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'dtypes', 'ewm', 'expanding',
'ffill', 'fillna', 'filter', 'first', 'get_group', 'groups', 'head', 'hist', 'idxmax', 'idxmin', 'indices', 'last', 'mad',
'max', 'mean', 'median', 'min', 'ndim', 'ngroup', 'ngroups', 'nth', 'nunique', 'ohlc', 'pad', 'pct_change',
'pipe', 'plot', 'prod', 'quantile', 'rank', 'resample', 'rolling', 'sample', 'sem', 'shift', 'size', 'skew', 'std',
'sum', 'tail', 'take', 'transform', 'tshift', 'var'

In [658]: g.mean()

Out[658]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

Type

conventional 139871.182 241.114 142182.294 706706.926 1.358

organic 125554.290 330.242 127413.680 486946.314 1.528

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 37/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [659]: g.min()

Out[659]:
Small Large Total Total
Region AveragePrice Date
Bags Bags Bags Volume

Type

conventional Atlanta 102717.50 60.60 102870.50 202790.74 1.01 2018-02-25

organic Atlanta 66273.89 18.83 66320.47 179041.72 1.16 2018-02-25

In [660]: g.max()

Out[660]:
Small Large Total Total
Region AveragePrice Date
Bags Bags Bags Volume

Type

2018-03-
conventional SanFrancisco 193813.92 609.20 197281.89 1203274.11 1.75
25

2018-03-
organic SanFrancisco 231913.11 1286.43 236417.93 1051308.50 1.82
25

In [661]: g.corr()

Out[661]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

Type

conventional Small Bags 1.000000 0.356233 0.998932 0.262657 -0.220026

Large Bags 0.356233 1.000000 0.379422 0.501285 -0.470016

Total Bags 0.998932 0.379422 1.000000 0.306574 -0.262626

Total Volume 0.262657 0.501285 0.306574 1.000000 -0.976248

AveragePrice -0.220026 -0.470016 -0.262626 -0.976248 1.000000

organic Small Bags 1.000000 0.917566 0.999667 0.864519 -0.930256

Large Bags 0.917566 1.000000 0.915467 0.774162 -0.772484

Total Bags 0.999667 0.915467 1.000000 0.877099 -0.938423

Total Volume 0.864519 0.774162 0.877099 1.000000 -0.958522

AveragePrice -0.930256 -0.772484 -0.938423 -0.958522 1.000000

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 38/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [662]: g.describe()

Out[662]:
Small Bags

count mean std min 25% 50% 75% max

Type

conventional 5.0 139871.182 39329.111592 102717.50 103033.73 136877.43 162913.33 193813

organic 5.0 125554.290 63623.861662 66273.89 89424.11 119694.95 120465.39 231913

2 rows × 40 columns

df.head
Return the first n rows. default 5

In [663]: avocado_data = pd.read_csv('avocado.csv')


avocado_data.head(2)

Out[663]:
Small Large Total Total
Region Type AveragePrice Date
Bags Bags Bags Volume

2018-03-
0 Atlanta organic 89424.11 207.08 89631.19 190257.38 1.70
25

2018-03-
1 Atlanta conventional 102717.50 153.00 102870.50 202790.74 1.75
18

df.tail
Return the last n rows. default 5

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 39/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [664]: avocado_data.tail()

Out[664]:
Small Large Total Total
Region Type AveragePrice Date
Bags Bags Bags Volume

2018-
5 California conventional 103033.73 186.20 106984.89 1203274.11 1.01
03-25

2018-
6 NewYork organic 119694.95 92.29 124214.59 777300.99 1.38
03-18

2018-
7 NewYork conventional 193813.92 196.57 197281.89 904333.98 1.29
03-11

2018-
8 SanFrancisco organic 231913.11 1286.43 236417.93 1051308.50 1.16
03-04

2018-
9 SanFrancisco conventional 162913.33 609.20 166836.16 984000.13 1.17
02-25

df.hist
Make a histogram of the DataFrame's columns.

In [665]: avocado_data.hist()

Out[665]: array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001DB6DEEA748>,

<matplotlib.axes._subplots.AxesSubplot object at 0x000001DB6D882400>],

[<matplotlib.axes._subplots.AxesSubplot object at 0x000001DB6BB25F60>,

<matplotlib.axes._subplots.AxesSubplot object at 0x000001DB6D6D4A58>],

[<matplotlib.axes._subplots.AxesSubplot object at 0x000001DB689A8C18>,

<matplotlib.axes._subplots.AxesSubplot object at 0x000001DB6BC06C50>]],

dtype=object)

df.idxmax
localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 40/70
5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

Return index of first occurrence of maximum over requested axis.

The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.

In [666]: avocado_data_num

Out[666]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 100.00 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

5 103033.73 186.20 106984.89 1203274.11 1.01

6 119694.95 92.29 124214.59 777300.99 1.38

7 193813.92 196.57 197281.89 904333.98 1.29

8 231913.11 1286.43 236417.93 1051308.50 1.16

9 162913.33 609.20 166836.16 984000.13 1.17

In [667]: # axis=0 means row wise(in each column)


avocado_data_num.idxmax(axis=0)

Out[667]: Small Bags 8

Large Bags 8

Total Bags 8

Total Volume 5

AveragePrice 4

dtype: int64

In [668]: # axis=1 means column wise(in each row)


avocado_data_num.idxmax(axis=1)

Out[668]: 0 Total Volume

1 Total Volume

2 Total Volume

3 Total Volume

4 Total Volume

5 Total Volume

6 Total Volume

7 Total Volume

8 Total Volume

9 Total Volume

dtype: object

df.idxmin
Return index of first occurrence of minimum over requested axis.
localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 41/70
5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [669]: # axis=0 means row wise(in each column)


avocado_data_num.idxmin(axis=0)

Out[669]: Small Bags 4

Large Bags 2

Total Bags 4

Total Volume 4

AveragePrice 5

dtype: int64

In [670]: # axis=1 means column wise(in each row)


avocado_data_num.idxmin(axis=1)

Out[670]: 0 AveragePrice

1 AveragePrice

2 AveragePrice

3 AveragePrice

4 AveragePrice

5 AveragePrice

6 AveragePrice

7 AveragePrice

8 AveragePrice

9 AveragePrice

dtype: object

df.iloc
Purely integer-location based indexing for selection by position.

In [671]: avocado_data_num.iloc[2:4,1:4]

Out[671]:
Large Bags Total Bags Total Volume

2 18.83 120484.22 236822.98

3 60.60 136938.03 239135.67

df.loc
Access a group of rows and columns by label(s).

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 42/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [672]: avocado_data_num

Out[672]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

0 89424.11 100.00 89631.19 190257.38 1.70

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

5 103033.73 186.20 106984.89 1203274.11 1.01

6 119694.95 92.29 124214.59 777300.99 1.38

7 193813.92 196.57 197281.89 904333.98 1.29

8 231913.11 1286.43 236417.93 1051308.50 1.16

9 162913.33 609.20 166836.16 984000.13 1.17

In [673]: avocado_data_num.loc[1:4, ['Small Bags','Large Bags','Total Bags','Total Volume',

Out[673]:
Small Bags Large Bags Total Bags Total Volume AveragePrice

1 102717.50 153.00 102870.50 202790.74 1.75

2 120465.39 18.83 120484.22 236822.98 1.58

3 136877.43 60.60 136938.03 239135.67 1.57

4 66273.89 46.58 66320.47 179041.72 1.82

In [674]: avocado_data_num.loc[1:6, 'Small Bags':'Total Bags']

Out[674]:
Small Bags Large Bags Total Bags

1 102717.50 153.00 102870.50

2 120465.39 18.83 120484.22

3 136877.43 60.60 136938.03

4 66273.89 46.58 66320.47

5 103033.73 186.20 106984.89

6 119694.95 92.29 124214.59

df.index

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 43/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [675]: avocado_data_num.index

Out[675]: RangeIndex(start=0, stop=10, step=1)

In [676]: list(avocado_data_num.index)

Out[676]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

df.info
This method prints information about a DataFrame including
the index dtype and columns, non-null
values and memory usage.

In [677]: avocado_data_num.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 10 entries, 0 to 9

Data columns (total 5 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Small Bags 10 non-null float64

1 Large Bags 10 non-null float64

2 Total Bags 10 non-null float64

3 Total Volume 10 non-null float64

4 AveragePrice 10 non-null float64

dtypes: float64(5)

memory usage: 528.0 bytes

df.insert
Insert column into DataFrame at specified location.

In [678]: avocado_data_num.insert(loc = 0, column = 'new_column', value= [1,2,3,4,5,6,7,8,9

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 44/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [679]: avocado_data_num

Out[679]:
new_column Small Bags Large Bags Total Bags Total Volume AveragePrice

0 1 89424.11 100.00 89631.19 190257.38 1.70

1 2 102717.50 153.00 102870.50 202790.74 1.75

2 3 120465.39 18.83 120484.22 236822.98 1.58

3 4 136877.43 60.60 136938.03 239135.67 1.57

4 5 66273.89 46.58 66320.47 179041.72 1.82

5 6 103033.73 186.20 106984.89 1203274.11 1.01

6 7 119694.95 92.29 124214.59 777300.99 1.38

7 8 193813.92 196.57 197281.89 904333.98 1.29

8 9 231913.11 1286.43 236417.93 1051308.50 1.16

9 10 162913.33 609.20 166836.16 984000.13 1.17

df.interpolate
Fill NaN values using an interpolation method.

method : str, default 'linear'


Interpolation technique to use. One of:

* 'linear': Ignore the index and treat the values as equally

spaced. This is the only method supported on MultiIndexes.

* 'time': Works on daily and higher resolution data to interpolate

given length of interval.

* 'index', 'values': use the actual numerical values of the index.

* 'pad': Fill in NaNs using existing values.

* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',

'barycentric', 'polynomial': Passed to

`scipy.interpolate.interp1d`. These methods use the numerical

values of the index. Both 'polynomial' and 'spline' require that

you also specify an `order` (int), e.g.

``df.interpolate(method='polynomial', order=5)``.

* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima',

'cubicspline': Wrappers around the SciPy interpolation methods of

similar names. See `Notes`.

* 'from_derivatives': Refers to

`scipy.interpolate.BPoly.from_derivatives` which

replaces 'piecewise_polynomial' interpolation method in

scipy 0.18.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 45/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [680]: df_dict = {'Year' : [1990, 1994, 1998, 2002],


'Country' : ['Italy', 'USA', 'France', 'Japan'],
'Winner' : ['Germany', 'Brazil', 'France', 'Brazil'],
'GoalScored' : [115, 141, np.nan, 161]
}
df_dict = pd.DataFrame(df_dict)
df_dict

Out[680]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 USA Brazil 141.0

2 1998 France France NaN

3 2002 Japan Brazil 161.0

In [681]: # filling missing value using interpolation


df_dict.interpolate(method='linear')

Out[681]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 USA Brazil 141.0

2 1998 France France 151.0

3 2002 Japan Brazil 161.0

In [683]: # filling specified column missing value using interpolation


df_dict['GoalScored'].interpolate(method='linear')

Out[683]: 0 115.0

1 141.0

2 151.0

3 161.0

Name: GoalScored, dtype: float64

df.isin
Whether each element in the DataFrame is contained in values.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 46/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [684]: df_dict

Out[684]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 USA Brazil 141.0

2 1998 France France NaN

3 2002 Japan Brazil 161.0

In [685]: # showing True where list of value present


df_dict.isin(['1994','Japan','Germany',141.0])

Out[685]:
Year Country Winner GoalScored

0 False False True False

1 False False False True

2 False False False False

3 False True False False

df.isna
Detect missing values. and gives boolean value True

In [686]: df_dict.isna()

Out[686]:
Year Country Winner GoalScored

0 False False False False

1 False False False False

2 False False False True

3 False False False False

df.isnull
Detect missing values.

If missing value present showing True

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 47/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [688]: df_dict.isnull()

Out[688]:
Year Country Winner GoalScored

0 False False False False

1 False False False False

2 False False False True

3 False False False False

df.items
Iterate over (column name, Series) pairs.

In [689]: df_dict

Out[689]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 USA Brazil 141.0

2 1998 France France NaN

3 2002 Japan Brazil 161.0

In [690]: items = df_dict.items()

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 48/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [691]: for label, content in items:


print(label)
print(content)
print()

Year

0 1990

1 1994

2 1998

3 2002

Name: Year, dtype: int64

Country

0 Italy

1 USA

2 France

3 Japan

Name: Country, dtype: object

Winner

0 Germany

1 Brazil

2 France

3 Brazil

Name: Winner, dtype: object

GoalScored

0 115.0

1 141.0

2 NaN

3 161.0

Name: GoalScored, dtype: float64

df.iteritems
Iterate over (column name, Series) pairs.

In [692]: iteritems = df_dict.iteritems()

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 49/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [693]: for label, content in iteritems:


print(label)
print(content)
print('---------')

Year

0 1990

1 1994

2 1998

3 2002

Name: Year, dtype: int64

---------

Country

0 Italy

1 USA

2 France

3 Japan

Name: Country, dtype: object

---------

Winner

0 Germany

1 Brazil

2 France

3 Brazil

Name: Winner, dtype: object

---------

GoalScored

0 115.0

1 141.0

2 NaN

3 161.0

Name: GoalScored, dtype: float64

---------

df.iterrows
Iterate over DataFrame rows as (index, Series) pairs.

In [694]: df_dict

Out[694]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 USA Brazil 141.0

2 1998 France France NaN

3 2002 Japan Brazil 161.0

In [695]: iterrows = df_dict.iterrows()

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 50/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [696]: for label, content in iterrows:


print(label)
print(content)
print('---------')

Year 1990

Country Italy

Winner Germany

GoalScored 115.0

Name: 0, dtype: object

---------

Year 1994

Country USA

Winner Brazil

GoalScored 141.0

Name: 1, dtype: object

---------

Year 1998

Country France

Winner France

GoalScored NaN

Name: 2, dtype: object

---------

Year 2002

Country Japan

Winner Brazil

GoalScored 161.0

Name: 3, dtype: object

---------

df.itertuples
Iterate over DataFrame rows as namedtuples.

In [697]: itertuples = df_dict.itertuples()

In [698]: list(itertuples)

Out[698]: [Pandas(Index=0, Year=1990, Country='Italy', Winner='Germany', GoalScored=115.


0),

Pandas(Index=1, Year=1994, Country='USA', Winner='Brazil', GoalScored=141.0),

Pandas(Index=2, Year=1998, Country='France', Winner='France', GoalScored=nan),

Pandas(Index=3, Year=2002, Country='Japan', Winner='Brazil', GoalScored=161.


0)]

df.keys()
localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 51/70
5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

columns for DataFrame.

In [700]: df_dict

Out[700]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 USA Brazil 141.0

2 1998 France France NaN

3 2002 Japan Brazil 161.0

In [701]: df_dict.keys()

Out[701]: Index(['Year', 'Country', 'Winner', 'GoalScored'], dtype='object')

df.values
Return a Numpy representation of the DataFrame.
Only the values in the DataFrame will be
returned, the axes labels
will be removed.

In [702]: df_dict.values

Out[702]: array([[1990, 'Italy', 'Germany', 115.0],

[1994, 'USA', 'Brazil', 141.0],

[1998, 'France', 'France', nan],

[2002, 'Japan', 'Brazil', 161.0]], dtype=object)

df.kurt
Return unbiased kurtosis over requested axis.

In [703]: avocado_data_num.kurt()

Out[703]: new_column -1.200000

Small Bags 0.257027

Large Bags 5.437476

Total Bags 0.249708

Total Volume -2.102999

AveragePrice -1.446656

dtype: float64

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 52/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [704]: avocado_data_num.kurtosis()

Out[704]: new_column -1.200000

Small Bags 0.257027

Large Bags 5.437476

Total Bags 0.249708

Total Volume -2.102999

AveragePrice -1.446656

dtype: float64

df.skew
Return unbiased skew over requested axis.
axis : {index (0), columns (1)}

In [705]: avocado_data_num.skew()

Out[705]: new_column 0.000000

Small Bags 0.866584

Large Bags 2.328516

Total Bags 0.856746

Total Volume 0.214461

AveragePrice -0.150918

dtype: float64

df.max
Return the maximum of the values over the requested axis.

In [706]: avocado_data_num.max()

Out[706]: new_column 10.00

Small Bags 231913.11

Large Bags 1286.43

Total Bags 236417.93

Total Volume 1203274.11

AveragePrice 1.82

dtype: float64

df.min
Return the minimum of the values over the requested axis.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 53/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [707]: avocado_data_num.min()

Out[707]: new_column 1.00

Small Bags 66273.89

Large Bags 18.83

Total Bags 66320.47

Total Volume 179041.72

AveragePrice 1.01

dtype: float64

df.median
Return the median of the values over the requested axis.

In [708]: avocado_data_num.median()

Out[708]: new_column 5.500

Small Bags 120080.170

Large Bags 126.500

Total Bags 122349.405

Total Volume 508218.330

AveragePrice 1.475

dtype: float64

df.std
Return sample standard deviation over requested axis.
{index (0), columns (1)}

In [709]: avocado_data_num.std()

Out[709]: new_column 3.027650

Small Bags 50433.160010

Large Bags 393.188540

Total Bags 51748.226118

Total Volume 422166.360013

AveragePrice 0.280240

dtype: float64

df.var
Return unbiased variance over requested axis.
{index (0), columns (1)}

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 54/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [710]: avocado_data_num.var()

Out[710]: new_column 9.166667e+00

Small Bags 2.543504e+09

Large Bags 1.545972e+05

Total Bags 2.677879e+09

Total Volume 1.782244e+11

AveragePrice 7.853444e-02

dtype: float64

df.melt
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

In [711]: avocado_data_num

Out[711]:
new_column Small Bags Large Bags Total Bags Total Volume AveragePrice

0 1 89424.11 100.00 89631.19 190257.38 1.70

1 2 102717.50 153.00 102870.50 202790.74 1.75

2 3 120465.39 18.83 120484.22 236822.98 1.58

3 4 136877.43 60.60 136938.03 239135.67 1.57

4 5 66273.89 46.58 66320.47 179041.72 1.82

5 6 103033.73 186.20 106984.89 1203274.11 1.01

6 7 119694.95 92.29 124214.59 777300.99 1.38

7 8 193813.92 196.57 197281.89 904333.98 1.29

8 9 231913.11 1286.43 236417.93 1051308.50 1.16

9 10 162913.33 609.20 166836.16 984000.13 1.17

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 55/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [712]: avocado_data_num.melt()

Out[712]:
variable value

0 new_column 1.00

1 new_column 2.00

2 new_column 3.00

3 new_column 4.00

4 new_column 5.00

5 new_column 6.00

6 new_column 7.00

7 new_column 8.00

8 new_column 9.00

9 new_column 10.00

10 Small Bags 89424.11

11 Small Bags 102717.50

12 Small Bags 120465.39

13 Small Bags 136877.43

14 Small Bags 66273.89

15 Small Bags 103033.73

16 Small Bags 119694.95

17 Small Bags 193813.92

18 Small Bags 231913.11

19 Small Bags 162913.33

20 Large Bags 100.00

21 Large Bags 153.00

22 Large Bags 18.83

23 Large Bags 60.60

24 Large Bags 46.58

25 Large Bags 186.20

26 Large Bags 92.29

27 Large Bags 196.57

28 Large Bags 1286.43

29 Large Bags 609.20

30 Total Bags 89631.19

31 Total Bags 102870.50

32 Total Bags 120484.22

33 Total Bags 136938.03

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 56/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

variable value

34 Total Bags 66320.47

35 Total Bags 106984.89

36 Total Bags 124214.59

37 Total Bags 197281.89

38 Total Bags 236417.93

39 Total Bags 166836.16

40 Total Volume 190257.38

41 Total Volume 202790.74

42 Total Volume 236822.98

43 Total Volume 239135.67

44 Total Volume 179041.72

45 Total Volume 1203274.11

46 Total Volume 777300.99

47 Total Volume 904333.98

48 Total Volume 1051308.50

49 Total Volume 984000.13

50 AveragePrice 1.70

51 AveragePrice 1.75

52 AveragePrice 1.58

53 AveragePrice 1.57

54 AveragePrice 1.82

55 AveragePrice 1.01

56 AveragePrice 1.38

57 AveragePrice 1.29

58 AveragePrice 1.16

59 AveragePrice 1.17

df.memory_usage
Return the memory usage of each column in bytes.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 57/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [713]: df_dict.memory_usage()

Out[713]: Index 128

Year 32

Country 32

Winner 32

GoalScored 32

dtype: int64

In [714]: avocado_data_num.memory_usage()

Out[714]: Index 128

new_column 80

Small Bags 80

Large Bags 80

Total Bags 80

Total Volume 80

AveragePrice 80

dtype: int64

df.multiply
Get Multiplication of dataframe and other, element-wise (binary operator mul ).

In [715]: df_dict.multiply(2)

Out[715]:
Year Country Winner GoalScored

0 3980 ItalyItaly GermanyGermany 230.0

1 3988 USAUSA BrazilBrazil 282.0

2 3996 FranceFrance FranceFrance NaN

3 4004 JapanJapan BrazilBrazil 322.0

df.nunique
Count number of distinct elements in specified axis.

In [717]: df_dict.nunique()

Out[717]: Year 4

Country 4

Winner 3

GoalScored 3

dtype: int64

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 58/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [718]: df_dict

Out[718]:
Year Country Winner GoalScored

0 1990 Italy Germany 115.0

1 1994 USA Brazil 141.0

2 1998 France France NaN

3 2002 Japan Brazil 161.0

df.pivot_table
Create a spreadsheet-style pivot table as a DataFrame.

In [721]: df_dict.pivot_table(values=['GoalScored','Year'],index='Winner',aggfunc='mean')

Out[721]:
GoalScored Year

Winner

Brazil 151.0 1998

France NaN 1998

Germany 115.0 1990

In [722]: df_dict.pivot_table(values=['GoalScored','Year'],index='Country',aggfunc='mean')

Out[722]:
GoalScored Year

Country

France NaN 1998

Italy 115.0 1990

Japan 161.0 2002

USA 141.0 1994

df.pop
Return item and drop from frame. Raise KeyError if not found.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 59/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [723]: df_dict.pop('Year')

Out[723]: 0 1990

1 1994

2 1998

3 2002

Name: Year, dtype: int64

In [724]: df_dict

Out[724]:
Country Winner GoalScored

0 Italy Germany 115.0

1 USA Brazil 141.0

2 France France NaN

3 Japan Brazil 161.0

df.rename
Rename column name

In [725]: df_dict.rename(columns = {'Country':'country','Winner':'winner_country','GoalScor

Out[725]:
country winner_country goal_scored

0 Italy Germany 115.0

1 USA Brazil 141.0

2 France France NaN

3 Japan Brazil 161.0

df.replace
Replace values given in to_replace with value .

In [726]: df_dict.replace(to_replace = 'USA', value= 'usa')

Out[726]:
Country Winner GoalScored

0 Italy Germany 115.0

1 usa Brazil 141.0

2 France France NaN

3 Japan Brazil 161.0

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 60/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [727]: df_dict.replace(to_replace = {'USA': 'usa', 'Brazil': 'brazil'})

Out[727]:
Country Winner GoalScored

0 Italy Germany 115.0

1 usa brazil 141.0

2 France France NaN

3 Japan brazil 161.0

df.reset_index
Reset the index of the DataFrame, and use the default one instead.

In [728]: df_dict.reset_index()

Out[728]:
index Country Winner GoalScored

0 0 Italy Germany 115.0

1 1 USA Brazil 141.0

2 2 France France NaN

3 3 Japan Brazil 161.0

df.sample
Return a random sample of items from an axis of object.

In [729]: df_dict.sample(n=4)

Out[729]:
Country Winner GoalScored

2 France France NaN

1 USA Brazil 141.0

0 Italy Germany 115.0

3 Japan Brazil 161.0

df.shape
Return a tuple representing the dimensionality of the DataFrame.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 61/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [730]: df_dict

Out[730]:
Country Winner GoalScored

0 Italy Germany 115.0

1 USA Brazil 141.0

2 France France NaN

3 Japan Brazil 161.0

In [731]: df_dict.shape

Out[731]: (4, 3)

df.shape
Return an int representing the number of elements in this object.

In [732]: df_dict.size

Out[732]: 12

df.sort_index
Sort object by labels (along an axis).

In [733]: df_dict.sort_index(axis=0)

Out[733]:
Country Winner GoalScored

0 Italy Germany 115.0

1 USA Brazil 141.0

2 France France NaN

3 Japan Brazil 161.0

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 62/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [734]: df_dict.sort_index(axis=1)

Out[734]:
Country GoalScored Winner

0 Italy 115.0 Germany

1 USA 141.0 Brazil

2 France NaN France

3 Japan 161.0 Brazil

df.sort_values
Sort by the values along either axis.

In [735]: df_dict.sort_values(by='Country')

Out[735]:
Country Winner GoalScored

2 France France NaN

0 Italy Germany 115.0

3 Japan Brazil 161.0

1 USA Brazil 141.0

In [736]: df_dict.sort_values(by='GoalScored')

Out[736]:
Country Winner GoalScored

0 Italy Germany 115.0

1 USA Brazil 141.0

3 Japan Brazil 161.0

2 France France NaN

In [737]: df_dict.sort_values(by=['Winner','GoalScored'])

Out[737]:
Country Winner GoalScored

1 USA Brazil 141.0

3 Japan Brazil 161.0

2 France France NaN

0 Italy Germany 115.0

df.to_clipboard
localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 63/70
5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

Copy object to the system clipboard.

In [738]: df_dict.to_clipboard()

In [739]: avocado_data_num.to_clipboard()

df.to_csv
Write object to a comma-separated values (csv) file.

In [458]: df_dict.to_csv('new_dict.csv')

df.to_dict
Convert the DataFrame to a dictionary.

In [457]: df_dict.to_dict()

Out[457]: {'Country': {0: 'Italy', 1: 'USA', 2: 'France', 3: 'Japan'},

'Winner': {0: 'Germany', 1: 'Brazil', 2: 'France', 3: 'Brazil'},

'GoalScored': {0: 115.0, 1: 141.0, 2: nan, 3: 161.0}}

df.to_excel
Write object to an Excel sheet.

In [459]: df_dict.to_excel('new_excel.xlsx')

df.to_html
Render a DataFrame as an HTML table.

In [462]: df_dict.to_html('dict_html.html')

df.to_json
Convert the object to a JSON string.

In [463]: df_dict.to_json('dict_json.json')

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 64/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

df.to_numpy
Convert the DataFrame to a NumPy array.

In [464]: df_dict.to_numpy()

Out[464]: array([['Italy', 'Germany', 115.0],

['USA', 'Brazil', 141.0],

['France', 'France', nan],

['Japan', 'Brazil', 161.0]], dtype=object)

df.to_parquet
Write a DataFrame to the binary parquet format.

In [466]: df_dict.to_parquet('parquet_file')

df.to_pickle
Pickle (serialize) object to file.

In [467]: df_dict.to_pickle('file.pkl')

df.transform
Call func on self producing a DataFrame with transformed values.

In [740]: df_dict.transform(func = lambda x:x*2)

Out[740]:
Country Winner GoalScored

0 ItalyItaly GermanyGermany 230.0

1 USAUSA BrazilBrazil 282.0

2 FranceFrance FranceFrance NaN

3 JapanJapan BrazilBrazil 322.0

df.transpose()
Transpose index and columns.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 65/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [741]: df_dict.transpose()

Out[741]:
0 1 2 3

Country Italy USA France Japan

Winner Germany Brazil France Brazil

GoalScored 115.0 141.0 NaN 161.0

In [742]: df_dict.T

Out[742]:
0 1 2 3

Country Italy USA France Japan

Winner Germany Brazil France Brazil

GoalScored 115.0 141.0 NaN 161.0

In [743]: df_dict

Out[743]:
Country Winner GoalScored

0 Italy Germany 115.0

1 USA Brazil 141.0

2 France France NaN

3 Japan Brazil 161.0

In [744]: df_dict.truncate(before=2, after=3,axis=0)

Out[744]:
Country Winner GoalScored

2 France France NaN

3 Japan Brazil 161.0

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 66/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [745]: avocado_data_num

Out[745]:
new_column Small Bags Large Bags Total Bags Total Volume AveragePrice

0 1 89424.11 100.00 89631.19 190257.38 1.70

1 2 102717.50 153.00 102870.50 202790.74 1.75

2 3 120465.39 18.83 120484.22 236822.98 1.58

3 4 136877.43 60.60 136938.03 239135.67 1.57

4 5 66273.89 46.58 66320.47 179041.72 1.82

5 6 103033.73 186.20 106984.89 1203274.11 1.01

6 7 119694.95 92.29 124214.59 777300.99 1.38

7 8 193813.92 196.57 197281.89 904333.98 1.29

8 9 231913.11 1286.43 236417.93 1051308.50 1.16

9 10 162913.33 609.20 166836.16 984000.13 1.17

In [746]: avocado_data_num.sort_index(axis=1).truncate(before='Small Bags',after='Total Vol

Out[746]:
Small Bags Total Bags Total Volume

0 89424.11 89631.19 190257.38

1 102717.50 102870.50 202790.74

2 120465.39 120484.22 236822.98

3 136877.43 136938.03 239135.67

4 66273.89 66320.47 179041.72

5 103033.73 106984.89 1203274.11

6 119694.95 124214.59 777300.99

7 193813.92 197281.89 904333.98

8 231913.11 236417.93 1051308.50

9 162913.33 166836.16 984000.13

df.update
Modify in place using non-NA values from another DataFrame.

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 67/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [747]: df = pd.DataFrame({'A': [1, 2, 3],


'B': [400, 500, 600]})

df

Out[747]:
A B

0 1 400

1 2 500

2 3 600

In [748]: new_df = pd.DataFrame({'B': [4, 5, 6],


'C': [7, 8, 9]})
new_df

Out[748]:
B C

0 4 7

1 5 8

2 6 9

In [749]: df.update(new_df)

In [750]: df

Out[750]:
A B

0 1 4

1 2 5

2 3 6

df.value_counts
Return a Series containing counts of unique rows in the DataFrame.

In [753]: df_dict.value_counts()

Out[753]: Country Winner GoalScored

Italy Germany 115.0 1

Japan Brazil 161.0 1

USA Brazil 141.0 1

dtype: int64

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 68/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [754]: df_dict['Winner'].value_counts()

Out[754]: Brazil 2

Germany 1

France 1

Name: Winner, dtype: int64

df.where
In [755]: df_dict

Out[755]:
Country Winner GoalScored

0 Italy Germany 115.0

1 USA Brazil 141.0

2 France France NaN

3 Japan Brazil 161.0

In [756]: avocado_data_num.where(avocado_data_num > 100)

Out[756]:
new_column Small Bags Large Bags Total Bags Total Volume AveragePrice

0 NaN 89424.11 NaN 89631.19 190257.38 NaN

1 NaN 102717.50 153.00 102870.50 202790.74 NaN

2 NaN 120465.39 NaN 120484.22 236822.98 NaN

3 NaN 136877.43 NaN 136938.03 239135.67 NaN

4 NaN 66273.89 NaN 66320.47 179041.72 NaN

5 NaN 103033.73 186.20 106984.89 1203274.11 NaN

6 NaN 119694.95 NaN 124214.59 777300.99 NaN

7 NaN 193813.92 196.57 197281.89 904333.98 NaN

8 NaN 231913.11 1286.43 236417.93 1051308.50 NaN

9 NaN 162913.33 609.20 166836.16 984000.13 NaN

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 69/70


5/11/22, 9:50 PM Pandas Full Tutorial - Jupyter Notebook

In [757]: df_dict.where(df_dict == 'USA')

Out[757]:
Country Winner GoalScored

0 NaN NaN NaN

1 USA NaN NaN

2 NaN NaN NaN

3 NaN NaN NaN

Happy Learning

localhost:8888/notebooks/OneDrive/Desktop/Data Science/python/Pandas/Pandas Full Tutorial.ipynb 70/70

You might also like