Pandas GroupBy Stack Unstack

Pandas GroupBy
Groupby is a pretty simple concept. We can create a grouping of categories and apply a function to the categories. It’s a simple concept but it’s an extremely
valuable technique that’s widely used in data science. In real data science projects, you’ll be dealing with large amounts of data and trying things over and
over, so for efficiency, we use Groupby concept. Groupby concept is really important because it’s ability to aggregate data efficiently, both in performance and
the amount code is magnificent. Groupby mainly refers to a process involving one or more of the following steps they are:
Splitting : It is a process in which we split data into group by applying some conditions on datasets.
Applying : It is a process in which we apply a function to each group independently
Combining : It is a process in which we combine different datasets after applying groupby and results into a data structure
The following image will help in understanding a process involve in Groupby concept.
1. Group the unique values from the Team column
2. Now there’s a bucket for each group
3. Toss the other data into the buckets
4. Apply a function on the weight column of each bucket.
Splitting Data into Groups

Splitting is a process in which we split data into a group by applying some conditions on datasets. In order to split the data, we apply certain conditions on
datasets. In order to split the data, we use groupby() function this function is used to split the data into groups based on some criteria. Pandas objects can be
split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. Pandas datasets can be split into any of their
objects. There are multiple ways to split data like:
obj.groupby(key)
obj.groupby(key, axis=1)
obj.groupby([key1, key2])
Note :In this we refer to the grouping objects as the keys.

Grouping data with one key:
In order to group data with one key, we pass only one key as an argument in groupby function.
Python3
# importing pandas module

import pandas as pd
# Define a dictionary containing employee data

data1 = {'Name':['Jai', 'Anuj', 'Jai', 'Princi',
'Gaurav', 'Anuj', 'Princi', 'Abhi'],
'Age':[27, 24, 22, 32,
33, 36, 27, 32],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj',
'Jaunpur', 'Kanpur', 'Allahabad', 'Aligarh'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd',
'B.Tech', 'B.com', 'Msc', 'MA']}
# Convert the dictionary into DataFrame

df = pd.DataFrame(data1)
print(df)
Now we group a data of Name using groupby() function.
Python3
# using groupby function

# with one key
df.groupby('Name')
print(df.groupby('Name').groups)
Output :
Now we print the first entries in all the groups formed.
Python3
# applying groupby() function to

# group the data on Name value.
gk = df.groupby('Name')
# Let's print the first entries

# in all the groups formed.
gk.first()
Output :
Grouping data with multiple keys :

In order to group data with multiple keys, we pass multiple keys in groupby function.
Python3

import pandas as pd

'Age':[27, 24, 22, 32,
33, 36, 27, 32],

print(df)
Now we group a data of “Name” and “Qualification” together using multiple keys in groupby function.
Python3
# Using multiple keys in

# groupby() function
df.groupby(['Name', 'Qualification'])
print(df.groupby(['Name', 'Qualification']).groups)
Output :
Grouping data by sorting keys :

Group keys are sorted by default using the groupby operation. User can pass sort=False for potential speedups.
Python3

import pandas as pd

'Age':[27, 24, 22, 32,
33, 36, 27, 32], }

print(df)
Now we apply groupby() without sort
Python3

# without using sort
df.groupby(['Name']).sum()
Output :
Now we apply groupby() using sort in order to attain potential speedups
Python3

# with sort
df.groupby(['Name'], sort = False).sum()
Output :
Grouping data with object attributes :

Groups attribute is like dictionary whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.
Python3

import pandas as pd

'Age':[27, 24, 22, 32,
33, 36, 27, 32],

print(df)
Now we group data like we do in a dictionary using keys.
Python3
# using keys for grouping

# data
df.groupby('Name').groups
Output :
Iterating through groups

In order to iterate an element of groups, we can iterate through the object similar to itertools.obj.
Python3

import pandas as pd

'Age':[27, 24, 22, 32,
33, 36, 27, 32],

print(df)
Now we iterate an element of group in a similar way we do in itertools.obj.
Python3
# iterating an element
# of group
grp = df.groupby('Name')
for name, group in grp:
print(name)
print(group)
print()
Output :
Now we iterate an element of group containing multiple keys
Python3
# iterating an element
# of group containing
# multiple keys
grp = df.groupby(['Name', 'Qualification'])

for name, group in grp:
print(name)
print(group)
print()
Output :
As shown in output that group name will be tuple
Selecting a groups
In order to select a group, we can select group using GroupBy.get_group(). We can select a group by applying a function GroupBy.get_group this function
select a single group.
Python3

import pandas as pd

'Age':[27, 24, 22, 32,
33, 36, 27, 32],

print(df)
Now we select a single group using Groupby.get_group.
Python3
# selecting a single group
grp.get_group('Jai')
Output :
Now we select an object grouped on multiple columns
Python3
# selecting object grouped

# on multiple columns
grp = df.groupby(['Name', 'Qualification'])

grp.get_group(('Jai', 'Msc'))
Output :
Applying function to group

After splitting a data into a group, we apply a function to each group in order to do that we perform some operation they are:
Aggregation : It is a process in which we compute a summary statistic (or statistics) about each group. For Example, Compute group sums ormeans
Transformation : It is a process in which we perform some group-specific computations and return a like-indexed. For Example, Filling NAs within groups
with a value derived from each group
Filtration : It is a process in which we discard some groups, according to a group-wise computation that evaluates True or False. For Example, Filtering
out data based on the group sum or mean
Aggregation :
Aggregation is a process in which we compute a summary statistic about each group. Aggregated function returns a single aggregated value for each group.
After splitting a data into groups using groupby function, several aggregation operations can be performed on the grouped data.
Code #1: Using aggregation via the aggregate method
Python3

import pandas as pd
# importing numpy as np
import numpy as np

'Age':[27, 24, 22, 32,
33, 36, 27, 32],

print(df)
Now we perform aggregation using aggregate method
Python3
# performing aggregation using

# aggregate method
grp1 = df.groupby('Name')
grp1.aggregate(np.sum)
Output :
Now we perform aggregation on agroup containing multiple keys
Python3
# performing aggregation on
# group containing multiple
# keys
grp1 = df.groupby(['Name', 'Qualification'])
grp1.aggregate(np.sum)
Output :
Applying multiple functions at once :

We can apply a multiple functions at once by passing a list or dictionary of functions to do aggregation with, outputting a DataFrame.
Python3

import pandas as pd
import numpy as np

'Age':[27, 24, 22, 32,
33, 36, 27, 32],

print(df)
Now we apply a multiple functions by passing a list of functions.
Python3
# applying a function by passing

# a list of functions
grp['Age'].agg([np.sum, np.mean, np.std])
Output :
Applying different functions to DataFrame columns :

In order to apply a different aggregation to the columns of a DataFrame, we can pass a dictionary to aggregate .
Python3

import pandas as pd
import numpy as np

'Age':[27, 24, 22, 32,
33, 36, 27, 32],
'B.Tech', 'B.com', 'Msc', 'MA'],
'Score': [23, 34, 35, 45, 47, 50, 52, 53]}

print(df)
Now we apply a different aggregation to the columns of a dataframe.
Python3
# using different aggregation

# function by passing dictionary
# to aggregate
grp.agg({'Age' : 'sum', 'Score' : 'std'})
Output :
Transformation :
Transformation is a process in which we perform some group-specific computations and return a like-indexed. Transform method returns an object that is
indexed the same (same size) as the one being grouped. The transform function must:
Return a result that is either the same size as the group chunk
Operate column-by-column on the group chunk
Not perform in-place operations on the group chunk.
Python3

import pandas as pd
import numpy as np

'Age':[27, 24, 22, 32,
33, 36, 27, 32],
'Score': [23, 34, 35, 45, 47, 50, 52, 53]}

print(df)
Now we perform some group-specific computations and return a like-indexed.
Python3
# using transform function
sc = lambda x: (x - x.mean()) / x.std()*10
df['Standardized_Score'] = df.groupby('Name')['Score'].transform(sc)
df[['Standardized_Score', 'Standardized_Age']] = df.groupby('Name')[['Score', 'Age']].transform(sc)
Output :
Filtration :
Filtration is a process in which we discard some groups, according to a group-wise computation that evaluates True or False. In order to filter a group, we use
filter method and apply some condition by which we filter group.
Python3

import pandas as pd
import numpy as np

'Age':[27, 24, 22, 32,
33, 36, 27, 32],
'Score': [23, 34, 35, 45, 47, 50, 52, 53]}

print(df)
Now we filter data that to return the Name which have lived two or more times .
Python3
# filtering data using

# filter data
grp.filter(lambda x: len(x) >= 2)
Output :
Reshape a Pandas DataFrame using stack,unstack and melt me

Pandas use various methods to reshape the dataframe and series. Reshaping a Pandas DataFrame is a common operation to transform data structures for
better analysis and visualization. The stack method pivots columns into rows, creating a multi-level index Series. Conversely, the unstack method reverses this
process by pivoting inner index levels into columns. On the other hand, the melt method is used to transform wide-format data into a long-format, making it
suitable for various analytical tasks. Let’s see about some of that reshaping method.
Importing the Dataset
Python3
import pandas as pd
# making dataframe
df = pd.read_csv("https://raw.githubusercontent.com/sivabalanb/Data-Analysis-with-Pandas-and-Python/master/nba.csv")
# it was print the first 5-rows
df.head()
Output:
Name Team Number Position Age Height Weight College Salary

0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0
Reshape DataFrame in Pandas

Below are the three methods that we will use to reshape the layout of tables in Pandas:
Using Pandas stack() method

Using unstack() method
Using melt() method
Reshape the Layout of Tables in Pandas Using stack() method
The stack() method works with the MultiIndex objects in DataFrame, it returns a DataFrame with an index with a new inner-most level of row labels. It
changes the wide table to a long table.
Python3
# import pandas module

import pandas as pd
# making dataframe
df = pd.read_csv("nba.csv")
# reshape the dataframe using stack() method

df_stacked = df.stack()
print(df_stacked.head(26))
Output:
0 Name Avery Bradley

Team Boston Celtics
Number 0.0
Position PG
Age 25.0
Height 6-2
Weight 180.0
College Texas
Salary 7730337.0
1 Name Jae Crowder
Team Boston Celtics
Number 99.0
Position SF
Age 25.0
Height 6-6
Weight 235.0
College Marquette
Salary 6796117.0
2 Name John Holland
Team Boston Celtics
Number 30.0
Position SG
Age 27.0
Height 6-5
Weight 205.0
College Boston University
dtype: object
Reshape a Pandas DataFrame Using unstack() method
The unstack() is similar to stack method, It also works with multi-index objects in dataframe, producing a reshaped DataFrame with a new inner-most level of
column labels.
Python3

import pandas as pd
# making dataframe
# unstack() method
df_unstacked = df_stacked.unstack()
print(df_unstacked.head(10))
Output:
Name Team Number Position Age Height Weight College Salary

0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0
5 Amir Johnson Boston Celtics 90.0 PF 29.0 6-9 240.0 NaN 12000000.0
6 Jordan Mickey Boston Celtics 55.0 PF 21.0 6-8 235.0 LSU 1170960.0
7 Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0
8 Terry Rozier Boston Celtics 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0
9 Marcus Smart Boston Celtics 36.0 PG 22.0 6-4 220.0 Oklahoma State 3431040.0
Reshape the Layout of Tables in Pandas Using melt() method
The melt() in Pandas reshape dataframe from wide format to long format. It uses the “id_vars[‘col_names’]” to melt the dataframe by column names.
Python3

import pandas as pd
# making dataframe
# it takes two columns "Name" and "Team"

df_melt = df.melt(id_vars=['Name', 'Team'])
print(df_melt.head(10))
Output:
Name Team variable value

0 Avery Bradley Boston Celtics Number 0.0
1 Jae Crowder Boston Celtics Number 99.0
2 John Holland Boston Celtics Number 30.0
3 R.J. Hunter Boston Celtics Number 28.0
4 Jonas Jerebko Boston Celtics Number 8.0
5 Amir Johnson Boston Celtics Number 90.0
6 Jordan Mickey Boston Celtics Number 55.0
7 Kelly Olynyk Boston Celtics Number 41.0
8 Terry Rozier Boston Celtics Number 12.0
9 Marcus Smart Boston Celtics Number 36.0

Pandas GroupBy Stack Unstack

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pandas GroupBy Stack Unstack

Uploaded by

Copyright:

Available Formats

Pandas GroupBy

2. Now there’s a bucket for each group

3. Toss the other data into the buckets

4. Apply a function on the weight column of each bucket.

Splitting Data into Groups

Note :In this we refer to the grouping objects as the keys.

# importing pandas module

# Define a dictionary containing employee data

# Convert the dictionary into DataFrame

Now we group a data of Name using groupby() function.

# using groupby function

Now we print the first entries in all the groups formed.

# applying groupby() function to

# Let's print the first entries

Grouping data with multiple keys :

# importing pandas module

# Define a dictionary containing employee data

# Convert the dictionary into DataFrame

# Using multiple keys in

Grouping data by sorting keys :

# importing pandas module

# Define a dictionary containing employee data

# Convert the dictionary into DataFrame

Now we apply groupby() without sort

# using groupby function

Now we apply groupby() using sort in order to attain potential speedups

# using groupby function

df.groupby(['Name'], sort = False).sum()

Grouping data with object attributes :

# importing pandas module

# Define a dictionary containing employee data

# Convert the dictionary into DataFrame

Now we group data like we do in a dictionary using keys.

# using keys for grouping

Iterating through groups

# importing pandas module

# Define a dictionary containing employee data

# Convert the dictionary into DataFrame

Now we iterate an element of group in a similar way we do in itertools.obj.

Now we iterate an element of group containing multiple keys

grp = df.groupby(['Name', 'Qualification'])

# importing pandas module

# Define a dictionary containing employee data

# Convert the dictionary into DataFrame

Now we select a single group using Groupby.get_group.

# selecting a single group

Now we select an object grouped on multiple columns

# selecting object grouped

grp = df.groupby(['Name', 'Qualification'])

Applying function to group

# importing pandas module

# Define a dictionary containing employee data

# Convert the dictionary into DataFrame

Now we perform aggregation using aggregate method

# performing aggregation using

Now we perform aggregation on agroup containing multiple keys

Applying multiple functions at once :

# importing pandas module

# Define a dictionary containing employee data

# Convert the dictionary into DataFrame

Now we apply a multiple functions by passing a list of functions.

# applying a function by passing

grp['Age'].agg([np.sum, np.mean, np.std])