Professional Documents
Culture Documents
Pandas GroupBy Stack Unstack
Pandas GroupBy Stack Unstack
Groupby is a pretty simple concept. We can create a grouping of categories and apply a function to the categories. It’s a simple concept but it’s an extremely
valuable technique that’s widely used in data science. In real data science projects, you’ll be dealing with large amounts of data and trying things over and
over, so for efficiency, we use Groupby concept. Groupby concept is really important because it’s ability to aggregate data efficiently, both in performance and
the amount code is magnificent. Groupby mainly refers to a process involving one or more of the following steps they are:
Splitting : It is a process in which we split data into group by applying some conditions on datasets.
Applying : It is a process in which we apply a function to each group independently
Combining : It is a process in which we combine different datasets after applying groupby and results into a data structure
The following image will help in understanding a process involve in Groupby concept.
1. Group the unique values from the Team column
obj.groupby(key)
obj.groupby(key, axis=1)
obj.groupby([key1, key2])
Python3
print(df)
Python3
df.groupby('Name')
print(df.groupby('Name').groups)
Output :
Python3
Output :
Python3
print(df)
Now we group a data of “Name” and “Qualification” together using multiple keys in groupby function.
Python3
print(df.groupby(['Name', 'Qualification']).groups)
Output :
Python3
print(df)
Python3
df.groupby(['Name']).sum()
Output :
Python3
Output :
Python3
print(df)
Python3
df.groupby('Name').groups
Output :
Python3
print(df)
Python3
# iterating an element
# of group
grp = df.groupby('Name')
for name, group in grp:
print(name)
print(group)
print()
Output :
Python3
# iterating an element
# of group containing
# multiple keys
Output :
As shown in output that group name will be tuple
Selecting a groups
In order to select a group, we can select group using GroupBy.get_group(). We can select a group by applying a function GroupBy.get_group this function
select a single group.
Python3
print(df)
Python3
grp = df.groupby('Name')
grp.get_group('Jai')
Output :
Python3
Output :
Aggregation : It is a process in which we compute a summary statistic (or statistics) about each group. For Example, Compute group sums ormeans
Transformation : It is a process in which we perform some group-specific computations and return a like-indexed. For Example, Filling NAs within groups
with a value derived from each group
Filtration : It is a process in which we discard some groups, according to a group-wise computation that evaluates True or False. For Example, Filtering
out data based on the group sum or mean
Aggregation :
Aggregation is a process in which we compute a summary statistic about each group. Aggregated function returns a single aggregated value for each group.
After splitting a data into groups using groupby function, several aggregation operations can be performed on the grouped data.
Code #1: Using aggregation via the aggregate method
Python3
# importing numpy as np
import numpy as np
print(df)
Python3
grp1 = df.groupby('Name')
grp1.aggregate(np.sum)
Output :
Python3
# performing aggregation on
# group containing multiple
# keys
grp1 = df.groupby(['Name', 'Qualification'])
grp1.aggregate(np.sum)
Output :
Python3
# importing numpy as np
import numpy as np
print(df)
Python3
grp = df.groupby('Name')
Output :
Python3
# importing numpy as np
import numpy as np
print(df)
Python3
Output :
Transformation :
Transformation is a process in which we perform some group-specific computations and return a like-indexed. Transform method returns an object that is
indexed the same (same size) as the one being grouped. The transform function must:
Return a result that is either the same size as the group chunk
Operate column-by-column on the group chunk
Not perform in-place operations on the group chunk.
Python3
# importing numpy as np
import numpy as np
print(df)
Python3
df['Standardized_Score'] = df.groupby('Name')['Score'].transform(sc)
Output :
Filtration :
Filtration is a process in which we discard some groups, according to a group-wise computation that evaluates True or False. In order to filter a group, we use
filter method and apply some condition by which we filter group.
Python3
# importing numpy as np
import numpy as np
print(df)
Now we filter data that to return the Name which have lived two or more times .
Python3
Output :
Python3
import pandas as pd
# making dataframe
df = pd.read_csv("https://raw.githubusercontent.com/sivabalanb/Data-Analysis-with-Pandas-and-Python/master/nba.csv")
df.head()
Output:
The stack() method works with the MultiIndex objects in DataFrame, it returns a DataFrame with an index with a new inner-most level of row labels. It
changes the wide table to a long table.
Python3
# making dataframe
df = pd.read_csv("nba.csv")
print(df_stacked.head(26))
Output:
The unstack() is similar to stack method, It also works with multi-index objects in dataframe, producing a reshaped DataFrame with a new inner-most level of
column labels.
Python3
# making dataframe
df = pd.read_csv("nba.csv")
# unstack() method
df_unstacked = df_stacked.unstack()
print(df_unstacked.head(10))
Output:
The melt() in Pandas reshape dataframe from wide format to long format. It uses the “id_vars[‘col_names’]” to melt the dataframe by column names.
Python3
# making dataframe
df = pd.read_csv("nba.csv")
Output: