Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Chapter 186: Pandas Transform: Preform

operations on groups and concatenate the


results
Section 186.1: Simple transform
First, Let's create a dummy dataframe

We assume that a customer can have n orders, an order can have m items, and items can be ordered more
multiple times

orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]
orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples',
'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']

# And this is how the dataframe looks like:


print(orders_df)
# customer_id order_id item
# 0 1 1 apples
# 1 1 1 chocolate
# 2 1 1 chocolate
# 3 1 2 coffee
# 4 1 2 coffee
# 5 2 3 apples
# 6 2 3 bananas
# 7 3 4 coffee
# 8 3 5 milkshake
# 9 3 6 chocolate
# 10 3 6 strawberry
# 11 3 6 strawberry

.
.

Now, we will use pandas transform function to count the number of orders per customer
# First, we define the function that will be applied per customer_id
count_number_of_orders = lambda x: len(x.unique())

# And now, we can transform each group using the logic defined above
orders_df['number_of_orders_per_cient'] = ( # Put the results into a new column that
is called 'number_of_orders_per_cient'
orders_df # Take the original dataframe
.groupby(['customer_id'])['order_id'] # Create a separate group for each
customer_id & select the order_id
.transform(count_number_of_orders)) # Apply the function to each group
separately

# Inspecting the results ...


print(orders_df)
# customer_id order_id item number_of_orders_per_cient
# 0 1 1 apples 2
# 1 1 1 chocolate 2
# 2 1 1 chocolate 2
# 3 1 2 coffee 2
# 4 1 2 coffee 2

GoalKicker.com – Python® Notes for Professionals 713

You might also like