Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

BUS 383 eBusiness Management

Project Dataset Documentation

About the Datasets


The provided dataset Project1922.zip consists of four files as follows:
1. AssoDectBau1922.csv Association Detection Dataset of the Bauhaus brand 1.11M records
2. AssoDectSal1922.csv Association Detection Dataset of the Salad brand 444k records
3. AssoDectSuD1922.csv Association Detection Dataset of the SuperDry brand 628k records
4. CustomerCluster1922.csv Cluster Analysis Dataset of Anonymous Customers 236k records
These are real sales and customer records for the three financial years: 1/Apr/2019 to 31/Mar/2022, with some
scrambling and anonymous treatments of sensitive business and private data. Some transactions like B2B sales are
removed, therefore, these datasets could not be compared directly with the company’s financial reports.

1. Association Detection Datasets


Datasets for association detections this year are divided into three major brands of the company for a more specific
view of customer buying behaviours. Accordingly, the following columns are relevant to the association detection:
Column Description
RearrangedOrderID A rearranged order number to mask the real receipt number, for ID in association detection
Class A general category of products, there are 10 classes in total:
1 = Tops; 2 = Pants; 3 = Shoes; 4 = Others; 5 = Skirts;
6 = Bag; 7 = Wallet; 8 = Watch; 9 = Belt; 12 = Small Goods
This list is also applicable to classes in the Cluster Analysis Dataset
ArticleEng A specific category of products in English, there are 93 article types in total

2. Cluster Analysis Dataset


In total there are 37 columns in the cluster analysis dataset for this year:
Column Description
CustomerID An anonymous ID of the customer
Gender Gender of the customer: M = Male; F = Female; U = Unspecified
BirthMonth The birth month of the customer, blank = Undisclosed
SumBauhaus Total amount spent under the Bauhaus brand 3 Years Total: 19-22
CountBauhaus Total items bought under the Bauhaus brand 3 Years Total: 19-22
SumSalad Total amount spent under the Salad brand 3 Years Total: 19-22
CountSalad Total items bought under the Salad brand 3 Years Total: 19-22
SumSuperDry Total amount spent under the SuperDry brand 3 Years Total: 19-22
CountSuperDry Total items bought under the SuperDry brand 3 Years Total: 19-22
SumClass01 ~ 12 Total amount spent for a specific class of product: 3 Years Total: 19-22
1 = Tops; 2 = Pants; 3 = Shoes; 4 = Others; 5 = Skirts;
6 = Bag; 7 = Wallet; 8 = Watch; 9 = Belt; 12 = Small Goods
CountClass01 ~ 12 Total items bought for a specific class of product: 3 Years Total: 19-22
1 = Tops; 2 = Pants; 3 = Shoes; 4 = Others; 5 = Skirts;
6 = Bag; 7 = Wallet; 8 = Watch; 9 = Belt; 12 = Small Goods
CountNoDiscountItems Total items bought without discount 3 Years Total: 19-22
CountDiscountedItems Total items bought with discount 3 Years Total: 19-22
Sum1920 ~ 2122 Total amount spent in a specific financial year (regardless of brand): 19-20; 20-21; 21-22
Count1920 ~ 2122 Total items bought in a specific financial year (regardless of brand): 19-20; 20-21; 21-22

BUS 383 eBusiness Management – Project Dataset Documentation

You might also like