Nanyang Business School BC3406 Business Analytics Consulting Data Hackathon

Nanyang Business School
BC3406 Business Analytics Consulting

Data Hackathon
Project Report
Group 02
Alvon Chua Kang Jin U1410705J

He RenYi Jonathan U1410458F
Tan Jun Lek, Jerry U1410759J
Table of Contents
Executive Summary ................................................................................................................... 1
1. Background Information ................................................................................................. 1
2. Bundling Analysis and Marketing Strategy (4Ps) .......................................................... 1
2.1. Product – Proposed Bundles ....................................................................................... 1
2.2. Promotions .................................................................................................................. 1
2.3. Pricing ......................................................................................................................... 2
2.4. Place ............................................................................................................................ 2
3. Flagship Store Positioning Analysis ............................................................................... 2
4. Forecasting Analysis ....................................................................................................... 3
4.1. Forecasting Models ..................................................................................................... 3
4.2. SKUs Recommended for Discontinuation .................................................................. 3
4.3. Sales Forecast for February 2017 to January 2018 ..................................................... 4
4.4. Channel Replenishment and HQ Ordering Models..................................................... 4
4.5. Top 10 Selling Products .............................................................................................. 4
Appendix A – Bundling Analysis and Marketing Strategies ..................................................... 6
A1. Additional Notes ......................................................................................................... 6
A2. Dataset ......................................................................................................................... 6
A2.1. Data Cleaning for Physical Stores: BC and FP .................................................... 6
A2.2. Mapping Explanation ........................................................................................... 8
A2.3. Data Cleaning for Online Store............................................................................ 9
A2.4. Data Consolidation............................................................................................... 9
A3. Methodology ............................................................................................................. 10
A3.1. Market Basket Analysis ..................................................................................... 10
A4. Analysis of Results .................................................................................................... 13
A4.1. Insights ............................................................................................................... 14
A4.2. Justifications for Proposed Bundles ................................................................... 14
A4.2.1 Bundle 1a........................................................................................................ 14
A4.2.2 Bundle 1b ....................................................................................................... 16
A4.2.3 Bundle 2 ......................................................................................................... 19
A4.2.4 Bundle 3 ......................................................................................................... 21
A4.2.5 Bundle 4 ......................................................................................................... 23
A4.3. Justifications for “Price” .................................................................................... 25
A4.4. Justifications for “Place”.................................................................................... 27
Appendix B – Flagship Store Positioning Analysis ................................................................. 33
B1. Dataset ....................................................................................................................... 33
B1.1. Data Reorganisation ........................................................................................... 33
B1.2. Data Cleaning..................................................................................................... 34
B2. Methodology ............................................................................................................. 34
B3. Analysis of Results ........................................................................................................ 37
B3.1. Option 1 – District 19 ........................................................................................ 39
B4. Recommendation – Punggol ......................................................................................... 44
Appendix C – Forecasting Analysis ........................................................................................ 45
C1. Methodology ............................................................................................................. 45
C2. Limitations ................................................................................................................ 46
C2.1. Clustering Analysis ............................................................................................ 46
C2.2. Multiple Linear Regression (MLR) ................................................................... 46
C3. Data Cleaning ............................................................................................................ 46
C4. Cluster Analysis ........................................................................................................ 48
C4.1. Clustering Results – Online Store ...................................................................... 48
C4.2. Clustering Results – Physical Stores.................................................................. 50
C5. Sales Forecasting ....................................................................................................... 51
C5.1. Model Building .................................................................................................. 51
C5.2. Multiple Linear Regression................................................................................ 52
C5.3. MLR Models for Online Store ........................................................................... 52
C5.4. MLR Models for Physical Stores ....................................................................... 53
C5.5. Model Evaluation and Selection ........................................................................ 53
C5.6. Model Results – Online Store ............................................................................ 54
C5.7. Model Results – Physical Stores ........................................................................ 54
C5.8. Summary of Selected Models ............................................................................ 55
C5.9. Identification of SKUs for Discontinuation ....................................................... 56
C5.10. Final Sales Forecast ........................................................................................ 57
C6. Channel Replenishment Model ................................................................................. 57
C7. HQ Stock Ordering Model ........................................................................................ 60
C8. Top 10 Products ........................................................................................................ 61
C9. Sample Codes used for Analysis ............................................................................... 61
Appendix D – Glossary............................................................................................................ 66
Appendix E – References......................................................................................................... 67
Executive Summary
1. Background Information
Paula’s Choice Singapore is a skincare company that aims to provide the best skincare and
makeup products to consumers. Their main target audience are millennials with 50% of
customers being aged 25-35 years old and 22% being aged 18-24 years old. Paula’s Choice
believes that that their brand exemplifies the essence of ‘masstige’ (mass prestige), where high
quality products are offered to consumers at affordable prices.
The three key requirements are:
1. Explore possible product bundles to meet customers’ needs via suitable marketing strategy
2. Identify a suitable location for Paula’s Choice’s flagship store in anticipation of the future
closure of Beauty Collective (BC) and Front Porch (FP) outlets.
3. Analysis of Paula’s Choice sales and stock keeping units (SKUs) so as to allow better
management of inventory to minimise operating costs and maximise revenue.
2. Bundling Analysis and Marketing Strategy (4Ps)
2.1. Product – Proposed Bundles
The following table shows the recommended bundles that Paula’s Choice should consider. For
detailed justifications, please refer to Appendix A4.
Bundle SKU Item Category
1a. [Skin Balancing 2010 SP 2% BHA Liquid - Regular SP Exfoliants
+ Skin Perfect (SP) 1150 Skin Balancing Cleanser - Regular Skin Balancing
Bundle] 1350 Skin Balancing Toner - Regular Skin Balancing
2010 SP 2% BHA Liquid - Regular SP Exfoliants
1b. [Resist + SP
7780 Resist Oily Toner - Regular Resist Oily
Bundle]
7830 Resist Oily Cleanser - Regular Resist Oily
7770 C15 Super Booster - Regular Resist Treatments
2. [Best-selling
5700 Resist Body 2% BHA - Regular Body, Lip & Hair
items Set]
2010 SP 2% BHA Liquid - Regular SP Exfoliants
7790 Resist Weekly 4% BHA – Regular Resist Oily
3. [High-Value
8010 Clinical 1% Retinol – Regular Clinical
Deal]
7820 Resist Pore Refining 2% BHA - Regular Resist Oily
1159 Skin Balancing Cleanser - Sample Skin Balancing
4. [Sample Set] 1359 Skin Balancing Toner - Sample Skin Balancing
3409 Skin Balancing Moisturizer - Sample Skin Balancing
2.2. Promotions
The five bundles of SKUs can be promoted throughout the year depending on the season with
focus on November, December and January since these three months are the months with the
most seasonal activities such as Thanks-giving, Black-Friday, Christmas, New Year and
Chinese New Year. Still, note that sample-sized Bundle 4 should be promoted even more in
this period since this is also the period when more people are travelling and for bundle 1a and
1b, they should not be promoted together due to possible cannibalization with each other. In
terms of communication, the message should emphasize on the complementary aspect of the
SKUs in the bundles.
Page | 1
2.3. Pricing
The recommended pricing for each bundle should be within the minimum price (50% margin
of all SKUs in the bundle) and the maximum price (sum of original price of all SKUs) for
profits optimization. Refer to Appendix A4.3. for detailed explanation.
2.4. Place
This table summarizes the recommended channels for the bundles. Refer to Appendix A4.4.
for the justifications.
Bundle SKU Proposed Channel
1a. [Skin Balancing + Skin Perfect (SP) Bundle] 2010, 1150, 1350 BC, FP
1b. [Resist + SP Bundle] 2010, 7780, 7830 Online, FP
2. [Best-selling items Set] 7770, 5700, 2010 Online, FP
3. [High-Value Deal] 7790, 8010, 7820 BC, Online
4. [Sample Set] 1159, 1359, 3409 Online
3. Flagship Store Positioning Analysis
Figure 3-1. Heatmap based on Total Sales
Page | 2
Figure 3-1. is a heatmap which indicates Paula’s Choice sales performance across Singapore.
The top five performing districts are 19, 23, 18, 22 and 15. District 19 has the highest zip code
count, number of transactions and total sales. The total sales in the district accounts for 13.39%
of all online sales transactions. The amount is almost double of District 23 and substantially
higher than all other districts.
Based on the analysis (refer to Appendix B. for detailed analysis), the recommendation for the
location of Paula’s Choice new flagship store will be District 19 – Punggol. The main deciding
factors include 1) large existing customer base, 2) population profile fits Paula’s Choice target
audience, 3) data shows high sales performance leading to great potential for high revenue and
4) extensive development plans for Punggol region.
To fit the brand image of mass prestige, it would be suitable for the Paula’s Choice to venture
into renting a retail space in a shopping mall. This would allow the company to reach out and
expose their brand name and products to more consumers. Based on the location, traffic flow
and mall image, the recommendation would be to open the flagship store in Waterway Point.
Conveniently located just above Punggol MRT/LRT Station and near the bus interchange,
Waterway Point is a popular shopping destination in the northwest. Despite being in the
heartlands, the mall contains established brands such as H&M and Uniqlo. Hence, the
atmosphere of the mall fits in with the affordable prestige image of Paula’s Choice.
Naturally, with the mall’s high footfall (Toh, 2016), rental in Waterway Point is slightly higher
than other malls in District 19. However, though the rental (~$16.00 - $40.00 psf) might be
higher, the revenue generating potential for Paula’s Choice would also be higher. Hence,
Waterway Point would be a suitable and feasible place for the new flagship store.
4. Forecasting Analysis
4.1. Forecasting Models
To come up with forecasting models for different SKUs, products for each channel (online and
physical stores were clustered based on their characteristics (refer to Appendix C4. for
elaboration), and forecasting models were generated and evaluated for each cluster (refer to
Appendix C5.). The below table summarizes the forecasting models selected for each cluster.
Channel / Cluster Model Selected Channel / Cluster Model Selected
Online / 1 5MLR Brick-and-mortar / 1 1-Year SMA
Online / 2 4MLR Brick-and-mortar / 2 5MLR
Online / 3 2SMA Brick-and-mortar / 3 1-Year SMA
Online / 4 1-Year SMA Brick-and-mortar / 4 5MLR
Online / 7 2SMA Brick-and-mortar / 7 2SMA
Brick-and-mortar / 8 3SMA
Figure 4-1. – Selected Forecasting Models
4.2. SKUs Recommended for Discontinuation
79 SKUs were recommended for discontinuation due to having zero sales in 2016 (Appendix
C3.), and an additional 55 SKUs were recommended for discontinuation based on low
forecasted demand (Appendix C5.9), making a total of 134 SKUs recommended for
discontinuation.
Page | 3
4.3. Sales Forecast for February 2017 to January 2018
Using the models selected for each cluster, and taking into account SKUs recommended for
discontinuation, a forecast of the following year’s sales was generated as per the figure below
(further discussion in Appendix C5.10).
Forecasted Feb 2017 to Jan 2018 Sales
+135.6%
$1,833,615
$778,402 -33.5%
$424,334
$282,121
Online Brick-and-Mortar
2016 2017 (Forecasted)
Figure 4-1. – Final Sales Forecast

With strong support, online sales is expected to surge, while forecasted poor performance of
brick-and-mortar channels provides strong impetus for the client to consider rebooting their
brick-and-mortar operations.
4.4. Channel Replenishment and HQ Ordering Models
Taking into account the constraints provided by the client, stock replenishment and ordering
models and guidelines were proposed, based on the concepts of lead time, demand uncertainty,
safety stock and reorder point. Refer to Appendix C6. and C7. for further explanation and
discussion.
4.5. Top 10 Selling Products
The top 10 selling products based on forecasted annual unit sales were identified. These are
products that will contribute heavily to revenue, and inventory for them should be managed by
the client with special attention. The following table summarizes the forecasted annual unit
sales, the safety stocks required, and reorder points and periods for each SKU.
Forecasted
Annual Sales Safety Reorder Point / Reorder Period
SKU (Units) Stock Order Quantity (Months)
2010 4646 153 1314 3
7770 1593 78 476 3
6000 1558 89 478 3
2017 1540 105 490 3
7820 1430 60 417 3
6007 1305 87 413 3
7980 1262 91 406 3
1350 1193 65 363 3
7780 1111 51 329 3
7760 1044 49 310 3
Page | 4
Appendix
Page | 5
Appendix A – Bundling Analysis and Marketing Strategies
A1. Additional Notes
1. Refer to glossary in Appendix D for explanation of the technical terms used in this section
onwards. The metrics used to determine the proposed SKU sales performance in this report
are:
a. ‘Yearly’ gross sales & quantity sold
i. Even though both gross sales and quantity sold are highly correlated, these
two metrics are used in our analysis to give our client a better picture of the
SKU sales performance
b. ‘Yearly’ gross sales contribution and quantity sold contribution
c. Average monthly gross sales & quantity sold are not considered because the relative
sales performance of each SKU will exactly be the same as that of ‘Yearly’
2. The dataset “December Orders - Online _ Store.csv” was not used because:
a. Firstly, it does not provide gross sales data. As such, fair comparison cannot be
drawn when looking at both gross sales and quantity sold metrics.
b. Secondly, there are no information on transaction ID. Market basket analysis cannot
be conducted with this dataset.
c. Thirdly, there are no significant changes to the quantity performance of the SKUs
in the proposed bundle even after including the additional quantity data from the
dataset (See Appendix for more details)
d. As such, the timeframe of ‘yearly’ in this analysis does not the include the additional
December data from the dataset, “December Orders - Online _ Store.csv”
3. The proposed bundle 1a and 1b are part of the currently offered Advanced Kit. However,
these bundles have a stronger focus on the complementary functions of the three SKUs as
compared to an advanced kit where there are about five SKUs.
4. The support for the market basket analysis of all the proposed bundles are very low in terms
of the transaction count, however, given the extreme granular nature of the SKU dataset,
having a transaction count of even 5 is arguably significant. Still, to ensure that there will
be enough support or to compensate the support for the proposed bundle, most of the
proposed SKU has very high demand indicated by their respective sales performance.
5. The key metrics for the proposed SKUs were compared against the scenario if the
additional dataset "December Orders - Online _ Store.csv" were used. This is to confirm
that the additional dataset "December Orders - Online _ Store.csv" do not have any
significant effects on the relative performance of the SKUs.
A2. Dataset
A2.1. Data Cleaning for Physical Stores: BC and FP
1. Dataset used: “FULL YEAR ITEMS SALE BC.csv” and “FULL YEAR ITEMS SALE
FP.csv”
Page | 6
2. Data Cleaning:
a. Standardize “Date” column due to inconsistent format from raw dataset
i. Convert date and time from PT to SGT
b. Delete “Time Zone” column due to identical values for all rows
c. Delete “Tax” column due to identical values for all rows
d. Remove “S$” from “Gross Sales”, “Discounts” and “Net Sales” columns
i. To transform text to numeric (accounting) format
e. Isolate refunded transactions
i. Move into new sheet, “Refund”
ii. Refunded transactions are not applicable in bundling analysis
f. For items without assigned SKU
i. Assign item = “Shine Stopper” with price point name = “Sample” to have
SKU = “3601”
ii. Isolate item = "Custom Amount"
1. Move into new sheet, “Blanks”
2. Items are not assign a new SKU because customized transactions are
not useful or relevant in bundling analysis
iii. For FULL YEAR ITEMS SALE BC data, additional steps are taken:
1. 1 row for item, “Calm Cleanser” is isolated to “Blanks” sheet
a. This item is also not assign any SKU because it is not
possible to map out the SKU and 1 item is not going to be
significant in affecting the subsequent analysis
2. 9 rows for item, “Skin Balancing Simple Kit (4 items)” are mapped
and assigned SKU = “4090”
3. 4 rows for item, “Skin Balancing Super Kit (7 items)” are mapped
and assign SKU = “4600”
4. Assignment of SKU for both items, “Skin Balancing Super Kit (7
items)” and “Skin Balancing Simple Kit (4 items)” are done by
mapping the corresponding items from the dataset “Line Item Orders
– Online.csv”. The mapping is based on three factors:
a. Price
b. Item/Product Name
c. Number of items in the kit
d. Refer to the following A2.2 dedicated to explaining in
detailed, the step by step illustration of how the mapping is
done.
3. Resulting dataset is renamed as “FULL YEAR ITEMS SALE BC_Cleaned” and “FULL
YEAR ITEMS SALE FP_Cleaned” respectively.
Page | 7
A2.2. Mapping Explanation
Mapping “Skin Balancing Super Kit (7 items)” and “Skin Balancing Simple Kit (4 items)”
1. Originally, no SKU is assigned for the items, “Skin Balancing Simple Kit (4 items)” and
“Skin Balancing Super Kit (7 items)” in the Dataset: “FULL YEAR ITEMS SALE BC” as
shown:
2. Using the dataset, “Line Item Orders – Online.csv” (see following screenshot) as a
reference, the items can be mapped easily due to the resemblance between the names and
the price.
3. In addition, based on the information from Paula’s Choice online store, it is shown that
“Skin Balancing Essential Kit” has 4 items and “Skin Balancing Advanced Kit” has 7 items.
Hence, “Skin Balancing Essential Kit” is mapped to “Skin Balancing Simple Kit (4 items)”
and “Skin Balancing Advanced Kit” is mapped to “Skin Balancing Super Kit (7 items)”.
Page | 8
4. Combining this information with the information on item/product name and price, the item,
“Skin Balancing Simple Kit (4 items)” is mapped and assigned SKU = “4090” and “Skin
Balancing Super Kit (7 items)” is mapped and assigned SKU = “4600”.
A2.3. Data Cleaning for Online Store
1. Dataset used: “Line Item Orders – Online.csv”
2. Data cleaning:
a. Delete redundant columns:
i. “Order Status” due to identical values for all rows
ii. “Region” since it is not needed for analysis
iii. “City” since it is not needed for analysis
iv. “Manufacturer” since this column is empty
v. “Qty. Invoiced” is similar to “Qty. Ordered”
vi. “Qty. Shipped” is ignored as stated by client
vii. “Qty refunded” is not needed
viii. “Tax”, “Tax Invoice”, “Refunded to Total Margin” columns are all empty
ix. “Total Incl. Tax” is similar to “Total”
x. “Invoiced Incl. Tax” is similar to “Invoiced”
b. Split “Order Date” into “Order Date” and “Order Time” for easier data analysis
3. Resulting dataset is renamed as “Line Item Orders - Online_Cleaned.csv”
A2.4. Data Consolidation
1. Consolidate the three cleaned datasets, “FULL YEAR ITEMS SALE BC_Cleaned”,
“FULL YEAR ITEMS SALE FP_Cleaned” and “Line Item Orders - Online_Cleaned.csv”
into a centralized dataset or database, named “BC + FP + Online.csv”
2. Due to the different number of variables used and the naming used between the datasets,
further data processing has to be done. (Refer to the figure below)
a. ‘Channel’ column was created to differentiate the channels of the transactions:
online, BC and FP stores.
Page | 9
b. To standardize newly mapped items from dataset, “FULL YEAR ITEMS SALE
BC_Cleaned”, the name of the items are changed as follows:
i. “Skin Balancing Simple Kit (4 items)”with assigned SKU = “4090” is
renamed as “Skin Balancing Essential Kit”
ii. “Skin Balancing Super Kit (7 items)” with assigned SKU = “4600” is
renamed as “Skin Balancing Advanced Kit”
A3. Methodology
A3.1. Market Basket Analysis
1. Further data processing is needed to ensure that the consolidated dataset is compatible with
SAS Enterprise Miner for Market Basket Analysis.
a. Standardize and recode “Transaction ID” to purely numeric code, “Transaction
ID_Coded” as follows:
b. Remove rows with 0 gross sales, 0 discounts and 0 net sales as they are not needed
in market basket analysis
c. Remove rows with ‘Discontinued’ category as they are not needed in market basket
analysis
d. Remove rows that have SKUs considered to be a bundle itself, (i.e. at least two
SKUs within a SKU) because they should not be considered for association and it
is not fair to compare a single SKU with a ‘bundled’ SKU. This can be done by
removing rows containing:
i. “kit” or “set”
ii. SKU 4980 – Power Couple: Clinical 1% Retinol + Resist Oil Booster
iii. SKU 4930 – Power Couple: Resist C15 + Skin Balancing Serum
iv. SKU 4920 – Power Couple: Resist C15 + Skin Recovery Serum
v. SKU 4910 – Power Couple: C15 + SA Serum
vi. SKU 4890 – Power Couple: Resist C15 + Ultra-light Serum
vii. SKU 4830 – Power Couple: Resist C15 + Pure Radiance
viii. SKU 4820 – Power Couple: Resist C15 + Wrinkle Repair Retinol
2. This dataset is then saved as “For SAS_3.csv”
3. To run the dataset in SAS Enterprise Miner, the dataset “For SAS_3.csv” has to be
converted into SAS format, “A3.sas7bdat” using base SAS with the following code:
Page | 10
4. After conversion, the dataset, “A3.sas7bdat” is then created in SAS Enterprise Miner.
5. The “Market Basket” node is then added and linked to the dataset node “A3” in order to
run the analysis.
6. Before running the analysis, the following constraints for the market basket node were
set:
Refer to Figure A3-1. in the next page for the explanation of the constraints
Page | 11
Constraints What is it for? What is set? Why?
To specify the maximum

Client requirement states
Maximum number of items (SKUs) to be
3 that a bundle is made up of
Items considered in an association,
three SKUs.
(bundle).
To specify the minimum Given the extreme granular
Minimum probability required to nature of the SKU dataset,
Confidence generate a rule, 25% having a probability of
Level 25% is arguably considered
reasonable.
To specify the minimum A minimum lift of 1
usefulness of a rule. indicates that the rules
Minimum provided by the analysis
1
Lift will at least be neutral or
more useful than a random
guess.
Specify if the type of support Given the extreme granular
should be in terms of count or nature of the SKU dataset,
percentage. it makes more sense to
view support as count of
For example, if the rule XY transactions rather than
Support
occurs 5 times in all 1000 Count percentage of transactions.
Type
transactions, then the support (This is because percentage
count is 5 and the support of transactions will be
percent is 0.5%. extremely low across all
the generated rules, mostly
<1%)
Given the extreme granular
To specify the number of
nature of the SKU dataset,
Support transactions in which a
5 having a minimum of 5
Count particular rule occurs out of all
support would arguably be
the transactions.
reasonable.
Figure A3-1. – Explanation of Constraints
7. The Market Basket Analysis was run and the subsequent result is shown in the rule window.
Sample of the result is shown below.
8. The results from this analysis is not conclusive nor sufficient in recommending what SKU
to place in a bundle. As such, in addition to using SAS Enterprise Miner, Microsoft Excel
was also used to descriptively analyze the sales performance of the SKUs using the dataset,
“For SAS_3.csv”.
Page | 12
A4. Analysis of Results
Before diving in depth to the results and the discussions of the proposed bundles, it will be
helpful to refer to the following figure (Figure A4-1.) summarizing the results and discussions
for the proposed bundles.
Note: With respect to the rules discussed below, in general, the rule: SKU A & SKU B  SKU
C, means that purchasing SKU A and SKU B together will likely lead to the purchase of SKU
C, given that SKU A and B are bought.
Bundle SKU Analysis/Justification

2010 - Since 2010 is top-selling SKU and 1350 is also selling well while 1150
is performing poorly, both 2010 and 1350 can help to increase the sales of
1150 1150.
1a
- Confidence of 2010 & 1150 1350 is reasonably high at 30%.
1350 - In fact, the three SKUs are highly complementary to each other as they
are used in the order: Cleanser→ Toner→ Exfoliants.
- Similar to 1a, 2010 might help to boost 7780 and 7830 as they are not
2010
selling enough. Still, both 7780 and 7830 have potential to drive sales as
they are actually key drivers of sales in their category, Resist Oily
1b 7780
- Resist Oily is also the second best-selling category.
- Confidence of 2010 & 7780 7830 is reasonably satisfactory at 25%
7830
- Highly complementary to each other: Cleanser→ Toner→ Exfoliants.
7770 - SKU 7770 and 2010 ranked first and second respectively in sales
- 5700 is best-selling body exfoliant lotion.
2 5700 - The rule 7770 & 5700 2010 has the highest confidence of 45% among
all the associations rules related to SKU 2010
2010 - Complementary as 5700 is used for body while 2010 is used for face.
7790 - SKU 7820 and 8010 have good sales performance
- 7820 and 8010 can help SKU 7790 which is not selling well
3
8010 - The rule 7790 & 8010 7820 has highest confidence of 83.33% among
all the associations rules
7820 - 8010 is a premium product, help to increase sales revenue
- Sample size bundle to promote less popular Skin Balancing Category
1159 - These SKUs are sample of the original pack sized SKUs which are
actually key drivers of the category. This means that they have higher
1359 potential to sell.
4
- The rule 1159 & 1359 3409 has highest confidence of 62.5% among
all the associations rules related to sample sized SKUs
3409 - Complementary between 1159 and 1359: Toner and cleanser.
- Promote bundle 1a since both bundles belong to the same category
Figure A4-1. – Summary of Results
Page | 13
A4.1. Insights
In addition to the justifications mentioned, the proposed bundles also apply an interesting
insight drawn from the analysis, which is: customers mostly purchased items of the same
pack size when purchasing multiple items. As such, note that all the proposed SKUs in each
of the bundles have the same size, i.e., they are all either regular in size or sample in size. In
fact, from our analysis, 99% of all baskets with at least 3 SKUs, contain SKUs with the same
pack size.
No. of Rules
1%
SKU all same size

Not all SKU same size
99%
Figure A4-2. – No. of Rules

A4.2. Justifications for Proposed Bundles
A4.2.1 Bundle 1a
Note: The percentile used in this section is inclusive of the ‘100th’ percentile.
SKU Bundle 1a - [Skin Balancing + Skin Perfect Bundle]
SKU Item Category
2010 SP 2% BHA Liquid - Regular Skin Perfecting Exfoliants
1150 Skin Balancing Cleanser - Regular Skin Balancing
1350 Skin Balancing Toner - Regular Skin Balancing
Justifications
1. SKU 2010 is the top selling SKU
SKU 2010 ranked 1st, ‘100th’ percentile for the year 2016 in terms of both gross
revenue and quantity. This makes it more likely for customers to purchase the
bundle.
Page | 14
2. SKU 1350 is also contributing significantly
SKU 1350 also contributes significantly, 2.12% to total gross sales and 2.26% to total
quantity for year 2016 (excl. Dec).
3. SKU 1150 has much lower contribution to total gross sales and quantity sold
as compared to SKU 2010 and 1350
SKU 1150 contributes only 0.87% to total gross sales and 0.93% to total quantity
sold for year 2016. Still, it has potential to increase since it is within the top 80th
percentile. Thus, bundling SKU 1150 with SKU 2010 and SKU 1350 will leverage
on SKU 2010 and 1350’s good sales performance as a halo effect to drive more
demand for SKU 1150.
Page | 15
4. The rule: SKU 2010 & 1150  1350 has a reasonable confidence of about 30%
Given that this analysis is done at the most granular SKU level, a confidence of
30% is reasonable. It also has the highest support count of 12 among all the rules
involving SKU 2010 with min. 3 items.
5. All three SKUs are highly complementary to each other.

SKU 2010 need to be used after cleanser and toner. Based on the user instruction
for SKU 2010 from Paula’s Choice online store website, the products should be
used in the order: Cleanser→ Toner→ Exfoliants. None of the kit/sets provided has
this much focus on the complementary combination. If complementary effects were
to fully be taken into account, it also means that the confidence of 30% should
actually be higher than it currently is.
A4.2.2 Bundle 1b
SKU Bundle 1b - [Resist + Skin Perfect Bundle]
SKU Item Category
7780 Resist Oily Toner - Regular Resist Oily
7830 Resist Oily Cleanser - Regular Resist Oily
Justifications
1. SKU 2010 is the top selling SKU
Similarly, SKU 2010, being the top selling SKU for the year 2016 in terms of both
gross revenue and quantity sold, will help to improve sales performance for the
other two SKUs - 7780 and 7830 which are selling significantly lesser than SKU
2010.
Page | 16
2. SKU 7780 and 7830 are contributing lesser than SKU 2010 but have high
potential to contribute more.
Both SKUs are within top 90th percentile and have high potential in contributing to
total gross sales and quantity.
- SKU 7780 contributes 1.88% to total gross sales and 1.81% to total quantity
- SKU 7830 contributes 1.47% to total gross sales and 1.48% to total quantity
In terms of both gross sales and quantity sold at the category level, SKU 7780 and
7830 belong to the 2nd largest category contributor - Resist Oily. Resist Oily
category contributes 16.04% to the total category gross sales and 15.25% to the
total category quantity sold.
Page | 17
iv. Within the resist oily category, SKU 7780 and 7830 are also one of
the top few key drivers for the resist oily’s category gross sales and
quantity sold.
3. The association between SKU 2010, 7780 and 7830 has a reasonable confidence
level of about 25%, given that this analysis is done at the most granular SKU
level.
Page | 18
If complementary effects were to be taken into account, it also means that the
confidence of 25% should actually be higher than it currently is.
4. Just like bundle 1a, all the three SKUs are highly complementary to each
other. SKU 2010 need to be used after cleanser and toner. They should be used
in the order: Cleanser→ Toner→ Exfoliants
A4.2.3 Bundle 2
SKU Bundle 2 - [Best-selling items Set]
SKU Item Category
7770 C15 Super Booster - Regular Resist Treatments
5700 Resist Body 2% BHA - Regular Body, Lip & Hair
Justifications
1. Amongst all SKUs, SKU 2010 and 7700 ranked first and second respectively in
terms of both gross sales and quantity sold
SKU 2010 and 7700 can provide halo effect to further drive sales performance for
less performing SKU 5700.
2. SKU 5700 is actually one of the best-selling body exfoliant lotion
Since SKU5700 is one of the best-selling body lotion, within the top 90th percentile,
all three SKUs can be considered to be the best-selling products in a bundle.
Page | 19
3. 2nd highest confidence of about 45% amongst all association rules related to
SKU 2010
a. Note: Highest confidence 46% rule is not selected because the SKUs in the
rule make no business sense. In that rule, 1150→ 1560→ 2010, it suggests
selling a bundle of cleanser, exfoliants and moisturizer. However, based on
the user instruction of SKU 2010, which is an exfoliant, SKU 2010 must be
used after a cleanser and toner. As such, for the bundle to be effective, both
cleanser and a toner must be present to effectively complement SKU 2010.
In this case, SKU 1150 is a cleanser but unfortunately, SKU 1560 is a
moisturizer instead of a toner. Hence, complementary effects will be sub-
optimal for this bundle.
Possible explanation of high confidence of 45%:

Bundling 2nd best-selling SKU 7700 (in terms of sales & qty) with 5700 first will make
customers more likely to purchase this bundle. Afterwhich, due to the complementary effect
between SKU 5700 and SKU 2010, coupled with the fact that SKU 2010 is also the best-
selling SKU, it is very likely that a given customer will purchase SKU 2010 after purchasing
SKU 7700 and 5700.
4. SKU 5700 can also complement well with 2010 since 5700 is used for body
while 2010 is used for face.
This also probably explain why the rule says that 7700 & 5700 will likely lead to the
purchase of 2010.
Page | 20
A4.2.4 Bundle 3
SKU Bundle 3 - [High-Value Deal]
SKU Item Category Price (SGD)
7790 Resist Weekly 4% BHA - Regular Resist Oily $55
8010 Clinical 1% Retinol - Regular Clinical $88
7820 Resist Pore Refining 2% BHA - Resist Oily $48

Regular
Justifications
1. Highest confidence of 83.33% amongst all the SKU associations
Note: Though both SKU 7790 and 7820 might conflict with each other as
suggested by the FAQ in the online store, it should be noted that the conflict is
addressed by the different frequency of use between these two SKUs. Based on the
SKU description, SKU 7790 is used weekly while SKU 7820 is used daily.
Therefore, SKU 7790 has a large complementary role to play in leading to the
purchase of SKU 7820 despite the fact that SKU 7790 perform similar functions as
SKU 7820. Coupled with the fact that SKU 8010 is performing well (see below),
the high confidence of 83.33% is well-justified.
2. SKU 7820 and 8010 are both performing well in terms of gross sales and
reasonably, quantity sold
Both SKUs performed well. However, SKU 8010 fall short of SKU 7820 in terms
of quantity sold. This is probably due to SKU 8010 having a more premium price at
$88 as compared to SKU 7820 being priced at $48.
Page | 21
As such, for a more premium SKU, which contributes 1.04% of total quantity sold,
it is reasonably considered very significant.
It is also interesting to note that SKU 8010, Clinical 1% Retinol - Regular, is

suggested by the model despite having another very similar SKU 7870 - Resist 1%
Retinol Booster, which is also part of the Resist Collection. Contrary to our
intuition that Resist Collection SKUs are best bundled together, it seems that there
are other factors other than being in the same collection, that are more significant in
influencing customers’ decisions to purchase a bundle. Therefore, with the help
from the online store, a comparison between these two SKUs were made as
follows:
From this comparison, we can see that SKU 8010 (Left) has a slightly higher rating than
SKU 7870 (Right) but more importantly, for the ‘Options’, SKU 8010 size of 1 oz is
doubled that of SKU 7870 size of 0.5 oz. Therefore, despite being pricier, SKU 8010 will
Page | 22
be perceived as having more value by the customers browsing the products. This probably
explains why the data shows that SKU 8010 is recommended instead of SKU 7870.
3. Helping SKU 7790

Unlike the high sales performance from the previous two SKUs, SKU 7790 is
actually not contributing as much to both total sales and quantity, it contributes only
1.06% to total gross sales and only 0.71% to total quantity sold. Still, just like the
bundles proposed earlier, SKU 7790 has high potential to contribute more since it is
within the top 80th percentile. The presence of SKU 8010 and SKU 7820, being
better performing SKUs, might provide halo effect to help drive sales for SKU
7790.
4. High-value transaction
SKU 8010 is a premium product, this will drive total sales revenue to a larger
extent.
A4.2.5 Bundle 4
SKU Bundle 4 - [Sample Kit]
SKU Item Category
1159 Skin Balancing Cleanser - Sample Skin Balancing
1359 Skin Balancing Toner - Sample Skin Balancing
3409 Skin Balancing Moisturizer - Sample Skin Balancing
Justifications
1. Sample bundle to help promote less popular Skin Balancing category
At the category level, Skin Balancing category are the fifth contributor to total
gross sales and fourth to quantity sold, suggesting that this category is not very
popular among customers, but has great potential to increase sales. As such, by
having sample sized SKUs for this category, it allows customers to try on this new
Page | 23
category first and if customers like this new category, they will likely purchase this
collection in future.
Page | 24
2. Original pack size SKU 1150, 1151, 1350 and 3400 are the key driver SKUs for
the Skin Balancing category. Respective sample SKUs have higher potential to
sell.
SKU 1159, 1359 and 3409 are sampling original SKU 1150, 1151, 1350 and 3400,
which are already contributing significantly within the Skin Balancing category.
This suggests that the three sample SKUs stand a higher chance of getting sold as
compared to the other samples in the same category.
3. High confidence of 62.5% among all sample sized rules.

Though there are other rules with higher or similar confidence, however, those
SKUs in the rules do not have as good a complementary effect as the proposed
SKUs.
4. Highly complementary SKUs

Similar to bundle 1a, SKU 1159 and 1359 are highly complementary to each other
5. Further promote Bundle 1a

Bundle 1a contains the original pack size SKU 1150 and 1350 for SKU 1159 and
1359 respectively. Therefore, this bundle will actually help to promote bundle 1a by
letting customers try the sample version first.
A4.3. Justifications for “Price”
Page | 25
1. The following chart, Figure A4-3. shows the price range of the proposed bundles, i.e. the
minimum price, the maximum price and the proposed discounted price at 15%, 17% and
20% of each respective bundle.
Figure A4-3. – Proposed Bundle Price Range
2. The maximum price is the sum of the price of each of the three SKUs in the bundle
3. The minimum price is 50% of the maximum price as we are assuming 50% margin
4. The discounted price at 17% is derived from taking a simple average of all the discounts
given in all the past transactions.
a. Arbitrarily, a range of discount is proposed using 17% as the central benchmark to
derive 15% as the lower bound and 20% as the upper bound.
5. Therefore, the recommended price of each bundle should:
a. First, lie within the minimum price and maximum price range in order to optimize
sales profit, and
b. Second, depending on Paula’s Choice willingness to give discount and the amount
of consumer surplus, it is up to Paula’s Choice to decide if it wants to follow the
proposed discounted price range of 15 – 20%.
Page | 26
A4.4. Justifications for “Place”
1. The dataset “For SAS_3.csv” is used
2. Remove rows with month = December due to incomplete data.
3. Since gross sales is highly correlated to quantity, only gross sales will be in the subsequent
analysis.
4. This stacked bar chart shows the proposed SKU gross sales breakdown by channel,
supported by the subsequent figure, Figure A4-5., showing the absolute figures.
Gross Sales Breakdown by Channel

100%
90%
30.25%
80%
43.75% 45.01% 43.42% 40.39% 45.66%
51.55% 49.75% 48.79% 46.84% 49.32% 46.49%
54.12%
70%
12.28%
60%
11.96% Online
50% 9.91% 6.58% 5.99%
13.28% 16.73% 7.18%
FP
9.28% 12.32% 11.78% 10.75%
40% BC
18.82%
30% 57.47%
46.33% 47.65% 47.76% 45.98% 47.51%
20% 41.70% 39.93%
39.18% 37.93% 39.85% 39.43%
27.06%
10%
0%
1150 1159 1350 1359 2010 3409 5700 7770 7780 7790 7820 7830 8010
Figure A4-4. – Gross Sales Breakdown by Channel
Figure A4-5. – Absolute figures for Gross Sales Breakdown by Channel
Page | 27
5. This information is then translated at the bundle level as shown:
SKU gross sales breakdown by Dominant Dominant Bundle
Bundle SKU Channel Channel Channel (At least 2
BC FP Online (+/- 2%) Dominant Channel)
1a. [Skin 2010 41.70% 13.28% 45.01% Online

Balancing +
1150 57.47% 12.28% 30.25% BC BC
Skin Perfect
(SP) Bundle] 1350 46.33% 9.91% 43.75% BC
2010 41.70% 13.28% 45.01% Online

1b. [Resist +
7780 47.76% 6.58% 45.66% BC/Online Online
SP Bundle]
7830 39.93% 10.75% 49.32% Online
7770 47.65% 11.96% 40.39% BC

2. [Best-selling
5700 39.85% 16.73% 43.42% Online Online
items Set]
2010 41.70% 13.28% 45.01% Online
7790 39.43% 11.78% 48.79% Online

3. [High-Value
8010 47.51% 5.99% 46.49% BC/Online BC, Online
Deal]
7820 45.98% 7.18% 46.84% BC/Online
1159 39.18% 9.28% 51.55% Online

4. [Sample
1359 27.06% 18.82% 54.12% Online Online
Set]
3409 37.93% 12.32% 49.75% Online
Figure A4-6. – Bundle Gross Sales Breakdown by Channel

6. From this channel analysis at the bundle level, we would recommend the dominant bundle
channel as the appropriate channel for the respective bundle. Due to the poor performance
of FP store, it is arguably important to also promote that store with bundle promotions. As
such, analysis at the channel level is done as follows:
Page | 28
100%
5.41%
10.21% 10.17%
8010
5.53%
90%
4.89% 6.15% 7830
8.43%
80%
12.85% 4.35% 7820
13.32%
4.31%
70% 3.47% 7790

4.37%
7.45%
7780
7.25%
60% 22.69%
7770
50% 21.52%
18.56%
5700
6.72%
0.16%
40% 3409
3.81% 4.23%
0.12% 0.16%
2010
30%
31.17%
1359
23.29%
25.58%
20%
1350
0.04%
0.11% 1159
10% 0.07%
8.14%
7.32%
7.82%
0.06% 0.06%
0.08% 1150
4.16% 3.73% 2.23%
0%
BC FP Online
Figure A4-7. – Channel Gross Sales Breakdown by SKU
Page | 29
Channel gross sales breakdown by SKU (S$)
SKU BC FP Online
8010 9768 1232 9557.8
7830 4680 1260 5780.8
7820 12288 1920 12517.8
7790 3315 990 4102.32
7780 7128 982 6813.8
7770 20582 5165 17445.22
5700 3645 1530 3971
3409 115.5 37.5 151.5
2010 22274 7095 24043.22
1359 34.5 24 69
1350 7786 1666 7352.8
1159 57 13.5 75
1150 3978 850 2093.7
Figure A4-8. – Absolute figures for Gross Sales Breakdown by SKU
Page | 30
7. Based on Figure A4-9., the most appropriate bundles for FP store are bundle 1a, 1b and 2
since they contribute, on average, significantly more than bundle 3 and 4.
SKU % Gross sales Average % bundle gross sales

Contribution to Channel contribution to channel
Bundle SKU BC FP Online BC FP Online
1a. [Skin 2010 23.29% 31.17% 25.58%

Balancing +
1150 4.16% 3.73% 2.23% 11.86% 14.07% 11.88%
Skin Perfect
(SP) Bundle] 1350 8.14% 7.32% 7.82%
2010 23.29% 31.17% 25.58%
1b. [Resist + SP
7780 7.45% 4.31% 7.25% 11.88% 13.67% 13.00%
Bundle]
7830 4.89% 5.53% 6.15%
7770 21.52% 22.69% 18.56%
2. [Best-selling
5700 3.81% 6.72% 4.23% 16.21% 20.19% 16.12%
items Set]
2010 23.29% 31.17% 25.58%
7790 3.47% 4.35% 4.37%

3. [High-Value
8010 10.21% 5.41% 10.17% 8.84% 6.06% 9.29%
Deal]
7820 12.85% 8.43% 13.32%
1159 0.06% 0.06% 0.08%
4. [Sample Set] 1359 0.04% 0.11% 0.07% 0.07% 0.11% 0.10%
3409 0.12% 0.16% 0.16%
Figure A4-9. - SKU Contribution to Channel
Page | 31
8. To summarize the appropriate channel for the proposed bundles, refer to Figure A4-10.
Bundle SKU Proposed Channel
2010
1a. [Skin Balancing + Skin Perfect (SP) Bundle] 1150 BC, FP

1350
2010
1b. [Resist + SP Bundle] 7780 FP, Online,
7830
7770
2. [Best-selling items Set] 5700 FP, Online
2010
7790
3. [High-Value Deal] 8010 BC, Online
7820
1159
4. [Sample Set] 1359 Online
3409
Figure A4-10. – Proposed Channels for Bundles
- End of Appendix A -
Page | 32
Appendix B – Flagship Store Positioning Analysis
Currently, Paula’s Choice has two physical retail stores in Singapore, Beauty Collective in
Novena Square 2 and Front Porch in Tanjong Pagar Plaza. With the progress and growth of the
company, Paula’s Choice would like to consolidate their current retail operations and open a
brand new flagship store in a new location.
B1. Dataset
The dataset used in the analysis is the “Line Item Orders - Online” which is the sales transaction
data for Paula’s Choice online store. The online store data is being used as it contains
customer’s address and zip code which allows for the analysis of customer’s location.
Note: Due to the limitations of the data provided, an assumption is made that the behaviour of
the online customers would be similar to retail stores customers, i.e. customers who purchase
Paula’s Choice products online will also buy them physically at a retail store if it is located
near them.
In order to proceed with the analysis, the dataset has to be further reorganised and cleaned.
Refer to Appendix A2.3. for the first round of data cleaning for “Line Item Orders – Online”.
B1.1. Data Reorganisation
As the dataset is based on line item orders (Figure B1-1.), it would not be meaningful to use it
as it is. Hence, the number of transaction and total sales transacted dataset is summed up and
aggregated at the zip code level. Figure B1-2. shows an example of the reorganised dataset.
This would allow comparison and analysis of the sales data by location. Upon consolidation,
the total number of unique zip code (instances) is 1,699.
Figure B1-1. – Original Cleaned Dataset
Figure B1-2. – Reorganised Dataset
Page | 33
Variable Definition/Explanation
Zip Code An assigned number to indicate a specific location in Singapore
Sector Postal sector as defined by the Urban Redevelopment Authority (URA).
District Postal district as defined by the URA. (Based on postal sector)
No. of Txn Number of transactions (orders) that occurred in that zip code
Total Sales Total sales amount transacted based in that zip code
Figure B1-3. – Data Definitions
B1.2. Data Cleaning

Upon closer observation, it is discovered that there was still a small amount of errors in the
dataset that were undetected in the previous round of data cleaning.
1. There were some discrepancies between zip codes and addresses. Some zip codes or
addresses were either keyed in incorrectly by customers or do not exist. To rectify these
errors, investigation on the entries was made and subsequently corrected or deleted.
2. There are some zip codes that are currently not reflected in Google Maps. As such, there
were some difficulties in using software that base their location functions on Google Maps.
To work around this issue, the affected zip codes were adjusted with zip codes in the same
postal sector and district. Hence, the accuracy and reliability of the analysis were not
impacted by change.
After cleaning, the total number of instances was reduced from 1,699 to 1,688.
B2. Methodology
The methodology behind the quantitative analysis is based on the evaluation of the sales
performance of the 28 districts. Figure B2-1. displays the details of the 28 districts of Singapore.
The sales performance will be analysed using three metrics, 1) number of transaction, 2) total
sales and 3) average sales. Figure B2-2. shows the sales performance for all the twenty-eight
districts. A strong sales performance in a particular district would imply that Paula’s Choice
has a strong customer base there. Therefore, due to the higher and existing demand, it would
be suitable for Paula’s Choice to open a flagship store in that district.
Figure B2-3. displays the correlation coefficient of variables. Based on the correlation
coefficient, it supports the intuitive relationship between the number of zip codes, transaction
and total sales, i.e. a higher number of zip code would result in higher number of transaction
resulting in higher sales.
Figure B2-4. shows the statistical summary of the metrics used to evaluate the district
performance. Only twenty-seven districts were analysed as District 24 (Lim Chu Kang, Tengah)
has no sales records and is currently an unsuitable location for the new store. Thus, it was
omitted.
Page | 34
Postal
Postal Sector General Location
District
1 01, 02, 03, 04, 05, 06 Raffles Place, Cecil, Marina, People's Park
2 07, 08 Anson, Tanjong Pagar
3 14, 15, 16 Queenstown, Tiong Bahru
4 09, 10 Telok Blangah, Harbourfront
5 11, 12, 13 Pasir Panjang, Hong Leong Garden, Clementi New Town
6 17 High Street, Beach Road (part)
7 18, 19 Middle Road, Golden Mile
8 20, 21 Little India
9 22, 23 Orchard, Cairnhill, River Valley
10 24, 25, 26, 27 Ardmore, Bukit Timah, Holland Road, Tanglin
11 28, 29, 30 Watten Estate, Novena, Thomson
12 31, 32, 33 Balestier, Toa Payoh, Serangoon
13 34, 35, 36, 37 Macpherson, Braddell
14 38, 39, 40, 41 Geylang, Eunos
15 42, 43, 44, 45 Katong, Joo Chiat, Amber Road
16 46, 47, 48 Bedok, Upper East Coast, Eastwood, Kew Drive
17 49, 50, 81 Loyang, Changi
18 51, 52 Simei, Tampines, Pasir Ris
19 53, 54, 55, 82 Serangoon Garden, Hougang, Punggol
20 56, 57 Bishan, Ang Mo Kio
21 58, 59 Upper Bukit Timah, Clementi Park, Ulu Pandan
22 60, 61, 62, 63, 64 Jurong
23 65, 66, 67, 68 Hillview, Dairy Farm, Bukit Panjang, Choa Chu Kang
24 69, 70, 71 Lim Chu Kang, Tengah
25 72, 73 Kranji, Woodgrove, Woodlands
26 77, 78 Upper Thomson, Springleaf
27 75, 76 Yishun, Sembawang
28 79, 80 Seletar
Figure B2-1. – Postal Districts of Singapore (URA)
Page | 35
Zip Code Total Average
District No. of Txn
Count Sales ($) Sales ($)
1 54 133 15709.41 118.12
2 11 24 2035.75 84.82
3 51 99 13542.55 136.79
4 31 70 8915.23 127.36
5 89 161 18116.78 112.53
6 4 4 693.33 173.33
7 16 25 2810.46 112.42
8 12 27 2773.68 102.73
9 56 99 11833.39 119.53
10 80 134 17835.90 133.10
11 27 45 6545.18 145.45
12 62 106 12208.52 115.17
13 26 37 3821.39 103.28
14 60 117 11714.36 100.12
15 75 162 20150.32 124.38
16 74 163 17804.96 109.23
17 12 21 2713.76 129.23
18 142 258 26279.21 101.86
19 223 451 47247.82 104.76
20 80 139 15108.05 108.69
21 44 96 11398.91 118.74
22 135 263 25184.55 95.76
23 158 279 27370.62 98.10
24 0 0 0 0
25 50 115 10112.53 87.94
26 13 28 2530.80 90.39
27 76 131 13829.49 105.57
28 27 42 4665.92 111.09
Figure B2-2. – District Sales Performance
Zip Code
No. of Txn Total Sales Average Sales
Count
Zip Code Count 1
No. of Txn 0.990665386 1
Total Sales 0.9795191 0.990396968 1
Average Sales -0.007439569 -0.014677583 0.050189827 1
Figure B2-3. – Correlation Coefficient of Variables
Page | 36
Zip Code
No. of Txn Total Sales Average Sales
Count
Mean 62.52 119.59 13072.33 113.72
Median 54 106 11833.39 111.09
Mode 12 99 #N/A #N/A
Standard Deviation 51.74892817 100.3586872 10308.16668 19.0722347
Skewness 1.475975039 1.598893007 1.447734681 1.194203885
Range 219 447 46554.49 88.51
Minimum 4 4 693.33 84.82
Maximum 223 451 47247.82 173.33
Sum 1688 3229 352952.87 3070.50
Count 27 27 27 27
Figure B2-4. – Statistical Summary of Variables
B3. Analysis of Results
Based on total sales, the top five performing districts are 19, 23, 18, 22 and 15 (Figure B3-1. –
Group A). District 19 has the highest zip code count, number of transactions and total sales.
The total sales in the district accounts for 13.39% of all online sales transactions. The amount
is almost double of District 23 and substantially higher than all other districts.
Figure B3-1. – Group A: Top Performers based on Total Sales
Judging by average sales, the top five performers are district 6, 11, 3, 10 and 17 ((Figure B3-2.
– Group B). However, upon further observation, it seems that average sales might not be the
best metric to evaluate sales performance.
Figure B3-2. – Group B: Top Performers based on Average Sales
Firstly, a high average sales figure could be misleading as with the case of District 6. Although
it has the highest average sales of $173.33, the total number of transaction is only 4. It would
not be feasible and profitable to open a store there when the total sales is very low.
Page | 37
Secondly, total sales could be more an important metric as it shows the revenue generating
potential and attractiveness of the district. This information could be potentially used to
determine if the new store could breakeven and earn a profit.
Lastly, Districts 19, 23, 18, 22 and 15 have higher zip code counts and number of transactions.
This implies that the customer base in those districts are higher than others. As the positive
correlation of number of customers and sales have been established earlier, having a store in
those districts might mean achieving higher sales.
The districts in Group A are more of the heartlands or residential estates in Singapore. This is
strategically in line with Paula’s Choice mass prestige strategy whereby the firm aims to deliver
quality products to the masses. Hence, by locating the store in districts in Group A, the firm
would be able to reach out to more consumers and further increase their customer base.
Furthermore, by comparing the difference between average sales among the two groups of
districts, it seems that the difference is around the average price of a regular product. For
example, the difference between District 19 and 11’s average sales is $40.69.
Figure B3-3. visualises the findings on a heatmap. The heatmap indicates the top 5 performing
districts by total sales. The appearance of a hotspot in District 1 is due to the concentration of
sales transaction in a small number of areas. Looking at its total sales, District 1 does not bring
in a higher amount of sales despite that. Hence, it is not considered for the flagship store.
Figure B3-3. – Heatmap based on Total Sales
Therefore, by using total sales as the deciding factor, Districts 19, 23 and 22 are shortlisted for
further analysis.
Page | 38
B3.1. Option 1 – District 19
The first option is District 19. It is the general location of Serangoon Garden, Hougang and
Punggol, which are situated in the northeast of Singapore.
Pros
 District 19 has highest total sales and the largest customer base among all districts.
Hence, there has a higher probability of achieving high sales and success if the store is
located there.
 It has the largest number of Singapore residents between the ages of 15-34 years old
among the three shortlisted districts. (Figure B3-4. to B3-6.)
o Total: 192,520
o Males: 93,780
o Females: 98,750
 The district has a relatively young population with only 5-10% of residents aged 65 years
and over (Figure B3-7.). Hence, the population there fits the target audience (18-35 years
old) that Paula’s Choice is focusing. This could also explain the high sales figure in the
district.
 The district also continually attracts younger population with its developments. For
example, the new Safra Club in Punggol (Chin, 2014) and Compass One mall in Seng
Kang (Baker, 2016) are designed to cater to young families. Furthermore, more public
housing are being planned and built in the district which are attractive to young couples
(CNA, 2016).
 Punggol is a rising and developing estate with increasing infrastructure and amenities. It
is announced that Punggol North is to be developed into Singapore’s first “Enterprise
District” (Ng, 2017). This development would help create more offices and working
spaces in the area. Thus, it might generate more retail traffic to the district.
 Serviced mainly by the North East MRT Line. Serangoon is also an interchange
connecting both North East and Circle Line. Furthermore, Punggol might be a terminus
for the future Cross Island Line, increasing accessibility to the area (CAN, 2016).
Cons
 It is not a centralized location and consumers from other regions of Singapore might find
it inconvenient to travel to the northeast.
 Punggol is a relative young town. Though there are many future developments planned,
it will still require some time for them to be completed.
Page | 39
The second option is District 23. It is the general location of Hillview, Dairy Farm, Bukit
Panjang, Choa Chu Kang, which are situated in the northwest of Singapore.
Pros
 The Integrated Transport Hub in Bukit Panjang is planned to be opened by this year
(CNA, 2015). With an improved transport hub providing accessibility and convenience,
more people might be willing to visit the area. Hence, this could potentially bring new
retail traffic to Bukit Panjang.
 District 23 has the second largest number of Singapore residents between the ages of 15-
34 years old among the three shortlisted districts. (Figure B3-4. to B3-6.)
o Total: 136,290
o Males: 68,340
o Females: 67,950
 Development plans for Tengah District, located beside Choa Chu Kang, is being
announced. Touted to be as big as Bishan and expected to have 55,000 homes (Yeo,
2016), the new estate could provide a huge number of new customers and consumers.
Thus, it might be lucrative to establish a flagship store in this district.
 Serviced by the Downtown Line and North South Line.
Cons
it inconvenient to travel to the northwest.
 The size of the retail scene is not as comparable to District 19 and 22
o District 19: Nex, Waterway Point, CompassOne
o District 22: JEM, Westgate, JCube, Jurong Point 1&2, Big Box
o District 23: Lot 1, Bukit Panjang Plaza, Hillion Mall, West Mall
 Compared to District 19 and 22, there is not much major future developments that would
be occurring in the district.
Page | 40
The third option is District 22. It is the general location of Jurong, which is located in the
western part of Singapore. Although District 22 is ranked 4th in terms of total sales, it is selected
over District 18 (Simei, Tampines, Pasir Ris) due to the future development plans of the district.
Pros
 The Jurong Lake District has been planned to be transformed into the second Central
Business District of Singapore (Ng and Chua, 2016). Furthermore, plans for
infrastructural and amenities developments in the district have also been announced
(Lim, 2016). Thus, the possible business prospect these future developments could bring
might be tremendous.
 Jurong East is the future terminus for the Singapore-Malaysia High Speed Rail (Heng,
2016). Hence, there might be increased retail traffic with both local and foreign
commuters. As Paula’s Choice do have customers in neighboring countries, there could
be potential benefits in setting up the flagship store in Jurong.
 The development of Tengah District, which is situated above Jurong District, could also
possibly be a potential source of customers for the store.
 District 22 is mainly serviced by the East West Line. Jurong East is also an interchange
connecting East West and North South Line.
Cons
it inconvenient to travel to the west.
 District 22 has the lowest number of Singapore residents between the ages of 15-34 years
old among the three shortlisted districts. (Figure B3-4. to B3-6.)
o Total: 100,480
o Males: 50,240
o Females: 50,240
Page | 41
Figure B3-4. – Total Singapore Residents aged 15-34 years old (SingStat, 2016)
Figure B3-5. – Male Singapore Residents aged 15-34 years old (SingStat, 2016)
Page | 42
Figure B3-6. – Female Singapore Residents aged 15-34 years old (SingStat, 2016)
Figure B3-7. – Proportion of Resident Population Aged 65 Years. June 2016 (SingStat, 2016)
Page | 43
B4. Recommendation – Punggol
After evaluating the three options, the recommendation for the location of Paula’s Choice new
flagship store will be in District 19 due to the strong potential of the area (Figure B4-1.).
Population Profile
Large Existing Potential for High Extensive Future
fits Target
Customer Base Revenue Developments
Audience
Figure B4-1. – Deciding Factors for the Location

To fit the brand image of mass prestige, it would be suitable for the Paula’s Choice to venture
into renting a retail space in a shopping mall. This would allow the company to reach out and
expose their brand name and products to more consumers. Based on the location, traffic flow
and mall image, the recommendation would be to open the flagship store in Waterway Point,
Punggol.
Conveniently located just above Punggol MRT/LRT Station and near the bus interchange,
Waterway Point is a popular shopping destination in the northwest. Within three months of
its opening, the mall has already attracted more than six million visitors (Lim, 2016). Despite
being in the heartlands, the mall contains established brands such as H&M and Uniqlo. It also
has the biggest cinema among all heartland malls (Yip, 2016).
Naturally, with the mall’s high footfall (Toh, 2016), rental in Waterway Point is slightly
higher than other malls in District 19 (Figure B4-2.). However, though the rental might be
higher, the revenue generating potential for Paula’s Choice would be higher too due to the
factors mentioned earlier. Hence, Waterway Point would be a suitable and feasible place for
the new flagship store to be located.
Shopping Mall Rental
Waterway Point ~$18.00 - $40.00 psf
Compass One ~$16.00 - $18.00 psf
Rivervale Mall ~$16.00 - $18.00 psf
Hougang Mall ~$7.00 psf
Figure B4-2. – Rental Costs as at 24 March 2017 (CommercialGuru.com, 2017)
- End of Appendix B -
Page | 44
Appendix C – Forecasting Analysis
With a portfolio comprising over 200 SKUs, Paula’s Choice needs to be able to forecast future
sales based on historical data, in order to make good decisions regarding inventory
management. Having clear projections will allow the company to know when to replenish
stocks at different distribution channels, when to make orders for products, as well as which
products to continue or discontinue selling.
Since the online and brick-and-mortar channels perform differently and are managed separately,
each channel was analysed separately. Furthermore, given the small contribution of the FP
outlet to total brick-and-mortar sales (~10%), the sales of both outlets were combined into a
single dataset for analysis.
C1. Methodology
Figure C1-1. – 6 Step Approach

A data-centric 6-step approach was taken to identify the most suitable forecasting methods for
different products in the portfolio. The steps are as follows:
1. Initial observation and cleaning of data, in order to create the datasets that would be used
for the main analysis
2. Due to the sheer number of SKUs carried by Paula’s Choice, it was impractical to analyse
each product separately. Instead, clustering analysis was conducted on the cleaned datasets
in order to group products into clusters with distinct characteristics
3. Thereafter, for each cluster, we built and assessed the suitability of various forecasting
methods. Specifically, each cluster was analysed using six forecasting models, namely 2-
month simple moving average (2SMA), 3-month simple moving average (3SMA),
exponential smoothing (ES), and multiple linear regression based on the past 3, 4 and 5
months sales (3MLR, 4MLR, 5MLR)
4. A python script was used to calculate the root-mean-square-error (RMSE) of each model,
which was used to evaluate the most suitable model for each cluster
5. Sales for each SKU was projected for the next 11 months (i.e. up to December 2017) using
the models selected in the previous step
6. Lastly, based on the sales forecasts, the following tasks were completed –
a. Determine SKUs to discontinue
b. Calculate replenishment frequencies, order quantities, safety stock requirements
based on constraints given by the client
Page | 45
C2. Limitations
C2.1. Clustering Analysis
Due to the sheer number of SKUs carried by Paula’s choice, it was prohibitively time and
resource-consuming to analyse each product separately. Hence, clustering analysis was done
to group products into clusters with similar characteristics for forecasting model generation.
This may reduce the accuracy of the final forecasting models when applied to individual
products.
C2.2. Multiple Linear Regression (MLR)
For each product, our group was online provided with 12 to 13 months of sales data. It was
decided that the dataset was too small to be split into training and testing datasets for the
purposes of MLR. Hence, all data points were used for MLR model training, and the
performance of MLR models was evaluated based on the RMSE of the model when applied to
the training dataset.
C3. Data Cleaning
The datasets given by the client were processed to generate datasets summarizing the sales
performance (in units) of the 275 SKUs over the past 12 (for retail) to 13 months (for online).
No missing data was found.
The total sales, average monthly sales, standard deviation (SD), coefficient of variance (CV),
and simple linear regression statistics (INTERCEPT, LINEST) were generated for each
product, in order to provide some general measures of the performance of each SKU:
1. Total sales and average monthly sales were indicators of general popularity
2. SD and CV were measures of sales volatility
3. Simple linear regression intercept was a possible measure of general demand, while
gradient was a possible measure of the rate of sales growth
There were significant numbers of SKUs that had no (0) sales within the given dataset. These
SKUs were hence removed from the dataset – it is also recommended that these SKUs be
discontinued, if they have not already been. A total of 79 SKUs with no sales were removed
at this point.
Page | 46
Jan 2016 to Jan 2016 to
SKU SKU
Jan 2017 Sales Jan 2017 Sales
2051 0 7970 0
2060 0 7979 0
2069 0 8020 0
2100 0 8029 0
2107 0 9107 0
2109 0 9117 0
2110 0 9127 0
2117 0 9137 0
2119 0 9147 0
2120 0 9157 0
2129 0 9167 0
2130 0 9177 0
2137 0 9187 0
2139 0 9908 0
2140 0 9940 0
2147 0 91501 0
2149 0 91502 0
2150 0 91511 0
2155 0 91512 0
2159 0 91521 0
2160 0 91522 0
2167 0 91531 0
2169 0 91532 0
2769 0 91541 0
3110 0 91542 0
3119 0 91551 0
3140 0 91552 0
3149 0 91561 0
3707 0 91562 0
6002 0 91571 0
6009 0 91572 0
7677 0 91580 0
7799 0 91587 0
7920 0 91589 0
7927 0 91641 0
7929 0 91651 0
7930 0 91661 0
7937 0 91671 0
7939 0 92063 0
7967 0
Figure C3-1. – List of SKUs removed due to zero sales in 2016
Page | 47
C4. Cluster Analysis
Cluster analysis was conducted using SAS through a four-step approach:
1. Variable clustering (proc varclus) was conducted to identify key variables to be used for
clustering, which were then standardized to account for the different scales of the variables
2. Hierarchical clustering (proc cluster) was conducted to determine the number of clusters,
based on the dendrogram, as well as CCC, pseudo-F and pseudo-T statistics
3. K-means clustering (proc fastclus) was then conducted based on the number of clusters
identified in the previous step
4. Lastly, each cluster was qualitatively analysed based on cluster characteristics
A summary of the clustering results is shown below:
Online Retail
Variables used for
clustering (based on Average, CV Average, LINEST, CV
step 1)
Number of clusters
7 8
(based on step 2)
Figure C4-1. – Summary of Clustering Results
C4.1. Clustering Results – Online Store
Variable Clustering
Page | 48
Hierarchical Clustering
7 Clusters
Summary of Clusters
Cluster Count of SKU Avg Mnthly Sales Avg CV Sales Volatility
1 1 99.23 0.57 High Low
2 29 3.47 2.08 Low High
3 61 6.65 1.30 Low High
4 6 2.18 3.26 Very Low Very High
5 23 23.03 0.69 Medium Low
6 5 42.18 0.67 Medium Low
7 71 7.75 0.66 Low Low
Overall 196 9.74 1.15
Page | 49
C4.2. Clustering Results – Physical Stores
Variable Clustering
Hierarchical Clustering
8 Clusters
Page | 50
Summary of Clusters
Count of Avg Mnthly Avg Avg
Cluster SKU Sales CV LINEST Sales Volatility Growth
1 48 1.18 1.33 0.05 Low High Low
2 1 62.18 0.19 1.99 High Low High
Very
3 18 0.42 2.38 -0.01 Low Very High Very Low
4 18 8.29 0.62 -0.63 Medium Low Negative
5 1 10.83 0.85 2.21 Medium Medium High
6 6 23.35 0.34 -0.50 High Very Low Negative
7 15 8.54 0.52 0.48 Medium Low Low
8 79 4.57 0.59 -0.06 Low Low Very Low
Overall 186 4.92 0.94 -0.03
C5. Sales Forecasting

C5.1. Model Building
Beyond the typical forecasting methods such as 2SMA, 3SMA and ES, our group also used
MLR as a forecasting tool. Specifically, for each cluster, we built MLR models that predicted
the sales in any particular month based on the past 3, 4 and 5 months sales.
Figure C5-1. – Forecasting Models
Page | 51
C5.2. Multiple Linear Regression
To build the MLR models, we generated suitable training datasets from the original data using
VBA script and generated the models using proc reg in SAS.
Figure C5-2. – Sample MLR Training Dataset

C5.3. MLR Models for Online Store
Coefficients
Cluster Model Intercept m1 m2 m3 m4 m5 R^2 Adj R^2
3MLR 19.58385 0.66666 -0.76213 1.09402 0.8099 0.7149
1 4MLR 31.08823 0.00815 0.75115 -0.93848 1.09101 0.8105 0.6209
5MLR 49.76952 0.5159 -0.02174 0.29874 -0.69366 0.84688 0.8946 0.6312
3MLR 3.78635 0.02451 -0.03767 0.27658 0.0182 0.0079
2 4MLR 2.15003 1.33043 -0.16889 -0.06605 0.25221 0.3574 0.3474
5MLR 2.29972 0.93038 1.21554 -0.20938 -0.11117 0.21428 0.3662 0.3522
3MLR 2.82864 0.16834 0.51185 0.32853 0.2019 0.1979
3 4MLR 2.27089 0.58894 0.02598 0.39712 0.29359 0.2542 0.2487
5MLR 2.20838 0.5354 0.48601 -0.05239 0.38556 0.25823 0.2541 0.2463
3MLR 3.07176 -0.1756 -0.10971 -0.03396 0.0024 -0.0511
4 4MLR 3.52825 -0.15677 -0.19443 -0.12478 -0.06043 0.0044 -0.0769
5MLR 1.31604 27.83916 0.05014 -2.0954 -0.16564 0.09528 0.8272 0.8066
3MLR 8.82733 -0.05576 0.53157 0.36594 0.2854 0.2759
5 4MLR 7.7207 0.70147 -0.38977 0.41867 0.30306 0.3968 0.3848
5MLR 9.78585 0.33409 0.52556 -0.40355 0.43694 0.17284 0.3771 0.3596
3MLR 16.94745 0.10306 0.25895 0.46696 0.2347 0.1848
6 4MLR 17.15713 0.89002 -0.35981 0.15046 0.28902 0.4133 0.3547
5MLR 20.68515 0.64236 0.5724 -0.43256 0.24059 0.04884 0.4336 0.3503
3MLR 2.31325 0.09215 0.31917 0.47227 0.3643 0.3616
7 4MLR 1.73266 0.44873 -0.05222 0.2087 0.42784 0.4249 0.4213
5MLR 1.67713 0.36797 0.34954 -0.14243 0.21626 0.36373 0.454 0.4492
Page | 52
C5.4. MLR Models for Physical Stores
Coefficients
Cluster Model Intercept m1 m2 m3 m4 m5 R^2 Adj R^2
3MLR 0.9119 0.04217 0.11034 0.13539 0.0414 0.0347
1 4MLR 0.83977 0.07518 0.03663 0.08565 0.14647 0.0489 0.0389
5MLR 0.82491 -0.0137 0.09429 0.03736 0.06166 0.15216 0.0504 0.036
3MLR 141.691 0.25547 -0.0271 -1.43 0.6903 0.5045
2 4MLR 119.247 -0.6618 0.4112 0.42417 -1.0643 0.8246 0.5908
5MLR 102.566 -0.3438 -0.4509 0.94383 0.46224 -1.2709 0.8605 0.1631
3MLR 0.45378 -0.0278 -0.0312 -0.0464 0.0037 -0.0152
3 4MLR 0.47776 -0.0036 -0.0345 -0.0174 -0.0668 0.0054 -0.0232
5MLR 0.5098 0.05056 0.00285 -0.0353 -0.0416 -0.0712 0.01 -0.0312
3MLR 1.05392 0.36863 0.18199 0.1373 0.3831 0.3714
4 4MLR 0.19253 0.15944 0.35502 0.19597 0.06158 0.4379 0.4217
5MLR -0.1653 0.19213 0.07833 0.27365 0.10177 0.11956 0.4787 0.457
3MLR 6.83998 -0.2668 -0.1857 0.88777 0.5658 0.3052
5 4MLR 10.2918 -0.3562 0.03784 -0.1732 0.67378 0.5324 -0.091
5MLR 16.7356 -2.4536 1.62043 0.15607 -1.7231 1.40069 0.9641 0.7845
3MLR 1.4608 0.1749 0.44065 0.2063 0.3257 0.2852
6 4MLR -0.137 -0.1679 0.26919 0.45087 0.32427 0.3636 0.3044
5MLR -1.1899 0.30047 -0.1229 0.12705 0.29859 0.29051 0.3361 0.2439
3MLR 2.44354 0.14476 0.34843 0.2184 0.4158 0.4024
7 4MLR 2.46195 0.00569 0.12928 0.37117 0.216 0.3962 0.3752
5MLR 2.77609 0.07898 -0.0321 0.11686 0.33436 0.20338 0.3573 0.3248
3MLR 0.65406 0.25169 0.28268 0.23739 0.451 0.4487
8 4MLR 0.35684 0.17987 0.19596 0.25806 0.17864 0.4745 0.4712
5MLR 0.37506 0.12628 0.10958 0.21547 0.22589 0.11985 0.4729 0.4681
C5.5. Model Evaluation and Selection

To evaluate the performance of the models, a python script (example code included in the final
section of this appendix) was used to generate the RMSE of each model when applied to the
historical dataset.
Figure C5-3. – Sample Python Output
Page | 53
For ES, the simple moving average of January and February was taken as the forecast for March,
and the ES model was applied to April and the following months. Alpha for ES models was
determined by the script – the alpha that produced the lowest RMSE for each cluster was
selected.
The summary of model results, as well as the models selected for each cluster are discussed
below.
C5.6. Model Results – Online Store
Characteristics RMSE
Cluster Sales Volatility 2SMA 3SMA ES Alpha ES 3MLR 4MLR 5MLR
1 High Low 39.51 43.13 1 33.02 21.98 21.70 13.84
2 Low High 11.80 12.26 0.25 11.78 11.71 9.92 10.36
3 Low High 11.81 12.44 0.45 12.39 11.98 12.03 12.53
4 Very
Low Very High 14.07 14.71 0.15 14.33 14.59 15.06 6.64
5 Medium Low 15.87 17.20 0.55 16.70 15.45 14.16 14.33
6 Medium Low 27.40 29.45 0.6 28.19 26.36 23.19 22.72
7 Low Low 5.65 5.91 0.45 5.68 5.55 5.41 5.37
Legend: Yellow Box = Best performing / selected model
Red Box = Best performing but model not selected
Figure C5-4. – Model Results – Online Store
For online sales, we found that MLR produced the least forecasting error as compared to
traditional forecasting methods when applied to clusters that had relatively low sales volatility
as indicated by CV.
Cluster 4 proved difficult to forecast, as SKUs in the segment generally had very low sales and
very high volatility. Since none of the models were applicable, we decided to forecast the
cluster based on the 1-year simple moving average, in order to achieve realistic projections.
For Cluster 7 where MLR provided no significant advantage over SMA, we decided to select
the simpler method, which in this case was SMA.
C5.7. Model Results – Physical Stores
Characteristics RMSE
Cluster Sales Volatility Growth 2SMA 3SMA ES Alpha ES 3MLR 4MLR 5MLR
1 Very High Low
Low 2.21 2.15 0.15 2.02 1.93 1.95 1.96
2 High Low High 20.23 19.69 0.2 20.58 9.68 7.61 7.22
3 Very Very Very
Low High High 1.33 1.28 0.25 1.22 1.08 1.12 1.18
4 Medium Low Low 5.35 4.86 0.35 4.9352 4.319 4.19 3.64
5 Medium Medium Medium 8.82 10.40 1 7.8174 5.9843 5.75 1.31
6 High Very Low Very
Low 8.26 8.19 0.45 8.26 7.44 7.43 7.54
7 Medium Low Low 4.95 4.95 0.25 4.81 4.67 4.88 5.06
8 Low Low Low 3.1261 2.9727 0.2 2.8505 2.8358 2.7919 2.7229
Legend: Yellow Box = Best performing / selected model
Red Box = Best performing but model not selected
Figure C5-5. – Model Results – Physical Stores
For brick-and-mortar sales, we similarly found that MLR generally performed well for clusters
that had low volatility.
Page | 54
Once again, for clusters such as 1 and 3 that had very low sales, none of the models could
produce realistic forecasts. Hence, for both clusters, we decided to forecast sales based on the
past 1-year simple moving average.
For clusters 7 and 8, MLR provided no significant advantage over SMA, therefore SMA was
selected.
C5.8. Summary of Selected Models
Below is a summary of all clusters and the models selected for each cluster.
Model Intercept Coefficient Coefficient Coefficient Coefficient Coefficient
Cluster Selected m1 m2 m3 m4 m5
Online / 1 5MLR 49.76952 0.5159 -0.02174 0.29874 -0.69366 0.84688
Online / 2 4MLR 2.15003 1.33043 -0.16889 -0.06605 0.25221
Online / 3 2SMA
1-Year
Online / 4 SMA
Online / 5 4MLR 7.7207 0.70147 -0.38977 0.41867 0.30306
Online / 6 5MLR 20.68515 0.64236 0.5724 -0.43256 0.24059 0.04884
Online / 7 2SMA
Brick-and- 1-Year
mortar / 1 SMA
Brick-and-
mortar / 2 5MLR 102.566 -0.34383 -0.45088 0.94383 0.46224 -1.2709
Brick-and- 1-Year
mortar / 3 SMA
Brick-and-
mortar / 4 5MLR -0.16532 0.19213 0.07833 0.27365 0.10177 0.11956
Brick-and-
mortar / 5 5MLR 16.73564 -2.45361 1.62043 0.15607 -1.72305 1.40069
Brick-and-
mortar / 6 4MLR -0.13701 -0.16794 0.26919 0.45087 0.32427
Brick-and-
mortar / 7 2SMA
Brick-and-
mortar / 8 3SMA
Page | 55
C5.9. Identification of SKUs for Discontinuation
To identify further SKUs to be discontinued, we compared the total annual sales forecasted
against the minimum order quantity (MOQ) as provided by the client. We recommend for
SKUs that have MOQ greater than the total annual sales forecast to be discontinued – the
minimal sales generated by these products do not justify the risk of holding such large amounts
of inventory caused by MOQ.
A total of 55 SKUs fell into this category and are identified in the table below.
Forecasted Feb Forecasted Feb
SKU 2017 to Jan 2018 MOQ SKU 2017 to Jan 2018 MOQ
Sales Sales
1259 36 250 7687 69 150
1359 136 250 7689 47 250
1469 140 250 7717 145 150
1569 197 250 7719 105 250
1720 41 48 7729 54 250
1869 105 250 7769 218 250
3109 124 250 7789 230 250
3257 16 150 7809 100 250
3259 73 250 7819 56 250
3709 61 250 7847 61 150
5009 236 250 7867 143 150
5579 17 250 7880 35 48
5709 209 250 8509 239 250
7607 37 150 8707 98 150
7609 56 250 8709 204 250
7617 58 200 8719 196 250
7619 28 250 8729 31 250
7629 152 250 8749 130 250
7639 48 250 9129 54 250
7647 56 150 9139 232 250
7649 162 250 3357 1 150
7659 118 250 3609 104 250
7667 85 150 5569 83 250
7669 21 250 5809 87 250
7679 34 250 7969 15 250
7680 48 48 8717 6 150
8737 101 130 8727 95 150
8747 3 130
Figure C5-6. – SKUs Identified for Discontinuation due to high MOQ
Page | 56
C5.10. Final Sales Forecast
Using the models selected in the earlier sections, we generated a 12-month sales forecast for
all applicable SKUs (excluding SKUs recommended for discontinuation), from February 2017
to January 2018 as shown in the table below.
Forecasted Feb 2017 to Jan 2018 Sales
+135.6%
$1,833,615
$778,402
-33.5%
$424,334
$282,121
Online Brick-and-Mortar
2016 2017 (Forecasted)
Figure C5-7. – Final Sales Forecast

Overall, we expect sales from the online channel to surge greatly in 2017, growing by 135% to
reach approximately $1,800,000. This is in alignment with the strong growth of the channel
observed in 2016, and is well-supported by other factors such as the renewed and relaunched
Paula’s Choice online webstore, the regional reach of the online store, as well as the strong
growth of e-commerce in the region.
On the other hand, brick-and-mortar sales is projected to decline by 33.5% to reach
approximately $282,000. This is also in alignment with the declining sales experienced by the
retail outlets throughout 2016, and provides a strong impetus for Paula’s Choice to reboot their
brick-and-mortar operations by closing down both existing outlets and opening a new flagship
store in the areas proposed earlier in the report.
C6. Channel Replenishment Model
Our proposed channel replenishment model is as follows. Since deliveries are made on monthly
basis (i.e. lead time is 1 month), the stock delivered per month should be able to meet the
forecasted demand for that particular month. However, to account for the uncertain nature of
product sales, there should also be a certain level of safety stock held at the channel. Therefore,
the minimum monthly inventory level for each SKU at each channel is as follows:
Figure C6-1. – Formula for Monthly Minimum Inventory Level

Safety stock is derived from two main factors, namely the historical standard deviation of
monthly sales for each product, as well as the service level required.
Page | 57
Service level refers to the client’s commitment to meeting customer demand. For example, a
90% service level means that the client will be able to meet customer demand 90% of the time.
For the purposes of this project, we seek to achieve a service level of 90% which corresponds
to a z-score of 1.28.
The formula for safety stock is as follows:
Figure C6-2. – Formula for Safety Stock

Safety stock requirements for each SKU by channel are summarized in the table below:
Safety Stock Required Safety Stock Required

SKU Brick-and-Mortar Online SKU Brick-and-Mortar Online
1000 3 4 7697 3 9
1001 4 4 7699 3 17
1050 4 7 7710 4 13
1051 3 4 7720 6 9
1059 4 17 7730 9 16
1100 3 4 7739 3 9
1101 4 6 7740 11 26
1150 7 16 7747 3 8
1151 6 11 7749 3 16
1159 6 27 7760 6 25
1250 6 8 7770 13 34
1350 12 26 7779 4 26
1460 4 6 7780 11 20
1560 6 12 7790 4 13
1860 7 9 7800 7 16
1900 7 11 7807 3 12
1907 3 9 7810 6 9
1909 6 16 7820 11 25
2010 17 73 7827 4 27
2017 7 56 7829 4 15
2019 9 32 7830 8 21
2040 9 15 7837 3 8
2049 8 29 7839 3 45
2050 6 9 7840 3 6
2059 4 12 7850 4 7
2320 6 8 7857 2 7
2330 2 3 7860 7 13
2560 4 7 7870 4 12
2750 3 7 7900 7 13
2759 7 22 7910 4 7
2760 4 11 7980 13 41
Page | 58
2800 4 9 8010 7 15
2809 4 22 8017 3 11
3100 6 7 8500 4 6
3250 4 7 8510 3 4
3350 4 11 8519 2 8
3359 2 17 8520 4 3
3400 9 21 8529 3 8
3409 8 24 8700 4 8
3500 6 8 8710 4 4
3600 4 9 8720 4 4
3700 4 8 8730 3 6
5000 4 6 8740 6 7
5200 3 3 8750 2 3
5209 2 11 8760 2 2
5500 3 6 9100 4 7
5560 4 4 9109 2 22
5570 3 3 9110 6 3
5700 6 16 9119 3 17
5800 3 9 9120 3 3
5900 6 12 9130 3 4
5909 2 22 9140 3 6
6000 11 43 9149 4 12
6007 3 49 9150 3 8
6100 6 21 9159 3 22
6107 3 29 9160 3 6
6110 6 27 9169 3 15
6117 4 22 9170 3 7
6130 7 11 9179 3 22
6137 3 15 9180 4 7
6200 7 25 9189 3 16
6207 4 32 9945 2 3
6210 9 24 90530 2 3
6217 4 22 92062 2 2
6240 6 13 92075 2 2
7600 6 9 92076 3 3
7610 6 17 7960 N/A 17
7620 3 6 7660 4 11
7630 3 6 7670 6 11
7640 6 9 7680 4 4
7650 4 11 7690 8 20
Figure C6-3. – Safety Stock Requirements for each SKU
Page | 59
All in all, replenishment amount for each SKU will be as follows, and will have to be calculated
by the client on a monthly basis depending on actual product sales:
Figure C6-4. – Formula for Monthly Minimum Inventory Level

C7. HQ Stock Ordering Model
Our HQ stock reordering model is based on the combined forecasts of both online and brick-
and-mortar stores, and the constraints provided by the client (i.e. order lead time = 3 months,
value of monthly orders cannot exceed 50% of trailing 3-months sales, minimum order
quantities MOQ).
Firstly, reorder points for each SKU were determined. Reorder point refers to the level of
inventory at which a new order should be initiated. It takes into the account sales expected
during the lead time, as well as the safety stock required.
The formula for reorder point is as follows:
Figure C7-1. – Formula for Reorder Point

To minimize inventory costs, we recommend for Paula’s Choice to operate on a constant
replenishment model (i.e. avoid holding more than necessary stock). In this case, reorder point
will be equivalent to order quantity.
However, because MOQ constraints exist, for SKUs where reorder points are smaller than
MOQ, there will be no choice but to take the MOQ as the effective reorder point:
Figure C7-2. – Formula for Effective Reorder Point

Based on the effective reorder point, we derive the reorder period (period between orders)
based on the total forecasted annual sales according to the formula below:
Figure C7-3. – Formula for Reorder Period

All in all, for each SKU, effective reorder point and reorder periods were generated based on
forecasted sales and client constraints. These figures will help the client to decide (for each
SKU) how much to order, and how often orders should be made.
Page | 60
C8. Top 10 Products
Based on our forecasts, the top 10 selling products (in terms of unit) were identified. The
following table summarizes the forecasted annual unit sales, the safety stock required, and
reorder points and period for each SKU.
Forecasted Historical
Annual Sales Standard Safety Reorder Point / Reorder Period
SKU (Units) Deviation Stock Order Quantity (Months)
2010 4646 69 153 1314 3
7770 1593 35 78 476 3
6000 1558 40 89 478 3
2017 1540 47 105 490 3
7820 1430 27 60 417 3
6007 1305 39 87 413 3
7980 1262 41 91 406 3
1350 1193 29 65 363 3
7780 1111 23 51 329 3
7760 1044 22 49 310 3
Figure C8-1. – Top 10 Products
C9. Sample Codes used for Analysis
SAS – Variable Clustering
SAS – Hierarchical Clustering
SAS – K-Means Clustering
Page | 61
VBA – Regression Dataset Building
SAS – Multiple Linear Regression
Page | 62
Python – Model RMSE Calculation
Page | 63
Page | 64
- End of Appendix C -
Page | 65
Appendix D – Glossary
Terms Meaning
SAS Enterprise Miner is a solution to create accurate predictive and
SAS Enterprise descriptive models on large volumes of data across different sources
Miner in the organization. In this case, this solution is used for Market Basket
Analysis.
Market Basket Analysis can simply be defined as a modelling
Market Basket technique based on the theory that when a consumer purchased a
Analysis certain group of items, he or she is likely to purchase another group of
items.
The likelihood or probability of purchasing B given that A is
Confidence
purchased already. P( purchase B | purchase A)
How useful is the generated rule, AB as compared to a random
guess.
Lift Lift > 1 means the generated rule is more useful than a random guess.
Lift = 1 means the generated rule is the same as a random guess.
Lift < 1 means the generated rule is less useful than a random guess.
Support refers to the frequency in which a rule occur in all the
Support
transactions.
A measure of error, whereby errors for individual data points are
Root mean
squared, averaged and squared rooted. This accounts for negative and
squared error
positive errors cancelling each other out, and more accurately reflects
(RMSE)
the scale of errors.
Multiple-Linear Used in predictive modelling to explain the relationship between one
Regression (MLR) continuous dependent variable to two or more independent variables.
An explorative analysis that divide a multivariate dataset into
Cluster Analysis
“natural” clusters (groups).
A procedure used to group redundant variables, in order to identify
Variable
key variables to be used for hierarchical, k-means or other types of
clustering
clustering of the actual dataset
Hierarchical It is a type of cluster analysis that serves to build a hierarchy of clusters
clustering for more distinct clustering.
A type of cluster analysis to divide n-observations into k-clusters
K-means (User can specify K as the number of clusters) in which each
clustering observation belongs to the cluster with the nearest mean, serving as a
prototype of the cluster.
- End of Glossary -
Page | 66
Appendix E – References
81817777.com. (2016) District Map. 81817777.com. Available at: http://81817777.com/new-
launch-condos/directory-SgCondo-apartment/singapore-district-map.jpg [Accessed: 22
March, 2017].
Baker, J, A. (2016) Compass One opens - with strong focus on young families. The Straits
Times. Available at: http://www.straitstimes.com/singapore/compass-one-opens-with-strong-
focus-on-young-families [Accessed: 22 March, 2017].
Cheng, K. (2016) A peek into Tengah, the next new HDB town the size of Bishan. Today.
Available at: http://www.todayonline.com/singapore/peek-tengah-next-new-hdb-town-size-
bishan [Accessed: 22 March, 2017].
Chin, D. (2014) New Safra Club at Punggol which will cater to young families. The Straits
Times. Available at: http://www.straitstimes.com/singapore/new-safra-club-at-punggol-
which-will-cater-to-young-families [Accessed: 22 March, 2017].
CNA. (2015) Bukit Panjang Integrated Transport Hub to open in 2017. Channel NewsAsia.
Available at: http://www.channelnewsasia.com/news/singapore/bukit-panjang-
integrated/1943524.html [Accessed: 22 March, 2017].
CNA. (2016) More than 10,000 flats launched in largest HDB sales exercise this year.
Channel NewsAsia. Available at: http://www.channelnewsasia.com/news/singapore/more-
than-10-000-flats-launched-in-largest-hdb-sales-exercise/3308458.html [Accessed: 22 March,
2017].
CNA. (2016) Two more years before decision on Cross Island Line: Khaw Boon Wan.
Channel NewsAsia. Available at: http://www.channelnewsasia.com/news/singapore/two-
more-years-before/2558414.html [Accessed: 22 March, 2017].
Heng, J. (2016) Singapore-KL High Speed Rail targeted to start running by around 2026;
journey will take 90 minutes. The Straits Times. Available at:
http://www.straitstimes.com/singapore/singapore-kl-high-speed-rail-targeted-to-start-
running-by-around-2026-journey-will-take-90 [Accessed: 22 March, 2017].
Lim, A. (2016) Plans to develop Jurong Lake Gardens Central and East unveiled. The Straits
Times. Available at: http://www.straitstimes.com/singapore/environment/plans-to-develop-
jurong-lake-gardens-central-and-east-unveiled [Accessed: 22 March, 2017].
Page | 67
Lim, P. J. (2016) Waterway Point officially opens. Channel NewsAsia. Available at:
http://www.channelnewsasia.com/news/business/singapore/waterway-point-
officially/2709726.html [Accessed: 24 March, 2017].
Ng, J, S. (2017) Parliament: Punggol North to become Singapore's first 'enterprise district',
home to digital and cyber-security industries. The Straits Times. Available at:
http://www.straitstimes.com/singapore/housing/punggol-north-to-become-spores-first-
enterprise-district-home-to-digital-and-cyber [Accessed: 22 March, 2017].
Ng, K. and Chua, A. (2016) Jurong Lake District to be second CBD, call for plans issued.
Today. Available at: http://www.todayonline.com/singapore/ura-seeks-proposals-develop-
jurong-lake-district-spores-2nd-cbd [Accessed: 22 March, 2017].
SingStat. (2016) Population Trends 2016. Department of Statistics Singapore. Available at:
http://www.singstat.gov.sg/publications/publications-and-papers/population-and-population-
structure/population-trends [Accessed: 24 March, 2017].
SingStat. (2016) Singapore Residents by Planning Area/Subzone, Age Group and Sex, June
2000 - 2016. Department of Statistics Singapore. Available at:
http://www.singstat.gov.sg/docs/default-source/default-document-
library/statistics/browse_by_theme/population/statistical_tables/tablea12-2000-2016.xls
[Accessed: 24 March, 2017].
URA. (2016) List of Postal Districts. Urban Redevelopment Authority. Available at:
https://www.ura.gov.sg/realEstateIIWeb/resources/misc/list_of_postal_districts.htm
Yeo, S, J. (2016) Tengah to be developed into a 'Forest Town'. The Straits Times. Available
at: http://www.straitstimes.com/singapore/housing/tengah-to-be-developed-into-a-forest-town
Yip, W. Y. (2016) Shaw's Waterway Point cineplex has most screens in the heartlands. The
Straits Times. Available at: http://www.straitstimes.com/lifestyle/entertainment/shaws-
waterway-point-cineplex-has-most-screens-in-the-heartlands [Accessed: 22 March, 2017].
- End of References -
Page | 68
BC3406
Business Analytics Consulting
Data Hackathon
Group Two | Alvon Chua Kang Jin | He renyi jonathan | Tan Jun Lek, Jerry
Overview of Paula’s Choice Data Hackathon Challenge
Background
Paula’s Choice Singapore is a skincare company that aims to
de the best skincare and makeup products to consumers. Their
main target audience are customers aged 25-35 (50%) and
Provides 18-24 (22%) years old. Paula’s Choice believes that that their
brand exemplifies the essence of ‘masstige’ where high quality
Dataset products are offered to consumers at affordable prices.
1. Bundling Analysis and

Marketing Strategies
Analyzed using
2. Flagship Store
Tools such as eSpatial, MS Excel, Python, Positioning Analysis
SAS, SAS Enterprise Miner and VBA
3. Forecasting Analysis
Product
‘4Ps’ Marketing Mix Strategy for 3-SKU Bundles
Price
SKU
Place
Promotion
Complementary
SKU
Justifications for the Five Proposed Bundles
Key Technique Used:
• Market basket analysis
• Descriptive analysis at:
• SKU Level
• Bundle Level
• Channel Level
• Category Level
• Combination of the above
Key Ideas for Bundling:

• High demand SKUs to help increase sales for
low demand SKUs via halo effect
• High confidence of at least 25% in the
association rule
• High demand SKUs to compensate for low
support in the generated rule
• High complementary relationship between SKUs
Interesting Insight
Customers mostly purchased items of the same pack
size when purchasing multiple items.
Flagship Store – Punggol
• Large existing customer base,
• Population profile fits target audience
• Data shows high sales performance;
great potential for high revenue
• Extensive development plans
Waterway Point
• ~$18.00 - $40.00 psf (24 Mar 17)
Forecasting Analysis
Variable clustering + hierarchical clustering
Methodology
+ k-means clustering to group SKUs for forecasting
Clustering
Top 10 Products Forecasted 12-Months Revenue
Forecasting Model
Final Forecast
Selection
Comparison between simple moving average models, exponential
smoothing and multiple linear regression models for different clusters
Models and guidelines for stock replenishment and ordering proposed based on generated forecast, constraints, and inventory
management concepts such as lead time, safety stock, reorder points

Nanyang Business School BC3406 Business Analytics Consulting Data Hackathon

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nanyang Business School BC3406 Business Analytics Consulting Data Hackathon

Uploaded by

Copyright:

Available Formats

Nanyang Business School

BC3406 Business Analytics Consulting

Alvon Chua Kang Jin U1410705J

3. Flagship Store Positioning Analysis

Figure 3-1. Heatmap based on Total Sales

Forecasted Feb 2017 to Jan 2018 Sales

2016 2017 (Forecasted)

Figure 4-1. – Final Sales Forecast

2. This dataset is then saved as “For SAS_3.csv”

To specify the maximum

Bundle SKU Analysis/Justification

SKU all same size

Figure A4-2. – No. of Rules

SKU Bundle 1a - [Skin Balancing + Skin Perfect Bundle]

SKU Item Category

2010 SP 2% BHA Liquid - Regular Skin Perfecting Exfoliants

1150 Skin Balancing Cleanser - Regular Skin Balancing

1350 Skin Balancing Toner - Regular Skin Balancing

5. All three SKUs are highly complementary to each other.

SKU Bundle 1b - [Resist + Skin Perfect Bundle]

SKU Item Category

2010 SP 2% BHA Liquid - Regular Skin Perfecting Exfoliants

7780 Resist Oily Toner - Regular Resist Oily

7830 Resist Oily Cleanser - Regular Resist Oily

SKU Bundle 2 - [Best-selling items Set]

SKU Item Category

7770 C15 Super Booster - Regular Resist Treatments

5700 Resist Body 2% BHA - Regular Body, Lip & Hair

2010 SP 2% BHA Liquid - Regular Skin Perfecting Exfoliants

Possible explanation of high confidence of 45%:

SKU Bundle 3 - [High-Value Deal]

SKU Item Category Price (SGD)

7790 Resist Weekly 4% BHA - Regular Resist Oily $55

8010 Clinical 1% Retinol - Regular Clinical $88

7820 Resist Pore Refining 2% BHA - Resist Oily $48

1. Highest confidence of 83.33% amongst all the SKU associations

It is also interesting to note that SKU 8010, Clinical 1% Retinol - Regular, is

3. Helping SKU 7790

SKU Bundle 4 - [Sample Kit]

SKU Item Category

1159 Skin Balancing Cleanser - Sample Skin Balancing

1359 Skin Balancing Toner - Sample Skin Balancing

3409 Skin Balancing Moisturizer - Sample Skin Balancing

3. High confidence of 62.5% among all sample sized rules.

4. Highly complementary SKUs

5. Further promote Bundle 1a

A4.3. Justifications for “Price”

Figure A4-3. – Proposed Bundle Price Range

Gross Sales Breakdown by Channel

Figure A4-4. – Gross Sales Breakdown by Channel

Figure A4-5. – Absolute figures for Gross Sales Breakdown by Channel

1a. [Skin 2010 41.70% 13.28% 45.01% Online

2010 41.70% 13.28% 45.01% Online

7770 47.65% 11.96% 40.39% BC

7790 39.43% 11.78% 48.79% Online

1159 39.18% 9.28% 51.55% Online

Figure A4-6. – Bundle Gross Sales Breakdown by Channel

70% 3.47% 7790

Figure A4-7. – Channel Gross Sales Breakdown by SKU

SKU % Gross sales Average % bundle gross sales

1a. [Skin 2010 23.29% 31.17% 25.58%

7790 3.47% 4.35% 4.37%