Professional Documents
Culture Documents
Business Data Mining
Business Data Mining
Business Data Mining
Classification Trees
Classification trees are used
1. When the target variable is categorical
2. When the goal is to generate understandable and explainable rules
3. To pick a good set of variables to be used as inputs to another modeling technique
like ANN etc
4. The early phase of majority of data mining projects as they reveal so much about
the data
Requirement
1) Data Mining plan
2) Recommend what data elements he needs and possible means to capture them
By addressing the customer satisfaction issues, the e-book retailer can greatly improve the
Mining Objectives:
1. Build a recommendation Engine which finds clusters of books that usually sell
together
2. Text Mining: Gain insights into customers' views and opinions about books and
our website
Special considerations:
Host a Customer discussion forum on the E-boom retailer website – user driven content.
Here allow customers to discuss about the books they bought/like/hate etc and also what
they love and hate about the website, things for improvement etc.
a. Perform text mining to capture good and bad about the book/retailer
website
b. Review the feedback, consolidate and make changes
Business Objective:
Develop targeted marketing plan
Explore if any niche marketing opportunities are available
Assumptions:
The supermarket has several departments like Apparel, Cosmetics, Home Furnishings,
Furniture, Electronic home appliances other than Groceries
Example: Big Bazaar, SPAR etc
Mining Objectives:
1. Classify customers into Brand Loyalists, Brand Switchers and Alternators for every
Brand sold in the store
2. Identify cluster of products that tend to be purchased by the same person over time
3. Link the demographic details with the share of wallet for each category sold in
supermarket (apparel, home appliances etc)
Methodology
Product/ Brand wise analysis:
Use data mining to analyze previous transactions of customers and classify them into
1. Loyalists – who does not switch brands
2. Switchers – who switch brands based on the promotions offered by respective
brands
3. Alternators – who change to new brand alternatively (Anti-incumbency effect)
Example clusters:
1) Bachelors & Work experience less than 1 years: Offer discounts on Electric rice
cookers, Sandwich makers, Utensils etc
2) Newly married: Offer discounts on home furnishings, home appliances like LCD
TV, Refrigerator, Washing machines, Microwave Ovens; Furniture like Sofa, Dining
table etc
3) Married with Kids: Offer discounts on kids clothes during festival time, Bundle
kids uniforms and school stationary & shoes etc
4) Housewives: Offer special discounts during weekdays to avoid long billing queues
over weekends. This offer can be best utilized only by people staying at home during
weekdays.
Identify loyalty at store level –Increase in the share of wallet:
Classify the customers into categories based on their income, profession, location,
purchase details across several months etc and predict typically what kind of
customers spend what percentage of their income on what categories of products.
Examples:
Suppose data mining shows that a typical IT professional who is married and has 2
kids and stays within 3Km radius of the store, spends approximately 30% of his
income on apparel;
If there exists a customer who belongs to this cluster but spends only 5% of his
income on apparel, then he has to be offered incentives for spending his full quota of
approximately 30% of his income in the store and not elsewhere.
Special Considerations:
Since the supermarket is a chain, there may be customers who buy from various outlets
spread across the city. So to get comprehensive view about purchase details it is
necessary to integrate the databases of all chains. Also it will be interesting to find out
what kind of people buy from distributed outlets.
Gains Chart
2500
2000
1500
No model
Hi
ts
#
Model 1
Model 2
1000
500
0
1 2 3 4 5
Quintile
Model#2: Maximum benefits of lift curve can be seen near 1.5 Quintile, with hits of
1250, as against 750 with no model. The relative gain is better for smaller number of
customers.
After 3.5 Quintile, both models converge to gains achieved by using no model. Hence
when the number of target customers are very large (>70%), it is better not to use any
model.