Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Marketing (professor Youjung Jun)

Companion to “Marketing Data Miner”


The Marketing Data Miner is an Excel-based tool that allows the user to cluster data (using K-means) to
uncover segments in the market.

Entering Data

The tool comes with a default dataset. If you would like to input your own data, Go to the “Input data”
worksheet, press “clean data” and enter your own data. Your data should contain scores (e.g.,
partworths, importance ratings) on several dimensions (e.g., attribute levels, evaluation dimensions) for
several units (e.g., consumers).

Suppose one wanted to analyze data on 10 consumers that rated the importance of 5 dimensions:
Durability, Service, Design, Prestige, and Affordability on a 1 to 7 scale, with 1 being not at all important
and 7 being extremely important. You also have some data about the demographics of these customers
in terms of their age in years, years of education completed and gender. The input would look like this:
K-means

K-means is a method for clustering observations. Simply press the “K-MEANS” button in the “Input
Data” tab. This will run K-means clustering with 1 cluster, 2 clusters, etc., all the way to 10 clusters. The
output is contained in the tab “K-Means” and looks like this:

“Percentage of explained variance” captures the proportion of the variance that would remain in the
data if we approximated each point with its cluster center. Ideally, we would like each point to be close
to its cluster center, that is, we would like the variance under this approximation to be close to the true
variance in the data (large proportion of explained variance). The graph shows you the impact of adding
more clusters on the percentage of explained variance. The more clusters we allow, the finer the
resolution and the better we are able to explain the variance in the data. In the above case, there are 10
customers being clustered. If we allow 10 clusters, then each customer is its own cluster and the clusters
match the data perfectly. In that case the percentage of explained variance is 100%. If we allow only 2
clusters, only 51% of the variance would be explained, etc. This graph allows you to identify a good
number of clusters. Intuitively, we would like to select a number of clusters that is not too large, while at
the same time explaining a lot of the variance in the data. In the above case 3 seems like the right
number of clusters: there is a big jump in the proportion of variance explained between 2 and 3 clusters,
but then it gets pretty flat as we add more clusters.

“Cluster Centers” give the center of each cluster, i.e., the average observation in each cluster. In our
case, with 3 clusters, it seems that the first cluster captures customers that care about Service; the
second cluster cares primarily about Design and Prestige; the third cluster cares about durability and
affordability. The size of the clusters is fairly similar.

“Cluster Membership” tells us the cluster to which each observation belongs. With 3 clusters, we look in
the “3 clusters” column and see that respondent 1 belongs to Cluster #3, respondent 4 to Cluster #1,
respondent 7 to Cluster #2, etc.
“Demographics”--in running the cluster analysis you can, if you wish, compare the clusters on other
variables such as demographics. If you wish to add comparisons of clusters on demographics click “yes”
in the “input data” worksheet under “Include Demographics.” The output would then present the
comparison of the clusters on demographics in the right-most table in “k-means worksheet”. For
example, we see in the 3 clusters solution that on average clusters 3 is a bit older (average age = 49.33)
than cluster 2 (average age = 46.25) which in turn is older than cluster 1 (average age = 40.33).

You might also like