Cluster Analysis - Nadya Fionalita - 19016041

Cluster Analysis

Nadya Fionalita

Problem Identification
Marketing managers want to create consumer groups based on consumer profiles, namely:
1. Usia
2. Jumlah anak
3. Income
4. Kegiatan membaca koran setiap minggu
5. Kegiatan menonton TV setiap minggu
6. Jumlah motor yang dimiliki konsumen
7. Jumlah mobil yang dimiliki konsumen
8. Jumlah kartu kredit
9. Tingkat pembelian barang tiap minggu
10. Tingkat pengeluaran
11. Jumlah jam kerja dalam seminggu
12. Jumlah kegiatan belanja dalam seminggu

To find the grouping that consist of similar form, the researcher use K-Means clustering analysis
in this research method. In order to sum the cluster, researcher divided it into four clusters. This
research will consist of 60 responses. Below are variable identification to ease the next steps.

Variable Name Label Variable Name Label

Usia Age Number of Children People
Pendapatan Income Menonton TV Watching Hour
Membaca Koran Reading Hour Motorcycle Owned Motorcycle
Car Owned Car Credit Card Owned ATM
Tingkat Pembelian Purchase Level Tingkat Pengeluaran Spending
Jam Kerja Working Hour Kegiatan Belanja Shopping
Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

Zscore: Usia 60 -1.74613 2.14071 .0000000 1.00000000

Zscore: Jumlah Anak 60 -.63104 2.97489 .0000000 1.00000000
Zscore: Penghasilan rata-
60 -.91197 3.08493 .0000000 1.00000000
rata per bulan
Zscore: Jumlah Jam
membaca Koran setiap 60 -1.60876 2.26950 .0000000 1.00000000
Zscore: Jumlah Jam
60 -1.88693 2.01707 .0000000 1.00000000
menonton TV setiap minggu
Zscore: Jumlah Motor yang
60 -1.47158 1.68180 .0000000 1.00000000
Zscore: Jumlah Mobil yang
60 -.87521 2.21377 .0000000 1.00000000
Zscore: Jumlah Kartu
60 -1.67616 2.51425 .0000000 1.00000000
Kredit/ATM yang dipunyai
Zscore: Tingkat Pembelian
60 -1.22890 1.89212 .0000000 1.00000000
Barang setiap minggu
Zscore: Tingkat
60 -.88103 3.11566 .0000000 1.00000000
Pengeluaran Bulanan
Zscore: Jumlah Jam Kerja
60 -1.18493 2.69195 .0000000 1.00000000
setiap minggu
Zscore: Jumlah Jam
60 -1.38277 3.00381 .0000000 1.00000000
Berbelanja setiap minggu
Valid N (listwise) 60

Initial Cluster Centers


1 2 3 4
Zscore: Usia -.50941 .37396 1.25733 2.14071
Zscore: Jumlah Anak -.63104 -.63104 -.63104 2.97489
Zscore: Penghasilan rata-
3.08493 .72057 -.68679 1.95904
rata per bulan
Zscore: Jumlah Jam
membaca Koran setiap .97675 .97675 -1.17784 -.31601
Zscore: Jumlah Jam
-1.32922 .90164 .06507 -.49265
menonton TV setiap minggu
Zscore: Jumlah Motor yang
.10511 1.68180 -1.47158 1.68180
Zscore: Jumlah Mobil yang
2.21377 -.87521 -.87521 .66928
Zscore: Jumlah Kartu
2.51425 1.67616 -1.67616 .83808
Kredit/ATM yang dipunyai
Zscore: Tingkat Pembelian
1.50199 1.89212 -.83877 1.50199
Barang setiap minggu
Zscore: Tingkat
3.11566 -.05153 -.67366 1.98452
Pengeluaran Bulanan
Zscore: Jumlah Jam Kerja
2.69195 2.09550 -1.12529 .72368
setiap minggu
Zscore: Jumlah Jam
3.00381 1.90717 -.78460 .81052
Berbelanja setiap minggu

Iteration Historya

Change in Cluster Centers

Iteration 1 2 3 4

1 2.587 2.360 2.451 2.648

2 .000 .256 .075 .000
3 .000 .000 .000 .000

a. Convergence achieved due to no or small change in cluster centers. The maximum

absolute coordinate change for any center is .000. The current iteration is 3. The
minimum distance between initial centers is 5.634.

Final Cluster Centers


1 2 3 4
Zscore: Usia -.68609 .38755 -.18698 .76265
Zscore: Jumlah Anak -.63104 -.42300 -.02254 1.53252
Zscore: Penghasilan rata-
3.08493 .54735 -.54605 1.71135
rata per bulan
Zscore: Jumlah Jam
membaca Koran setiap 1.40767 .81101 -.44528 .89056
Zscore: Jumlah Jam
-.21379 .38682 -.06739 -.38110
menonton TV setiap minggu
Zscore: Jumlah Motor yang
.10511 .83282 -.36789 .73579
Zscore: Jumlah Mobil yang
1.44152 .43166 -.37325 1.28707
Zscore: Jumlah Kartu
1.67616 1.22489 -.56571 .67047
Kredit/ATM yang dipunyai
Zscore: Tingkat Pembelian
.72174 .99182 -.51692 1.26791
Barang setiap minggu
Zscore: Tingkat
3.11566 .49664 -.53321 1.72813
Pengeluaran Bulanan
Zscore: Jumlah Jam Kerja
1.64817 1.23296 -.56762 .67597
setiap minggu
Zscore: Jumlah Jam
1.50839 1.23998 -.54583 .53935
Berbelanja setiap minggu


Cluster Error

Mean Square df Mean Square df F Sig.

Zscore: Usia 2.400 3 .925 56 2.595 .061

Zscore: Jumlah Anak 4.962 3 .788 56 6.299 .001
Zscore: Penghasilan rata-
16.500 3 .170 56 97.249 .000
rata per bulan
Zscore: Jumlah Jam
membaca Koran setiap 8.137 3 .618 56 13.173 .000
Zscore: Jumlah Jam
.982 3 1.001 56 .981 .409
menonton TV setiap minggu
Zscore: Jumlah Motor yang
5.720 3 .747 56 7.655 .000
Zscore: Jumlah Mobil yang
6.811 3 .689 56 9.890 .000
Zscore: Jumlah Kartu
13.391 3 .336 56 39.828 .000
Kredit/ATM yang dipunyai
Zscore: Tingkat Pembelian
10.852 3 .472 56 22.982 .000
Barang setiap minggu
Zscore: Tingkat
16.309 3 .180 56 90.653 .000
Pengeluaran Bulanan
Zscore: Jumlah Jam Kerja
13.456 3 .333 56 40.441 .000
setiap minggu
Zscore: Jumlah Jam
12.637 3 .377 56 33.555 .000
Berbelanja setiap minggu

The F tests should be used only for descriptive purposes because the clusters have been chosen to maximize the
differences among cases in different clusters. The observed significance levels are not corrected for this and thus
cannot be interpreted as tests of the hypothesis that the cluster means are equal.

Number of Cases in each


Cluster 1 2.000

2 13.000

3 40.000

4 5.000
Valid 60.000
Missing .000
K-Means is used to create cluster and relocate them through iteration process. Based on the
data, there is a change in Iteration History table and the three iterations performed towards
the data and have minimum distance of 5.634 between initial centers.

The interpretation of four clusters formed begins by analyzing the variables that distinguish
between the three clusters.
In this research, SPSS generally uses 5% error, we can construct hypothesis where no
significant difference between clusters if the significance value > 0.05. Significance value <
0.05 shows the opposite. ANOVA table presents distinguishing variables of the segments.
1. Income, reading hour, Motorcycle, car, ATM, purchasing level, spending, working hour,
and shopping have 0.00 significance value which represents they have significance
difference in clusters. Cluster 1, cluster 2, cluster 3, and cluster 4 have relation with
all these variables.
2. Watching hour has significance value of .409 that has the same meaning with all
variables included in number 1.
3. Children has significance value of .001 that also has the same meaning as all
variables included in number 1 and number 2.
4. Age has .61 significance value which means this variable does not have significance
difference between clusters. Hence, this is not considerable during the analyzing

Based on the table, the highest amount of F value is 97.249. The number shows income in
each cluster are much different.

Next, each city can be analyzed based on Final Cluster Centers table. From the table, we can
see that through eleven variables (exclude age after ANOVA result analysis) it is forms four
groups. On Final Cluster Centers table, numbers are still related to the standardization
process as we used Z-score value where negative value (-) means the data is below total
average and positive value (+) means the opposite. Hence we can see that:
1. Age, number of children, watching hour are below total average in cluster 1
2. Number of children and watching hour are variables that have lower sum in cluster 2
3. All variables are negative in cluster 3
4. Only watching hour that has negative value in cluster 4

These are the analysis of each variables that shows in Cluster table:

1. Children: average in cluster 4 (1.53252) > cluster 3 (-.02254) > cluster 2 (-43200) >
cluster 1 (-.63104). This proves that amount of children really impacts behavior the
most in cluster 4 where some people who live in cities included in cluster 1 are not
influenced by number of children.
2. Income: shows that cluster 1 (3.08493) > cluster 4 (1.71135) > cluster 2 (.54735) >
cluster 3 (-.54605). It means some consumer behavior are really dependent toward
income the most in cluster 1 and income has least influence in cluster 3.
3. Reading Hour: results is cluster 1 (1.40767) > cluster 4 (.89056) > cluster 2 (.81101)
> cluster 3 (-.44528). Some people who live in Jakarta Utara and Jakarta Timur have
spent time the most in reading hour rather than some people who live in cities
included in cluster 3.
4. Watching Hour: variables shows cluster 2 (.38582) > cluster 3 (-.06739) > cluster 1
(-.21379) > cluster 4 (-.3810). Some people who live in cities categorized on cluster
2 (Bandung, Semarang, Jakarta Selatan, Surabaya, Jakarta Barat, Yogya, Solo,
Malang) watch the most rather than cluster 4 who have least watching hour.
5. Motorcycle: shows cluster 2 (.83282) > cluster 4 (.73579) > cluster 1 (.10511) >
cluster 3 (-.36789). Cluster 2 have motorcycles the most and cluster 3 have the least
6. Car: is owned the most in cluster 1 (1.44152) > cluster 4 (1.28707) > cluster 2
(.43166) > cluster 3 (.-37325).
7. ATM: owners group the most in cluster 1 (1.67616) > cluster 2 (1.22489) > cluster 4
(.67047) > cluster 3 (.56571).
8. Purchasing Level: reach the most amount in order: cluster 4 (1.26791) > cluster 2
(.99182) > cluster 1 (.72174) > cluster 3 (-.51692).
9. Spending: is at the highest in cluster 1 (3.11566) > cluster 4 (1.72813) > cluster 2
(.49664) > cluster 3 (-.53321)
10. Working Hour: influence the most in cluster 1 (1.64817) > cluster 2 (1.23296) >
cluster 4 (.67597) > cluster 3 (-.56752)
11. Shopping: took the longest time in cluster 1 (1.50839) > (1.23998) > cluster 4
(.53935) > cluster 3 (-.54583).

Based on the data above, we can conclude that manager can divide the consumer group
based on consumer behavior within four groups that consist of 3.3% in cluster 1 (Jakarta
Utara and Jakarta Timur), 21.7% of cluster 2 (Surabaya, Jakarta Selatan, Jakarta Timur,
Jakarta Barat, and Bandung), 66.67% in cluster 3 (Tegal, Yogya, Solo, Banjarnegara, Madiun,
Pekalongan, Jepara, Blora, Karawang, Magelang, Parakan, Tuban, Ciamis, Pati, Cepu,
Wonogiri, Pacitan, Malang, Yogya, Solo, Caruban, Mojokerto, Jombang, Kebumen, Kutoarjo,
Purworejo, Cirebon, Tasikmalaya, Bogor, Jakarta Barat, Jakarta Timur, Bekasi, Ambarawa,
Purwodadi, Tangerang, Sidoarjo) and 8.33% in cluster 4 (Bandung, Jakarta Barat, Jakarta
Timur, Jakarta Selatan, Surabaya).

Therefore, we can sum up that cluster 1 has the highest amount in income, reading hour,
car, ATM, spending, working hour, and shopping. Cluster 2 has the highest amount in
watching hour, motorcycle. Cluster 3 has no highest value in all variables since it is always in
the last place for income, reading hour, car, ATM, purchasing level, spending, working hour,
and shopping. Last, cluster 4 is highest in children, purchasing level and spending.

It is rational that cluster 1 has highest amount in almost all general things to have for living
in the city while cluster 3 is filled mostly by districts that do not consider things as much as
people in cluster 1,2, and 4 due to socio-demographic condition. On the other hand, people
in cluster 4 are considered to live in a capital city of provinces which produces high salary
and much offers in that city such as shopping mall, restaurant, et cetera. This creates high
spending and purchasing on that area due to living place and other supported factor such as
salary. While in group 1,2, and 4 are considered in higher economy classification rather than
group 3.

Therefore, researcher suggest marketing manager to put more exposure for consumer group
1,2, and 4 as they have highest value in vital variables to do shopping activity. The manager
can divide these 3 groups into different purpose. Marketing managers would likely to focus
on cluster 1 if they want to achieve higher profit put on attention that they really like reading.
Focus on cluster 2 can be done if managers want to promote the practicality through
advertising media. Focus on cluster 4 if mother-care supplies become the main purpose
(newly-born mother would likely to have increasing shopping for baby needs). In this case,
whether the group 3 has the lowest result, the managers can still take this situation as an
advantage to focus on product function as they don’t really pay attention to any promotions.
Managers can maximize in children and watching hour in this group. TV advertising can be
the best alternative for brand exposure and sell more children stuff.

