Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Knowledge-Based Systems 70 (2014) 128–136

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

ClustOfVar and the segmentation of cruise passengers from mixed data:


Some managerial implications
Juan Gabriel Brida a, Vincenzo Fasone b, Raffaele Scuderi b,⇑, Sandra Zapata-Aguirre b
a
School of Economics and Management and TOMTE, Competence Center in Tourism Management and Tourism Economics, Free University of Bozen-Bolzano, Piazza Università 1,
I-39100 Bolzano, Italy
b
University of Enna ‘‘Kore’’, Cittadella Universitaria, 94100 Enna, Italy

a r t i c l e i n f o a b s t r a c t

Article history: Market segmentation comprises a variety of measurement methodologies that are used to support
Received 23 September 2013 management, marketing and promotional policies in tourism destinations. This study applies ClustOfVar,
Received in revised form 5 May 2014 a relatively recent algorithm for cluster analysis from mixed variables. The technique finds groups of
Accepted 11 June 2014
variables by using a homogeneity criterion based on the sum of correlation ratios for qualitative variables,
Available online 27 June 2014
and squared correlations for quantitative variables. Then principal components from each cluster of
variables are extracted in order to segment cruise passengers. CART analysis is finally used for the sake
Keywords:
of finding the variables that drove the formation of the clusters. All the analysis is based on an official
ClustOfVar
Cruise tourism
survey of tourists who disembarked in Uruguayan ports. The analysis identified five clusters, both for
Mixed data variables and cruise passengers. Findings highlight the importance of the enjoyment of the contact with
Market segmentation local people for the economic impact, as well as the important role of age and gender related variables.
Uruguay Managerial implications are also discussed.
Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction bordering the South Atlantic Ocean, between Argentina and Brazil.
With two main ports Montevideo and Punta del Este, the country
With a remarkable growth in the last decades, cruise industry has experienced a significant increase in the number of vessels,
has become one of the most dynamic and growing segments of as well in the number of disembarking cruise passengers. Accord-
the travel and tourism industry [28]. This increase has occurred ing to the data of the Ministry of Tourism and Sports of Uruguay
since 1990 with an average annual passenger growth rate of 7.4% [34], during the 2011–2012 season 225 cruise ships arrived to
[20]. The cruise industry has responded to the worldwide ports the country in the two ports of Montevideo (53%) and Punta del
growth with a rapid increase in the world’s cruise fleet, the crea- Este (47%). This corresponds to an increase of 32% with respect
tion of appealing itineraries that reflect the interests of today’s to the previous season, and it is also the most outstanding outcome
travellers and the ability of cruise ships to visit new and exotic for both ports since 2004 [8]. From this context, a deeper under-
locations [17]. Recent numbers reported by the FCCA [20], an standing of the main characteristics of the cruisers disembarking
organization that represents member lines operating more than in the two ports could be an important feature in order to improve
100 vessels in Floridian, Caribbean and Latin American waters, the positive impact generated by this kind of activities.
announced that 17 million passengers chose to cruise in 2012. The main focus of this paper is to find and analyse homogeneous
Thus, the industry will be well on the way to maturity [36,43]. groups of cruise passengers who disembarked in Uruguayan ports.
In this context, South American markets, together with the This would allow to identify market segments, and suggest
Mediterranean areas (with an annual growth rate of about 10%) managerial recommendations to destination managers and port
are also increasing their shares. They belonged to the top world authorities. Indeed, market segmentation techniques comprise a
destinations in 2012 [17]. The country of Uruguay is in line with set of tools that are appropriate not only to segment markets and
this trend. It is a small country located in southern South America, identify target markets, but also for assisting a business analyst to
understand the relationship between destination and tourists [3].
Segmentation tools have been widely used in both industry and
⇑ Corresponding author. Tel. +39 (0)935 536380; fax: +39 (0)935 536971.
academic research for the sake of gaining more insights into a wide
E-mail addresses: juangabriel.brida@unibz.it (J.G. Brida), vincenzo.fasone@
unikore.it (V. Fasone), raffaele.scuderi@unikore.it (R. Scuderi), sandra.zapataaguirre@ range of markets [40,19,21]. Their number has sensibly grown
unikore.it (S. Zapata-Aguirre). [31,18], with each method presenting a peculiar view of the data

http://dx.doi.org/10.1016/j.knosys.2014.06.016
0950-7051/Ó 2014 Elsevier B.V. All rights reserved.
J.G. Brida et al. / Knowledge-Based Systems 70 (2014) 128–136 129

[30]. Hence, scholars have improved them in order to make them variables into groups. This way, each group will carry elements
less prone to error and misinterpretation [19]. The wide set of tech- with the most similar information.
niques ranges from elementary percentiles and quartiles [35] to Among the several methods that have been proposed for
more complex techniques such as factor analysis, principle compo- dimensionality reduction, little attention has been paid to mixed
nents and cluster analysis [3]. Nowadays, segmentation techniques qualitative-quantitative datasets. To this end, ClustOfVar [16] is a
and methodologies have evolved in order to overcome a series of relatively recent algorithm for the statistical software R [37] to
limitations [3]. For instance, artificial neural networks (ANN) have handle mixed dataset. It finds group of variables by using a homo-
been proven as better method in yielding market segmentation geneity criterion that is based on the sum of correlation ratios for
[26,33]. In many market segmentation researches, ANN based clus- qualitative variables, and squared correlations for quantitative
tering has been dominated by Self-Organizing Feature Map-SOFMs variables. Identification of the number of clusters is made from
or SOMs [26]. These neural networks have apparently primary the analysis of aggregation levels and stability of the partitions
advantages over cluster analysis [3,6,24]. On the other hand, other via bootstrapped mean-adjusted Rand index. Then from each clus-
techniques have been developed in order to ‘refine’ classic ones [7]. ter a synthetic representative variable is extracted through PCAmix
However, for what concerns the studies of cruise passengers, only [15,23]. The latter is a generalization of PCA and MCA that allows
few papers have applied these techniques for detecting and analys- handling of mixed datasets. The extracted new variable is the main
ing characteristics, behaviors and onshore experience of cruise pas- principal component of PCAmix applied within each group of
sengers while visiting a port of call (see [2,9,12,10,42]). variables.
This paper adopts a three-step multivariate market segmenta-
tion analysis, where clustering techniques are adopted for both 2.2. The algorithm
variables and units. The first step consists in the dimensionality
reduction of the original set of variables. This is a classic problem Consider two data matrices of v1 standardized quantitative and
in statistical analyses [32]. The reduction of the number of vari- v2 qualitative variables on n units, respectively named X ¼
  
ables in a few latent dimensions has the twofold purpose of x1 ; x2 ; . . . ; xi ; . . . ; xm1 and Z ¼ z1 ; z2 ; . . . ; zj ; . . . ; zv 2 . ClustOfVar
imploding highly linearly dependent variables that might carry follows the usual steps of hierarchical agglomerative methods.
too similar information, and at the same time of finding and inter- Therefore it starts by considering each of the v = v1 + v2 variables
preting common latent dimensions to such variables. Techniques as forming a cluster on its own.
such as Principal Component Analysis (PCA) or Multivariate Corre-
spondence Analysis (MCA), respectively for quantitative and qual- 1. Start with v partitions.
itative variables, are usually applied in order to detect latent traits. 2. Aggregate the two clusters A and B with the smallest similarity
However these techniques are not able to handle mixed datasets. given by d(A, B) = H(A) + H(B)  H(A [ B). The homogeneity of
In addition, they are built on the hypothesis that all variables share the k-th cluster of variables Ck e {C1, . . . , CK} is calculated from
a common ‘aspect’, i.e. come from a common ‘semantic’ group, and P P  
HC k ¼ xi 2C k r 2 ðxi ; yk Þ þ zj 2C k g2 zj ; yk ¼ k1k . In the latter yk is
they ‘only’ need to be imploded via a technique that would extract
the first principal component of PCAmix applied to the stan-
latent variables. The latter components are found in such a way
dardized variables of Ck. The values of r2 and g2 are respectively
that they are independent each other. Therefore they do exclude
the squared correlation of xi and yk, and the correlation ratio of
the possible presence of links within the set of latent dimensions.
This paper faces these methodological issues through ClustOfVar zj and yk. The element k1k is the first eigenvalue of PCAmix
[16], a recent algorithm for clustering datasets where mixed vari- applied to Ck. Consider a partition PK of the initial set in K
P
ables are present. The methodology is based on PCAmix, a general- clusters, and define HðPK Þ ¼ Kk¼1 HC k ¼ k11 þ k12 þ . . . þ k1K . The
ization of PCA and MCA. algorithm operates in such a way that each partition in p – l
After finding groups of variables, the second step is to segment clusters maximizes H(PK) among all the possible partitions that
the cruisers via classic clustering algorithm. Specifically, Ward’s can be obtained by aggregating two of the clusters from the
[44] method and the comparison of different stopping rules were previous partition in p  l + 1 clusters.
adopted to determine groups of units. Finally, the third step con- 3. Stop when one single partition is obtained.
sists in using CART analysis [5], for the sake of estimating those
variables that significantly affect the partition in clusters of the ori- In order to evaluate the number K of the clusters to divide variables
ginal set. into, both inspection of aggregation levels and a bootstrap method
The paper is organized as follows. Section 2 resumes the to evaluate partitions’ stability are used. The latter is based on the
adopted methodology. Section 3 presents the datasets. Section 4 generation of b bootstrap samples of the N observations. On each
shows the results and discusses them. Section 5 concludes. bootstrapped sample the steps of ClustOfVar are repeated. Mean-
adjusted [22] Rand indices are then computed from each of the b
samples.
2. Methodology Though relatively recent, the algorithm has been applied in dif-
ferent fields, as gene expression data [14], cardiac diseases [4], car
2.1. ClustOfVar diagnosis [4], chemistry [45], detection of morphometric features
[29], farming and environment [25]. Chavent et al. [14] showed
Redundancy of information is one of the problems that often also that the algorithm can outperform PCA for what concerns
arises when clustering units. Strongly related variables can in fact dimension reduction.
coexist, thus determining duplication of linear information that
may lead to misleading classifications of the units. Consistently, 2.3. Clustering of units and CART
the dimensionality of a dataset is usually reduced by techniques
such as the ones that are based on the extraction of latent vari- The subsequent clustering of units in groups is based on the
ables. The most commonly used techniques are PCA for quantita- extracted first components from each cluster of variables. It is
tive variables, and MCA for qualitative variables. Nevertheless performed via the hierarchical agglomerative criterion of Ward
these approaches consider all variables as being part of a common [44]. Different stopping rules were used in order to determine
set. For this reason, a commonly used practice is to first cluster the optimal number of clusters of cruisers. Specifically we
130 J.G. Brida et al. / Knowledge-Based Systems 70 (2014) 128–136

compared the trace of the total-sample sum-of-squares and cross- misclassification error. The final tree is obtaining from ‘pruning’:
products matrix, and the indices of Scott, Marriot, Friedman and after determining a nested sequence of subtrees, least important
Rubin, Calinski and Harabasz, Krzanowski and Lai – see [13]. splits are recursively snipped off, based on the complexity
The analysis of the key variables driving the aggregation of units parameter. The latter is found via classic cross-validation criteria.
in clusters was conducted via the Classification And Regression Among the various available estimation methods, we used the ones
Tree (CART – [5]). CART is one of the most important and popular implemented in the package rpart of the software R [41].
nonparametric techniques in machine learning and data mining
[39]. Given a partition of units in groups and a set of variables, it 3. Dataset and variables
estimates hierarchical rules for the variables to be predictors of
the groups. Results are plotted through a tree structure, which dis- This study is based on a survey conducted by the Uruguayan
plays graphically how variables rule the formation of clusters of Ministry of Tourism (MINTURD) during the 2011–2012 cruise sea-
units. A main advantage is the easy-to-interpret representation of son. Original dataset consisted of 3537 records, which reduced to
the groups’ drivers, useful for marketing and policy indications. 2585 due to missing data. Data were collected at the two ports
The method operates through binary recursive partitioning. The of call of Montevideo and Punta del Este. The sampling strategy
initial measurement space X, called ‘root node’ is split into two used a two-step stratified approach. In a first phase cruise vessels
descendent subsets, each one having a label that indicates which were selected randomly, from a list of the ships expected to arrive
variable and modality determined the splitting. The new subset during the season, through a systematic sampling. In the second
can be either itself a ‘parent node’ of other ‘nodes’ (‘recursivity’ stage, cruise passengers from travel groups were chosen to ensure
of the technique), or a ‘terminal node’ (i.e., ‘leaf’) that corresponds equiprobability. Trained interviewers approached cruise passen-
to a cluster of units. At every stage subsets are mutually exclusive gers prior to return to the ship.
(generated ‘partitioning’). To build a tree, the main tasks involved Table 1 reports the list of variables that are used in this paper
in this method are the following ones. together with the labels that are used in our analysis. They include
the following items.
– Select a splitting criterion.
– Decide whether a node should be split further. – Socio-demographic. This category includes classic variables
– Assign a cluster of units to each terminal node, according to the such as gender, age and place of residence. Occupation was
majority vote criterion. considered also as proxy of wealth levels, since information
about income was not available. In addition, we considered
At each node, the split that maximizes the decrease in impurity the number of past visits to Uruguay. Repeat visit is a
(i.e., heterogeneity of nodes obtained in the splitting) is chosen. widespread phenomenon in the country, as many tourists from
This can be based on measures such as Gini index, entropy, neighbor countries use to return because they own second

Table 1
Variables: description of labels and descriptive statistics.

Qualitative variables Quantitative variables


Label Description % Label Description Mean
Gender Expenditure, per capita
gender.m Male 37.10 exp.tot.pc Total 139.75
gender.f Female 62.90 exp.fb.pc Food & beverage 31.45
Age exp.shop.pc Shopping 95.93
age.1935 19–35 5.69 exp.tour.pc Tour 4.75
age.3664 36–64 78.26 exp.trans.pc Transportations 1.08
age.65m >64 16.05 exp.oth.pc Other 4.76
Country of residence Travel party
res.bra Brazil 41.97 group.tot Size 2.43
res.arg Argentina 36.63 Nm.less15.p % males < 15 yrs 2.38
res.oam America, other 11.80 Nf.less15.p % females < 15 yrs 2.42
res.eur Europe 7.74 Nm.1529.p % males 15–29 yrs 3.31
res.oth Other 1.86 Nf.1529.p % females 15–29 yrs 4.89
Loyalty Nm.3064.p % males 30–64 yrs 27.79
times.1 First 65.57 Nf.3064.p % females 30–64 yrs 42.94
times.23 Second or third 20.08 Nm.65more.p % males > 64 yrs 5.57
times.m3 Fourth or more 14.35 Nf.65more.p % females > 64 yrs 10.69
Occupation
occ.rethou Retired, housewife 26.00
occ.own Entrepreneur or similar 13.97
occ.pro Professional or similar 33.66
occ.emp Employee or similar 12.34
occ.cre Ship’s crew 7.00
occ.oth Other 7.04
Port of call
port.mon Montevideo 43.60
port.pde Punta del Este 56.40
Satisfaction
like.bui Liked buildings 25.79
like.lan Liked landscape 7.30
like.hyg Liked hygiene 9.01
like.peo Liked people 31.26
like.tra Liked tranquillity 4.87
like.foo Liked food 4.37
J.G. Brida et al. / Knowledge-Based Systems 70 (2014) 128–136 131

Fig. 1. ClustOfVar: dendrogram.

homes, particularly in Punta del Este [11]. In addition, notable visitors (65.57%). They disembarked to Montevideo in a bit lower
flows of repeaters come from Argentina, because of the percentage (43.6%) than in Punta del Este (56.4%). The majority
closeness of its capital city to Montevideo, and the related high of them were professionals (33.66%), retired or housewives (26%)
frequency of land and maritime means of transportations [1]. and entrepreneurs (13.97%). The main satisfaction-related aspect
However, as also descriptive statistics of Table 1 point out, the was local people (31.26%), followed by buildings and architectures
percentage of cruise passengers who already visited the of the visited places (25.79%) and hygiene (9.01%). The average vis-
country, also through other forms of tourism, is low. itor spent USD 139.75, with shopping (USD 95.93) and food and
– Trip-related. This category comprises travel party information beverage (USD 31.45) being the main items in terms of average
(size and composition by age and gender), port of call where expenditure. Travel parties were composed on average by more
the ship stopped, per capita expenditure in USD (total, and than two passengers (2.43), and consistently with the information
for separate items such as food & beverage, shopping, tour, about age, the highest average shares were of women in the class
transportations, other). age 30–64 (42.94%), but also men (27.79%) in the same age bracket.
– Psychographic. The surveyed psychographic variables reported
whether the tourist enjoyed particular aspects of the visit, 4. Results and discussion
namely buildings, landscape, hygiene, people, tranquillity, food
and beverage, other. 4.1. Clusters of variables

Descriptive statistics from Table 1 highlight that the sample Fig. 1 reports the dendrogram for the clustering of variables.
was composed mainly by women (62.9%), and a large percentage Aggregation distances (Fig. 2) can be examined in order to choose
of respondents were in the middle class age (36–64, 78.26%). The
majority of passengers were resident in the two neighbor countries
(Brazil, 41.97%, and Argentina, 36.63%), and were mainly first time

Fig. 3. ClustOfVar: stability of the partitions, mean adjusted Rand index from
Fig. 2. ClustOfVar: aggregation levels. bootstrap.
132 J.G. Brida et al. / Knowledge-Based Systems 70 (2014) 128–136

for quantitative ones. In addition, we also reported correlation


between the first principal component of each VClust and original
quantitative and dummy variables.
The examination of the clusters of variables suggests that
VClust 1 refers to economic impact. It aggregates all expenditure
items, as well as the enjoyment for local people. All reported vari-
ables are positively related with the extracted component. Indeed
this is an interesting indication about the behavior of disembarking
cruise tourists, inasmuch as their economic impact in all the items
appears to have something in common with the enjoyment of per-
sonal characteristics of locals. This also suggests that economic
impact is not correlated to a segment of tourists with other specific
characteristics, which turns out to be quite a strong indication to
market operators.
Age is the other important element for the definition of cruisers’
characteristics in two VClust. VClust 3 collects variables of age and
group composition that are related to teenagers and young people.
In addition also ‘other’ occupations are present. The latter rein-
forces such age-related VClust, since students compose about
30% of this category. Another age-related set of variables is VClust
5, which can be attributed to the segment of elderly travellers
(65 years or more). This is confirmed by group composition, the
Fig. 4. ClustOfVar: boxplots of mean adjusted Rand index from bootstrap. negative correlation with the middle-aged segment (36–64), and
the positive one with the occupation items of retired and house-
hold. Consistently, the cluster reports other occupations in a nega-
the cutting threshold of the dendrogram to determine groups. It tive relationship with the extracted first component (professionals,
emerges that three and five are two candidate numbers. This is entrepreneurs, employees).
confirmed by the bootstrapped mean-adjusted Rand criterion VClust 4 instead reports gender-related characteristics, where
reported by Fig. 3, where however a slightly higher value is specifically a direct correlation with females is found, together
reported for K = 5. The higher stability of the five-partition solution with an inverse correlation with the percentage of males aged
emerges also from the boxplots of Fig. 4, which report a lower dis- 30–64 in the travel party.
persion and a higher median value for that K. VClust 2 is the group that collects the highest number of vari-
Table 2 displays the composition of each cluster of variables ables. It is made of people from Argentina and disembarking at
(VClust), together with factor loadings. The latter correspond to Punta del Este. They did not visit Uruguay for the first time and
correlation ratio for qualitative variables, and squared correlation are not part of the crew of the ship. Group size has also a positive

Table 2
ClustOfVar: clusters of variables composition, squared loadings, correlation between the first principal component from PCAmix and each quantitative or dummy variable.

Variable Squared loading Corr. Variable Squared loading Corr.


VClust 1 VClust 3
Economic impact Young travellers, also in groups
exp.tot.pc 0.952529442 0.976 age.1935 0.8424915 0.918
exp.shop.pc 0.756128525 0.870 Nf.1529.p 0.5552205 0.745
exp.fb.pc 0.208906669 0.457 Nm.1529.p 0.4332971 0.658
exp.tour.pc 0.030133165 0.174 occ.oth 0.1498959 0.387
exp.trans.pc 0.021897322 0.148 VClust 4
exp.oth.pc 0.006675104 0.082 Women, not travelling with middle-aged men
like.peo 0.000185593 0.014 gender.f 0.9343701 0.967
VClust 2 gender.m 0.9343701 0.967
Loyal Argentineans at Punta del Este Nm.3064.p 0.4313739 0.657
port.mon 0.739120566 0.860 VClust 5
port.pde 0.739120566 0.860 Elderly travellers, also in groups
res.arg 0.527357386 0.726 age.65m 0.88127067 0.939
times.1 0.13029228 0.361 age.3664 0.76363258 0.874
res.oam 0.111062122 0.333 Nf.65more.p 0.67947608 0.824
res.eur 0.094914868 0.308 Nm.65more.p 0.52230948 0.723
occ.cre 0.076499885 0.277 occ.rethou 0.44769825 0.669
res.bra 0.075708253 0.275 Nf.3064.p 0.40892876 0.639
like.foo 0.072251661 0.269 occ.pro 0.10415314 0.323
times.23 0.067257714 0.259 occ.own 0.03329502 0.182
group.tot 0.054516384 0.233 occ.emp 0.02862077 0.169
like.bui 0.042374137 0.206
times.m3 0.037197127 0.193
res.oth 0.032328529 0.180
Nm.less15.p 0.023459383 0.153
Nf.less15.p 0.019858367 0.141
like.hyg 0.018630443 0.136
like.lan 0.013853278 0.118
like.tra 0.002315793 0.048
J.G. Brida et al. / Knowledge-Based Systems 70 (2014) 128–136 133

Fig. 5. Clusters of cruise passengers from Ward algorithm: dendrogram.

Fig. 6. CART analysis: decision tree for the five clusters of cruise passengers, original variables. The root reports the number of units belonging to each cluster. The second row
of rectangular boxes report the percentage of units belonging to each cluster that enter a given step, respectively for clusters 1, 2, 3, 4 and 5.

correlation, as well as the percentage of the youngest group mem- 4.2. Clusters of units analysis via CART
bers (<15 years) of both genders, which may indicate also the pres-
ence of families. The first principal component within the cluster is The five principal components of each VClust were used to
positively related to the enjoyment of hygiene and tranquillity. obtain the clusters of units of Fig. 5. Five out of eight of the stop-
Consistently with some of these characteristics, negative correla- ping rules mentioned in Section 2.3 suggested a number of five
tion occurs with disembarking in Montevideo, first time visiting partitions of units. The description of their main discriminating
and living in other places than Argentina. The first principal com- variables via CART adopts two points of view. The first one illus-
ponent shows also negative correlation with those who indicated trates how all ‘raw’ variables used for the ClustOfVar analysis
satisfaction with more material elements such as food, buildings (Table 1) affected the formation of the five clusters of units
and landscape. (Fig. 6). The second one (Fig. 7) shows instead how the synthesis
134 J.G. Brida et al. / Knowledge-Based Systems 70 (2014) 128–136

Fig. 7. CART analysis: decision tree for the five clusters of cruise passengers, first extracted components from clusters of variables. The root reports the number of units
belonging to each cluster. The second row of rectangular boxes report the percentage of units belonging to each cluster that enter a given step, respectively for clusters 1, 2, 3,
4 and 5.

of ClustOfVar, i.e. the first principal components of each VClust, that reports a women-related dimension. Then, male cruisers are
ruled the aggregation in the same five clusters of units. further distinguished according to their economic impact (VClust
Fig. 6 shows that, within the broader set of variables used for 1) into a group entering 2.3% of Cluster 1 (high economic impact),
forming the clusters (see Table 1), only age, gender, port of disem- and another one of non-high impact being part of 93.5% of Cluster
barking and total expenditure concur in discriminating the groups. 1. Women instead are split for their age (VClust 5), with elder ones
Age is the first and most important variable that enters the binary being divided again into younger (part of 51.9% of Cluster 3) and
splitting process of the tree. It divides Cluster 4 (right branch) from not (54.5% of Cluster 4). Younger women are split according to
the remainder. This group reports the age of cruisers of 65 and their economic impact (VClust1) into Cluster 5 (only 5.9% of it)
more as most distinctive feature. The complementary subset (left and 97.9 of Cluster 2.
branch of the tree) is characterized by cruisers of younger age. Overall, results of both the CART analyses highlighted the pri-
For these units, the port of call is a very important further discrim- mary importance of variables such as age and port of disembark-
inating variable. The right sub-branch comprises those who disem- ing, and to a second extent of gender and expenditure. The latter
barked in Montevideo, and it splits further in two parts where age deserves peculiar attention. In both CART diagrams, the segment
acts again as a discriminating factor. Middle-aged cruisers (36–64) of very high spenders is isolated. But if on one side the raw variable
are part of almost the totality of Cluster 5 (91.7%), whereas tourists in Fig. 6 regarded only 4.7% of Cluster 5, on the other side such ele-
of other ages enter 48.9% of Cluster 3. For those who instead disem- ment percentage increased when dealing with a broader concept of
barked in Montevideo, gender concurs to discriminate further ‘economic impact’ as measured by VClust 1 (2.3% plus 5.9%). It is
between men, who enter 89.3% of Cluster 1, and women, who also interesting to note that the model does not apply any other
are further discriminated according to their age. In this further splitting rule within the subsample of the spenders with relatively
step, again Cluster 3 results to be composed by non-middle aged lower levels. This might confirm that, after isolating for very high
tourists, this time for its 35.1%, whereas cruisers aged between spenders, the economic impact of cruisers might be quite wide-
36 and 64 take another step of splitting. The latter involves total spread and somehow similar across the subsets.
expenditure levels and generates 91.3% of Cluster 2 and 4.7% of
Cluster 5, where high spenders affect such small portion of the
latter. 5. Conclusions
The interpretation of clusters’ main discriminating variables is
less direct if we take into account the latent dimensions that syn- Despite the remarkable growth of the cruise industry and its
thesize each VClust – Fig. 7. Of course, the higher complexity of this great changes, studies on cruise passenger’s behavior while at port
second tree structure might be due to the fact that the latent are not common in the academic research. In this sense, the pres-
dimensions of VClust express concepts that are themselves more ent paper contributed both to the cruise industry literature, and to
complex than the single indicators of Fig. 6. Here VClust 2, that is provide some insights to destination stakeholders, who can use the
the highly related variable to Argentineans disembarking at Punta results as an input to design suitable managerial strategies and to
del Este, is the most important discriminating feature. Those that offer a pleasant and satisfactory visitation during the time ship
present the lowest values of this variable are part of the right side passengers are onshore.
branch, which is split further by age-related variables (VClust 5 for In this work we adopted a three-step multivariate market seg-
elderly, and then VClust 3 for young travellers). Then what mentation analysis. Given the classic problem in reducing dimen-
reported for Fig. 6 appears again: elderly travellers enter 43.9% of sionality we used ClustOfVar, a relatively recent algorithm for
Cluster 4, whereas young ones are part of 42% of Cluster 3, and the statistical software R that is able to handle qualitative-quanti-
middle aged enter 89.3% of Cluster 5. Cruisers in the left-side tative datasets. There are few studies designed to handle mixed
branch generated by VClust 2, which is those that show a high datasets in the tourism field and, particularly to best of our knowl-
score for this variable, are further divided according to VClust 4 edge, in the cruise tourism analysis. Key findings reported the
J.G. Brida et al. / Knowledge-Based Systems 70 (2014) 128–136 135

importance of the cruiser onshore expenditure in general terms, [4] H. Bouhamed, T. Lecroq, A. Rebaï, New filter method for categorical variables’
selection, Int. J. Comput. Sci. Issues 9 (2012) 1–10.
thus suggesting that cruise tourists find good shopping opportuni-
[5] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression
ties while at Uruguayan ports. In addition, economic impact in all Trees, Wadsworth, Belmont, Calif, 1984.
the items appears to have something in common with the enjoy- [6] J.G. Brida, M. Disegna, L. Osti, Segmenting visitor of cultural events by
ment of personal characteristics of local people. Thus, it would motivation: a sequential non-linear clustering analysis of Italian Christmas
Market visitors, Expert Syst. Appl. 39 (2012) 11349–11356.
be desirable to create a good atmosphere for cruisers while at port, [7] J.G. Brida, M. Disegna, R. Scuderi, Visitors of two types of museums: a
through positive experience derived from the interaction with segmentation study, Expert Syst. Appl. 40 (2013) 2224–2232.
locals. With this regard, the synergic cooperation with local part- [8] J.G. Brida, V. Fasone, R. Scuderi, S. Zapata-Aguirre, Exploring the determinants
of cruise passengers’ expenditure while at ports of call in Uruguay, Tourism
ners may lead also to effects that go beyond the single cruise trip, Econ. (2013), http://dx.doi.org/10.5367/te.2013.0322 (online Fast Track
such as the increasing loyalty of customers [38], and the develop- article).
ment of local firms [27]. [9] J.G. Brida, M. Pulina, E. Riaño, S. Zapata-Aguirre, Cruise passenger’s experience
embarking in a Caribbean Homeport, Ocean Coast. Manag. 55 (2012) 135–145.
As to managerial implications from other findings, suggestions [10] J.G. Brida, M. Pulina, E. Riaño, S. Zapata-Aguirre, Cruise passengers in a
regard the launch of targeted market strategies and campaigns homeport: a market analysis, Tourism Geographies: Int. J. Tourism Space Place
focusing on attracting passengers from nationalities different from Environ. 15 (2013) 68–87.
[11] J.G. Brida, J. Pereyra, R. Scuderi, Repeat tourism in Uruguay: modelling
the Argentinean one. As results indicate, being resident in Argen- truncated distributions of count data, Qual. Quant. 48 (2014) 475–491.
tina is in the same group and positively correlated with loyalty [12] J.G. Brida, R. Scuderi, M.N. Seijas, Segmenting cruise passenger visiting
indicators. Promotion of cruise tourism can be a good strategy in Uruguay: a Factor-Cluster Analysis, Int. J. Tourism Res. 16 (2014) 209–222.
[13] M. Charrad, N. Ghazzali, V. Boiteau, A. Niknafs, NbClust: An Examination of
order to increase the volume of the segment of travellers who have
Indices for Determining the Number of Clusters: NbClust Package. R package
never visited the country. version 1.4, 2013. <http://CRAN.R-project.org/package=NbClust>.
In addition, variables such age and port of disembarking, and to [14] M. Chavent, R. Genuer, V. Kuentz-Simonet, B. Liquet, J. Saracco, ClustOfVar: an
a lesser extent gender and economic impact, have shown to be the R package for dimension reduction via clustering of variables. Application in
Supervised Classification and Variable Selection in Gene Expressions Data.
main drivers for cluster formation. This is a further strong sugges- Poster Presented at the Conference ‘Statistical Methods for (post)-Genomics
tion towards the organization of specific on-site activities for dif- Data (SMPGD 2013)’, 2013. <http://www.math.u-bordeaux1.fr/~machaven/
ferently aged cruisers. Moreover, the specific features of each wordpress/wp-content/uploads/2012/12/poster-SMPGD.pdf>.
[15] M. Chavent, V. Kuentz, B. Liquet, J. Saracco, ClustOfVar: an R package for the
place of disembarking should be highlighted while promoting the clustering of variables, J. Stat. Softw. 50 (2012) 1–16.
activities at the port of call, and at the same time be an important [16] M. Chavent, V. Kuentz, J. Sarraco, Orthogonal rotation in PCAMIX, Adv. Data
part of the experience of the tourist. Once more, all this reinforces Anal. Classif. 6 (2012) 131–146.
[17] CLIA-Cruise Lines International Association, 2012. Cruise Industry Update.
the idea of creating or strengthening the relationships with local Retrieved from <http://www.cruising.org/sites/default/files/pressroom/
partners. For what concerns the economic impact, at Punta del Este 2012CruiseIndustryUpdateFinal.pdf> (accessed 27.03.13).
there has been a reduced number of tourists exhibiting very high [18] S. Dolnicar, F. Leisch, Segmenting markets by bagged clustering, Australasian
Market. J. 12 (2004) 51–65.
levels of spending, mainly women. But after isolating for these, [19] S. Dolnicar, S. Kaiser, K. Lazarevski, F. Leisch, Biclustering: overcoming data
no further discriminating thresholds of expenditure or economic dimensionality problems in market segmentation, J. Travel Res. 51 (2012) 41–
impact were functional to identify segments of tourists. This might 49.
[20] FCCA-Florida-Caribbean Cruise Association, 2012. Cruise Industry Overview.
suggest that the profitability of disembarking tourists could be
Retrieved from <http://www.f-cca.com/downloads/2013-cruise-industry-
quite widespread within the great majority of them. overview.pdf> (accesed 20.04.13).
The used dataset comes from an official survey from the Uru- [21] Y. Hayashi, M.H. Hsiehb, R. Setiono, Understanding consumer heterogeneity: a
guayan Ministry of Tourism (MINTURD), which reports a large business intelligence application of neural networks, Knowl.-Based Syst. 23
(2010) 856–863.
sample of cruisers. Of course, this is a strong point. However the [22] L. Hubert, P. Arabie, Comparing partitions, J. Classif. (1985) 193–208.
main limitation of this study is given by the used dataset itself. [23] H.A.L. Kiers, Simple structure in Component Analysis Techniques for mixtures
Important variables such as income, and behaviors-related ones of qualitative and quantitative variables, Psychometrika 56 (1991) 197–212.
[24] J. Kim, S. Wei, H. Ruys, Segmenting the market of Western Australia senior
while at the port are absent. In addition, the presence of psycho- tourists using artificial neural networks, Tourism Manage. 24 (2003) 25–34.
graphic variables is limited. For future researches, it would be of [25] V. Kuentz-Simonet, S. Lyser, J. Candau, P. Deuffic, M. Chavent, J. Saracco, Une
great importance to include these key variables. This would allow approche par classification de variables pour la typologie d’observations: le cas
d’une enquête agriculture et environnement, Journal de la Société Française de
a better understanding of attitudes and behaviors that are crucial Statistique 154 (2013) 37–63.
for market operators and promoters of activities related to cruise [26] R.J. Kuo, K. Akbaria, B. Subroto, Application of particle swarm optimization and
ship visit. Future research directions can consist in applying the perceptual map to tourist market segmentation, Expert Syst. Appl. 39 (2012)
8726–8735.
technique to other surveys of cruisers visiting other places and in [27] C. Lechner, M. Dowling, I. Welpe, Firm networks and firm development: the
different times, in order to better define the tourists’ profile of this role of the relational mix, J. Bus. Ventur. 21 (2006) 514–540.
growing segment in the tourism industry. More in general, the [28] S. Lee, C. Ramdeen, Cruise ship itineraries and occupancy rates, Tourism
Manage. 34 (2013) 236–237.
combination of information from clusters of variables and of units
[29] D. Legland, J. Beaugrand, Automated clustering of lignocellulosic fibres based
can support market segmentation and analysis in a wide variety of on morphometric features and using clustering of variables, Ind. Crops Prod.
business contexts. 45 (2013) 253–261.
[30] F. Leisch, A toolbox for K-centroids cluster analysis, Comput. Stat. Data Anal. 51
(2006) 526–544.
Acknowledgements [31] S.H. Liao, P.H. Chu, P.Y. Hsiao, Data mining techniques and applications – a
decade review from 2000 to 2011, Expert Syst. Appl. 39 (2012) 11303–11311.
[32] K.V. Mardia, J.T. Kent, J.M. Bibby, Multivariate Analysis, Academic Press Inc.,
This paper was supported by the Free University of Bozen-Bolz-
London, 1979.
ano project ‘‘Tourism and Economic Growth: the Role of Transpor- [33] J.A. Mazanec, Classifying tourists into market segments: a neural network
tations and Spatial Contiguity’’. approach, J. Travel Tourism Mark. 1 (1992) 39–59.
[34] MINTURD-Ministerio de Turismo y Deporte, 2012. Database Cruise Survey.
<http://www.turismo.gub.uy/estadisticas/item/385-bases-de-datos>
References (accessed 15.04.13).
[35] C. Mok, T.J. Iverson, Expenditure-based segmentation: Taiwanese tourists to
[1] A. Abbruzzo, J.G. Brida, R. Scuderi, Determinants of tourist expenditure as a Guam, Tourism Manage. 21 (2000) 299–305.
network: empirical findings from Uruguay, Tourism Manage. 43 (2014) 36–45. [36] T. Peisley, The cruise ship industry to the 21st century, Travel Tourism Analyst
[2] K. Andriotis, G. Agiomirgianakis, Cruise visitors’ experience in a Mediterranean 2 (1995) 4–25, http://dx.doi.org/10.1057/rpm.2009.55.
port of call, Int. J. Tourism Res. 12 (2010) 390–404. [37] R Core Team, R: A Language and Environment for Statistical Computing. R
[3] J. Bloom, Tourist market segmentation with linear and non-linear techniques, Foundation for Statistical Computing, Vienna, Austria, 2013. <http://www.R-
Tourism Manage. 25 (2004) 723–733. project.org/> (accessed 15.04.13).
136 J.G. Brida et al. / Knowledge-Based Systems 70 (2014) 128–136

[38] M. Rodríguez-Díaz, T.F. Espino-Rodríguez, A model of strategic evaluation of a [42] B. Thurau, A. Carver, J. Mangun, C. Basman, G. Bauer, A market segmentation
tourism destination based on internal and relational capabilities, J. Travel Res. analysis of cruise ship tourists visiting the Panama Canal Watershed:
46 (2008) 368–380. opportunities for ecotourism development, J. Ecotourism 6 (2007) 1–18.
[39] L. Rutkowski, M. Jaworski, L. Pietruczuk, P. Duda, The CART decision tree for [43] UNWTO, Cruise Tourism, First ed. Current situation and Trends, ooMadrid,
mining data streams, Inf. Sci. 266 (2014) 1–15. 2010.
[40] G. Sánchez-Hernández, F. Chiclana, N. Agell, J.C. Aguado, Ranking and selection [44] J.H. Ward, Hierarchical grouping to optimize an objective function, J. Am. Stat.
of unsupervised learning marketing segmentation, Knowl.-Based Syst. 44 Assoc. 58 (1963) 236–244.
(2013) 20–33. [45] M. Xie, F. Deng, X. Zhang, Y. Tian, P. Li, H. Zhai, MLRMPA: an R package of
[41] T. Therneau, B. Atkinson, B. Ripley, rpart: Recursive Partitioning and multiple linear regression model population analysis based on a cluster
Regression Trees. R package version 4.1-5, 2014. <http://CRAN.R-project.org/ sampling technique for variable selection of high dimensional data, Chemom.
package=rpart>. Intell. Lab. Syst. 132 (2014) 124–132.

You might also like