Using Correspondence Analysis To Monitor The Persona Segmentation Process

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Using Correspondence Analysis to monitor the persona

segmentation process
Lieve Laporte Karin Slegers Dirk De Grooff
K.U.Leuven – CUO – IBBT K.U.Leuven – CUO – IBBT K.U.Leuven – CUO – IBBT
Parkstraat 45 / 3605 Parkstraat 45 / 3605 Parkstraat 45 / 3605
3000 Leuven, Belgium 3000 Leuven, Belgium 3000 Leuven, Belgium
lieve.laporte@soc.kuleuven.be karin.slegers@soc.kuleuven.be dirk.degrooff@soc.kuleuven.be

ABSTRACT always included: making persona segmentations (i.e.


Persona segmentation is the first phase of the persona creating groups of similar users), and writing the persona
method. It can be defined as the process of creating narratives (i.e. writing a detailed description of the
representative groups of similar users. Since the origin of persona’s behaviours, attitudes, and personal data).
the persona technique, both qualitative and quantitative Qualitative and/or quantitative methods can be applied to
methods have been used to create persona segments. While the implementation of both phases. Traditionally, and
the qualitative approach has been criticized because of its according to Cooper’s goal-directed design method [9],
lack of accuracy in creating persona segments, application personas are created based on rich, qualitative data coming
of quantitative methods seems to be suffering from the from interviews and observations. Recently, however, this
same problem, due to inconsiderate application of statistical approach has been criticized because of its lack of rigor:
techniques. In this paper, we present Correspondence developing personas based on a set of interviews with a
Analysis, an exploratory data technique, as an alternative (very) limited number of users can hardly represent the
quantitative persona segmentation method. We demonstrate breadth of real users [25], with the ultimate risk of
that this method is appropriate to create useful persona diminishing the people that should be at the centre of the
profiles, and, additionally, it can aid in carefully monitoring design process [11]. As a reaction to this criticism, a trend
the segmentation process. towards the use of quantitative data in combination with
Author Keywords statistical techniques to create persona segmentations is
Personas; persona segmentation; correspondence analysis. now observable. Several multivariate techniques, such as
factor analysis and cluster analysis have been applied to the
ACM Classification Keywords domain of persona segmentation (e.g. [22][4]). This
H.5.2. Information interfaces and presentation: User approach, however, has equally been subject to criticism,
Interface: Theory and Methods. mainly related to the use of surveys to gather quantitative
INTRODUCTION user data, and to the overconfident use of statistical
As part of his goal-directed design methodology, Cooper techniques to analyze these data [28].
developed the persona-technique [9]. The notion persona Consensus has clearly not been reached yet about this
refers to the description of a hypothetical, archetypical user matter. This paper introduces an exploratory data technique,
[3]. It is a fictional character, made up of a demographic Correspondence Analysis (CA), as an alternative
profile, ‘psychographics’ such as goals, attitudes and quantitative method to create persona segmentations. In the
behaviors, and a set of needs in relation to the technological context of a case study, we will demonstrate and illustrate
design at hand. These characteristics are generally based the appropriateness of such a quantitative approach to
upon real user data, to make the persona a representation of analyze representative sets of rich categorical data and thus
a unique group of people who share common goals, create useful persona segmentations. In addition, we will
attitudes and behaviours. Personas focus attention on show that CA has a set of properties that makes it especially
aspects of technology design and use that other methods do useful to closely monitor the segmentation process, and,
not [26], and, like that, they can guide designers and like that, allows avoiding overconfident or thoughtless use
developers during product development. of statistical techniques.
Two separate phases in the persona creation process are In this paper, we will explicitly focus on the segmentation
process; the second phase, interpreting the segments and
Permission to make digital or hard copies of all or part of this work for writing the persona narratives, will not be considered. In the
personal or classroom use is granted without fee provided that copies are next section of this paper, we will present an overview of
not made or distributed for profit or commercial advantage and that copies traditional persona creation methods. Next, we will describe
bear this notice and the full citation on the first page. To copy otherwise,
or republish, to post on servers or to redistribute to lists, requires prior
CA as a multivariate, exploratory statistical technique, and
specific permission and/or a fee. demonstrate its use in a case study creating persona
NordiCHI '12, October 14-17, 2012 Copenhagen, Denmark segmentations for the design of an intelligent adaptive
Copyright © 2012 ACM 978-1-4503-1482-4/12/10... $15.00"

265
learning system for children with reading comprehension patterns of responses from a mobile user questionnaire,
difficulties. The paper concludes with a qualitative analysis leading to personas that would, according to the authors,
of the technique. certainly not have emerged from manually sifting through
the data [16].
PERSONA CREATION METHODS
Qualitative personas Persona segmentations that have been created based on
Originally, purely qualitative methods were applied during quantitative data and with the use of quantitative (often
the entire process of creating personas: making the statistical and automatic) techniques often lack some ‘body’
segmentations as well as writing the narrative part of the and they are therefore often enriched with the use of
persona. A typical and recommended approach within additional qualitative data. Khalayi et al., for example,
qualitative persona creation would be to conduct one-to-one describe a framework to create personas based on a pool of
interviews, or ethnographic user studies, combining quantitative data coming from market research and a
interviews with observation. After data collection, customer data warehouse, at first, and then supplement and
qualitative analysis techniques are applied to make enrich them with additional qualitative data according to the
segmentations in the data. As a general procedure, the most product concept at hand [20]. In another study, Tu et al.
important behavioral variables or themes, according to the used a 45 dimensions survey as the basis for Multivariate
project objectives, are determined [27]. Next, each Cluster Analysis [32]. Using MVCA to segment and profile
interviewee is mapped against the appropriate set of the data and qualitative methods to enrich the persona
variables. In that way, a segmentation of the data into profiles, they found that resulting personas were more
separate groupings based on these behavioural patterns is representative and less ambiguous. Similarly, Sinha made
created, each presenting a different persona profile. This use of PCA to identify underlying factors in the results of a
segmentation process is mostly done with the use of manual survey questioning 32 dimensions of the restaurant
techniques, such as affinity diagrams, card sorting exercises experience. Results of the data analysis were given to two
and expert panels (e.g. [5] [21]). Besides these commonly information architects, who used the factor loadings as the
used and typical methods, a few very specific methods have core of the personas. Other information helped them to
been suggested: Miaskiewicz uses Latent Semantic flesh out the demographic characteristics of the personas. In
Analysis, a qualitative semi-automatic method for addition, they felt the need to do more user research, such
extracting and representing the contextual-usage meaning as interviews, to finish the creation of the personas [29].
of words in a large corpus of text to make persona clusters According to the author, the proposed technique enhances
[24]. Faily & Fléchais make use of a qualitative data the accuracy of the persona creation, while working in a
analysis technique based on Grounded Theory to create complementary way with other qualitative methods.
personas whose characteristics are, according to the McGinn et al. created a survey based on stakeholder input,
authors, both grounded in and traceable to their originating containing 18 questions that gathered both demographic
source of empirical data [14]. Usually, the persona and behavioral data. They conducted exploratory FA to
segmentations created with these qualitative methods are extract a set of factor-driven groups, and resulting personas.
considered as the end result. But if enough time and Afterwards, people from each of the groups were
resources are available, the segmentations can be viewed as interviewed in order to collect information that could not
hypotheses, and a quantitative method, such as a statistical easily be analyzed by statistical methods (i.e. questions
analysis based on survey data, can be used to validate them. concerning people’s motivations, attitudes, goals etc.).
In each case, after the segmentation process, the persona Besides adding this type of ‘richer’ data to the factor groups
description is assembled around the segments. For each of (and personas), these interviews came out to be a useful
these, details based on the data are added, and personas are validation technique for the survey design [22].
described in narrative form [15][1]. Criticism
Quantitative personas Both qualitative and quantitative methods have been subject
Data collection in a quantitative research design is usually to a range of criticisms in persona literature. To start with,
done with user surveys, site traffic analysis or CRM data the validity of the persona method in itself has not been
analysis [25]; data analysis typically makes use of statistics researched extensively [7]. Faily describes the persona
to make segmentations. Several statistical techniques to concept as a paradox: “A behavioural specification of sorts,
segment persona data have been studied. In an empirical their most endearing characteristic is that they do not look
study using data from an online survey and server queries like specifications: they have names, jobs, feelings, and
of system usage [4], Brickey compared three multivariate goals they want to fulfill, sometimes irrespective of the
methods – Factor Analysis (FA), Principal Component purpose a system was designed for” [14].
Analysis (PCA) and Multivariate Cluster Analysis (MVCA) Opponents of the explicit focus on qualitative data claim
– and a manual persona clustering technique. Degree of that personas are often created based on insights on a few
agreement between each of the methods and the manual users at the expense of a broader understanding of users
clustering method was found moderate for FA/PCA and through the use of quantitative data (e.g. [29]). Moreover,
poor for MVCA [4]. Greaney et al. used PCA to uncover

266
as the amount of (textual) data coming from interviews techniques on survey data to create persona segmentations,
grows, it becomes difficult for human experts to make without monitoring the statistical analyzing process. Both
objective judgments and trace their findings back to user survey and statistics seem to be treated as a black box,
data [27]. Different creators might end up making totally putting in data, and taking the output for granted. Hardly
different and too subjective personas [32]. Another any analysis of the segmentation process is done, and the
criticism on the qualitative approach states that existing result is taken ‘as is’. Nevertheless, whether using
assumptions about users don’t tend to be questioned during qualitative or quantitative methods to create personas,
the qualitative analysis process: “people find what they are segmentation is the critical starting point of the persona
looking for” [25]. Finally, interviews are often judged as an creation process. If the core way the users are defined is not
inefficient and time-consuming way of finding clear, useful and correct, the personas will not be either.
representative users [30]. And if design decisions are based on these wrong segments
or user representations, the product to be developed might
Criticism on quantitative methods is often related to the fact
end up for the wrong people. Enough attention should
that most of them originally come from market research.
therefore be given to the accurateness and validity of
Although marketing and user experience research
persona segmentation.
effectively have shared interests (e.g. understanding the
psychology of the user/purchaser in order to deliver a This paper argues that Correspondence Analysis (CA), an
valued user experience), important differences in practice exploratory data technique to analyze categorical data, can
traditions remain. In particular, market research places offer added value to persona segmentation. It will be
much greater emphasis on surveys and tends to have more demonstrated as a useful method to make hypothetical user
confidence in the idea that self-report predicts behavior profiles by identifying patterns and critical attributes in the
[28]. Siegel describes a case study in which the profiles of 8 user data. Moreover, we will show that CA has several
clusters of respondents based on survey data were used as properties that will aid in carefully monitoring and
initial personas [28]. New participants were then recruited validating the process of creating persona segmentations.
based on a screening tool that was developed using a
CORRESPONDENCE ANALYSIS
slightly adapted version of the original classification tool. CA is an exploratory data technique, which means that it
When gathering additional rich qualitative data from these explores data for which no specific hypotheses have been
new users to support the persona development for each formed [10]. Its main goal is to reveal structure and patterns
segment, it was found that people could easily be grouped in a complex data matrix, by replacing the raw data with a
into 2-4 subgroups within each segment, and many of these more simple data matrix without losing any essential
subgroups together across segments produced more
information [8]. More specifically, CA analyzes two-way
coherent groups than grouping them by their assigned
tables containing some measure of correspondence between
segments. Based on these results, the author warns against the rows and columns (e.g. frequency counts) [23]. In
accepting statistical segmentations of survey data as ‘the addition, CA makes it possible to represent the relationship
truth’ instead of viewing them as preliminary hypotheses to between the variables and categories of variables visually,
be tested with behavioural research. The essence of his as points within a (typically two-dimensional) space, which
criticism is that, when doing quantitative research, basic makes interpretation easier [8]. For example, categories that
properties of statistical methods (e.g. the differences are similar to each other appear close to each other in the
between treatment of statistical groupings and individual plots, and row and column points are shown in the same
measurements) and principles such as the need to evaluate graphical display. In this way, it is easy to see which
reliability and the need for external validation need to be categories of a variable are similar to each other or which
respected [28]; otherwise segmentations will not reflect the categories of different variables are associated to each other
reality of the users. Other criticisms relate to the design and
[23]. In other words, CA tries to explain the most variance
use of marketing surveys to create persona segmentations,
in the model with the least number of dimensions
placing a lot of emphasis on demographic variables to form (dimensions in the context of CA are somewhat comparable
clusters, with questions focusing on the like/dislike of a to principal components in principal component analysis).
product concept instead of the specifics of the interaction of
the user with the product [30]. CA has been a popular method in some areas of the social
sciences, such as marketing, ecology and sociology, where
From the criticisms on the qualitative approach to persona it is used to transform complex data tables into
segmentation, it becomes clear that persona creation could straightforward graphical displays of data patterns [18]. To
benefit from a more objective approach, based on a the best of our knowledge, it has not been used in HCI
representative, sufficiently large sample of users. research yet, although it has been suggested as a suitable
Quantitative methods are needed to accomplish this goal. method to analyze patterns of player behavior for the
The criticisms on the quantitative methods, however, show
creation of play-personas [6]. Similarly, we argue that CA
that, at least so far, these segmentation methods have not
can be suitably applied to create persona segmentations.
been applied in a well-considered way. Indeed, all studies First, CA can transform a large set of complex data into a
described here [4, 16, 20, 22, 29, 32] use multivariate

267
simpler display of variables, while at the same time research materials for the second study. One of the main
preserving all of the valuable information in the data set. objectives of the second field study was to create child-
This is also the case with other multivariate techniques such personas, to be used throughout the entire design process in
as factor analysis, principal component analysis and cluster the project.
analysis, which have been used to make persona
Data collection
segmentations. But while these other techniques assume The study was conducted in two British schools, both of
their input data to be at least at the ordinal level, CA has
which have a deaf children’s unit and a mainstream unit. A
very flexible data requirements [18]. For example, when a
total of 114 children from four classes (one per year) of
Likert scale is used to collect data, the spaces between
each unit were visited in their classroom. Several
descriptors (i.e. ‘almost never’, ‘sometimes’, and ‘often’) assignments were done with all children in all classes.
are not necessarily equivalent (i.e. the distance between Table 1 presents an overview of participating (hearing and
‘almost never’ and ‘sometimes’ is not necessarily the same deaf) children, per age and gender.
as the distance between ‘sometimes’ and often’). In this
type of data sets, CA is a useful technique because it Class Hearing/deaf Nb of boys Nb of girls Total
focuses mainly on how the variables correspond to one Year 3 Hearing 25 16 41
Deaf 6 7 13
another and not whether there is a significant difference Year 4 Hearing 22 21 43
between these variables [13]. And indeed, categorical data Deaf 7 4 11
are often easier and less time consuming to collect in HCI Year 5 Hearing 15 29 44
research. For example, it is less complex to ask children Deaf 7 5 12
whom they like to play computer games with compared to Year 6 Hearing 25 24 49
asking them to describe the degree to which they like to Deaf 5 8 13
play games with a particular person. In addition, no Table 1. Children participating to assignments.
distributional assumptions are necessary, unlike classical The purpose of the study was to collect data on children’s
techniques involving inference to population parameters. attitudes, behaviour and goals towards their technology use
The only assumption required of the data is that the values (i.e. computer, games) and towards reading. To address
not be negative [8]. Third, CA demonstrates how variables these domain-specific questions, and taking into account
are associated, and not simply that they are associated (as is our methodological requirements, namely the collection of
the case in cluster analysis) [10] or that they explain a categorical data from a relatively large sample of children, a
certain amount of variance in the data set (as factor analysis number of specific paper assignments were presented to
and principal component analysis do) [13]. That way, each child in the classroom. These assignments were
relationships that cannot be identified using other centered on the following themes: mobile phone use,
multivariate techniques can be revealed. computer use, TV use, leisure activities, school activities,
Since CA typically analyzes the associations between two games, reading, homework, and interaction with parents.
variables, an extension of the technique, Multiple About each theme, the assignments included a number of
Correspondence Analysis (MCA), will be needed here to simple open-ended questions, referring to the context of use
analyze the associations between more than two variables. of the device or activity at hand. Examples of questions
MCA is a simple CA carried out on a matrix with cases as related to the use of a computer were: why do you use a
rows and categories of variables as columns [10]. As in CA, computer – with whom do you use a computer – what do
the analysis will be most successful when the variables you use the computer for, etc. Assignments were designed
partition the cases into clusters with the same or similar in a child-friendly way, showing thought bubbles grouped
categories [23]. around a coloured picture of the device at hand (shown in
Figure 1) (according to the guidelines from [2]). The idea
CASE STUDY: CREATING CHILD-BASED PERSONAS was to ask the children survey-like questions, but in a
FOR AN ADAPTIVE READING COMPREHENSION TOOL playful way. A total of 9 assignments were presented to
CA was used for the creation of persona segmentations in each child. Each assignment took about 5 minutes to make,
the context of a project to design and develop an intelligent enabling us to complete all assignments within one class
adaptive learning system for hearing and deaf children with period. Besides the assignments, a few
reading comprehension difficulties. The system will consist demographic/psychographic characteristics from the
of smart games, asking children to draw inferences about children were recorded: age, gender, and whether a child
temporal events of stories [31]. During the user was deaf or hearing. The method described above allowed
requirements collection phase of this project, two field us to collect quantitative data from a reasonable number of
studies were conducted. The objective of the first field participants, while at the same time creating a feel for the
study was to get a general idea of the target group. targeted user group by observing the children in the
Therefore, children were observed when performing their classroom and talking with them and their teachers while
tasks in the classroom, and their teachers were interviewed they were doing the assignments.
about their pupils. The information gathered during this
first field study was used as an input for drawing up the

268
PhoneWhere: home, out
PhoneWhy: contact, emergency, fun, other
PhoneHowOften: often, notOften
TvWith: alone, aloneBrotherSister, aloneParents, parents, brotherSister,
many
TvWhere: livingroom, bedroom, other
TvWhy: fun, bored, interesting, other
TvWhen: morning, afternoon, evening, weekend
TvHowOften: everyday, sometimes, allTheTime
ComputerWhat: games, internet, communication, info, everything
ComputerWhere: bedroom, livingroom, other
ComputerWhen: morning, afternoon, evening, weekend
ComputerWhy: bored, fun, info, communication, other
Figure 1. Example of an assignment: children were asked 6 ComputerHowOften: everyday, sometimes, rarely, allTheTime
open-ended questions about their computer use. ComputerWith: alone, aloneParents, parents, brotherSister, many
Data analysis Table 2. Variables in the dataset, with their category labels.
Before starting the data analysis, the raw data coming from Every variable also included the category ‘NA’, meaning that
the assignments were prepared in order to meet the input the device/attribute at hand was not used.
requirements for a MCA. The assignment questions were
Cronbach's Variance Accounted For
open-ended, and the answers were in a short, textual format. Dimension
Alpha Inertia % of Variance
All these answers were put into categories (similar answers
1 ,884 ,312 31,203
were placed into the same category), resulting in the data 2 ,843 ,251 25,148
set structure described in Table 2. The SPSS input data file 3 ,765 ,183 18,272
consisted of 114 rows, one for each case (representing one Total ,746
child’s data on all assignments), and with 20 columns Table 3. Model summary.
representing the variables. Each cell in the data matrix thus
Object scores
contained the category score for a particular variable.
Figure 2 presents the scores of each case (according to the
In the next sections, the entire procedure to arrive at the terminology of MCA, these are called objects here) on the
persona segmentation is described. This segmentation is first two dimensions. Examining the plot, we can
based on a 3-dimensional solution of a MCA; the essential distinguish 5 clusters of objects, indicated with different
elements of this solution are presented, both in the text and colours (objects indicated in grey do not really seem to be
in a set of plots. For reasons of clarity and conciseness, part of a cluster). The first dimension discriminates the
however, (a) only a subset of variables (presented in Table blue, red and orange cluster (on one end of the horizontal
2) was included in the analysis; (b) only two-dimensional axis) from the yellow and green cluster (on the other end of
plots are visually presented; (c) and the procedure is the horizontal axis). It also separates the blue cluster from
demonstrated with a few example variables from the data all the others. The second dimension seems to separate the
set instead of presenting separate results for each variable. red and yellow cluster from the orange and green cluster,
with the blue cluster going in between them. The blue
Model summary
cluster lies much farther from the origin than the other
Table 3 shows the summary of our MCA solution, with
objects, suggesting that, taken as a whole, many of the
three dimensions explaining the model. Almost 75% of the
characteristics of these objects (or cases) are not shared by
variance in the data is accounted for by this solution. Each
the other objects, since the distance from an object to the
dimension in the table is listed according to the amount of
origin reflects variation from the average response pattern.
variance explained in the model. Dimension 1 explains the
Taking into account the third dimension in our solution
most variance in the model (31%), followed by dimension 2
reveals that the orange and green clusters (scoring
(25%) and 3 (18%). The internal consistency of the three
moderately low on the second dimension) both have
dimensions was very good with Cronbach's Alpha
moderately negative (to neutral) scores on the third
coefficients (a measure of reliability) of .884, .843 and .765
dimension. The yellow cluster has very variable scores on
for dimensions 1 to 3 respectively. We chose a three-
the third dimension, ranging from very low to very high;
dimensional solution, because the amount of additional
the same goes for the blue cluster. Within the red cluster,
variance accounted for by adding a fourth dimension to the
two subgroups can be distinguished, one of them scoring
model was negligible, and lower-dimensional solutions are
very low on the third dimension, the other one scoring very
easier to interpret (see further).
high. In the next section, discrimination measures of the
Variable: categories variables in the model will be discussed. Afterwards,
BooksWith: alone, aloneParents, parents, brotherSister, friends, many category quantifications (i.e. how the categories are spread
BooksWhere: livingroom, bedroom, other along the dimensions) are described. These measures will
BooksWhy: bored, fun, haveTo, learning, other
BooksWhen: morning, afternoon, evening, weekend then lead us to a detailed interpretation of the object clusters
BooksHowOften: everyday, sometimes, rarely and the dimensions.
PhoneWhat: calling, texting, callingTexting, games, everything

269
4.0   dimension, resulting in a clear view of the spread of
categories in the dimensional space.
2.0  

0.0  
-­‐4   -­‐3   -­‐2   -­‐1   0   1   2  
-­‐2.0  

-­‐4.0  

Figure 2. Object scores on dimensions 1 and 2.

Figure 3. Discrimination measures on dimensions 1 and 2.


Discrimination measures
For each variable, a discrimination measure is computed for
each dimension, with large discrimination measures
indicating a higher degree of discrimination between the Figure 4. Quantifications of categories.
categories of a variable along that dimension [10]. The plots
Figure 4 shows such a plot for two example variables,
in Figure 3 show that the first dimension is associated to the
booksWith and computerWhy. booksWith scores high on
variables relating to TV and book use (except for tvWhen
dimensions 1 and 2 (as described in the previous result
and booksWhen who do not seem to discriminate well in
section). Plot (a) presents the seven categories of
any dimension). The variables related to phone use have a
booksWith, six of which group together at the right of the
large value on the second dimension and a small value on
plot, with NA very far from this group. The large
the first dimension. The variables booksWith and tvWith
discrimination for this variable along dimension 1 is the
have large values on both dimensions, indicating a large
result of this latter category being very different from the
spread of categories along both dimensions. Taking into
other categories of booksWith. The group of six categories
account the third dimension (not shown in the figure) shows
is spread along the second dimension, with brotherSister
that variables related to computer use (except
located at the bottom of the plot, alone and aloneParents
computerWhen) discriminate well here. Regarding the
nearer to the top, and the remaining categories near the
interpretation of the dimensions, it can be seen that each
center of the graph (friends, many, parents). Similarly, the
dimension corresponds to the use of a particular
spread of the categories of variable computerWhy (b),
device/attribute.
which scores high on the third dimension, suggests that
Category quantifications cases with computer use because of informative reasons are
The discrimination plot in the previous result section rather different from cases using the computer because it is
contains the variances of the quantified variables along a fun, and very different from those who do not use a
particular dimension. However, the same variance could computer. These category coordinates were calculated for
correspond to all categories spread moderately far apart or each variable, resulting in a detailed insight into the spread
to most categories being close together, with a few of the categories of each variable.
categories differing from this group [10]. Since
Interpreting the object scores
discrimination plots cannot discriminate between these two
Taking into the account the information coming from
conditions, MCA also provides category quantification
discrimination measures and category quantifications, a
plots, displaying the coordinates of each category on each
greater insight into the object scores displayed in Figure 2
can now be gained. In Figure 5, these object scores are

270
displayed again, but this time they are labeled by their
category value for the variable booksWith. The plot shows
there is an almost perfect differentiation of the categories
noBooks and brothers&Sisters, demonstrating that
booksWith discriminates in the first and second dimensions
respectively. Similarly, categories alone and
aloneWithParents mainly have positive scores, while values
of category manyOthers are scattered throughout the plot.
The same procedure can be followed for each variable.
Mapping the entire set of variable category quantifications
onto the clusters of objects reveals a lot of information on
the distribution of categories over objects (cases).
Additionally, it can aid in interpreting dimensions. Indeed,
while the first dimension seems to be related to the
frequency of use of several mediums, the second dimension Figure 5. Object scores labeled by the category values of
provides information on the kind of contact users expect variable booksWith.
from these mediums, e.g. via a phone, or via live contact.
The third dimension, then, shows that the use of computer
crosses both other dimensions; its use is not associated to
the use of the other mediums. All this information brought
together enables a detailed analysis and interpretation of the
coloured clusters displayed in Figure 2, and thus, of initial
persona segmentations.
Towards persona segmentation hypotheses
Initial persona segmentations in our case study are thus
based on: (1) distribution of all variable categories over
object clusters, e.g. the blue cluster consists of objects with
category ‘noBooks’ on variable booksWith while the orange
cluster consists of objects with category ‘brothers&sisters’
on variable booksWith; (2) interpretation of discrimination
measures and, thus, dimensions.
Following this procedure led us to the following persona Figure 6. Quantifications of categories of supplementary
segmentations (described very generally here). To start variables (a) deaf/hearing and (b) age.
with, although the yellow cluster contains a variety of This is the initial segmentation, based on a first MCA
cases, a few characteristics seem to be unique to this analysis. In a next step, they can be further adjusted or
cluster. Many of these children read books and watch TV refined, making use of (1) adding supplementary variables
because they are feeling bored. They watch TV very to the model, and (2) omitting outliers from the model.
regularly (every day), and they do so alone or with their
parents. They use their mobile phone and their computer for (1) MCA allows adding supplementary variables to the
various reasons and purposes, and in different locations. model; this means that these variables have no influence on
The yellow cluster is the largest one. The red cluster, then, the solution, but it is possible to place them onto the model
is characterized by its lack of interest in books. These a posteriori. Examining object score plots labeled with
children use their phone mainly to have contact with others. supplementary, demo/psychographic variables age, gender,
They watch TV very frequently, and their parents are not and deaf/hearing to the model teaches us that the oldest
involved in these activities. They use a computer, often children mainly appear in the red cluster while the youngest
together with other people, and they use it to play games, children occur in the blue and orange clusters; deaf children
but also to communicate with others and to look for are in the blue and orange cluster, and hearing children
information. The orange and green clusters contain children mainly in the yellow and red clusters (Figure 6).
who read and watch TV because it is fun. They don’t do (2) MCA also allows for easy treatment of outliers in the
these activities by themselves but together with their model. The blue cluster at first sight seems to provide a
brothers and sisters. Their computer is used to play games, useful persona segment; indeed, it confirms the intuitive
and they do this by themselves. The blue cluster contains assumption that many deaf (especially very young) children
only children who do not make any use of mobile phone, do not make use of a mobile phone, do not read books or
TV and books. They do use a computer, however, mainly to watch TV. This could be valuable information to be taken
play games. into account during the design process. However, this group

271
seems to be an outlier, characterized by a lot of unique ‘identifying’ the clusters and giving them more meaning. In
characteristics. Since the presence of such outliers might addition, taking into account all variables included in the
influence the discrimination measures of the variables in the analysis resulted in detailed segment description. Each
model [23] it is possible to omit these objects from the segment could be described as a multifaceted person with a
analysis. Applying this to our model revealed that set of behaviours and attitudes. Moreover, the description of
discrimination measures remained largely unchanged, so our segments corresponded to information gathered in
there was no need to adjust our initial segmentation. interviews and observations.
Can the segments be described quickly? – Since MCA
The persona segmentations were used as a starting point for
identifies the variables with the highest discrimination
the second part of persona creation: writing the narratives.
measures, it is possible to quickly describe each segment by
This was done with the use of information from interviews
the factors that best define it.
and observations conducted in both field studies. However,
Do the segments cover all users? – In our solution, the
since this is not the focus of our paper, we will not further
majority of our users were described by the segmentation;
elaborate upon this process.
only a limited set of the data points were scattered
DISCUSSION throughout the plot. In addition, MCA allows adding
We will apply a qualitative analysis to describe how well supplementary data points to the model. This means that, in
our method arrives at (a) producing useful and valid the course of the design process, additional user data could
persona segments, and how well it aids in (b) monitoring be placed onto the model. These data points could allow for
the persona segmentation process. a more detailed interpretation of the results and further
(a) Creating useful segments
refinement of segments.
We will analyze the usefulness and validity of our (b) Monitoring the segmentation process
segmentation approach by applying a set of evaluation A number of properties specific to MCA make it possible to
criteria proposed by Mulder [25]. closely follow-up on the statistical segmentation process.
Do the segments explain key differences between the (1) MCA demonstrates how variables are associated, and
users? – The discrimination measures of MCA clearly not simply that they are associated. While other
show which variables are most discriminating within the multivariate techniques such as cluster analysis, all search
data set. Iteratively trying out a number of solutions with a for patterns in a complex data set, MCS goes one step
different number of dimensions or adding/omitting further, by identifying relationships among cases, variables
particular variables from the model can reveal additional and categories. This enables the researcher to gain more
discriminating variables. Depending on how well these insight into the data structure and closely follow-up the
discriminating variables describe the resulting clusters, a segmentation process. By describing two examples of
particular solution can be chosen. pitfalls or lessons learned during our case study, we will
Are the segments different enough from each other? – demonstrate that particular characteristics (e.g. errors) of
Two points positioned close together in a low-dimensional the research design can be easily revealed and evaluated by
solution may lie far apart in a solution with higher making use of the specific properties of MCA during the
dimensionality [8]. Within a particular solution, a specific segmentation phase instead of validating the persona
number of clusters will organically emerge from the data. If segments afterwards. (1) The first example concerns an
the segments in this solution are not different enough, the aspect of reliability of the data collection method (the
number of dimensions can be enlarged in order to produce a children(s assignments) used in the case study, namely that
larger number of more differentiated clusters. small differences in respondents’ answers might lead to
Do the segments feel like real people? – The possibility to different categorizations: we used open-ended questions
add supplementary variables to the MCA model enabled the and were thus obliged to categorize these answers to make
creation of more meaningful segments. Persona them suited for MCA. However, it is possible that different
segmentations are generally based on people’s attitudes, respondents actually mean to give ‘the same answer’, but
goals and behaviors instead of demographic variables. name it differently, which could make it difficult to
However, persona attitude, goal and behavior data is often categorize these answers. For example, to the question ‘how
‘enriched’ (after the segmentation) with these often do you watch TV?’, someone might say ‘every day’,
demo/psychographic characteristics, in order to give the and another person might say ‘all day’, while they actually
persona an identity or a ‘face’. As demonstrated in our case mean the same thing. A similar problem related to the
study with the variables age, gender and deaf/hearing, structure of the data set occurs when making use of the
MCA allows for adding supplementary variables to the response category ‘other’ or when making use of multiple
model. Since these variables are not taken into account in response categories. (2) The second example is related to
the analysis, they do not influence the solution (and they the validity of a method: do we accurately understand what
shouldn’t), but putting them onto the model afterwards and a method is measuring? The data might contain, for
viewing the relationships between demographic variables example, a consistent bias, in the form of, as in our case
and attitudes/goals/behaviors can provide useful ways of study, very young children who love to say that they make

272
very frequent use of all kinds of technology and games, and (4) Finally, MCA provides ways to evaluate the validity of
they do this all the time. Such a response pattern will the results. If, for example, the dimensions can be
wrongly add these children (cases) to a cluster or category interpreted in a sensible manner, this indicates that they are
of very intensive technology users. This is a problem in justified. If a dimension cannot be interpreted, this could
many data collection methods, since respondents often give indicate that it is just a result of random fluctuations in the
socially desirable answers. In both examples, MCA will data. Including supplementary data points in a MCA also
produce ‘strange’ results. Regarding the first example, makes it possible to check the validity of the results.
imagine a two-dimensional representation of a number of Indeed, since these points do not influence the solution to
categories with all these categories grouped in a neat the active elements, they server as external criteria [8].
cluster, except one: the category ‘other’. In the second
CONCLUSION AND FUTURE WORK
example, it will become immediately clear from the data This paper described the use of MCA to create persona
patterns that the cluster of the youngest children are the segmentations. With our case study, we demonstrated that
most intensive technology users, which is probably not a it is a powerful tool for data reduction that can capture the
correct representation of reality. The properties of MCA, underlying patterns in complex categorical data and thus
explicitly demonstrating the associations between variables, provide a useful set of initial persona profiles.
objects and cases and visually displaying them in plots, thus
forces the researcher to obtain more insight into the set-up Statistical techniques can indeed be applied to the persona
of the research design and the structure of the data, and, if creation process and can provide accurate, representative
necessary, make corrections to the analysis. results. However, they should not be used blindly, since
statistics software always produces results, regardless of the
(2) The plots generated in a MCA allow you to look at
quality of the input data. With our case study we showed
relationships in a dimensional space. Presenting the analysis that MCA can help the researcher to obtain insight into the
results visually also allows for an easy discovery of quality of his data and research design. By its properties,
remarkable patterns in the data. In our case study, for such as showing associations between variables, categories
example, the blue cluster was positioned very far from the and cases, and visually displaying these in a dimensional
other data points. All the cases in this cluster appeared to space, it allows for an easier discovery of unexpected
refer to children who make no use of TV, mobile phone and patterns in the data or even flaws in the research design.
books, but who do use a computer. Extreme cases might be The analysis can thus be closely monitored and, if
useful in persona creation to explore aspects of use that necessary, adjusted in order to adequately explain
might be overlooked by focusing on more realistic model similarities or differences between user groups during the
users [12], or because they are more memorable [16].
segmentation process instead of merely trying to validate
Moreover, when designing smart games to support reading
the segments afterwards, with the use of qualitative data.
comprehension, the blue cluster provides valuable The latter, of course, does not mean that the qualitative
information, and could thus be withheld as a useful persona persona creation approach is redundant. Although Cooper’s
segment. Showing clusters of data points together with data initial suggestion that it is better to be as precise as possible
points scattered loosely throughout the dimensional space than to be as accurate as possible in persona creation [9] is a
also reminds the researcher that clusters are not –and are little outdated, everyone probably agrees that qualitative
not supposed to be- neat and tidy, but they rather represent research is crucial to the enrichment of persona profiles,
continuous differences between cases. The latter is also and to the validation of quantitative persona segmentations.
accomplished by looking at the visual representation of a
cluster of object scores labeled with their category values The focus of this paper was on persona segmentation. Other
on a particular variable. parts of the creation process, such as data collection, were
not discussed and might be worth further research. Since
(3) The presence of outliers, such as the blue cluster
quantitative data analysis techniques need quantitative
referred to in the previous paragraph, might influence the
input, the design and use of surveys especially adapted to
discrimination measures of the variables in the model. persona creation, for example, could be further explored.
MCA allows for easy treatment of outliers, however. They Second, the use of MCA for other applications within the
can be omitted from the model, which might make other domain of HCI researched could be explored. For example,
patterns in the data more prominent. This way, MCA is a MCA could be use to map personas with scenarios of use.
useful aid in generating several segmentations and selecting Finally, detailed guidelines for creating persona
the most useful model. The latter can also be accomplished segmentations are not available yet. Although these partly
by adding versus omitting particular variables to/from the depend on the research questions, the process of creating
model. Indeed, MCA allows for supplementary variables to personas is usually still rather intuitive and ad hoc. Efforts
be placed onto the model a posteriori, without influencing to standardize the process a little more could be of great
the solution. We used this feature to add
value to bring more rigor and science to persona creation.
psycho/demographic characteristics as supplementary
variables, but the same procedure can be followed to test
the influence or discrimination value of other variables.

273
ACKNOWLEDGMENTS 17. Hisham, S. Experimenting with the use of persona in a
The research described in this paper has been made possible focus group discussion with older adults in Malaysia.
by the TERENCE project funded by FP7-ICT-2009-5. OZCHI 2009, ACM (2009), 333-336.
REFERENCES 18. Hoffman, D., & Franke, G. (1986). Correspondence
1. Adlin, T., Pruitt, J., Goodwin, K., Hynes, C., McGrane, analysis: graphical representation of categorical data in
K., Rosenstein, A., et al. Putting personas to work. Ext. marketing research. Journal of Marketing Research ,
Abstracts CHI '06. ACM (2006),13-16. 23, 213-227.
2. Antle, A. . Child-based personas: need, ability and 19. IBM SPSS Categories.
experience. Cognition, Technology & Work , 10, 2008. http://public.dhe.ibm.com/common/ssi/ecm/en/ytd0301
3. Blomquist, A., & Arvola, M. Personas in action: 2usen/YTD03012USEN.PDF
Ethnography in an interaction design team. NordiCHI, 20. Khalayi, N., Terum, T., Nyhus, S., & Hamnes, K.
2002. Persona based rapid usability kick-off. Proc CHI 2007,
4. Brickey, J., Walczak, S., & Burgess, T. A comparative ACM (2007) , 1771-1776.
analysis of persona clustering methods. Americas 21. Lindgren, A. C., Amdahl, P., & Chaikiat, P. Using
Conference on Information Systems, 2010. personas and scenarios as an interface design tool for
5. Broschinsky, D., & Baker, L. Using persona with XP at advanced driver assistance systems. HCII, 2007.
LANDesk software. Agile 2008, 543-548. 22. McGinn, J., & Kotamraju, N. Data-driven persona
6. Canossa, A., & Drachen, A. Play-personas: behaviours development. CHI 2008. ACM (2008).
and belief systems in user-centred game design. 23. Meulman, J., & Heiser, W. (2010). IBM SPSS
INTERACT 2009, Springer-Verlag Berlin, 2009. Categories 19. http://aspdf.com/view/6851/ibm-spss-
7. Chapman, C., & Milham, R. The personas' new categories-19-pdf.html
clothes: methodological and practical arguments 24. Miaskiewicz, T., Sumner, T. and Kozar, K. A. A latent
against a popular method. Proc. Human Factors and semantic analysis methodology for the identification
Ergonomics Society 50th Annual Meeting, 2006. and creation of personas. CHI 2008, ACM (2008).
8. Clausen, S. Applied correspondence analsysis. An 25. Mulder, S., & Yaar, Z. The user is always right. New
introduction. Sage Publications, 1998. Riders, Berkeley, CA, USA, 2007.
9. Cooper, A. The inmates are running the asylum. USA: 26. Pruitt, J., & Grudin, J. Personas: practice and theory.
Macmillan Publishing, 1999. Proceedings of the 2003 conference on Designing for
10. Correspondence Analysis (StatSoft, Inc.) user experiences, ACM (2003), 1-15.
http://www.statsoft.com/textbook. 27. Pruitt, T., & Adlin, T. The persona lifecycle: keeping
11. De Voil, N. Personas considered harmful, 2010. people in mind throughout product design. Elsevier,
http://www.devoil.com 2006.

12. Djajadiningrat, J., Gaver, W., & Frens, J. Interaction 28. Siegel, D. The mystique of numbers: belief in
relabelling and extreme characters: methods for quantitative approaches to segmentation and persona
exploring aesthetic interactions. DIS '00, ACM(2000). development. CHI 2010. ACM (2010).

13. Doey, L., & Kurta, J. Correspondence analysis applied 29. Sinha, R. Persona development for information-rich
to psychological research. Tutorials in Quantitative domains. Ext. Abstracts CHI 03, ACM (2003).
Methods for Psychology , 7, 1 (2011), 5-14. 30. Sinha, R. User research for information-rich domains.
14. Faily, S., & Flechais, I. Persona cases: a technique for IA Summit, 2003.
grounding personas. CHI 2011. ACM (2011). http://iainstitute.org/files/rashmi_aifia_Mar03.ppt.

15. Goodwin, K. Getting from research to personas: 31. TERENCE Working Group: http://terenceproject.eu.
harnessing the power of data, 2002. 32. Tu, N., Dong, X., Rau, P., & Zhang, T. Using cluster
http://www.cooper.com/journal/2002/11/getting_from_ analysis in persona development. International
research_to_perso.html Conference on Supply Chain Management and
16. Greaney, J., & Riordan, M. The use of statistically Information Systems, 2010.
derived personas in modeling mobile user populations.
MobileHCI 2003.

274

You might also like