Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Product Variety and Customer Behaviour in Online

Fast Fashion Retailing


Jean-Sébastien Matte
Desautels Faculty of Management, McGill University, jean-sebastien.matte@mail.mcgill.ca

Mehmet Gumus
Desautels Faculty of Management, McGill University, mehmet.gumus@mcgill.ca

Javad Nasiry
Desautels Faculty of Management, McGill University, javad.nasiry@mcgill.ca

In collaboration with one of Europe’s largest fast fashion retailers, we study the implications of assortment
variety on customer choice. We analyze a large, event-based clickstream dataset to characterize and quantify
the effects of assortment variety on customer choice. We propose a novel definition and representation of
assortment variety as a bipartite graph, which allows us to define variety along three dimensions: the number
of styles, the number of colours, and density of the graph. We then develop a customer behaviour model,
formalized as a two-stage consider-then-choose, in which we specify a nonlinear utility for all variety variables.
Our results confirm that the relationship between variety and customer utility, and consequently, choice, is
nonlinear for all variety variables, and show that different dimensions of variety affect customers differently.
These findings provide evidence of choice overload in a natural setting of revealed preferences, highlighting
the importance of considering multidimensional metrics of product variety rather than a single aggregate
variety measure (e.g., the total number of products). We also investigate the possible moderating effects of
customer types and seasonality on customers’ response to assortment variety.

Keywords : product variety, choice overload, choice models, behavioural decision making, customer
segmentation, clickstream data, fashion retailing
History :

1. Introduction
The market size of the global fast fashion industry exceeded $90 billion in 2022, and is expected to
reach $133 billion by 2026, representing a compound annual growth rate of 7.7% (Statista 2022).
The success of the fast fashion business model lies in its ability to respond quickly to customers’
variety-seeking preferences. This ability results in quick turnarounds of assortments in fast fashion
companies (Closa 2015). For instance, Zara releases new products every two weeks, for a total of
approximately 10,000 new products a year, while SHEIN—which revolutionized ultra-fast fashion—
releases as many as 6,000 new items daily (Segran 2021). Specific to the online channel, the offered
assortment variety has exploded due to nonexistent physical display constraints and low inventory
costs.

Electronic copy available at: https://ssrn.com/abstract=4451618


2 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Product variety, however, poses operational challenges as it may lead to a small fraction of
products collecting most of the views (and purchases), and the resulting heavy-tailed distribution
(Brynjolfsson et al. 2011) makes forecasting difficult. Resultantly, product inventory is not always
depleted, which leaves retailers with a large amount of unsold merchandise. For example, H&M
reported more than $4 billion worth of unsold inventory in just one year, representing a 7% increase
from the previous year (Paton 2018). These amounts exacerbate the fashion industry’s waste crisis,
which has tarnished the industry’s public image and heightened the regulatory pressure on fashion
brands.
Greater variety also forces trade-offs in the process of product manufacturing. Long and Nasiry
(2022) show that, to compensate for the higher design and production costs when offering greater
variety, fast fashion retailers lower the quality of the garments they produce. As a result, it reduces
the clothes’ potential for reuse after being discarded by variety-seeking customers. This dynamic
is of particular concern in light of current social trends and ongoing regulatory emphasis on sus-
tainable operations.
The key question, then, is: why is fast fashion accelerating? The answer seems to be that retailers
believe customers prefer more variety. Two arguments support this belief. First, variety allows retail
companies to address heterogeneity in taste across the market. In other words, a varied assortment
increases the likelihood that customers will find a product to their liking. Second, assortment variety
serves to hedge against the uncertainty in fast-changing fashion trends. Production decisions are
usually made in advance of selling seasons and so more variety improves the odds of assortments
matching the latest fashion trends (Long and Nasiry 2022).
Yet, there is growing evidence in the consumer psychology and marketing literature that, from a
costumer’s perspective, more variety does not always lead to a positive experience and a purchase
decision. In other words, variety may cause choice overload (Iyengar and Lepper 2000). Instead of
helping customers find their preferred product, variety confuses customers and make them abandon
their shopping session. Moreover, variety in the fast fashion industry involves multiple dimensions
(e.g., styles and colours). It is therefore crucial for brands and retailers to understand how these
multiple dimensions can affect customer choice, and to balance the trade-offs when defining their
product assortments. Offering a wide range of styles allows customers to find unique and trendy
pieces that fit their individual preferences, and can also help the retailer to differentiate itself from
competitors. At the same time, offering a wide range of colours is important as it allows customers
to find products that fit their specific colour preferences and match with their existing wardrobe.
In this context, we pose three research questions. First, can we develop measures of product
variety that capture its multidimensional nature, and that are readily and easily implementable on

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 3

large and complex fashion assortments? Second, how are consumers affected by the multiple dimen-
sions of this growing variety, and, can we find evidence of choice overload in a revealed-preference
setting? Finally, how are customer choices affected by moderating factors such as customer types
and seasonality?
To address these questions, we partnered with one of the largest fast fashion retailers in Europe
that offers more than 30,000 unique style and colour combinations in every selling season. A unique
feature of our setting is that our partner retailer’s online channel provides us with a large event-
based clickstream dataset that captures snapshots of more than 400 million events (e.g., product
view, add-to-basket, and checkout) generated by online customers visiting the retailer’s online store.
The dataset also contains firm-side operational data such as product descriptions, and daily prices,
discounts, and inventory levels. We use this rich and customer-specific dataset to micro-model the
behaviour of online customers—in particularl, how they respond to the variety of the proposed
product assortment—while controlling for (a) online price and promotion information posted on
the page and (b) stock availability information.
We propose a novel way of defining and representing assortment variety as a bipartite graph.
This representation allows us to define variety along three dimensions: the number of styles, the
number of colours, and the density of the graph. These measures of variety are especially suitable
for large and complex assortments, and are applicable for research in operations management,
marketing, and consumer psychology research. In practice, our operationalization of variety can be
readily adopted by managers.
Next, we characterize and quantify the effects of product variety on customer utility by modelling
the customer behaviour using a two-stage consider-then-choose framework. Our model defines the
customer utility as a function of the information available on the product pages, and of variety-
specific contextual variables related to the assortment’s bipartite graph representation. A potential
issue arising from the variables included in our model is that decisions made by a retailer during the
selling season may correlate discounts and product availability (inventory levels) with unobservable
shocks. Resultantly, endogeneity for those variables, included in the customer utility, could be a
problem. We use the control function of Petrin and Train (2010) to isolate the effects of discount
and availability (inventory levels) on customer choices.
We test our model empirically on customer-level clickstream data. Our results confirm that the
relationship between variety and customer utility is nonlinear for all three dimensions of variety
that we consider. Importantly, we provide the first empirical evidence of choice overload in a
revealed-preference setting from data collected during the natural course of operations.
Finally, we investigate potential moderating effects of assortment variety on customer choice,
namely heterogeneity in customer types, and seasonality. We test the robustness of our results to

Electronic copy available at: https://ssrn.com/abstract=4451618


4 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

aggregation bias by performing a segmentation analysis in which we adopt a behavioural K-means


clustering approach based on customer session patterns. We find that our dataset has two main
segments, which can be further subdivided to form a total of five segments. Although different
segments display contrasting session behaviours, the shape of the relationship to variety is invariant
to customer types, but not the magnitude. We investigate the temporal dimension of variety within
a selling season and how it affects customers’ relationship to assortment variety by separating our
dataset into subsets, one per month of data, and re-estimate model coefficients. We find that the
relationship is invariant to seasonality, but increases in magnitude as the season progresses.
The rest of our paper proceeds as follows. In Section 2, we review the related literature. Section
3 motivates the business context and presents the measures of variety we use in our analysis. In
this section, we also develop the theoretical model of customer behaviour. In Section 4, we describe
the data in detail and test our model empirically. Section 5 identifies customer types and assesses
how the behaviour of those types is affected by variety. Section 6 presents our robustness analyses.
We conclude in Section 7 with a brief summary of our context and findings, as well as suggestions
for future research.

2. Literature Review
In this section, we review the literature most closely related to our work.

Effects of Variety. The theory of rational choice provides evidence that more variety increases
the likelihood of the assortment containing an option that matches a customer’s preferences and
decreases the choice probability of the no-purchase option (Lancaster 1990). The literature on
consumer psychology suggests that customers may exhibit a variety-seeking behaviour (Kahn 1995,
Kahn 1998, Simonson 1999), and derive utility from having the ability to choose among multiple
products in an offer set (Moe 2003). Research in operations management further supports this
notion. For example, Borle et al. (2005) show that a reduction in the number of stock keeping units
(SKUs) leads to an overall decrease in retail sales. Ton and Raman (2010) and Kok and Şimşek
(2021) show that increases in product variety and inventory levels result in higher sales.
There is growing evidence in the marketing and consumer psychology literature that choosing
from large assortments is cognitively taxing for consumers, hence why more variety does not always
translate into either an enhanced experience or and increased likelihood of making a purchase.
Iyengar and Lepper (2000) show that customers are more likely to choose from a limited offer set
rather than from an extensive one. Their study lead to the development of the choice overload con-
cept (for a review, see Chernev et al. 2015). Several studies support this concept (see e.g., Chernev
2003b, Chernev 2003a, Gourville and Soman 2005). Broniarczyk et al. (1998) and Boatwright and
Nunes (2001) argue that substantive reductions in the number of SKUs at a grocery retailer can be

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 5

made without negatively affecting store choice (i.e., patronage) or category sales. Fox et al. (2004)
show that assortment variety in grocery retail setting—defined as the ratio of total products in a
category to the average market basket—increases the likelihood of patronage but reduces the total
expenditure per trip.
The literature discussed so far always assumed a linear relationship between assortment variety
and consumer choice, that is, the relationship is either positive (more variety leads to more sales),
or negative (more variety leads to fewer sales). More elaborate models may be helpful to capture
the complexity of the relationship. Wan et al. (2012), Wan et al. (2014), and Lu et al. (2022)
all include a quadratic term for the variety variable in their regression model specifications, and
obtain an inverted U-shaped relationship between variety and sales. This supports the intuition
that some level of variety is desirable and increases sales up to a threshold after which additional
variety leads to lower sales.
Our study specifies the customer utility as a quadratic function of the variety-specific variables
related to our proposed bipartite graph representation, which allows to characterize—more accu-
rately than before—the relationship between variety and consumer choice.
Evidence from the consumer psychology literature is based mostly on survey data (see e.g.,
Gourville and Soman 2005), or laboratory and field experiments (see e.g., Broniarczyk et al. 1998,
Iyengar and Lepper 2000, Chernev 2003a); in contrast, evidence from the marketing and operations
management literature (e.g. Boatwright and Nunes 2001, Kok and Şimşek 2021, Lu et al. 2022)
is based on sales data. Another noteworthy distinction is that, whereas the marketing literature
typically relies on choice models, the operations management literature typically relies on regression
models (e.g., Kok and Şimşek 2021, Lu et al. 2022), although recent studies have proposed choice
models (e.g., Aouad et al. 2019). Recently, Long et al. (2021) present evidence from a randomized
field experiment. No evidence of choice overload has come from a choice modelling approach using
full information, revealed-preference consumer data as the clickstream data we use in this study.

Dimensions of Variety. Most of the current studies define variety as the total number of prod-
ucts (or SKUs) in an assortment. While adopting the latter definition, Boatwright and Nunes
(2001) propose an attribute-based model and show that SKU reductions were most effective when
removing SKUs with duplicate attributes; doing so does not significantly reduce the assortment
variety (in terms of available attribute levels) but rather declutters it. Similar results are also
reported by Broniarczyk et al. (1998). Gourville and Soman (2005) introduce the construct of
assortment type as the difference between alignable (alternatives defined using similar attributes)
and nonalignable (alternatives defined using different attributes) assortments, and show that it
affects conversion. The effect of variety is positive when products differ along a single compensatory

Electronic copy available at: https://ssrn.com/abstract=4451618


6 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

dimension such that choosing from that assortment only requires within-attribute trade-offs. van
Herpen and Pieters (2002) present an attribute-based product space and report that assortment
size (in terms of total number of products) is not always a good proxy for perceived variety.
The preceding examples highlight the importance of defining assortment variety in a way that
accurately reflects the concept’s complexity, which may lead to nuances in the debate on whether
assortment variety is desirable or not. Studies that map product attributes to a product space
(see e.g., Hoch et al. 1999, van Herpen and Pieters 2002, Gaur and Honhon 2006, Rooderkerk
et al. 2011, Long and Nasiry 2022) assume that the distance between products can be computed,
therefore providing an easy measure of similarity between products. Despite being desirable and
useful, this approach is often impractical. The many product attributes are defined by categorical
variables which are unordered and not equidistant (e.g., colour, bill of materials), and in such cases,
it is impossible to define a product space. Rooderkerk et al. (2011) discuss this issue as a limitation
of their work. We further note that no one has studied how seasonal variations of product variety
affects consumer choice.
Our proposed representation of an assortment as a bipartite graph allows us to conceptualize
variety along three dimensions: the number of styles, the number of colours, and the density of
the graph. Additionally, it is suitable for applications in many fields, and can be used readily by
managers.

Context Effects and Choices in Large Assortments. Discrete choice models have been used
extensively in the literature, and particularly in marketing, to model how rational consumers make
choices. The theory of rational choice assumes the relative valuation between two alternatives does
not depend on the presence (or absence) of other alternatives—referred to as the independence
of irrelevant alternatives (IIA) assumption (Luce 1959)—and implies that a consumer chooses
the alternative with the highest utility from an offer set. An immediate implication of the IIA
assumption is the regularity condition: the choice probability of all alternatives in the offer set
decreases by the same relative amount as more alternatives are added to the offer set. When an
offer set contains the choice of not purchasing a product (i.e., the no-purchase alternative), then
adding more products to the offer set effectively decreases the choice probability of the no-purchase
alternative. This intuition guided to the development of the fast fashion business model.
There is empirical evidence that consumers are not rational (Huber et al. 1982, Tversky and
Kahneman 1991), but rather influenced by the context of the offer set in which the decision is
made. In such context-dependent situations, the assumptions of the theory of rational choice (IIA
and regularity) are violated, and substitution patterns can be significantly different. In a fast
fashion retail setting, this can imply the choice probability of the no-purchase alternative may not

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 7

be strictly decreasing as more products are added to the assortment. Context effects also imply
that choice probabilities for products that become dominating (product that attains the same or
higher level across all attributes, when compared to another product) or dominated (product that
attains the same or lower level across all attributes, when compared to another product) when
the assortment increases in size may increase or decrease considerably. We refer the reader to
Rooderkerk et al. (2011) for a concise exposition of context effects (and in particular, the attraction
effect) in choice models.
Our proposed bipartite graph representation allows contextualizing variety-specific variables, and
enable a richer and more accurate characterization and quantification of the relationship between
variety and consumer choice.

3. The Empirical Model


In this section, we first discuss the business context that motivates our study. We then present our
modelling framework and develop the measures to capture different dimensions of product variety.

3.1. Business Context


Our partner is a European fast fashion retailer operating brick-and-mortar stores and an online
store. In this study, we focus solely on the online store operations.
The landing page of the online store displays different product categories for men, women,
children, babies, and accessories. The website has a search function in addition to multiple banners
promoting a range of products including new arrivals, sales, and seasonal essentials. A customer’s
search eventually lands on a product category page (e.g., women’s dresses, sunglasses, sales) that
displays, in a continuous scroll, thumbnail photos of products from the selected category along with
product titles, available colours, sizes, and pricing information. Each product category page allows
the customer to sort products. The default order is a combination of new arrivals and discounted
products; a customer however has the choice to sort by ascending or descending prices, and new
arrivals among others. A customer can also choose to filter products on the category page by style,
colour, size, and price range, or any combination thereof.
While scrolling through the chosen product category, a customer can access detailed information
for a product by clicking on its thumbnail photo. Doing so takes the customer to the product
page, which contains more pictures of the product, available colours, sizes and sizing information,
product description, and information on shipping and returns. The customer can see the maximum
number (usually 10) of units of a product she can order, but cannot see the product’s inventory
(i.e., how many units are in stock).
Production decisions are made well in advance of the selling season such that there is considerable
uncertainty on how fashionable a product will be upon release. To mitigate this uncertainty, the

Electronic copy available at: https://ssrn.com/abstract=4451618


8 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

retailer offers a large assortment of products in hopes that a number of products will fit customer
tastes and generate demand.

3.2. A Multidimensional Representation of Variety


We now propose a novel representation of product variety that explicitly incorporates its multiple
dimensions. We define a product by the combination of a style and a colour. This definition is
standard in the industry and followed by our partner retailer; it has also been used in the literature
(e.g., Boada-Collado and Martı́nez-De-Albéniz 2020). We can then represent an assortment of
products as a bipartite graph (see Figure 1). Such a graph has a set of style vertices and a set
of colour vertices and each edge connecting a style vertex to a colour vertex corresponds to a
product. One could employ similar representations in industries where the product variety is defined
naturally along multiple dimensions, such as automobiles (models, colours, trim-levels) and food
(flavor, processed vs. non-processed).

S1 C1 S1 C1

C2

S2

C3

S3 C4 S2 C2

Styles Colours Styles Colours

(a) (b)

Figure 1 Bipartite Graph Representation of an Assortment

Let VS and VC be the set of style and colour vertices, respectively, and denote the cardinality
of these sets as |VS | and |VC |. Also, let E be the set of edges in the graph with |E | denoting its
cardinality. Define the degree of a style vertex deg(vS ) as the total number of edges leaving the
vertex vS , and the degree of a colour vertex deg(vC ) as the total number of edges connecting to
the vertex vC . Furthermore, define density and densitymin as:

|E |
density := (1)
|VS | · |VC |
max{|VS |, |VC |}
densitymin := (2)
|VS | · |VC |

These measures are useful to capture how connected the graph is. If a style or colour vertex is
not connected (i.e. deg(v. ) = 0), then the vertex should not be included in the graph. Conse-
quently, the minimum density is a function of the minimum number of products in the assortment

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 9

(max{|VS |, |VC |}). A minimal density implies that more edges can be added to the graph, whereas
density = 1 implies a fully connected graph. For example, the bipartite graph in Panel (a) of Figure
1 has a density of 0.5 and a minimum density of 0.33.
A typical retailer must decide on the number of styles to design, the colour palette for the season,
and which styles to offer in which colour. In our graph representation, these decisions correspond
to choosing the vertices and how densely to connect the graph. The minimum density and density
measures can be directly used by the retailer as a gauge to how sparse (or dense) an assortment’s
offering.
However, the density measure in eq. (1) does not allow a comparison between assortments that
differ in order (number of vertices) and size (number of edges). To illustrate, consider the two
assortments in Figure 1. Both assortments have a density of 0.5. However, the graph in Panel (a)
has a minimum density of 0.33, while the graph in Panel (b) has a minimum density of 0.5. This
limitation on comparisons is especially important from a practical perspective as a retailer must
decide on assortments for different categories (and subcategories within). Having a standardized
measure that allows for a straightforward comparison is thus critical.
To overcome this limitation, we introduce a normalized measure of density as:

density − densitymin density − densitymin


norm.density := = (3)
densitymax − densitymin 1 − densitymin

The normalized density captures the added connectivity in a graph beyond the minimum density.
Normalized density is bounded below by 0 if the graph is minimally connected (e.g., the graph on
the right-hand-side of Figure 1), and is bounded above by 1, when the graph is fully connected. In
the case of a division by 0 (densitymax = densitymin ), we assume the normalized density is equal
to 0.
The bipartite graph representation has the additional advantage of offering a context-dependent
definition of our variety measures, thereby indirectly capturing context effects. For a given assort-
ment, how the graph is connected (i.e., how style vertices are connected to colour vertices), and
the presence of dominating and dominated vertices, can affect customers’ choice.

3.3. Model Development


To operationalize the effect of variety on customer utility, we develop a customer behaviour model
and formalize it as a two-stage consider-then-choose framework. When dealing with large assort-
ments, customers use different decision-making strategies than the one assumed by the multinomial
logit (MNL) model (McFadden 1974), in which customers look at all alternatives before making
a choice. Two-stage choice models, such as the consider-then-choose model, have been proposed
to more accurately model customer search and choice behaviour in the context of larger offer sets,

Electronic copy available at: https://ssrn.com/abstract=4451618


10 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

and is particularly fitting to clickstream data; see, for example, the applications in Moe (2006), Li
et al. (2018), and Aouad et al. (2019).
We assume that a customer has imperfect information about product fit, that is, she does not
know a priori which product from the offered assortment she prefers, and therefore engages in a
search and deliberation process to resolve this uncertainty. Given a large assortment, the search
process is cognitively costly and time-consuming. Consequently, a customer can afford to only look
at a subset of the offered products (the consider stage) before deliberating if she either selects one
of the products in her consideration set, or leaves the online store without making a purchase (the
choose stage).
Consider a retailer that offers a set St ⊆ N of products, indexed by j = 1, · · · n, on date t. The
utility of product j ∈ St for customer i on date t can be written as:

Uijt := uijt + ξijt , (4)

where uijt is the deterministic part of the utility and ξijt represents independent and identically dis-
tributed (iid ) idiosyncratic shocks to utility. Conditional on the information visible to the customer
on the category and product pages, we can express uijt as follows:

k
uijt := β1 pricej + β2 discountjt + β3 agejt + β4 broken.assortjt + β5 view.counti(jt) + f (varietyijt ).
(5)
Here, pricej is the base price of product j, discountjt is the discount rate of product j on date
t, agejt is the age of product j on date t, broken.assortjt is a binary variable indicating if at
least one size of product j on date t is stocked out (broken.assortjt =1) or if all sizes are available
(broken.assortjt =0), view.countijt corresponds to the number of times customer i viewed product
k
j during her session on date t. The variables varietyijt are the variety measures for product j which
belongs to subcategory k on date t, and f (·) is a function that reflects the effect of product variety
on customer utility. We assume the effects of variety on customer utility are nonlinear, and specify
f (·) as a quadratic function. Furthermore, without loss of generality, we normalize the utility of
the no-purchase option to 0 so that U0 = ξi0t .
The variety measures included in our specification relate to the bipartite graph representa-
tion of an assortment presented in §3.2. Consequently, our specification includes the number of
style vertices tot.styleskijt , the number of colour vertices tot.colourskijt , and the normalized density
k
norm.densityijt , where k denotes the subcategory to which product j belongs, as we assume the
effect of variety on a customer choice within a product category is at the product subcategory
level. We define here a subcategory as a separation of products within a category under smaller
subsets (e.g., within the dress category, there is a sleeveless dress subcategory, a strapless dress

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 11

subcategory, a back-detailed dress subcategory, etc.). The effects of the variety measures on utility
can then be written as:
k
f (varietyijt ) =δ1 tot.styleskijt + δ2 (tot.styleskijt )2 + δ3 tot.colourskijt + δ4 (tot.colourski(jt) )2
(6)
k k 2
+ δ5 norm.densityijt + δ6 (norm.densityijt ) .

Under model specification (5), some variables may be correlated with the idiosyncratic shocks ξijt .
In particular, decisions made by the retailer during the selling season are subject to endogeneity.
We believe that only two variables, discount and broken.assort, are potentially endogenous. The
base price of the product is a decision that is made prior to the selling season and therefore does not
vary.1 In addition, our partner retailer has confirmed that the release of new products is scheduled
ahead of time, and therefore, variety measures are not dependent on sales.
Endogeneity in the discount and broken.assort variables may lead to biased estimates of the
model’s coefficients. To address this potential endogeneity issue, we follow Petrin and Train (2010)
and use the two-stage control function approach. In the first stage, we define two linear regression
models, one for each potentially endogenous variable, and regress the endogenous (either discount
or broken.assort) variable on all model variables and an appropriate instrument zijt . That is:
0
discountjt := uijt + γ3 broken.assortjt + µdiscount zijt
discount
+ µbroken.assort zijt
broken.assort discount
+ ζijt ,
0
broken.assortjt := uijt + γ3 discountjt + µdiscount zijt
discount
+ µbroken.assort zijt
broken.assort broken.assort
+ ζijt ,

0
k
where uijt := γ1 pricej + γ2 agejt + f (varietyijt ). In the first-stage regression, we omit the variable
view.count because it is a customer-specific variable. In the second stage, we include the first-stage
discount broken.assort
residuals ζijt and ζijt in specification (5). By adding the residuals of the first-stage
regressions to the specification, ξijt are now independent of all other covariates. Specification (5)
now becomes:

Uijt := β1 pricej + β2 discountjt + β3 agejt + β4 broken.assortjt + β5 view.countijt


k discount broken.assort
+ f (varietyijt ) + η1 ζijt + η2 ζijt + ξijt . (7)

discount
The control function approach requires the covariates of specification (7) and instruments zijt
broken.assort discount broken.assort
and zijt to be independent of the residuals ζijt and ζijt , but do not require
that the covariates and the instrument be independent of each other. Lagged variables have been
used as instrumental variables in similar settings (e.g., Tan and Netessine 2014, Chuang et al. 2016,
Boada-Collado and Martı́nez-De-Albéniz 2020). We use the lagged discount rate discountij,t−1 as

1
Contrary to some studies (e.g. Berry et al. 1995), we understand from conversations with our partner retailer
that the base price of products is not influenced by unobserved characteristics, and instead reflects cost and market
competition.

Electronic copy available at: https://ssrn.com/abstract=4451618


12 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

an instrument for discount because the discount rate of the previous date t − 1 does not directly
affect the customer at time t, as the customer makes a choice based on the discount she sees on date
t. We justify the use of the lagged broken assortment indicator broken.assortij,t−1 as an instrument
for broken.assort for the same reasons.
Finally, contrary to classical applications of customer choice models (e.g., preference surveys,
conjoint experiments), the consideration set data we use in our study to estimate model parameters
do not possess a panel structure. Consequently, our model does not include product fixed effects.2
We do not include time (date) fixed effects either because we are observing each customer exactly
once and so there can be no unobserved customer-specific and time-specific effects in our data.

4. Empirical Analysis
In this section, we describe our data and examine the multidimensional effect of variety on customer
utility.

4.1. Data
Our analysis leverages two unique datasets provided by our partner retailer: (i) a large event-based
clickstream dataset containing customer visit sessions, and (ii) a firm-side dataset containing daily
information on pricing, inventory, and product attributes.
We apply our model to data from the women’s dress category. Approximately 60% of the products
with which customers interact in our clickstream data are from the women category, and the dress
category is the largest and most active category for women. This was confirmed in discussions with
our partner retailer. We therefore select only customers who viewed dresses during their session
(i.e., they had dress products in their consideration sets). For tractability, we restrict our study to
the period from February 19, 2018, to May 27, 2018. This three-month period corresponds to the
online store’s regular selling season. Figure 2 illustrates the evolution of the daily total number of
dresses available at the online store over those three months. The figure clearly shows a ramping
up of product introduction between February 19 and April 8, followed by a plateau until a sharp
decrease around the end of May when the liquidation season starts. The two blue dashed lines
represent the beginning and end of the three-month period.

Clickstream Data. Clickstream data were collected directly from the retailer’s website and
includes snapshots of more than 400 million events such as page view, add to basket, checkout, etc.,
2
The clickstream data we use to estimate eq. (7) is collected during the online store’s natural course of operations,
and as such, is highly unstructured due to the high variability of the subcategories and products each customer views.
Therefore, there is little to no repeated cross-section structure in the data. We assume there are no unobserved effects
related to products or subcategories in our data and so do not include product or subcategory fixed effects in our
specification. In addition, as highlighted by Brownstone et al. (2000), estimating choice models with large choice sets
(in their study, they have 698 alternatives, half of what we have in our data) creates computational difficulties, and
solutions do not always lead to robust results. In the context of a consider-then-choose framework, we argue that
excluding product fixed effects is the appropriate modelling choice.

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 13

1000
900
800

Count
700
600
500
01 29 26 26 23 21 18
01− 01− 02− 03 − 04 − 05 − 06−
8− 8− 8− 8− 8− 8− 8−
201 201 201 201 201 201 201
Date

Figure 2 Evolution of Daily Number of Unique Dress Products Available

generated by visitors to the site. These data were collected between January 2017 and December
2018. The clickstream data describes customers’ browsing, from the moment they arrive at the
retailer’s online store, to the moment they leave (whether they make a purchase). The granularity
of the data allows us to see every step of a customer’s journey through the website. In particular,
the actions we extracted from the source files correspond to browsing clicks (home page, main
page, category page, brand page), search clicks (search page), interaction clicks (page view, product
view, basket add or remove), and checkout clicks (purchase event). For each click event, we collect
following information: unique anonymized user identification number, time stamp, click type, and
details about the click type (e.g., the name of the category page, promotion banner, unique prod-
uct identification number), and product information. For basket events, such as add-to-basket or
checkout, the following additional information is collected: product list, unit price, quantity (units)
in the basket, and total basket price.
The unique anonymized user identification is generated by cookies. One limitation of this data
is that, if a user clears her cookies between visits, then the website can no longer recognize her,
and she will be assigned a new unique identification number. We assume the frequency at which
this happens is low.
We also note that the clickstream data were collected at the product level and does not contain
any sizing information (an SKU is the combination of a product and a size), even for basket events.
As such, it is not possible from our data to know which SKU a user purchased. This difference does
not limit us in our application, however. We argue that size does not constitute an attribute on
which a user will make a preference choice–as opposed to a style, colour or pattern, but is rather
similar to the stock availability of an item. If a user finds an item she likes, but her size is not
available, she will move on to other styles, or leave. This is naturally captured by the clickstream
data. In addition, and most importantly, the focus of the study is on the effects of product variety
on customer choice.

Firm Data. The second dataset consists of firm-side data and contains daily pricing and inven-
tory information at the SKU level, as well as product information.

Electronic copy available at: https://ssrn.com/abstract=4451618


14 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Data Cleaning and Preparation. The clickstream data were collected in the natural course
of operations of the online store. Unlike data collected in the process of a controlled experiment
(e.g., Aouad et al. 2019), the clickstream data we obtained from our partner retailer is rich, yet
noisy. Consequently, we devoted a considerable effort to extracting, cleaning, and preparing the
clickstream data to retain its richness, while allowing a tractable implementation of the estimation
procedure. The data cleaning and preparation, including handling outliers and imputation of firm
data, is detailed in EC.1.3
We focus on a customer’s first session at the online store. This is to remove any complementarity
or sequential effects that might affect a customer to return to the online store to purchase a product
she saw previously. Alternatively, we are focusing on customers’ first impression of an assortment,
and how it affects their choice. The procedure results in a final dataset with 1.5 million observations
from 159,772 unique customers who viewed 1,208 unique products. Table 1 summarizes the variables
entering the customer utility of Specification (5). A detailed discussion of the variables is presented
in EC.1.3, along with summary statistics of the variables.

Variable Definition
pricej Base price of product j (in the local currency)
discountj Discount rate of product j on date t, and ranges between 0 (no discount) to 1,
(fully discounted)
agejt Number of days, on date t, since released of product j on the online store
broken.assortjt Binary variable indicating if at least one size of product j on date t is stocked
out (=1) or all sizes are available (0)
view.countijt Number of times customer i viewed product j on date t
tot.styleskjt Total number of style vertices on date t in subfamily k to which product j
belongs
tot.colourskjt Total number of colour vertices on date t in subfamily k to which product j
belongs
k
norm.densityjt Normalized density on date t in subfamily k to which product j belongs
Table 1 Description of Variables

4.2. Empirical Findings


We show in EC.2 that the log-likelihood of the two-stage customer utility model presented in §3
separates into two independent maximization problems, one for each stage of the consider-then-
choose model. Because we observe the consideration set formation stage for each customer in our
clickstream dataset (i.e., the first stage), we focus our attention exclusively on the second stage
estimation using maximum likelihood estimation (MLE) to estimate model parameters. Table 2
presents our main findings.

3
All codes and notebooks (in Python) used to generate data—redacted to preserve proprietary information—are
available upon request. Sample data used for the choice model estimation algorithm is presented in EC.1.

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 15

Base Model Benchmark Model Variety Model Variety Model


with control function
(1) (2) (3) (4)
price -0.056∗∗∗ -0.041∗∗∗ -0.031∗∗∗ -0.030∗∗∗
(0.0004) (0.0004) (0.0004) (0.0004)
discount 1.762∗∗∗ 1.711∗∗∗ 1.134∗∗∗ 1.127∗∗∗
(0.078) (0.079) (0.091) (0.096)
age -0.010∗∗∗ -0.009∗∗∗ -0.003∗∗∗ -0.003∗∗∗
(0.0004) (0.0004) (0.0005) (0.0005)
broken.assort -1.128∗∗∗ -1.133∗∗∗ -0.729∗∗∗ -0.855∗∗∗
(0.03) (0.031) (0.036) (0.039)
view.count 0.393∗∗∗ 0.463∗∗∗ 0.584∗∗∗ 0.585∗∗∗
(0.012) (0.012) (0.011) (0.011)
tot.prods - -0.086∗∗∗ - -
(0.002)
tot.prods2 - 0.001∗∗∗ - -
(0.00002)
tot.styles - - 0.080∗∗∗ 0.080∗∗∗
(0.005) (0.005)
tot.styles2 - - -0.001∗∗∗ -0.001∗∗∗
(0.00003) (0.00003)
tot.colours - - -0.586∗∗∗ -0.581∗∗∗
(0.015) (0.015)
tot.colours2 - - 0.015∗∗∗ 0.014∗∗∗
(0.0004) (0.0004)
add.density - - -25.761∗∗∗ -25.522∗∗∗
(1.141) (1.142)
add.density 2 - - 47.659∗∗∗ 47.254∗∗∗
(3.416) (3.408)
resid.discount - - - 0.243
(0.312)
resid.ba - - - 0.686∗∗∗
(0.083)

Observations 1,498,821 1,498,821 1,498,821 1,498,821


Log-Likelihood -25,789.53 -24,491.44 -22,022.97 -21,988.92
AIC 51,589.05 48,996.89 44,067.94 44,003.83
BIC 51,649.59 49,081.64 44,201.11 44,161.22
McFadden R2 0.926 0.930 0.937 0.937
∗ ∗∗ ∗∗∗
: p ≤ 0.1; : p ≤ 0.05; : p ≤ 0.01 ; (std dev)
Table 2 Estimation Results

Column (1) presents results for a base model without any variety variables, Column (2) presents
results for a model we refer to as the benchmark variety model in which variety is modelled by
including only the aggregate variety measure of the total number of products in the assortment,
corresponding to the most common modelling approach found in the literature. Column (3) presents
results of our proposed variety model. Finally, Column (4) presents second stage estimation results

Electronic copy available at: https://ssrn.com/abstract=4451618


16 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

of our proposed variety model using the control function approach. The first stage regression results
are presented and discussed in EC.3.
Columns (1) and (3) show that including our proposed variety variables improves significantly
2
the goodness-of-fit on all measures (McFadden Rbase =0.926). A likelihood ratio test between base
and variety models for each customer type returns a significant difference, and we conclude that
the model containing variety-specific variables is preferable as it leads to an enhanced modelling
approach that better captures customer choice behaviour.
Recall that the variables discount and broken.assort may be endogenous. Column (3) shows that
η1 , the coefficient estimate for resid.discount in eq. (7), is statistically insignificant, which suggests
that discount is not an endogenous variable in our model. On the other hand, η2 , the coefficient
estimate for resid.ba is highly significant and so broken.assort is endogenous. Consequently, we
can estimate coefficients of eq. (5) consistently without controlling for endogeneity in the discount
variable. We note that controlling for endogeneity in the variable broken.assort only affects the
estimation results for this variable significantly, and has little impact on the coefficient estimates
for other variables, and especially the variety measures.
Results shown in Column (2) reveal a U-shaped relationship between customer utility and the
aggregate variety measure (i.e., the total number of products). This finding contrasts with the
results of our proposed model (Columns (3) and (4)), where we observe both U-shaped and inverted
U-shaped relationships between variety variables and customer utility. This difference highlights
the need for adequate measures of assortment variety; measures that will capture its complex effects
on customer utility. Furthermore, the results of our proposed model (Columns (3)-(4)) confirm that
the relationship between variety variables and customer utility is nonlinear as all variety variable
coefficient estimates are highly significant.
Figure 3 illustrates the estimated relationship between variety variables and their contribution
to customer utility, based on coefficient estimates of Column (3), with 95% confidence interval
bands (shaded area). Panel (a) of Figure 3 shows that the number of styles has an inverted U-
shaped relationship with customer utility. This result supports similar findings in the literature; for
example, Wan et al. (2012), Wan et al. (2014), and Lu et al. (2022), who use nonlinear specifications
of the relationship between variety and customer choice. However, we are the first to provide
evidence of choice overload in a revealed-preference setting.
Panels (b) and (c) of Figure 3 show a U-shaped relationship between the number of colours
and customer utility, and normalized density and customer utility respectively. This U-shaped
pattern is significant as it shows that customers are affected by dimensions of variety differently
and supports our argument that a more nuanced approach to understanding variety in assortment
design is necessary.

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 17

0
3

-2

f(tot.colours)
2
f(tot.styles)

Legend Legend
Aggregated Aggregated
-4
1

0 -6
0 30 60 90 120 0 10 20 30
Total Style Vertices Total Colour Vertices

(a) Total Number of Styles (b) Total Number of Colours


0

-1
f(norm.density)

Legend
-2 Aggregated

-3

0.0 0.1 0.2 0.3 0.4 0.5


Normalized Density

(c) Normalized Density

Figure 3 Estimated Effect Variety Variables on Customer Utility

The decreasing pattern in Panel (b) suggests that, within a subcategory, a retailer is worse-off
offering two colours rather than one, three rather than two, and so on. To provide a rationale, we
look at the distribution of sales (in units) by colour in our data (see Table 8 in the Appendix). We
observe that 41% of the products purchased were in the colour black. The second most purchased
colour is red with only 7.6% of all sales, suggesting strong attraction context effects in the colour
dimension. In other words, customers shopping for dresses at our partner retailer have a strong
preference for the black colour. The latter helps explain the result in Panel (b): as the number
of colour vertices increases, the likelihood of finding the preferred style linked to the preferred
colour (in our case, black) decreases. We verify this mechanism in §5.2. We further posit that the
increasing part may be attributed to demand creation at high levels of colour vertices, and verify
this mechanism in §5.2.
The U-shaped relationship in Panel (c) suggests that low levels of normalized density bring strict
disutility, while higher levels bring strict utility, almost to the point of indifference (i.e., where
f (norm.density) = 0). We posit that the initial negative effect of normalized density on utility
can be attributed to repeated information that is not valued by customers. Category pages of the
online store show all products (i.e., all style-colour combinations) in the assortment, such that
a style connected to multiple colours appears as many times as the degree of that style vertex.
Such products are usually presented close to one another. Consequently, for a limited search (the
majority of customers only see a fraction of the full assortment), customers may see less unique

Electronic copy available at: https://ssrn.com/abstract=4451618


18 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

styles as the density of the assortment increases. Given that customers value styles more than
colours (as suggested from Panels (a) and (b)), seeing multiple styles in different colours only adds
noise to category pages. On the other hand, the positive effects of increasing normalized density
beyond a threshold could be explained by demand creation for high levels of normalized density
(i.e., more densely connected graphs), as it may provide customers with more opportunities to find
a preferred product in a preferred colour. We verify this mechanism in §5.2.
To summarize, our results provide evidence that different dimensions of variety affect customers
differently, and that only looking at variety in aggregate terms (i.e., as the total number of products)
may lead researchers and managers to miss important nuances in customer preferences. Our results
further confirm that customer preferences for variety vary nonlinearly, and more importantly, that
the shape and magnitude of the effect is different from one dimension to another.

5. Heterogeneous Effects
In Section 4, we analyzed the effect of variety on customer utility. In this section, we introduce
customer types, study how the effect of variety changes with customer type, and identify the
mechanisms behind it.

5.1. Customer Types


A major challenge with our sizable clickstream dataset and the model defined in §3 is the possibil-
ity of aggregation bias due to the unobserved heterogeneity in customers visiting the online store.
Effects obtained from aggregating customers of different types may not necessarily exist when types
are considered separately. Hutchinson et al. (2000) offer aggregation bias as an alternative expla-
nation for context effects. Hence, it is important to study whether customers are of different types
in their online behaviour. To this end, we first run a clustering algorithm to separate customers in
4
our dataset into different types.

5.1.1. Clustering. The first step is to determine the number of customer types in the data.
We adopt a behavioural clustering approach5 based on customer session patterns and use K-means
clustering. The data we use for clustering are not limited to the dress category, but leverages the
full information available on each customer’s session. This method has been previously used in the
literature to cluster customer types (Moe 2003). Table 3 summarizes the variables used for the
K-means clustering algorithm.
We evaluate the goodness-of-fit of our clustering algorithm using the scree plot approach (com-
monly referred to as the “elbow” method) and the Davies-Bouldin Index (Davies and Bouldin
4
We opt for a clustering algorithm over mixture models (e.g., mixed MNL) because we aim to develop a prescriptive
model to study customer behaviour. We argue a non-probabilistic labelling is more useful in this case.
5
See EC.4 for a detailed presentation of the behavioural assumptions and customer typology for our study.

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 19

Variable Definition
cat.pages Number of unique catetgory pages
avg.prod.cat Average number of unique product pages viewed per category
std.prod.cat Standard deviation of number of unique product pages viewed per category
repeat.cat.ratio Number of unique category pages to number of (total) unique product pages
tot.views Total number of product page views
prod.pages Number of unique product pages viewed (equal to size of consideration set)
repeat.prod.ratio Number of unique product pages viewed to total product pages viewed
max.views Maximum number of repeat product page views
avg.consid.set.price Average price of the products in the consideration set
avg.consid.set.discount Average discount rate of the products in the consideration set
add.cnt Number of add-to-basket operations
bv.cnt Number of basket-view operations
Table 3 Definition of Clustering Variables

1979). The elbow method is a simple yet useful heuristic to determine the optimal number of clus-
ters in the data by plotting the inertia (sum of squared distances of data points to their closest
cluster centers) as a function of the number of clusters. By doing so, one should be able to identify
an “elbow” where the inertia significantly reduces. The Davies-Bouldin index can be interpreted
as the average similarity between clusters, where the similarity is measured as the distance of the
clusters from each other, accounting for the size of each cluster. A lower score indicates dense and
well separated clusters.
Figure 4 presents the evaluation metrics as a function of the number of clusters.

×106

1.575
1.8
Davies-Bouldin Index

1.550
1.6
1.525
Inertia

1.4
1.500
1.2 1.475
1.0 1.450
0.8 1.425

2 4 6 8 10 2 3 4 5 6 7 8 9 10
Number of clusters (K) Number of clusters (K)
(a) Scree Plot (b) Davies-Bouldin Index

Figure 4 Evaluation Metrics–Clustering

From Panel (a) of Figure 4, we observe that the largest difference in slope between two segments
(i.e., the elbow in the plot) occurs at K = 2. Panel (b) of Figure 4 also shows that K = 2 results
in sufficiently high fitting number compared to other options. To provide a parsimonious and
interpretable model, we conduct our analysis with two customer types and present the mean and
standard deviation of the clustering variables for each customer type in Table 4. The best fit to
our data is with K = 5; in §6.1, we conduct a robustness analysis with K = 5 and show that our
insights are robust to this alternative clustering of customer types.

Electronic copy available at: https://ssrn.com/abstract=4451618


20 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Cluster 1 Cluster 2
(Exploratory) (Goal-directed)
(0.6% conversion) (8.5% conversion)
proportion 77% 23%
Mean St. Dev. Mean St. Dev.
cat.pages 1.168 0.449 4.629 3.318
avg.prod.cat 8.941 4.335 10.544 5.991
std.prod.cat 0.306 0.800 3.833 2.334
repeat.cat.ratio 0.144 0.067 0.217 0.113
tot.views 11.062 5.521 28.551 16.479
prod.pages 9.356 4.269 22.674 12.366
repeat.page.ratio 1.185 0.264 1.280 0.324
max.views 1.886 1.043 2.919 1.792
avg.consid.set.price 131.60 37.50 97.16 27.37
avg.consid.set.discount 0.236 0.149 0.253 0.119
add.cnt 0.105 0.482 1.784 3.308
bv.cnt 0.100 0.682 2.097 4.754
Table 4 Clustering Results–Two Customer Types

The clustering suggests different patterns of behaviour between two types of customers. Com-
pared to the customers in the first cluster (Exploratory customers in Table 4), the customers in
the second cluster (Goal-directed customers in Table 4) look at more products within categories,
generate more views in terms of both total views and unique product views, engage in more repeat
views of product pages (i.e., they navigate back to products already viewed more frequently), and
have more basket operations, which results in a higher conversion rate for that segment. Moreover,
goal-directed customers have lower average price and higher average discount than exploratory
customers, suggesting they are more price sensitive.

5.1.2. Estimation Results. We employ MLE to estimate the effect of variety variables on
customer utility for each customer type. Table 5 presents the estimation results for the variety
model without the control function approach. We refer the reader to EC.5.2 for the full set of
results (base model, variety model and variety model with control function) and discussion.
Figure 5 illustrates the estimated relationship between variety variables and their contribution
to customer utility by type, with 95% confidence interval bands (shaded area). For comparison, we
also include the effect of variety on the aggregated model of §4.2.
All three panels of Figure 5 show similar relationships to those in §4, irrespective of the customer
type. However, Figure 5 reveals that the magnitude of the effect depends greatly on the customer
type. To avoid repetition, we focus our discussion on the key differences between customer types,
and highlight how accounting for customer heterogeneity offers valuable insights to researchers and
retailers.
Panel (a) of Figure 5 shows that the magnitude of the effect of style vertices is greater for
exploratory customers than for goal-directed customers. The tight and non-overlapping confidence

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 21

Exploratory Goal-directed
(1) (2)
price -0.030∗∗∗ -0.024∗∗∗
(0.001) (0.0005)
∗∗∗
discount 0.690 0.840∗∗∗
(0.186) (0.107)
age -0.0055∗∗∗ -0.0023∗∗∗
(0.001) (0.001)
broken.assort -0.734∗∗∗ -0.675∗∗∗
(0.068) (0.042)
∗∗∗
view.count 0.520 0.498∗∗∗
(0.029) (0.012)
tot.styles 0.160∗∗∗ 0.059∗∗∗
(0.009) (0.005)
tot.styles2 -0.0011∗∗∗ -0.0004∗∗∗
(0.00007) (0.00004)
tot.colours -0.906∗∗∗ -0.479∗∗∗
(0.032) (0.016)
tot.colours2 0.022∗∗∗ 0.012∗∗∗
(0.0009) (0.0004)
norm.density -23.464∗∗∗ -22.528∗∗∗
(2.254) (1.301)
norm.density 2 37.148∗∗∗ 43.244∗∗∗
(8.144) (3.82)

Observations 1,106,010 392,811


Log-Likelihood -6,808.78 -13,804.88
AIC 13,639.56 27,631.75
BIC 13,769.34 27,750.35
McFadden R2 0.974 0.837
∗ ∗∗ ∗∗∗
: p ≤ 0.1; : p ≤ 0.05; : p ≤ 0.01 ; (std dev)
Table 5 Estimation Results–Two Customer Types

interval bands suggest a significant difference between the two types. We also observe that the
effect on utility (f (tot.styles)) obtains its maximum at a higher threshold of styles for exploratory
customers (71 versus 68 style vertices), providing evidence that exploratory customers value this
dimension of variety more.
Panel (b) of Figure 5 shows exploratory customers are more affected by the number of colours
in the assortment than goal-directed customers. The tight and non-overlapping confidence interval
bands again suggest a significant difference between the two types. We see this difference both in the
magnitude of the relationship (i.e., exploratory customers derive more disutility from the number of
colours in the assortment), and the location of the maximum (20 for exploratory customers versus
19 for goal-directed customers) of the relationship. The location of the maximum suggests that
goal-directed customers start valuing colour vertices positively before exploratory customers do.
We observe that, on average, goal-directed customers have more colours in their consideration sets

Electronic copy available at: https://ssrn.com/abstract=4451618


22 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

0.0
6

-2.5

f(tot.colours)
f(tot.styles)

4 Legend Legend
Aggregated -5.0 Aggregated
Exploratory Exploratory
Goal-directed Goal-directed
2
-7.5

0 -10.0
0 30 60 90 120 0 10 20 30
Total Style Vertices Total Colour Vertices

(a) Total Number of Styles (b) Total Number of Colours


0

-1
f(norm.density)

-2 Legend
Aggregated
Exploratory
-3 Goal-directed

-4

-5
0.0 0.1 0.2 0.3 0.4 0.5
Normalized Density

(c) Normalized Density

Figure 5 Estimated Effect of Variety Variables on Customer Utility

(seven unique colours) than do exploratory customers (five unique colours), and that this difference
is significant (p-value = 0). These results support the observation that goal-directed customers are
affected by the number of colours in the assortment less negatively than exploratory customers we
see in Panel (b). Recall that goal-directed customers are motivated by finding and purchasing their
preferred product, and therefore, more colours may provide them with more opportunities to find
their preferred product.
Panel (c) of Figure 5 shows overlapping confidence bands suggesting there is no significant
difference between the aggregated model, exploratory, and goal-directed customers. The diverging
confidence band for exploratory customers also seem to indicate higher variability in utility for
denser assortments, although we obtain a highly significant coefficient estimate for the quadratic
term.
Recall from §3.2 that the size of an assortment is a function of the number of styles and colours,
and the connectivity (i.e., the normalized density). Our results suggest that goal-directed (respec-
tively, exploratory) customers want fewer (respectively, more) styles, and more (respectively, fewer)
colours, while both types do not differ with respect to normalized density. Consequently, con-
sidered separately, an assortment generated to meet the preferences of goal-directed customers
would likely be larger (in terms of total products) than one generated to meet the preferences of
exploratory customers. This observation constitutes an added complexity to the retailer’s assort-
ment problem, should it decide to account for the different customer types. On the one hand, the

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 23

goal-directed customers represent the segment most likely to convert to a purchase (and thus con-
tribute to the retailer’s profit), but prefer larger, denser assortments, which are usually associated
with a higher managerial cost (production, transportation, inventory costs, etc.). On the other
hand, exploratory customers represent the largest segment (almost double that of goal-directed
customers) that prefers smaller, sparser assortment. Here also, this type of assortment has its pros
(smaller assortment is easier to manage), and cons (less commonality in products may increase
production costs). Our results therefore further highlight the necessary trade-offs a retailer must
face when making assortment decisions.

5.2. Understanding the Mechanisms


Our results on the effect of each variety measure on customer utility in §4.2 and §5.1 reveal a U-
shaped relationship between customer utility and number of colour vertices and normalized density
of the assortment. In this section, we explore and validate possible mechanisms that could explain
these patterns.

5.2.1. The Impact of Colour Vertices. Recall from §4.2 that customers in our dataset
have a strong preference for the colour black and that we believe the likelihood of a connection
between a preferred style and a popular colour in the assortment decreases as the number of colour
vertices increases. We verify the latter assumption by focusing on the colour black, and examine the
relationship between the total number of colour vertices and the normalized vertex degree, which we
define as the ratio of the degree of a colour vertex and the total number of styles in the subcategory
(the total number of styles is the maximum possible degree of that colour vertex). Our correlation
analysis shows that there exists a negative correlation (r = −0.509) between the total number of
colour vertices and the normalized black colour vertex degree. That is, as the number of colours
in the assortment increases, the fraction of products with black colour decreases (see Figure 8).
This provides evidence of strong attraction context effects, in which one colour dominates others,
and violates the key assumptions of the theory of rational choice (IIA and regularity assumptions,
see §2). Therefore, the mechanism explaining the decreasing part of Panel (b) in Figures 3 and 5
emphasizes the necessity of including context effects directly through our variety measures, in the
customer utility specification. Otherwise, a retailer could be lead to believe they are not offering
sufficient variety, when in fact, any additional colour added to the assortment will result in poorer
outcomes.
Customer utility increases in colour options after a threshold, which we posit could be attributed
to demand creation because a higher number of colour vertices provides customers with more
opportunities to find their preferred style-colour combination. To test the validity of this argument,
we compare sales for subcategories with low and high levels of colour options. We define the

Electronic copy available at: https://ssrn.com/abstract=4451618


24 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

threshold between low and high as the average of the locations of the minima of the relationships
shown in Panel (b) of Figure 3 for exploratory and goal-directed customers. We find this average
to be 20. We test the difference in mean sales (in units) of the low and high colour subcategories
in our dataset, and find a significant difference (x̄low = 0.588, x̄high = 2.680, p-value = 0.00).
This confirms that the mechanism driving the increase in customer utility for high levels of colour
vertices in a subcategory is demand creation.
The mechanisms uncovered here show that colour acts as a vertical (dominating or dominated)
rather than a horizontal differentiation dimension. At low levels, customers are not willing to
substitute for a different colour and an increase in the number of colours only makes them less
likely to find the style-colour combination they want and so more likely to leave without a purchase.
Beyond a threshold on the number of colours, we show there is demand creation, and we interpret
our results as customers being more likely to find the style-colour combination they want, and
therefore opt to purchase.6 However, a retailer may face high costs for having subcategories with
many colour options, and our results suggest that the additional sales from demand creation may
not make up for the additional product and operational costs of having larger subcategories, as
customer utility for high levels of colours is still much lower than for low levels.

5.2.2. The Impact of Normalized Density. An increase in utility for high levels of normal-
ized density could be attributed to demand creation because denser assortments provide customers
with more opportunities to find their preferred product. We test for this possibility in the case
of goal-directed customers, as we observe this phenomenon only for this type; see Panel (c) in
Figure 5.
We define the threshold between sparse and dense subcategories as the location of the minima
of the relationships shown in Panel (c) of Figure 3 for goal-directed customers which is equal to
0.260. The test of difference in mean sales (in units) of the sparse and dense subcategories returns
a significant difference, but in favour of the sparse subcategories (x̄sparse = 0.600, x̄dense = 0.121,
p-value = 0). This finding disproves the possibility of demand creation for denser subcategories.
Another possible mechanism that might explain why, for goal-directed customers, utility increases
with assortment density beyond a threshold for goal-directed customers is that denser assortments
increase the likelihood of finding their preferred product. To test this mechanism, we first identify
the product subcategories whose normalized density is greater than 0.260 (location of the mini-
mum). In our data, only two such subcategories exist, and they are small, each with a maximum of
2 styles and 3 colours and a total number of products not exceeding 4. We subset customers that
6
To put it differently: If a customer is presented with multiple styles in one colour, and her preferred style is
unavailable, she is more likely to purchase a product (through substitution of styles), than if the same customer is
presented with one style in different colours, when her preferred colour is missing.

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 25

have at least one product of the two subcategories in their consideration set (80% of goal-directed
customers), and collect the subcategory purchased (if a purchase happens). We then compare the
average number of times a product from a sparse subcategory (subcategory with normalized den-
sity lower than 0.260) is chosen over the two dense ones. The result (x̄sparse = 4.207, x̄dense = 8,
p-value = 0.092) is not significant at the 5% level, but we do see a marked difference between
the frequency of choice of denser (and smaller) subcategories. We note that the non-significance
of our tests is in line with the observed diverging confidence interval bands in Figures 3 and 5
for high levels of normalized density. Results from our test, although not statistically significant,
point in the direction that smaller and more densely connected subcategories for which customers
perceive most of the variety increases the likelihood of choice. More evidence is needed to confirm
this observation, which would help researchers and retailers alike to contrast the trade-offs implied
by smaller subcategories and simpler cognitive choice processes for goal-directed customers.

6. Robustness Analysis
In this section, we conduct two robustness tests. The first is concerned with investigating a larger
number of customer segments, and comparing the effects of variety on each segment’s utility. The
second robustness test is concerned with examining how the progression of variety during the selling
season affects customer utility. In light of our robustness tests, we revisit the mechanisms discussed
in §5.2 for the normalized density measure.

6.1. Refined Customer Types


In §5.1, we looked at the effects of variety on customer utility for two segemnts. In this section,
we discuss the estimation results for five customer types (i.e., when K = 5). For comparison, Table
EC.5 shows how exploratory and goal-directed customers are distributed between the five clusters.
Descriptive statistics (mean and standard deviation) of the clustering variables, a discussion of
customer typology for each of the new clusters, and the full estimation results for the five new
clusters are presented in EC.5.3. Based on the results in §4.2, we do not control for endogeneity
in the estimation because broken.assort is not the focus of our study but instead is a control
variable, and also from the observation that coefficient estimates for the variety measures are not
significantly affected by whether we control for endogeneity.
Coefficient estimates obtained for the product-specific variables are consistent with results in
Table 2. Here, Figure 6 illustrates the estimated relationship between variety variables and their
contribution to customer utility by type, with 95% confidence interval bands (shaded area). For
comparison, we also include the effect of variety on the aggregated model of §4.2.
When compared to the estimation results for the aggregate model, we observe that the rela-
tionship between customer utility and number of styles and number of colours in the assortment

Electronic copy available at: https://ssrn.com/abstract=4451618


26 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

6 6 6 6 6
f(tot.styles)

4 4 4 4 4

2 2 2 2 2

0 0 0 0 0
0 30 60 90 120 0 30 60 90 120 0 30 60 90 120 0 30 60 90 120 0 30 60 90 120
Total Style Vertices
Aggregated Cluster 1 Aggregated Cluster 2 Aggregated Cluster 3 Aggregated Cluster 4 Aggregated Cluster 5

(a) Total Number of Styles


0.0 0.0 0.0 0.0 0.0
f(tot.colours)

-2.5 -2.5 -2.5 -2.5 -2.5

-5.0 -5.0 -5.0 -5.0 -5.0

-7.5 -7.5 -7.5 -7.5 -7.5

-10.0 -10.0 -10.0 -10.0 -10.0


0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30
Total Colour Vertices
Aggregated Cluster 1 Aggregated Cluster 2 Aggregated Cluster 3 Aggregated Cluster 4 Aggregated Cluster 5

(b) Total Number of Colours


2.5 2.5 2.5 2.5 2.5
f(norm.density)

0.0 0.0 0.0 0.0 0.0


-2.5 -2.5 -2.5 -2.5 -2.5
-5.0 -5.0 -5.0 -5.0 -5.0
-7.5 -7.5 -7.5 -7.5 -7.5
-10.0 -10.0 -10.0 -10.0 -10.0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Normalized Density
Aggregated Cluster 1 Aggregated Cluster 2 Aggregated Cluster 3 Aggregated Cluster 4 Aggregated Cluster 5

(c) Normalized Density

Figure 6 Estimated Effect of Variety Variables on Customer Utility–Five Customer Types

(Panels (a) and (b) of Figure 6) is invariant to the refined clustering, but that the magnitude of
the relationship is not.
Our results show that customer utility decreases in the normalized density for Cluster 4 and
Cluster 5. The non-significant coefficient estimates for the linear term, together with the lower
significance of the coefficient estimate for the quadratic terms of Cluster 4 suggest that this cus-
tomer type has strict disutility to an assortment’s (normalized) density. For Cluster 5, only the
quadratic coefficient estimate is non-significant, indicating a negative linear relationship. Nonethe-
less, the large and diverging confidence interval bands (and the increasing upper limit) suggests
that this type is also mostly indifferent to normalized density. This result further confirms that not
all customer types are affected similarly by variety.
Although customers in each of the five new clusters exhibit different session behaviour, we see
from Figure 6 that confidence bands overlap for several clusters (specifically, Cluster 2, Cluster 3,
Cluster 4, and the aggregated model results) for each variety measures, and that the significant
difference can be traced back to the difference between exploratory and goal-directed customers.
More importantly, our results suggest that any new customer cluster that emerged from this refined
clustering exercise is bounded above by exploratory customer behaviour, and below by goal-directed
customer behaviour. However, we note differences in the location of the minima in the relationships.

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 27

Table 6 summarizes the location of minima for each customer type. This observation is especially
important as it may greatly influence how retailers set their variety, as it directly effects customer
behaviour.

Base Variety Model Refined Customer Clusters


Exploratory Goal-directed Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5
tot.styles 71.679 68.922 69.212 74.570 64.662 71.255 71.436
tot.colours 20.520 19.666 19.742 19.752 18.660 20.353 20.772
Table 6 Locations of Optima

6.2. Seasonality Effects


Products in fast fashion are seasonal goods and usually released gradually over the span of a selling
season. As a result, there is often a fluctuation in the demand from the beginning to the end of the
season. We are interested in understanding if this temporal dynamic within a selling season has an
effect on how customers react to variety.
The variety model presented so far does not control for this effect because our data contain
only unique visits from customers. We overcome this limitation by separating our dataset into
three subsets depending on the time of visit and re-estimate model coefficients. The first subset
contains data from mid-February to mid-March, the second from mid-March to mid-April, and the
last subset from mid-April to mid-May. This subset selection allows us to study the online store’s
different regimes.
Descriptive statistics of the variables for the three subsets, as well as estimation results are
presented in EC.6. Based on the results in §4.2, we do not control for endogeneity for the same
reasons given in the previous sections.
Coefficient estimates obtained for the product-specific variables are consistent with the results
presented in Table 2. Here again, Figure 7 illustrates the estimated relationship between variety
variables and their contribution to customer utility by month and customer type (exploratory and
goal-directed), with 95% confidence interval bands (shaded area). For comparison, we also include
the effect of variety on the aggregated model of §4.2. Not all variety variables cover the full range
of values in the first two months, explaining truncation of relationships observed.
To put the results presented in this subsection in context, we present the proportion of customers
from the full dataset, a proxy for store traffic, and conversion rates for each month in Table 7. We
also refer the reader to Figure 2 which shows the evolution of the total number of dress products
over the same three-month period. For both customer types, the proportion of customers is highest
in Month 2 (lowest in Month 1), and conversion rate is highest in Month 1 (lowest in Month 2).
Conversion rate is the highest when variety is at its lowest.

Electronic copy available at: https://ssrn.com/abstract=4451618


28 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Month 1 Month 2 Month 3


Exploratory Goal-directed Exploratory Goal-directed Exploratory Goal-directed
Number of customers (% total) 26,300 (16%) 9,932 (6%) 60,548 (38%) 15,222 (9%) 36,689 (23%) 12,115 (8%)
Conversion rate 0.78% 7.72% 0.55% 5.87% 0.61% 6.02%
Table 7 Seasonal Variations

Month 1 Month 2 Month 3


10.0 10.0 10.0
f(tot.styles)

7.5 7.5 7.5

5.0 5.0 5.0

2.5 2.5 2.5

0.0 0.0 0.0


0 30 60 90 120 0 30 60 90 120 0 30 60 90 120
Total Style Vertices
Aggregated Exploratory Goal-directed

(a) Total Number of Styles


Month 1 Month 2 Month 3
0 0 0
f(tot.colours)

-5 -5 -5

-10 -10 -10

0 10 20 30 0 10 20 30 0 10 20 30
Total Colour Vertices
Aggregated Exploratory Goal-directed

(b) Total Number of Colours


Month 1 Month 2 Month 3
2 2 2
f(norm.density)

0 0 0

-2 -2 -2

-4 -4 -4

-6 -6 -6
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Normalized Density
Aggregated Exploratory Goal-directed

(c) Normalized Density

Figure 7 Estimated Effect of Variety Variables on Customer Utility–Seasonality Effects

When compared to the estimation results in Figure 3, we see from Figure 7 that the shape of
the relationship for style and colour vertices (Panels (a) and (b) in both figures) is invariant to
seasonality. We see from Panel (a) Figure 7 an increase in magnitude of the effect of style vertices
on customer utility as the season progresses for exploratory customers, particularly between Month
1 and Month 2. We see from Panel (b) Figure 7 that the increasing part of the relationship for
exploratory customers is more marked in Month 1 than in the other months and from the results
presented in Panel (b) of Figure 3. This slightly more significant increase could be explained by
more demand creation coming from the higher conversion rates during that month.
Panel (c) Figure 7 shows an interesting evolution for exploratory customers as the season pro-
gresses. In Month 1, when variety is low and products are still being released, customer utility
increases for a low level of normalized density (up to around 0.035) before sharply decreasing.
As the variety plateaus in Month 2, exploratory customers transition to a strictly negative rela-
tionship, which we see then resembles that shown in Panel (c) of Figure 3. It is also interesting
to note that in Month 1, the confidence interval bands of both customer types are tight and not
overlapping (except for where the two relationships intersect), suggesting a significant difference

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 29

in the effect between customer types, a marked difference from the behaviour of the subsequent
months when we see the confidence interval bands overlap. Panel (c) shows the relationship for
goal-directed customers is mostly invariant to seasonality.
Finally, results of Panel (a) and Panel (b) show that the relationships from the aggregated model
follow closely those of the goal-directed customers, and suggest there is no significant differences
in Month 1 and Month 3 given the overlapping confidence interval bands.

6.3. Understanding the Mechanisms–The Impact of Normalized Density


In §5.2.2, we explored the likely mechanisms that could explain the increase in utility for high
levels of normalized density shown in Figure 5. Specifically, we looked at two possible mechanisms.
We revisit those using our robustness results.
The first mechanism is that the increase in utility for high levels of normalized density could be
explained by demand creation. In the case of K = 5 customer clusters, we perform the same test
taking into account customer types (Cluster 1 and Cluster 3 only), and obtain a similar result for
Cluster 1 (x̄sparse = 0.512, x̄dense = 0.117, p-value = 0), and an inconclusive result for Cluster 3
(x̄sparse = 0.057, x̄dense = 0.026, p-value = 0.254). We also perform the same test for seasonality
(Month 3 only), and obtain similar results (x̄sparse = 0.504, x̄dense = 0.057, p-value = 2.283e-3)
The second potential mechanism proposed that the increase in utility for high levels of normalized
density is due to an increased choice likelihood of products from dense subcategories, when present
in a consideration. In the case of K = 5 clusters, we apply the same methodology as in §5.2.2 to
test this difference on Cluster 1 customers and find that the difference is still not significant at
the 5% level (x̄sparse = 3.536, x̄dense = 7, p-value = 0.074). Our result, however, further points in
the direction that smaller and more densely connected subcategories for which customers perceive
most of the variety increases the likelihood of choice.

7. Conclusions, Limitations, and Future Work


In collaboration with one of the largest fast fashion retailers in Europe, our study aimed at under-
standing how assortment variety affects customer choice in large fashion assortments. We develop
a novel representation of an assortment as a bipartite graph, and define three measures of variety
based on this representation: number of style vertices, number of colour vertices, and normalized
density. We then define a customer behaviour model, formalized as a two-stage consider-then-
choose model, and test it empirically on a large customer-level clickstream dataset by estimating
model parameters of the second stage (i.e., choose stage).
Our paper bridges several research gaps between operations management, marketing and cus-
tomer psychology literatures. First, our study provides further evidence that the relationship
between all our proposed variety measures and customer utility is nonlinear. Moreover, our results

Electronic copy available at: https://ssrn.com/abstract=4451618


30 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

support the concept of choice overload for the style dimension of variety; we appear to be the first
to provide empirical evidence in a setting with revealed preferences. Our results also show strong
attraction context effects for the colour dimension of variety up to a threshold on the number
of colours, after which demand creation explains an increase in customer utility (for high levels
of colour vertices). Our results also suggest that sparser assortments are preferred, and point in
the direction that smaller, more densely connected subcategory assortments, for which customers
perceive most of the variety, may help customers in choosing their preferred product. However,
more evidence is needed to support this proposition.
We test the robustness of our results to aggregation (heterogeneity) bias by performing a seg-
mentation analysis in which we adopt a behavioural clustering approach based on customer session
patterns using K-means clustering. We also investigate the moderating effects of seasonality. We
find that our dataset has two main segments, which can be further subdivided to form a total
of five segments. Our estimation results show that the relationship between variety variables and
customer utility is similar, and only differs in magnitude and location of optimum. Moreover, our
results show that the relationship between variety variables and customer utility is invariant to
seasonality, but increases in magnitude as the season progresses.

Managerial Implications. Our results provide several managerial implications for fast-fashion
retailers seeking to optimize their product variety along multiple dimensions. Our results display
a different relationship than the aggregate variety measure (i.e. the total number of products),
highlighting the importance of adequately defining assortment variety to capture the complex
relationship to customer utility. Ignoring to do so could lead managers to miss important nuances in
customer preferences. For example, even with demand creation, the increase in customer utility for
colours is never nearly as high as if a low number of colour vertices are included in the assortment.
This result highlights the importance for managers to understand which dimensions of variety
customers are unwilling to substitute, and which vertices within the dimension dominate others.
With a better understanding of purchase behaviour, we contribute to the greater goal of increasing
product variety efficiency to achieve more sustainable resource consumption levels.
Our segmentation study further shows that different segments prefer different levels of variety,
and that their preferences are misaligned (i.e. when one wants more, the other wants less). Given
that different segments represent different proportions of the population and conversion rates,
heterogeneity in customer segments suggests a careful evaluation of an assortment’s offering by the
retailer.
Finally, our results are especially relevant for the assortment optimization problem. Adequately
specifying the effects of variety on customer demand could lead to reductions in assortment sizes, for

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 31

which there is anecdotal evidence that a reduction can help reduce waste and simplify operations,
without significantly affecting profits (Riesenegger and Hübner 2022). Furthermore, offering a wide
range of styles in a particular colour set may have different cost implications than offering a wide
range of colours with limited style variations. The former results in more design and development
efforts, whereas the latter leads to inventory management challenges due to the need for maintaining
inventory for each colour option. We think that our bipartite representation of product variety can
be used in conjunction with supply chain costs to optimize assortment decision at the product
design stage.

Limitations and Future Work. A first limitation to our study is that the results are only
applicable to our dataset. It would be interesting to see if similar results replicate for different fast
fashion retailers, or even different retail industries (e.g., grocery). A replication of our study on
different product categories and datasets will further allow confirming if the effects of, for example,
colour vertices are similar to what we report. It could be that the dress category was particularly
prone to having a dominating colour, just as it could be possible that another category has a
dominating style.
A further limitation of our study, relating to our dataset, is that we do not have attributes of
the dress products (e.g., length of sleeves, length of dress, details, etc.), which precludes us to build
an attribute-based model. Styles are a bundle of attributes, and so, variety within the attribute
space could influence customer utility differently (i.e., variety of some attribute may have a U-
shaped or inverted U-shaped relationship with the utility). Additionally, there is evidence that
customers use simple screening rules when making choices (Gilbride and Allenby 2004), suggesting
that not all attributes contribute equally to a customer’s choice. Separating styles into their bundle
of attributes could help further nuance the effect of assortment variety in a manner similar to
Boatwright and Nunes (2001), and help in supporting the discussion on dominated products coming
from the attraction context effect. One could also define additional graph-related variety measures
that more directly capture context effects.
Finally, this study only looked at the effect of variety on the first appearance of customers in our
data. It would be interesting to see the evolution of the effects of variety on returning customers.
In particular, it would be interesting to see if there are any shifts as customers become more loyal
to a brand or retailer, or when they transition from exploratory in one session to goal-directed in
their next session.

Appendix. Additional Tables & Figures


Table 8 below presents the distribution of units sold in the dress category for the 3-month period of interest
for this study by colour groups (see §EC.1.3).

Electronic copy available at: https://ssrn.com/abstract=4451618


32 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Colour Sales (in units)


Black 1295 41.09%
Red 239 7.58%
Navy Blue 235 7.46%
Pink 233 7.39%
Gray 200 6.35%
Green 186 5.90%
Blue 164 5.20%
White 144 4.57%
Yellow 143 4.54%
Burgundy 79 2.51%
Coffee 73 2.32%
Ecru 67 2.13%
Mixed 35 1.11%
Purple 33 1.05%
Orange 26 0.82%
Table 8 Distribution of Units Sold by Colour

1.0 y = 0.671 + −0.020x, r=-0.509

0.8
Degree Ratio–Black Colour

0.6

0.4

0.2

0 5 10 15 20 25 30
Total Number of Colour Vertices

Figure 8 Relationship between Number of Colour Vertices and Degree of Black Colour Vertex

References
Aouad A, Feldman J, Segev D, Zhang DJ (2019) The Click-Based MNL Model : A Novel Framework for
Modeling Click Data in Assortment Optimization, URL http://dx.doi.org/10.2139/ssrn.3340620,
Working paper. Available at SSRN 3340620.

Berry S, Levinsohn J, Pakes A (1995) Automobile Prices in Market Equilibrium. Econometrica 63(4):841–
890.

Bigon L, Cassani G, Greco C, Lacasa L, Pavoni M, Polonioli A, Tagliabue J (2019) Prediction is very hard,
especially about conversion. Predicting user purchases from clickstream data in fashion e-commerce.
URL http://arxiv.org/abs/1907.00400, arXiv: 1907.00400.

Boada-Collado P, Martı́nez-De-Albéniz V (2020) Estimating and optimizing the impact of inventory on con-
sumer choices in a fashion retail setting. Manufacturing and Service Operations Management 22(3):582–
597, ISSN 15265498, URL http://dx.doi.org/10.1287/msom.2018.0764.

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 33

Boatwright P, Nunes JC (2001) Reducing assortment: An attribute-based approach. Journal of Marketing


65(3):50–63, ISSN 00222429, URL http://dx.doi.org/10.1509/jmkg.65.3.50.18330.

Borle S, Boatwright P, Kadane JB, Nunes JC, Shmueli G (2005) The effect of product assortment changes
on customer retention. Marketing Science 24(4):616–622, ISSN 07322399, URL http://dx.doi.org/
10.1287/mksc.1050.0121.

Broniarczyk SM, Hoyer, McAlister L (1998) Consumers’ Perceptions of the Assortment Offered in a Grocery
Category: The Impact of Item Reduction. Journal of Marketing Research 35(2):166–176.

Brownstone D, Bunch DS, Train K (2000) Joint mixed logit models of stated and revealed preferences for
alternative-fuel vehicles. Transportation Research Part B: Methodological 34(5):315–338.

Brynjolfsson E, Hu YJ, Simester D (2011) Goodbye Pareto Principle, Hello Long Tail: The Effect of Search
Costs on the Concentration of Product Sales. Management Science 57(8):1373–1386, ISSN 0025-1909,
1526-5501, URL http://dx.doi.org/10.1287/mnsc.1110.1371.

Chernev A (2003a) Product Assortment and Individual Decision Processes. Journal of Personality and Social
Psychology 85(1):151–162, ISSN 00223514, URL http://dx.doi.org/10.1037/0022-3514.85.1.151.

Chernev A (2003b) When More Is Less and Less is More: The Role of Ideal Point Availability and Assortment
in Choice. Journal of Consumer Research 30(2):170–183.

Chernev A, Böckenholt U, Goodman J (2015) Choice overload: A conceptual review and meta-analysis.
Journal of Consumer Psychology 25(2):333–358, ISSN 10577408, URL http://dx.doi.org/10.1016/
j.jcps.2014.08.002.

Chuang HHC, Oliva R, Perdikaki O (2016) Traffic-Based Labor Planning in Retail Stores. Production
and Operations Management 25(1):96–113, ISSN 10591478, URL http://dx.doi.org/10.1111/poms.
12403.

Closa S (2015) Zara: disrupting the fashion industry. Harvard Business Review URL https://d3.harvard.
edu/platform-rctom/submission/zara-disrupting-the-fashion-industry/.

Davies DL, Bouldin DW (1979) A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and
Machine Intelligence PAMI-1(2):224–227.

Fox EJ, Montgomery AL, Lodish LM (2004) Consumer Shopping and Spending across Retail Formats. The
Journal of Business 77(S2):S25–S60, ISSN 0021-9398, 1537-5374, URL http://dx.doi.org/10.1086/
381518.

Gaur V, Honhon D (2006) Assortment planning and inventory decisions under a locational choice model.
Management Science 52(10):1528–1543, ISSN 00251909, URL http://dx.doi.org/10.1287/mnsc.
1060.0580.

Gilbride TJ, Allenby GM (2004) A choice model with conjunctive, disjunctive, and compensatory screening
rules. Marketing Science 23(3):391–406, ISSN 07322399, URL http://dx.doi.org/10.1287/mksc.
1030.0032.

Electronic copy available at: https://ssrn.com/abstract=4451618


34 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Gourville JT, Soman D (2005) Overchoice and Assortment Type: When and Why Variety Backfires. Mar-
keting Science 24(3):382–395, ISSN 0732-2399, 1526-548X, URL http://dx.doi.org/10.1287/mksc.
1040.0109.

He X, Zhang H, Kan MY, Chua TS (2016) Fast matrix factorization for online recommendation with implicit
feedback. SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and
Development in Information Retrieval, 549–558, ISBN 978-1-4503-4290-2, URL http://dx.doi.org/
10.1145/2911451.2911489.

Hoch SJ, Bradlow ET, Wansink B (1999) The variety of an assortment. Marketing Science 18(4):527–546,
ISSN 07322399, URL http://dx.doi.org/10.1287/mksc.18.4.527.

Honhon D, Kadiyala B, Ulu C (2017) Learning from Clickstream Data in Online Retail, URL http://dx.
doi.org/10.2139/ssrn.2830877, Working paper. Available at SSRN 2830877.

Huber J, Payne JW, Puto C (1982) Adding Asymmetrically Dominated Alternatives: Violations of Regularity
and the Similarity Hypothesis. Journal of Consumer Research 9(1):90–98, ISSN 0093-5301, 1537-5277,
URL http://dx.doi.org/10.1086/208899.

Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Computational Statistics &
Data Analysis 52(12):5186–5201, ISSN 01679473, URL http://dx.doi.org/10.1016/j.csda.2007.
11.008.

Hutchinson JW, Kamakura WA, Lynch JG (2000) Unobserved Heterogeneity as an Alternative Explanation
for “ Reversal ” Effects in Behavioral Research. Journal of Consumer Research 27(3):324–344.

Iyengar SS, Lepper MR (2000) When choice is demotivating: Can one desire too much of a good thing?
Journal of Personality and Social Psychology 79(6):995–1006, ISSN 00223514, URL http://dx.doi.
org/10.1037/0022-3514.79.6.995.

Janiszewski C (1998) The influence of display characteristics on visual exploratory search behavior. Journal
of Consumer Research 25:290–301, ISSN 00935301, URL http://dx.doi.org/10.1086/209540.

Kahn BE (1995) Consumer variety-seeking among goods and services. Journal of Retailing and Consumer
Services 2(3):139–148, ISSN 09696989.

Kahn BE (1998) Dynamic relationships with customers: High-variety strategies. Journal of the
Academy of Marketing Science 26(1):45–53, ISSN 00920703, URL http://dx.doi.org/10.1177/
0092070398261005.

Kok G, Şimşek AS (2021) Variety and Inventory Trade-off in Retailing: An Empirical Study, URL http:
//dx.doi.org/10.2139/ssrn.3791772, Working paper. Available at SSRN 3791772.

Lancaster K (1990) The Economics of Product Variety: A Survey. Marketing Science 9(3):189–206, ISSN
0732-2399, 1526-548X, URL http://dx.doi.org/10.1287/mksc.9.3.189.

Li MM, Liu X, Huang Y, Shi C, Hua C (2018) Integrating Empirical Estimation and Assortment Personal-
ization for E-Commerce: A Consider-Then-Choose Model, URL http://dx.doi.org/10.2139/ssrn.
3247323, Working paper. Available at SSRN 3247323.

Electronic copy available at: https://ssrn.com/abstract=4451618


Matte, Gumus, Nasiry: Product Variety and Customer Behaviour 35

Long X, Nasiry J (2022) Sustainability in the Fast Fashion Industry. Manufacturing & Service Operations
Management 24(3):1261–1885.

Long X, Sun J, Dai H, Zhang DJ, Zhang J, Chen Y, Hu H, Zhao B (2021) Choice Overload with Search
Cost and Anticipated Regret : Theoretical Framework and Field Evidence, URL http://dx.doi.org/
10.2139/ssrn.3890056, Working paper. Available at SSRN 3890056.

Lu G, Lee H, Son J (2022) Product variety in local grocery stores: Differential effects on stock-keeping
unit level sales. Journal of Operations Management 68(1):33–54, ISSN 0272-6963, 1873-1317, URL
http://dx.doi.org/10.1002/joom.1158.

Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis. (New York: Wiley), URL http://dx.
doi.org/10.2307/2343282.

McFadden D (1974) Conditional logit analysis of qualitative choice behavior. Zarembka P, ed., Frontiers in
Econometrics, 105–142 (New York: Academic Press).

Moe WW (2003) Buying, searching, or browsing: Differentiating between online shoppers using in-store
navigational clickstream. Journal of Consumer Psychology 13(1-2):29–39, ISSN 10577408, URL http:
//dx.doi.org/10.1207/s15327663jcp13-1&2_03.

Moe WW (2006) An empirical two-stage choice model with varying decision rules applied to Inter-
net clickstream data. Journal of Marketing Research 43(4):680–692, ISSN 00222437, URL http:
//dx.doi.org/10.1509/jmkr.43.4.680.

Paton E (2018) H&M, a fashion giant, has a problem: $4.3 billion in unsold clothes. https://www.nytimes.
com/2018/03/27/business/hm-clothes-stock-sales.html, accessed: 02-25-2021.

Petrin A, Train K (2010) A control function approach to endogeneity in consumer choice models. Journal of
Marketing Research 47(1):3–13, ISSN 00222437, URL http://dx.doi.org/10.1509/jmkr.47.1.3.

Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009) BPR: Bayesian personalized ranking from
implicit feedback. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI
2009, 452–461.

Riesenegger L, Hübner A (2022) Reducing Food Waste at Retail Stores—An Explorative Study. Sustainability
(Switzerland) 14(5), ISSN 20711050, URL http://dx.doi.org/10.3390/su14052494.

Rooderkerk RP, Van Heerde HJ, Bijmolt TH (2011) Incorporating context effects into a choice model. Journal
of Marketing Research 48(4):767–780.

Segran E (2021) You thought the pandemic killed fast fashion? Not
even close. Fast Company URL https://www.fastcompany.com/90688828/
you-thought-the-pandemic-killed-fast-fashion-not-even-close.

Simonson I (1999) The Effect of Product Assortment on Buyer Preferences. Journal of Retailing 75(3):347–
370.

Electronic copy available at: https://ssrn.com/abstract=4451618


36 Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Statista (2022) Fast fashion market value forecast worldwide from 2021 to 2026. "https://www.statista.
com/statistics/1008241/fast-fashion-market-value-forecast-worldwide", (accessed: 03-08-
2023).

Tagliabue J, Greco C, Roy JF, Yu B, Chia PJ, Bianchi F, Cassani G (2021) SIGIR 2021 E-Commerce Work-
shop Data Challenge, URL http://arxiv.org/abs/2104.09423, arXiv preprint. arXiv:2104.09423.

Tan TF, Netessine S (2014) When does the devil make work? An empirical study of the impact of workload
on worker productivity. Management Science 60(6):1574–1593, URL http://dx.doi.org/10.1287/
8a886aa9-b76a-4a67-b881-e7c244855e1f.

Ton Z, Raman A (2010) The effect of product variety and inventory levels on retail store sales: A longitudinal
study. Production and Operations Management 19(5):546–560, ISSN 10591478, URL http://dx.doi.
org/10.1111/j.1937-5956.2010.01120.x.

Tversky A, Kahneman D (1991) Loss Aversion in Riskless Choice: A Reference-Dependent Model. The
Quartely Journal of Economics 106(4):1039–1061.

van Herpen E, Pieters R (2002) The Variety of an Assortment: An Extension to the Attribute-Based
Approach. Marketing Science 21(3):331–341, ISSN 0732-2399, 1526-548X.

Wan X, Dresner ME, Evers PT (2014) Assessing the Dimensions of Product Variety on Performance: The
Value of Product Line and Pack Size. Journal of Business Logistics 35(3):213–224, ISSN 07353766,
URL http://dx.doi.org/10.1111/jbl.12054.

Wan X, Evers PT, Dresner ME (2012) Too much of a good thing: The impact of product variety on operations
and sales performance. Journal of Operations Management 30(4):316–324, ISSN 02726963, URL http:
//dx.doi.org/10.1016/j.jom.2011.12.002.

Wang R, Sahin O (2018) The impact of consumer search cost on assortment planning and pricing. Manage-
ment Science 64(8):3649–3666, ISSN 15265501, URL http://dx.doi.org/10.1287/mnsc.2017.2790.

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec1

Electronic Companion to “Product Variety and Customer


Behaviour in Online Fast Fashion Retailing”

EC.1. Data Preparation, Variable Definition & Sample Data


EC.1.1. Data Preparation
Recall that we are only interested in dress shoppers, i.e. customers that have at least one dress prod-
uct in their consideration set. Additionally, our data collection and cleaning serves two purposes:
obtaining structured data of the full session of customers for clustering purposes, and obtaining
structured data relative to the dress category only for estimation purposes. We extract from the
raw clickstream data, for each user, the list of product view, basket operation, and purchase events
only for dress shoppers. If a customer leaves the online store without making a purchase, we assign
the no-purchase option as the purchased product. In the event a dress shopper makes a purchase,
but in a different category than the dress category (e.g., pants), we modify the sales event to
represent the no-purchase (in the dress category) option.
We filter the customers in our data to retain only those that have interacted with five dress
products or more, allowing us to collect customers with a sufficient number of interactions for the
clustering and estimation purposes. This decision is made as we assumed sessions for customers
interacting with less than five products do not possess enough information to provide insights into
customer choice. This is common practice in machine learning applications to recommender systems
(e.g., Rendle et al. 2009, He et al. 2016, Bigon et al. 2019). Furthermore, Moe (2003) empirically
shows that customers interacting with on average 2 pages were clustered together, and did not
provide insights in terms of customer behaviour or typology. We choose a cutoff at 5 products as
we feel, based on preliminary inspection of our dataset, that a cutoff at 10 products would be too
limiting, and two products would still leave us with too much noise in the data. Applying this
cutoff rule results in removing 66% of the unique customers in the dataset.
Furthermore, as mentioned previous, we are only interested in applying our model to each cus-
tomers’ first visit to the online store, and consequently remove any subsequent visits from returning
customers other than the first one.
We also modify our dataset to retain customers that purchased more than one dress product
from a same sales event (e.g., a customer purchases 3 different dresses on her first visit). Since the
standard MNL model we use in our study assumes customers can only choose one product at a
time (i.e. purchase exactly one unit of one product), in those few cases (∼1% of users), we create as
many copies of the consideration set as unique products purchased, and assume separate purchase
events for each. Aouad et al. (2019) use the same approach. For the few cases (<1% of users) in

Electronic copy available at: https://ssrn.com/abstract=4451618


ec2 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

which customers purchase more than one unit of a same product, we modify the sales event to be
of one unit.
To have the information contained in the firm-side data at the same level as the clickstream
data, the first step is to aggregate, for each day, the SKU information at the product level, i.e.
aggregating information of all sizes of a product.

EC.1.1.1. Firm Data Imputation. We noted that a large fraction of the products (∼65%)
had missing entries for one of more dates of their selling season. We assume that the missing entries
corresponds to dates for which there were no transactions (either customer purchases or inventory
restock), and impute the data with those missing entries based on the last available date. This
allows us to have continuous entries for each product in our dataset, which is necessary for the
computation of several variables and estimation of the model.
For example, for a hypothetical product that is put on the online store on 2018-02-20, we see daily
entries the first week of availability of the product (from 2018-02-20 to 2018-02-27), 5 subsequent
dates missing (2018-02-28 to 2018-03-04), and then daily entries for the remainder of the time it is
available (from 2018-03-05 to 2018-06-24). We impute the data by copying, for each missing date,
the row corresponding to the entry of 2018-02-27.

EC.1.2. Outliers
Upon preliminary inspection of our dataset, we noted the extreme long-tail nature of the distribu-
tion of the size of the consideration set of each customer (the largest consideration set has 8,176
products). Because our data is collected from our partner retailer’s website, we were concerned some
data might not necessarily have been generated by humans, but rather by bots or web crawlers.
Note here that, as opposed to other applications (e.g., clinical trials), what we call outliers here
are neither measurement errors nor novel data. They are data points generated outside the data
generating process we aim to study, i.e. human customers shopping online. Therefore, we remove
those data points.
The simple rule of Q3 + 1.5 ∗ IQR felt restrictive when compared to the distribution. Therefore,
we opted for the rule that outliers were any data point that lied further than twice the standard
deviation from the mean. Applying this rule, we remove any customer that has more than 72
products in her (full) consideration set (1% of the unique customers), or more than 22 products in
her dress consideration set(3% of the unique customers).
We later learned of a method proposed by Hubert and Vandervieren (2008) which applies specif-
ically to skewed distributions, and is considered more robust than the original box plot approach.
The threshold obtained from this method are slightly different from those obtained from our sim-
ple standard deviation rule (more than 61 products in the (full) consideration set, more than 32

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec3

products in the dress consideration set). We reproduce the entire data pipeline, and estimation
(only for aggregate model), and find that our results are robust to this difference. Overall, there
is a 2% increase in the number of unique customers between the two approaches (from 159,772 to
162,884 unique customers).
Clustering provides similar segmentation, both in terms of proportion of customers and descrip-
tive statistics. Clustering results further confirm that K = 5 is the optimal number of clusters in
our case. We re-estimated coefficient for the aggregated model, and find them to be consistent
with results presented in Table 2. We present in Table EC.1 below a comparison of the estimation
results for the variety model without the control function (Column (3) of Table 2).

Variety Model Variety Model with revised


cutoff for outliers
(1) (2)
price -0.031∗∗∗ -0.030∗∗∗
(0.0004) (0.0004)
∗∗∗
discount 1.134 1.104∗∗∗
(0.091) (0.089)
age -0.003∗∗∗ -0.003∗∗∗
(0.0005) (0.0004)
broken.assort -0.729∗∗∗ -0.731∗∗∗
(0.036) (0.035)
view.count 0.584∗∗∗ 0.580∗∗∗
(0.011) (0.011)
tot.styles 0.080∗∗∗ 0.083∗∗∗
(0.005) (0.004)
tot.styles2 -0.001∗∗∗ -0.001∗∗∗
(0.00003) (0.00003)
tot.colours -0.586∗∗∗ -0.596∗∗∗
(0.015) (0.014)
tot.colours2 0.015∗∗∗ 0.015∗∗∗
(0.0004) (0.0004)
norm.density -25.761∗∗∗ -25.207∗∗∗
(1.141) (1.109)
norm.density 2 47.659∗∗∗ 46.072∗∗∗
(3.416) (3.338)

Observations 1,498,821 1,595,556


Log-Likelihood -22,022.97 -23,219.00

: p ≤ 0.1; ∗∗ : p ≤ 0.05; ∗∗∗
: p ≤ 0.01 ; (std dev)
Table EC.1 Comparison of Estimation Results

Electronic copy available at: https://ssrn.com/abstract=4451618


ec4 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

EC.1.3. Variable Definition


We present below a short description of the variables included in our model specification. Table
EC.2 below presents a sample of the data used for the choice estimation respectively, Table EC.3
presents summary statistics, and Table EC.4 the correlation matrix for the variety variables.
Price (Base Price). We include the base price (in the local currency) of each product as it has
been shown to be a strong predictor of purchasing decision. Base prices are obtained from firm-side
data.
Discount rate. The discount rate indicates the level of promotion of a product on a given date,
and ranges between 0 (no discount) to 1 (fully discounted). The discount rate is calculated with
respect to the base price of the product. The daily discount rate is included in our model to capture
customers’ propensity to discounts.
Age. We include the age of a product, that is, the number of days it has been on the website,
as a proxy for its position on the category page.
Broken Assortments. Recall that we do not have the size information (i.e. SKU-level informa-
tion) in our clickstream data. In addition, recall from §3.1 that inventory information is not readily
available to customers, and that they will only learn if a product is stocked out when adding it to
their basket. To control for the missing size information in our data and for the chance a customer
finds her preferred item stocked out, we define a proxy variable to control for the possibility that a
customer may find her preferred product stocked-out, leading her to not purchase the product (and
either substitute for the next best product, of leave without a purchase). Relating to the broken
assortment literature, we add the binary variable broken.assort equal to 1 if at least one of the
sizes is stocked out, and 0 otherwise.
View count. View counts indicate the number of times a customer has looked at a product
during her session. We include it as it has been observed that customers that return to a product
more often may be more likely to buy it.
A note on colour grouping (for tot.colours and norm.density). The retailer has a total
of 3,478 different colour codes for the seasons we restrict our application to. Those are highly
specific colour codes that have a product-differentiation purpose on the retailer’s end more than
for customers. On the online store, products of the dress category are categorized into 80 different
colours that are found in the product descriptions. Those colours cover the full range of “basic”
colours (e.g., red, green, blue, black, white, etc.), but also include specific colours (e.g., mint, navy,
bordeaux, etc.) as well as patterns (e.g., black striped, black patterned, turquoise check, etc.).
Finally, the retailer also has a higher-level categorization of colours with 15 distinct categories.
Those include only “basic” colours.

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec5

To define the subcategory variety variables tot.colours and norm.density, we used the middle
level of 80 colours. This allows to retain enough flexibility in the definition of the different colours
a customer can perceive (black, black striped and black patterned are assumed to be all distinct
colour/patterns). From our data, we see that any subcategory never has over 30 of the 80 colours
included.

userId productId price discount age view.cnt broken.assort tot.styles tot.colours norm.density
user 1 hash product 1 hash 149.99 0.47 129 1 1 11 6 0.036
user 1 hash product 2 hash 179.99 0.00 63 1 1 40 16 0.015
user 1 hash product 3 hash 149.99 0.57 129 1 1 21 8 0.020
user 1 hash product 4 hash 249.99 0.48 129 1 0 21 8 0.020
user 1 hash product 5 hash 199.99 0.50 129 1 1 40 16 0.015
Table EC.2 Sample Choice Model Estimation Data

Variable Mean Std. Dev. Min. Max.


price 131.70 54.08 24.99 299.99
discount 0.25 0.23 0 0.76
age 69 47 1 173
broken.assort 0.70 0.46 0 1
view.count 1.22 0.63 1 36
tot.styles 35.56 27.76 1 115
tot.colours 13.03 6.94 1 29
norm.density 0.02 0.04 0 0.5
Table EC.3 Descriptive Statistics

tot.styles tot.colours tot.products norm.density


tot.styles 1.000000 0.839983 0.993009 -0.180424
tot.colours 1.000000 0.881020 -0.170871
tot.products 1.000000 -0.163321
norm.density 1.000000
Table EC.4 Correlation Matrix–Variety Variables

EC.2. Estimation Procedure


In the first stage, a customer will look at some or all of the product in the offer set St proposed by
the retailer on that date, and adds product j to her consideration set Cit with probability λi(jt) .
We assume that a customer adds a product to her consideration set if, from the category page, she
clicks on the product to access the product page. The same assumption is used by Moe (2006),
Honhon et al. (2017), and Aouad et al. (2019).

Electronic copy available at: https://ssrn.com/abstract=4451618


ec6 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

In the second stage, she chooses the product in her consideration set that maximizes her utility.
This stage can be readily modeled using a Multinomial Logit (MNL) choice model on each cus-
tomer’s consideration set, for which we can define the probability of customer i choosing product
S
j ∈ Cit {0} on date t as
exp{ui(jt) }
qi(jt) := P (EC.1)
k∈Cit exp{ui(jt) } + 1

Define the probability of customer i choosing product j on date t given the offer set St as

πi(jt) (St ) := λi(jt) qi(jt) (EC.2)

then, assuming each customer decision is independent of all others, the likelihood function defining
the probability of each customer choosing the alternative observed in the data can be written as

L(β, δ, η) := Πt Πi Πj∈Cit λi(jt) (qi(jt) )yi(jt) Πk∈C


/ it (1 − λi(kt) ) (EC.3)

where yijt is a binary variable equal to 1 if customer i chooses product j on date t. The log-likelihood
can then be written as
 
XX X X
L(β, δ, η) :=  λi(jt) + yi(jt) log(qi(jt) ) + (1 − λi(kt) ) (EC.4)
t i j∈Cit k∈C
/ it
XX X XX X XX X
= λi(jt) + (1 − λi(kt) ) + yi(jt) log(qi(jt) ) (EC.5)
t i j∈Cit t i k∈C
/ it t i j∈Cit

From eq. (EC.5), we note that the log-likelihood function separates into two independent max-
imization problems, one for each stage of the consider-then-choose model. Since the first stage of
the process is given, i.e. we observe exactly what each customer includes in their consideration sets
from the clickstream data, we focus our attention exclusively on the second stage estimation. We
use Maximum Likelihood Estimation (MLE) to estimate model parameters β and η. Consequently,
we maximize the following log-likelihood function
XX X
L(β, δ, η) := yi(jt) log(qi(jt) ) (EC.6)
t i j∈Cit

which we can recognize as the classical MNL model log-likelihood, estimated on each customer’s
consideration set Cit . The log-likelihood of the MNL model is concave in the model parameters
(McFadden 1974) and can therefore be easily maximized. All model estimations were carried in R
using a custom Newton-Raphson algorithm with line search.7
We consider customer types independently for estimation of the choose stage coefficients, based
on the clustering obtained form the K-Means algorithm.

7
R code available upon request.

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec7

Finally, we measure the goodness-of-fit of our model using three different metrics: the Akaike
Information Criterion (AIC := −2L + 2k), the Bayesian Information Criterion (BIC := −2L +
L
log(n)k), and McFadden pseudo-R2 (RM
2
cF adden := 1 − L0 ). In the previous equations, L is the log-

likelihood of the estimated model, k is the number of covariates included in the model, n is the
number of observations in the dataset. L0 corresponds to the log-likelihood of the null model, for
S
which all products have an equal choice probability (i.e. qi(jt) = 1/|Ci t {0}|).

EC.3. Endogeneity Results


Tables EC.1 presents the regression results of the first-stage regression. Note that the first stage
regression is at the product level, and does not include any customer-specific data (view.count
variable). Consequently, the first-stage regression is identical for both customer types.
The first stage results show a high goodness-of-fit (R2 = 0.941 for discount and R2 = 0.864
for broken.assort), and therefore validate the instruments for their use in the control function
approach. discount.lag explains 94% of the variance (i.e. all the variance) in the discount model
(Column (1)), while broken.assort.lag explains 86% of the variance (i.e., all the variance) in the
broken.assort model (Column (2)), showing that the chosen instruments significantly contribute
to the prediction of the endogenous variables.

EC.4. Customer Typology


We develop in this section the theoretical background of the customer behaviour and formalize the
two-stage consider-then-choose framework. In our study, we assume that a customer has imperfect
information about product fit, i.e. she does not know a priori which product from the offered
assortment she prefers, and therefore engages in a search and deliberation behaviour to resolve
this uncertainty. Since the search process is costly, and the offered assortment is large, a customer
will only look at a subset of the offered products (the consider stage) before deliberating if she
either selects one of the products viewed, or leaves the online store without making a purchase (the
choose stage).
Recall the utility of product j ∈ St for customer i on date t is expressed in eq.(4)

Assumption EC.1. Demand is unit-inelastic.

In other words, we assume that a customer that purchases a product purchases at most one unit
of the product.

Assumption EC.2. There are two main types of customers visiting the online store; those who
exhibit a goal-directed search behaviour, and those that exhibit an exploratory search behaviour. The
former has a higher propensity to conversion than the latter. A customer is aware of her type prior
to starting an online session.

Electronic copy available at: https://ssrn.com/abstract=4451618


ec8 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Dependent variable:
discount broken.assort
(1) (2)
price 0.00002∗∗∗ −0.00005∗∗∗
(0.00001) (0.00002)
discount - −0.014
(0.013)
age 0.00001∗∗ 0.0001∗∗∗
(0.00001) (0.00002)
broken.assort −0.001 -
(0.001)
num.styles −0.0001∗ −0.0001
(0.0001) (0.0002)
num.styles2 0.00000∗∗ 0.00000
(0.00000) (0.00000)
num.colours −0.0001 −0.0002
(0.0002) (0.001)
num.colours2 0.00000 −0.00001
(0.00001) (0.00002)
norm.density −0.038∗∗∗ −0.012
(0.015) (0.046)
norm.density 2 0.010 0.070
(0.041) (0.129)
discount.lag 0.965∗∗∗ 0.035∗∗∗
(0.001) (0.013)
broken.assort.lag 0.0001 0.917∗∗∗
(0.001) (0.002)

Observations 62,266 62,266


R2 0.941 0.864
2
Adjusted R 0.940 0.864
F Statistic (df = 11; 62157) 89,416.260∗∗∗ 36,047.150∗∗∗

: p ≤ 0.1; ∗∗ : p ≤ 0.05; ∗∗∗ : p ≤ 0.01; (std dev)
Table EC.1 First-Stage Regression Estimation Results

We base this assumption on the idea that search behaviour can be separated into two main cate-
gories: goal-oriented and exploratory search (Janiszewski 1998). We assume that some customers
visit the online store motivated by making a purchase. They either have a clear idea of a product to
purchase, or can be converted to purchase a product while visiting the online store. Naturally, those
customers have a higher conversion rate. We refer to those customers as goal-directed customers.
On the other hand, we also assume that some customers are motivated by simply browsing and
learning the assortment offered. Naturally, those customers have a significantly lower propensity
to convert to a purchase. We refer to those customers as exploratory customers.
We make the distinction between sessions for a same customer, and allow for the possibility that
a customer be of the goal-directed type in one session, and of the exploratory type in another.
To simplify our setting, however, we focus in this study on a customer’s first session at the online

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec9

store. We motivate this choice by our desire to remove any complementarity or sequential effects
that might affect a customer to return to the online store to purchase a product she saw previously.
Alternatively, we are focusing on customers’ first impression of an assortment, and how it affects
their choice.
Based on this segmentation, we assume that goal-directed customers will have a session which
exhibits a more directed and structured search pattern than an exploratory customer. Specifically,
we focus on four main dimensions of the customer session to refine the customer behaviour by type:
category pages, product pages, pricing, and basket operations. We make the additional assumptions
on online session behaviour of the two customer types.

Assumption EC.2a. The search behaviour of goal-directed customers is structured and focused.
It is characterized by interacting with a few product categories, but many products within, with a high
number of repeat views for certain products. Their session is also characterized by multiple basket
operations. Consideration sets tend to be large. Goal-directed customers are more price sensitive,
particularly to discounts.

Because goal-directed customers are motivated by a purchase, and may have an idea of the product
they would like to purchase, they tend to focus on particular product categories where they could
find their preferred product. Their session is therefore focused on accessing product information
(i.e. product page views). Consequently, goal-directed will interact with many products from a
few categories. Because they are trying to find the best product to purchase, they will look at
multiple substitute products during their session, and it is typical for them to see certain products
multiple times. Consequently, they will have many total product views, and a large consideration
set. Finally, because their session is motivated by making a purchase, goal-directed customers will
tend to have many basket operations, and will be sensitive to base price and any discounts.

Assumption EC.2b. The search behaviour of exploratory customers is characterized by inter-


acting with many product categories, but little products within. Their session is also characterized
by little to none repeat product views and basket operations. Consideration sets tend to be small.
Exploratory customers are less price sensitive.

Because exploratory customers are motivated by browsing and learning the assortment, they have
category-focused sessions. They will access multiple product categories, but will interact with a
moderate number of products within. Consequently, total views will be low, and their consideration
sets small. Because they are less focused on purchasing a product, they tend to be less sensitive to
price and discounts, and tend to seldom have basket operations.
Table EC.1 below presents an overview of our assumptions for each customer type along the
dimensions of a session identified. Note that the typology presented in Assumptions EC.2 to EC.2b

Electronic copy available at: https://ssrn.com/abstract=4451618


ec10 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

borrows from the work by Moe (2003). In particular, we support the assumed session behaviour of
Table EC.1 on the empirical findings presented by Moe (2003).

Exploratory Customer Goal-directed Customer


Category Pages High Low
Unique Products per Category Page Low High
Category Pages to Unique Products High Low
Total Product Views Low High
Unique Product Views (Size of Consideration Set) Low High
Repeat Views of Product Pages Low High
Price Sensitivity Low High
Add-to-Basket Low High
View Basket Low High
Table EC.1 Assumed Session Behaviour per Customer Type

We assume that exploratory customers constitute the majority of the customers visiting the
online store for the simple reason that the upfront cost to visiting an online store nowadays is little
to none, compared to visiting a brick-and-mortar store. Consequently, it is easy for customers to
visit an online store to browse and learn a retailer’s offering, compare with other retailers, or simply
pass the time by looking at clothes. This assumption is supported by observed conversion rates of
1% in a typical online store (e.g., Moe 2003, Tagliabue et al. 2021, this study), versus an observed
conversion rate that is usually much higher in brick-and-mortar stores (e.g., ∼6% as reported by
Boada-Collado and Martı́nez-De-Albéniz 2020).

Assumption EC.3. There exists an outside (no-purchase) option available at all times. Cus-
tomers can learn the utility, both the deterministic and random part, of the outside option at no
cost.

In other words, the customer, at any point in her session, can decide to leave the store without
making a purchase, and that she knows exactly its cost.

Assumption EC.4. Customers form their consideration set by balancing search cost and prod-
uct match. Exploratory customers have a higher search cost than goal-directed customers.

In the first stage of the consider-then-choose framework, the consider stage, a customer will select a
subset of products from the offered assortment to consider further before making her final purchase
decision. The size of the subset accounts for the costly search, and therefore balances the trade-off
between finding a better alternative (but not necessarily the next one) and the cost of searching for
a longer time. Due to the large size of the offering, it is unlikely that a customer will look through
the entire offer set. Consequently, a large portion of products in the offer set St will never be seen by
customers because they either have found a product they like enough within the first few they see,

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec11

or deemed the offer set was not matching their expectations and left without making a purchase. It
is assumed that when the customer leaves the online store, the sale is lost and the customer never
returns to the store. Because of their type, we assume the search process of exploratory customers
is more costly, compared to goal-directed customers. Consequently, exploratory customers will on
average have a smaller consideration set than goal-directed customers. This is supported empirically
by Chernev (2003a), who shows that customers with an idea of what they want to purchase (i.e.
goal-directed customers) prefer larger assortments to those without an articulated idea. This is
reflected in Table EC.1 and supported empirically (Moe 2003). The first stage ends when the
customer has identified her optimal consideration set.
During the consider stage, the customer only decides on the products to include in the consid-
eration set, but does not resolve uncertainty about them.

Assumption EC.5. Customers resolve uncertainty towards a product by accessing the product
page.

Based on the description of the online store provided in §3.1, on the product category page, a
customer only has access to a limited amount of information. While this limited information is
enough for a customer to decide if she wants to inquire further information about a specific product,
we assume it is not enough for her to fully resolve the uncertainty about the product. Consequently,
a customer has to access the product page to obtain full information about products and resolve
uncertainty.

Assumption EC.6. Customers choose the alternative which maximizes their utility, after
resolving any uncertainty about products.

In other words, after defining their consideration set and resolving uncertainty about the products
within, a customer enters the choose stage, where she chooses the product that maximizes her
utility.
Here, one of two scenarios may happen. On one hand, a customer may find that her preferred
product is stocked-out (e.g., sees depleted inventory for her size when trying to add to basket). In
this case, she may either substitute for the next best product, or leave without making a purchase.
Otherwise, she chooses the product that maximizes her utility and proceeds to checkout.

Assumption EC.7. Customer choice is a function of product variety.

In addition to product-specific variables, we assume that customer choice depends on the context
in which it is made. Recall that the consider-then-choose framework naturally allows mitigating
potential IIA assumption violations because it implicitly models the consideration set formation
(Wang and Sahin 2018). However, product variety may still significantly influence customer choice

Electronic copy available at: https://ssrn.com/abstract=4451618


ec12 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

in the second stage. We propose to directly account for the latter by deaggregating the deterministic
part of the utility presented in eq. (4) into a product-specific and variety-specific utility as:

Ui(jt) := upi(jt) + uvi(jt) + ξi(jt) (EC.7)

where upi(jt) corresponds to the deterministic product-specific utility, uvi(jt) corresponds to the deter-
ministic variety-specific utility, and ξi(jt) are the iid idiosyncratic shocks.
Based on the above assumptions, we formulate below the main hypothesis of this paper, which
we aim to test empirically.

Hypothesis EC.1. Exploratory customers are positively affected by variety, and possess a
threshold beyond which they become indifferent. Goal-directed customers possess an optimal level
of variety.

In other words, we posit that exploratory customers are generally positively affected by variety.
Recall that this customer type derives utility from browsing the online store (hedonic browsing) or
learning the assortment (extend market expertise). Consequently, more variety translates into more
information. However, at some level, even exploratory customers cannot see and appreciate variety.
Conversely, goal-directed customers have an optimal level of variety. Too little variety doesn’t allow
them to feel like they have much choice, but too much variety leads to choice paralysis.

EC.5. Clustering Procedure & Results


EC.5.1. Clustering Variables
We present here a detailed definition of the clustering variables, in addition to our motivation for
using them. Table EC.1 presents descriptive statistics for the clustering variables, Table EC.2 a
sample of the data used for the clustering algorithm, and Table EC.3 the correlation matrix for
the clustering variables.
cat.pages captures how many unique category pages a customer visits during a session. Note
that this does not equate to the number of product categories in a customer’s consideration set,
but rather reflects how many category pages they access. avg.prod.cat captures the unique product
for each category page accessed in a compact statistic. Because one of our main assumption is that
one customer type will view more products than another, we also include std.prod.cat to measure
the variance of products viewed on each product category page. repeat.cat.ratio captures how
focused on the category pages each customer is during their session. A low ratio indicates a low
number of category pages to unique products, while a large ratio indicates a scattered interest.
tot.views measures how many products were viewed during a session, and prod.pages corresponds
to the size of the consideration set for the full session (i.e. not restricted to the dress category),
while repeat.prod.ratio captures if customers exhibit the behaviour of returning multiple times

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec13

to products when comparing and slowly progressing to a preferred product. maxv.iews captures
the maximum number of times a product was viewed during a session, as this is also considered
a strong indicator of conversion. avg.consid.set.price and avg.consid.set.discount capture price
sensitivity of customers’ session in a compact statistic. Finally, add.cnt and bv.cnt capture basket
operations statistics which are also considered strong indicators of conversion.

Variable Mean Std. Dev. Min. Max.


cat.pages 1.958 2.186 1 35
avg.prod.cat 9.307 4.811 2.200 65
std.prod.cat 1.111 1.982 0 22.054
repeat.cat.ratio 0.161 0.085 0.015 0.697
tot.views 15.053 11.805 5 191
prod.pages 12.395 8.955 5 74
repeat.prod.ratio 1.207 0.282 1 6.375
max.views 2.122 1.327 1 36
avg.consid.set.price 123.74 38.28 20.66 289.99
avg.consid.set.discount 0.240 0.143 0 0.771
add.cnt 0.488 1.781 0 96
bv.cnt 0.556 2.494 0 80
Table EC.1 Clustering Variables–Descriptive Statistics

From Table EC.1, it is interesting to see that the descriptive statistics of prod.pages, which
corresponds to the size of the consideration set, confirm that our consider-then-choose approach is
valid. On average, for their full session (i.e. not limited to dress products), customers in our data
have a consideration set of ∼ 12 products, with a maximum of 74 (after removal of outliers), which
is only a fraction of the average of ∼ 30, 000 products available on the online store on any day.
Finally, we log-transform, center and scale all variables used in the K-Means algorithm as they
did not meet the Normality assumption. After those manipulations, all variables met, or were close
enough, to the Normality assumption to proceed with the K-Means clustering algorithm.

userId cat.pages avg.prod.cat std.prod.cat repeat.cat.ratio tot.views prod.pages repeat.page.ratio max.views avg.consid.set.price avg.consid.set.discount add.cnt bv.cnt
user 1 hash 1 8.000 0.000 0.125 51 8 6.375 36 107.49 0.18 0 0
user 2 hash 1 20.000 0.000 0.050 112 20 5.600 36 178.49 0.37 4 24
user 3 hash 1 29.000 0.000 0.034 91 29 3.138 29 71.78 0.01 5 18
user 4 hash 8 14.405 9.385 0.216 83 37 2.243 26 85.67 0.03 5 15
user 5 hash 5 14.184 4.640 0.102 114 49 2.327 26 87.49 0.21 5 14
Table EC.2 Sample Clustering Data

EC.5.2. Clustering Results–K = 2


Table EC.4 presents estimation results for K = 2. Column (1) and (2) present estimation results of
a base model, in which no variety variables are incorporated, for exploratory customers, and goal-
directed customers respectively. Columns (3) and (4) present estimation results of our proposed
model including variety-specific variables for exploratory customers, and goal-directed customers

Electronic copy available at: https://ssrn.com/abstract=4451618


ec14 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

cat.pages avg.prod.cat std.prod.cat repeat.cat.ratio tot.views prod.pages repeat.prod.ratio max.views avg.consid.set.price avg.consid.set.discount add.cnt bv.cnt
cat.pages 1.000000 -0.086179 0.664227 0.564410 0.661457 0.717411 -0.002450 0.171938 -0.331307 0.010142 0.405503 0.278511
avg.prod.cat 1.000000 0.191503 -0.654788 0.518157 0.541807 0.041786 0.201426 -0.166324 0.104294 0.116373 0.085855
std.prod.cat 1.000000 0.344571 0.644833 0.683078 0.016447 0.206494 -0.302208 0.039395 0.344786 0.268829
repeat.cat.ratio 1.000000 -0.062464 -0.050177 -0.042125 -0.057256 -0.058914 -0.075649 0.080779 0.062915
tot.views 1.000000 0.945436 0.298519 0.514347 -0.365660 0.062256 0.487385 0.403175
prod.pages 1.000000 0.037241 0.289343 -0.387336 0.075860 0.459633 0.321040
repeat.prod.ratio 1.000000 0.807609 -0.011795 -0.034793 0.120418 0.204092
max.views 1.000000 -0.116034 -0.003519 0.249438 0.337276
avg.consid.set.price 1.000000 -0.091048 -0.224494 -0.167422
avg.consid.set.discount 1.000000 0.054741 0.036686
add.cnt 1.000000 0.560237
bv.cnt 1.000000

Table EC.3 Correlation Matrix–Clustering Variables

respectively. Columns (5) and (6) present estimation results of our variety model controlling for
endogeneity using the control function for exploratory customers, and goal-directed customers
respectively. The table also contains the log-likelihood of each model, as well as goodness-of-fit
measures (AIC, BIC, McFadden R2 ).

Base Model Variety Model Variety Model with control function


Exploratory Goal-directed Exploratory Goal-directed Exploratory Goal-directed
(1) (2) (3) (4) (5) (6)
price -0.053∗∗∗ -0.044∗∗∗ -0.030∗∗∗ -0.024∗∗∗ -0.030∗∗∗ -0.024∗∗∗
(0.001) (0.0004) (0.001) (0.0005) (0.001) (0.0005)
discount 1.349∗∗∗ 1.193∗∗∗ 0.690∗∗∗ 0.840∗∗∗ 0.600∗∗∗ 0.882∗∗∗
(0.154) (0.095) (0.186) (0.107) (0.197) (0.112)
age -0.012∗∗∗ -0.009∗∗∗ -0.0055∗∗∗ -0.0023∗∗∗ -0.005∗∗∗ -0.002∗∗∗
(0.001) (0.0004) (0.001) (0.001) (0.001) (0.001)
broken.assort -1.135∗∗∗ -1.040∗∗∗ -0.734∗∗∗ -0.675∗∗∗ -0.890∗∗∗ -0.783∗∗∗
(0.059) (0.037) (0.068) (0.042) (0.075) (0.046)
view.count -0.436∗∗∗ 0.374∗∗∗ 0.520∗∗∗ 0.498∗∗∗ 0.523∗∗∗ 0.499∗∗∗
(0.057) (0.012) (0.029) (0.012) (0.029) (0.012)
tot.styles - - 0.160∗∗∗ 0.059∗∗∗ 0.159∗∗∗ 0.058∗∗∗
(0.009) (0.005) (0.009) (0.005)
tot.styles2 - - -0.0011∗∗∗ -0.0004∗∗∗ -0.001∗∗∗ -0.0004∗∗∗
(0.00007) (0.00004) (0.00007) (0.00004)
tot.colours - - -0.906∗∗∗ -0.479∗∗∗ -0.898∗∗∗ -0.474∗∗∗
(0.032) (0.016) (0.032) (0.016)
tot.colours2 - - 0.022∗∗∗ 0.012∗∗∗ 0.022∗∗∗ 0.012∗∗∗
(0.0009) (0.0004) (0.0009) (0.0004)
norm.density - - -23.464∗∗∗ -22.528∗∗∗ -23.308∗∗∗ -22.272∗∗∗
(2.254) (1.301) (2.254) (1.301)
norm.density 2 - - 37.148∗∗∗ 43.244∗∗∗ 36.914∗∗∗ 42.841∗∗∗
(8.144) (3.82) (8.095) (3.816)
resid.discount - - - - 0.971 -0.250
(0.605) (0.361)
resid.ba - - - - 0.824∗∗∗ 0.576∗∗∗
(0.155) (0.098)

Observations 1,106,010 392,811 1,106,010 392,811 1,106,010 392,811


Log-Likelihood -8,357.84 -15,680.55 -6,808.78 -13,804.88 -6,794.22 -13,787.42
AIC 16,725.67 31,371.10 13,639.56 27,631.75 13,614.44 27,600.85
BIC 16,784.66 31,425.00 13,769.34 27,750.35 13,767.81 27,741.00
McFadden R2 0.968 0.815 0.974 0.837 0.974 0.838
∗ ∗∗ ∗∗∗
: p ≤ 0.1; : p ≤ 0.05; : p ≤ 0.01 ; (std dev)
Table EC.4 Estimation Results–Five Customer Types

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec15

EC.5.3. Clustering Results–K = 5


EC.5.3.1. Clustering & Customer Typology. Table EC.5 shows how exploratory and
goal-directed customers are distributed between the five clusters. Each row sums to the total
number of customers in the exploratory and goal-directed segments, and the number in parentheses
below indicates the fraction of customers in each of the five clusters that make up the exploratory
and goal-directed segments. Each column sums to the total number of customers in the new clusters,
and the number in parentheses on the right indicates the fraction of exploratory and goal-directed
customers that make up the new clusters.

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5


Exploratory 51 (0.49%) 61426 (99.99%) 8622 (29.54%) 33702 (90.13%) 19510 (91.01%)
(0.04%) (49.81%) (6.99%) (27.33%) (15.82%)
Goal-directed 10273 (99.51%) 9 (0.01%) 20563 (70.46%) 3689 (9.87%) 1927 (8.99%)
(28.18%) (0.02%) (56.40%) (10.12%) (5.29%)
Table EC.5 Clustering with K = 2 and K = 5

Recall from §5.1 that the fit metrics of the clustering algorithm suggest that the two larger
clusters (exploratory and goal-directed) may further be divided into smaller subsegments. The
distribution presented in Table EC.5 partially supports this approach. In particular, Cluster 1 and
Cluster 2 consist, almost exclusively, of goal-directed and exploratory customers, respectively.
It is interesting, however, to observe that the remaining three clusters consist predominantly
from one segment, but do have a significant fraction from the other. Cluster 3 is predominantly
composed of goal-directed customers, and Cluster 4 and Cluster 5 are predominantly composed of
exploratory customers.
Table EC.6 presents descriptive statistics (mean and standard deviation) of the clustering vari-
ables per cluster.

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5


(19.3% conversion) (0.26% conversion) (0.70% conversion) (0.64% conversion) (1.56% conversion)
proportion 6% 38% 18% 23% 13%
mean std. mean std. mean std. mean std. mean std.
cat.pages 5.482 3.910 1.110 0.319 4.207 2.809 1.093 0.297 1.136 0.381
avg.prod.cat 11.666 6.220 6.566 1.518 8.070 3.817 14.904 4.622 7.949 2.723
std.prod.cat 4.347 2.930 0.199 0.591 3.440 1.757 0.373 1.275 0.283 0.813
repeat.cat.ratio 0.192 0.096 0.169 0.050 0.253 0.102 0.076 0.021 0.147 0.051
tot.views 38.677 20.924 7.270 1.648 20.293 12.198 17.857 6.860 13.954 6.387
prod.pages 28.833 15.203 6.813 1.540 17.548 10.084 15.341 5.103 8.321 2.904
repeat.page.ratio 1.376 0.327 1.071 0.106 1.155 0.169 1.159 0.160 1.669 0.371
max.views 3.767 2.134 1.361 0.482 2.102 0.989 2.071 0.894 3.628 1.516
avg.consid.set.price 88.09 26.74 138.75 39.06 103.10 27.34 117.91 28.75 136.14 41.16
avgconsid.set.discount 0.270 0.113 0.229 0.160 0.239 0.126 0.265 0.120 0.215 0.155
add.cnt 5.162 4.464 0.056 0.319 0.280 0.792 0.223 0.811 0.219 0.670
bv.cnt 6.635 6.992 0.056 0.532 0.146 0.628 0.136 0.661 0.349 1.267
Table EC.6 Clustering Results–Five Customer Types

Electronic copy available at: https://ssrn.com/abstract=4451618


ec16 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

We present here a short description of the customer typology of each of the new five clusters
obtained, based on the session behaviours presented in Table EC.6.
Cluster 1 suggests an impulsive or bargain shopper customer type, explained by the highest level
of category and product pages viewed, the largest consideration set, high level of repeat views, max
views, and basket operations (indicating an intense deliberation process), and the lowest average
base price and highest discount rates. Additionally, this cluster has the highest (by a large margin)
conversion rate of all new clusters. Consequently, customers in this cluster are highly motivated
by purchasing a product (goal-directed), have high uncertainty about their preferences, and are
stimulus-driven, which explains their erratic search behaviour. They also seem to be stimulus-
driven, given the variety of categories and products viewed. We expect them to behave similarly to
goal-directed customers with respect to variety (Cluster 1 is made almost entirely of goal-directed
customers, see Table EC.5).
Cluster 2 can be seen as abandoning or non-directed window-shopper customers. We use this
term because Cluster 2 has the lowest conversion rate, and the lowest category and product page
interactions among all clusters. Additionally, the ratio of category pages to unique products is
relatively high (third largest), indicating a scattered interest during the shopping session. It suggests
customers that have no clear idea of what they are looking for, and leave quickly after starting
their online session. It is also the largest cluster (38% of all customers in our data), which supports
our assumption that the largest fraction of customers visiting the online store are motivated by
simply browsing and learning the assortment (exploratory window-shoppers). Consistent with the
result that this cluster is mainly composed of exploratory customers, we expect them to behave
similarly with respect to variety.
Cluster 3 can be seen as deliberating customers as customers in this cluster interact with a
high level of category and product pages, the highest ratio of category pages to unique products
(indicative of scattered interest), in addition to large consideration sets, yet their conversion rate is
relatively low (third largest). In addition, they have a moderate level of repeat views and max views,
and little to no basket operations, supporting the idea that these customers are more interested
in learning the assortment, but can be converted into a purchase if they find the right product.
Interestingly, this cluster is mostly composed of goal-directed customers (70.46%), yet they exhibit a
behaviour that is more closely related to that of the exploratory customers. Consequently, we expect
them to exhibit a behaviour in-between exploratory and goal-directed customers with respect to
variety.
Cluster 4 suggests knowledge-building customers. The low number of category page interactions,
high number of products within a category, and the low (lowest of all clusters) level of category
pages to product pages, suggest customers are focused on one category and a moderate number of

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec17

products within. However, the high level of total views and consideration set, but moderate max
views and repeat ratio suggest that they are still converging on which product is the best alternative
to purchase. Note that Cluster 4 has the second-to-last highest conversion rate, supporting this
idea that customers in Cluster 4 are not yet in a position to make a purchase, although they
are working towards it. Because sessions of Cluster 4 customers seems motivated by learning the
assortment (in view of a purchase), we expect them to exhibit a behaviour closer to exploratory
with respect to variety.
Cluster 5 can be seen as direct buyer customers as customers in this cluster interact with little
products and category, have a high level of repeat views and the second level of max views, and
have the second-largest conversion rate of the five clusters obtained. Additionally, basket operations
are low, which could be interpreted as customers knowing exactly the product (or type of product)
they are looking for, and once they find it, they proceed to checkout. It is interesting to see
that customers in Cluster 5 exhibit one of the lowest price and discount sensitivity, suggesting
that customers in this cluster may have different motives for their online sessions (either higher
socio-economic status, or deriving more utility from the luxury than the affordable nature of fast
fashion). Because sessions of Cluster 5 customers seems motivated by the purchase of a product,
we expect them to behave similarly to exploratory customers with respect to variety, given their
session behaviour regarding category and product pages, and extreme focus on finding a specific
product.

EC.5.3.2. Estimation Results. Table EC.7 presents the estimation results for each of the
five new clusters.

EC.6. Seasonality Estimation Results


Table EC.1 presents descriptive statistics, per month, for the estimation variables. Table EC.2
presents the full set of estimation results for seasonality effects. Columns (1)-(3) present estima-
tion results of Month 1 for the aggregated, exploratory customers, and goal-directed customers
respectively. Columns (4)-(6) present estimation results of Month 2 for the aggregated, exploratory
customers, and goal-directed customers respectively. Columns (7)-(9) present estimation results of
Month 1 for the aggregated, exploratory customers, and goal-directed customers respectively.

References
See references list in the main paper.

Electronic copy available at: https://ssrn.com/abstract=4451618


ec18 e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5


(1) (2) (3) (4) (5)
price -0.018∗∗∗ -0.015∗∗∗ -0.028∗∗∗ -0.036∗∗∗ -0.023∗∗∗
(0.001) (0.002) (0.002) (0.001) (0.001)
∗∗∗ ∗∗∗
discount 0.388 -0.508 0.343 1.365 1.456∗∗∗
(0.123) (0.387) (0.344) (0.264) (0.259)
age -0.002∗∗∗ 0.002 -0.002 -0.004∗∗∗ -0.008∗∗∗
(0.001) (0.002) (0.002) (0.001) (0.001)
broken.assort -0.531∗∗∗ -0.653∗∗∗ -0.779∗∗∗ -0.745∗∗∗ -0.636∗∗∗
(0.049) (0.136) (0.128) (0.104) (0.1)
∗∗∗
view.count 0.465 -2.190∗∗∗ 0.155 0.450 ∗∗∗
0.417∗∗∗
(0.014) (0.255) (0.103) (0.066) (0.02)
tot.styles 0.043∗∗∗ 0.099∗∗∗ 0.101∗∗∗ 0.111∗∗∗ 0.152∗∗∗
(0.006) (0.019) (0.017) (0.014) (0.013)
tot.styles2 -0.0003∗∗∗ -0.0007∗∗∗ -0.0008∗∗∗ -0.0008∗∗∗ -0.001∗∗∗
(0.00004) (0.0001) (0.0001) (0.0001) (0.0001)
tot.colours -0.368∗∗∗ -0.655∗∗∗ -0.742∗∗∗ -0.750∗∗∗ -0.840∗∗∗
(0.018) (0.066) (0.055) (0.051) (0.044)
tot.colours2 0.009∗∗∗ 0.017∗∗∗ 0.020∗∗∗ 0.018∗∗∗ 0.020∗∗∗
(0.0005) (0.002) (0.002) (0.001) (0.001)
norm.density -19.066∗∗∗ -14.420∗∗∗ -22.799∗∗∗ -4.026 -18.807∗∗∗
(1.522) (4.56) (3.54) (8.661) (3.872)
norm.density 2 39.129∗∗∗ 23.442∗ 44.016∗∗∗ -274.180∗∗ 11.706
(4.589) (13.387) (9.412) (117.626) (22.744)

Observations 119,909 442,016 265,858 491,746 179292


Log-Likelihood -9,189.64 -2,042.31 -1,917.63 -3,014.08 -2,779.38
AIC 18,401.28 4,106.62 3,857.26 6,050.15 5,580.75
BIC 18,506.86 4,225.96 3,971.38 6,171.44 5,690.41

: p ≤ 0.1; ∗∗ : p ≤ 0.05; ∗∗∗ : p ≤ 0.01; (std dev)
Table EC.7 Estimation Results–Five Customer Types

Month 1 Month 2 Month 3


Variable Mean Std. Dev. Min. Max. Mean Std. Dev. Min. Max. Mean Std. Dev. Min. Max.
price 127.24 48.42 24.99 299.99 135.23 55.65 24.99 299.99 129.55 55.26 24.99 299.99
discount 0.341 0.214 0 0.72 0.24 0.24 0 0.76003 0.195 0.220 0 0.76003
age 68.540 31.266 1 103 65.11 47.82 1 138 74.414 54.283 1 173
broken.assort 0.732 0.443 0 1 0.682 0.466 0 1 0.691 0.462 0 1
view.count 1.149 0.522 1 24 1.23 0.64 1 36 1.252 0.678 1 26
tot.styles 29.960 24.300 1 88 35.77 27.63 1 108 39.428 29.642 1 115
tot.colours 10.941 6.359 1 22 12.97 6.66 1 27 14.675 7.367 1 29
norm.density 0.019 0.041 0 0.33 0.02 0.04 0 0.333 0.022 0.043 0 0.5
Table EC.1 Descriptive Statistics–Seasonality Robustness Check

Electronic copy available at: https://ssrn.com/abstract=4451618


e-companion to Matte, Gumus, Nasiry: Product Variety and Customer Behaviour ec19

Month 1 Month 2 Month 3


Aggregated Exploratory Goal-directed Aggregated Exploratory Goal-directed Aggregated Exploratory Goal-directed
(1) (2) (3) (4) (5) (6) (7) (8) (9)
price -0.028∗∗∗ -0.028∗∗∗ -0.022∗∗∗ -0.031∗∗∗ -0.030∗∗∗ -0.024∗∗∗ -0.028∗∗∗ -0.025∗∗∗ -0.023∗∗∗
(0.001) (0.002) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001)
discount 1.074∗∗∗ 0.805∗ 0.916∗∗∗ 1.198∗∗∗ 1.228∗∗∗ 0.576∗∗∗ 1.268∗∗∗ 0.781∗∗ 1.133∗∗∗
(0.188) (0.422) (0.213) (0.139) (0.26) (0.166) (0.186) (0.393) (0.214)
age -0.008∗∗∗ -0.007∗∗ -0.007∗∗∗ -0.003∗∗∗ -0.007∗∗∗ -0.001∗ -0.001∗ -0.003∗∗ -0.001
(0.001) (0.003) (0.001) (0.001) (0.001) (0.001) (0.001) (0.002) (0.001)
broken.assort -0.477∗∗∗ -0.495∗∗∗ -0.485∗∗∗ -0.660∗∗∗ -0.650∗∗∗ -0.600∗∗∗ -0.877∗∗∗ -0.814∗∗∗ -0.834∗∗∗
(0.071) (0.151) (0.08) (0.057) (0.103) (0.069) (0.065) (0.127) (0.078)
view.count 0.747∗∗∗ 0.758∗∗∗ 0.629∗∗∗ 0.574∗∗∗ 0.521∗∗∗ 0.483∗∗∗ 0.528∗∗∗ 0.421∗∗∗ 0.459∗∗∗
(0.024) (0.06) (0.025) (0.017) (0.041) (0.019) (0.019) (0.056) (0.02)
tot.styles 0.058∗∗∗ 0.160∗∗∗ 0.056∗∗∗ 0.112∗∗∗ 0.219∗∗∗ 0.071∗∗∗ 0.083∗∗∗ 0.202∗∗∗ 0.053∗∗∗
(0.01) (0.024) (0.011) (0.008) (0.015) (0.009) (0.009) (0.021) (0.01)
tot.styles2 -0.001∗∗∗ -0.001∗∗∗ -0.0006∗∗∗ -0.001∗∗∗ -0.002∗∗∗ -0.0005∗∗∗ -0.001∗∗∗ -0.001∗∗∗ -0.0004∗∗∗
(0.0001) (0.0002) (0.0001) (0.0001) (0.0001) (0.00007) (0.0001) (0.0002) (0.00007)
tot.colours -0.712∗∗∗ -1.306∗∗∗ -0.612∗∗∗ -0.668∗∗∗ -1.048∗∗∗ -0.515∗∗∗ -0.597∗∗∗ -1.046∗∗∗ -0.464∗∗∗
(0.032) (0.097) (0.034) (0.025) (0.051) (0.027) (0.028) (0.069) (0.03)
tot.colours2 0.024∗∗∗ 0.043∗∗∗ 0.021∗∗∗ 0.016∗∗∗ 0.025∗∗∗ 0.013∗∗∗ 0.015∗∗∗ 0.025∗∗∗ 0.012∗∗∗
(0.001) (0.003) (0.001) (0.001) (0.001) (0.0008) (0.001) (0.002) (0.0008)
norm.density -13.209∗∗∗ 55.144∗∗∗ -13.508∗∗∗ -30.016∗∗∗ -28.744∗∗∗ -26.855∗∗∗ -28.609∗∗∗ -29.199∗∗∗ -24.319∗∗∗
(2.16) (14.199) (2.399) (2.017) (3.672) (2.405) (2.264) (4.384) (2.57)
norm.density 2 17.397∗∗ -782.745∗∗∗ 25.210∗∗∗ 56.715∗∗∗ 50.167∗∗∗ 52.519∗∗∗ 51.644∗∗∗ 52.538∗∗∗ 43.997∗∗∗ s
(7.606) (191.354) (8.25) (8.31) (17.022) (9.712) (5.514) (10.19) (6.483)

Observations 339,449 235,787 103,662 705,461 542,959 162,502 453,911 327,264 126,647
LL -6,299.89 -1,677.18 -4,196.45 -8,890.47 -3,028.70 -5,282.48 -6,608.52 -1,977.42 -4,224.52
AIC 12,621.78 3,376.36 8,414.91 17,802.95 6,079.40 10,586.96 13,239.04 3,976.83 8,471.03
BIC 12,738.63 3,489.13 8,518.84 17,927.83 6,201.35 10,695.86 13,359.07 4,093.21 8,577.17
∗ ∗∗ ∗∗∗
: p ≤ 0.1; : p ≤ 0.05; : p ≤ 0.01; (std dev)
Table EC.2 Estimation Results–Seasonality Effects

Electronic copy available at: https://ssrn.com/abstract=4451618

You might also like