Experimental Design Slides 2021-2022

Setting up DCEs and why choosing

D-efficiency as the ED criterion

1. What are DCEs?

2. Setting up DCEs

3. Survey development

4. Experimental design

5. Experimental design criteria

6. Assignment

1. What are DCEs?
• In DCEs respondents are asked to choose their
preferred alternative from a given set of alternatives
defined by levels of attributes
Labeled alternative
Choice set 1 Car Train
Travel time 10 minutes 15 minutes Level
Travel cost 1€ 0,5 €


* Source: Bliemer, M.C.J. and J.M. Rose, Experimental design influences on stated choice outputs: An empirical
study in air travel choice. Transportation Research Part A: Policy and Practice, 2011. 45(1): p. 63-79.
1. What are DCEs?
• Means to which DCE can be used:

– Computation parameter weights

– Computation alternative utility
– Computation alternative/scenario probability
– Computation (m)WTP
– Forecasting market share
– Computation elasticities
Car Train

Choice set Travel time Travel cost Travel time Travel cost

1 10 1.00 15 0.50
How can you collect choice data?
2 20 1.50 20 0.50

3 15 2.00 25 0.50

4 15 1.50 15 1.00

5 10 2.00 20 1.00 How many stated choice tasks can be

6 20 1.00 25 1.00 created for this problem: 2 alt with 2
7 20 2.00 15 1.50
attr having 3 levels?
8 15 1.00 20 1.50
= Difference between full factorial and
9 10 1.50 25 1.50
fractional factorial


1. What are DCEs?
• Decision maker
– Individual
– Household/group
– Company/organization
• Choice set
– Made up of finite number of mutually exclusive alt’s
– Alt’s described by att’s
– One alt is chosen
• Decision rule
– Y= f(x, z, β) with Y = discrete
1. What are DCEs?
• Compared to RP choice
– Need for experimental design to encourage trade-off
– Respondents face multiple choice sets
– Attributes and levels are chosen by analyst
– Allows for hypothetical choice scenario’s
• Control vs reliability (no consequences)

• Can we replace people fully by more choice sets?

Maximize information with
limited number of observations

1. What are DCEs?
• DCEs are based on 2 fundamental building blocks:

– Lancaster’s theory of value: utility from a good arises

from the characteristics of that good

– Random utility theory (RUT): latent utility (U) can be

divided into an observable or systematic (V) and
unobservable or random (ε) part => probabilistic
utility function

• The indirect utility function describes the individual V m,s

of an alternative m for a given choice set s
1. What are DCEs?
U = utility
n= individual
i,j = alternative

1. What are DCEs?
• Common assumption I: the error term enters the utility
function as an additive term.

• Common assumption II: the utility function is a linear function

of the attributes

U ik  ai  M k   ik

There is a trade-off between the benefits of assuming a less

restrictive formulation and the complications that arise from
doing so. This is especially relevant for the way income enters
the utility function.

A simpler functional form (e.g. linear in income) makes estimation

of the parameters and calculation of welfare effects (remember
welfare effects for linear additive function was the negative of a
ratio with the price coefficient in the denominator) easier, but
the estimates are based on restrictive assumptions. 10
1. What are DCEs?
Historically the most common model to estimate a DCE is the MNL
also CL. Main reason is simplicity to estimate. However, the last
10 years or so a rapid development of other models, computer
capacity and algorithms has made this model somewhat less

Suppose we have a choice set with J alternatives. The probability

that individual k chooses alternative i can then be expressed as

Pik  P vik (a i , y  p i )   ik  v jk (a j , y  p j )   jk ; j  i 
 P jk   ik  vik  v jk ; j  i 

We assume that the error terms have an extreme value type I

distribution (iid), the variance of this distribution is
var( )   2 2 6
1. What are DCEs?
Of any discrete choice model
1. The true parameters are confounded with the scale parameter
2. Only the utility difference matters. Consequently, there must
be a difference between the alternatives in order to estimate a
3. This means that we can only include M-1 alternative specific

Of the MNL model

4. The alternatives are independent (because of the IID
assumption). Results in the IIA property
5. Limited modeling of taste variation. Unobserved heterogeneity
is captured via the error term in a simple fashion. However,
socio-economic variables can account for observed

1. What are DCEs?
It can be shown that the choice probability for an MNL can be
expressed as
exp(vik  )
Pik  J

 exp(v
j 1
jk )

Which is a very simple and nice expression! But this will come with
some ”costs”.

The parameters are normalised with a scale paramters. This

complicates the interpretation of models, and in particular a
comparison among models.

1. What are DCEs?
The ratio of choice probabilities between two alternatives in a
choice set is unaffected by what other alternatives that are
available in the choice and the levels of the attributes of the other
exp(vik )  exp(v
jS m
jk )
exp(vik )
 
Pnk exp(v nk )  exp(v
jS m
jk ) exp(v nk )

May or may not be satisfied, in many cases not. With many

alternatives this is nevertheless a useful property.

Can be tested with the Hausman-McFadden test (1984).

Essentially: If IIA is satisfied then the ratio of choice probabilities
should not be affected by whether another alternative is on the
choice set or not. Hence, one way of testing IIA is to remove one
alternative and re-estimate the model and compare the choice
1. What are DCEs?
We will use the Wetland study. A mail survey to Swedish
housholds about possible development of wetland areas (both for
biodiversity and recreation reasons).

Attribute Description Variable Levels

Total cost (Cost) The total cost for the individual Cost 200, 400, 700, 850

Surrounding vegetation Forest or meadow-land Meadow Forest, Meadow

Biodiversity The wetland can contain different numbers of both rare and more Medbio Low, Medium, High,
common species. Highbio

Fish The design of the wetland area can improve the conditions for fish Fish No, Yes

Fenced waterline The water is surrounded with a 1m fence in order to prevent Fence No, Yes
drowning accidents.

Crayfish Introduction of Swedish crayfish and allow fishing. Crayfish No, Yes

Walking facilities Construction of the wetland area for outdoor life Walk No, Yes
1. What are DCEs?
Each respondent made at most 4 choice situations. There was
always an opt-out situation
Choice 1

Of the three alternatives below, mark the alternative you prefer.

Your choice
(Mark your choice)
Wetland Alternative 1 Alternative 2 Alternative 3
Simple ponds

Surrounding vegetation
1. Surrounding vegetation Forest Meadow-land Forest

Water issues
2. Fish Good conditions No actions No actions

3. Cray fish Introduction No introduction No introduction

Other attributes
4. Biodiversity Low High Low

5. Walking facilities No walking facilities Walking facilities No walking facilities

6. Fence No Fence No
___________________ ______________ _______________ ______________
= Total cost per citizen SEK 850 SEK 400 SEK 0
1. What are DCEs?
The data needs to be arranged in a specific way for
STATA/NLOGIT. Each row in the data set represents one of the
alternatives of a choice set. So if there are 3 alternatives (as in
this case) each choice set will have 3 rows in the data set.
102 1 1 0 400 0 0 1 0 0 0
102 1 2 1 200 0 1 0 1 1 1
102 1 3 0 0 0 0 0 0 0 0
102 2 1 0 200 0 0 1 1 1 0
102 2 2 1 700 1 0 0 0 0 1
102 2 3 0 0 0 0 0 0 0 0
102 3 1 1 400 1 0 1 1 0 0
102 3 2 0 850 0 1 0 0 1 1
102 3 3 0 0 0 0 0 0 0 0
102 4 1 0 400 0 0 0 1 0 0
102 4 2 1 700 0 1 1 0 1 1
102 4 3 0 0 0 0 0 0 0 0
104 1 1 0 200 1 0 1 1 1 1
104 1 2 0 700 0 0 0 0 0 0
104 1 3 1 0 0 0 0 0 0 0
104 2 1 0 200 0 0 1 0 0 0
104 2 2 0 850 1 0 0 1 1 1
104 2 3 1 0 0 0 0 0 0 0
104 3 1 0 700 0 1 0 1 1 0
104 3 2 0 400 0 0 1 0 0 1
104 3 3 1 0 0 0 0 0 0 0
104 4 1 0 850 0 0 0 1 1 0
104 4 2 0 700 0 1 1 0 0 1
104 4 3 1 0 0 0 0 0 0 0
1. What are DCEs?
We use the NLOGIT command in limdep, together with the Model
command, where we specify the utility function for each of the


Note 1: Choice is the choice indicator.

Note 2: In this particular case we assume that the two ”new”
alternatives have a common alternative specific constant.
Note 3: The number of ASC cannot be higher than the number of
alternatives minus one.
1. What are DCEs?
Multinomial Logit
Coefficient Standard error
Intercept 0.1195 0.3384
Cost -0.0012 0.0000
Meadow -0.0518 0.3967
Highbio 0.7835 0.0000
Medbio 0.5906 0.0000
Fish 0.4051 0.0000
Fence -0.1946 0.0016
Crayfish -0.1301 0.0339
Walk 0.7532 0.0000
1. What are DCEs?
Can we compare the coefficients for different attributes within this

Can we compare the coefficients of this model with the coefficients

from another model?
1. What are DCEs?
Can we compare the coefficients for different attributes within this
Yes and No. A meaningfull comparison of coefficients require that
they are measured on the same scale. In this particular case all
variables are dummy variables so it is actually possible. But be
careful here!

Can we compare the coefficients of this model with the coefficients

from another model?

Yes and No. You can compare sign and significance. But you cannot
compare the size of the coefficients. This because all the coefficients are
scaled with an unknown scale paramter. And without further information
we cannot say anything about the scale parameter.

Recap - Intro - Set-up - CVM - CM - Example - Appreciation - Assignment

2. Setting up DCEs

To come

* Source: Ryan, M., K. Gerard, and M. Amaya-Amaya, Discrete Choice Experiments in a Nutshell, in Using Discrete Choice Experiments to Value Health and
Health Care, M. Ryan, K. Gerard, and M. Amaya-Amaya, Editors. 2008, Springer Netherlands. p. 13-46.

3. Survey development
• Designing a SP survey (lay-out):

– Introductory section:
• Intro Easy = warm-up, engage,
• Socio-demographics screen
• Use of the good and substitutes

– Valuation section
• Valuation scenario
• Value elicitation questions Cognitively demanding, unfamiliar
• Follow-up questions

– Final section
• Socio-Demographics
Sensitive information
• Attitude/opinion
• Identification 23
4. Experimental design
• Experimental design (ED): how attributes and levels
are combined into different choice tasks*

*Source: Louviere, J.J., D.A. Hensher, and J.D. Swait, Stated choice methods: analysis and applications. 2000:
Cambridge University Press.
Cambridge University Press.

4. Choice task


scenario or


4. Choice task
scenario Describes the background and choice text and is fixed across alt’s,
yet may vary over choice tasks and may be different across resp’s

alternative Labeled Unlabeled Typically fixed

Attributes are Attributes and

scenario or typically fixed, levels may be Describe the
profile levels vary over different across alternative
choice tasks respondents

response Best, best and worst, conditional best

4. Unlabeled choice task
scenario You are looking to buy a new laptop for use at home. Which of
the following laptops would you prefer

alternative Laptop A Laptop B Laptop C

scenario or

response All alternatives have the same utility function; used for valuation

4. Unlabeled choice task
scenario You are looking for a way to get home. Which of the following
roads would you prefer

alternative Road A Road B Road C (current)

scenario or
profile Fixed levels


4. Labeled choice task
scenario Consider a 70 year old patient with advanced breast cancer. As his
doctor, what treatment do you recommend?

alternative Radiotherapy Surgery No treatment

scenario or
profile No levels

response Each alternative may have a different utility function; also used
for prediction and for elasticities

4. (Un)labeled choice task?
scenario You are looking to buy a new laptop for use at home. Which of
the following laptops would you prefer

alternative Laptop A Laptop B Neither

scenario or


4. Experimental design
• Experimental design (ED): how attributes and levels
are combined into different choice tasks*

– An experimental design is a matrix of values that is used to

determine what goes where in the survey
• Each row represents a choice task
• Each column represents an attribute of an alternative

*Source: Louviere, J.J., D.A. Hensher, and J.D. Swait, Stated choice methods: analysis and applications. 2000:
Cambridge University Press.
Cambridge University Press.

4. Experimental design
Car Train

Choice set Travel time Travel cost Travel time Travel cost

1 10 1.00 15 0.50

2 20 1.50 20 0.50
Fractional factorial =
3 15 2.00 25 0.50
“Smart” selection
4 15 1.50 15 1.00
out of the full space
5 10 2.00 20 1.00 = full factorial
6 20 1.00 25 1.00

7 20 2.00 15 1.50

8 15 1.00 20 1.50

9 10 1.50 25 1.50


* Source: Ryan, M., K. Gerard, and M. Amaya-Amaya, Discrete Choice Experiments in a Nutshell, in Using Discrete Choice Experiments to Value Health and
Health Care, M. Ryan, K. Gerard, and M. Amaya-Amaya, Editors. 2008, Springer Netherlands. p. 13-46.

4. Experimental design
• Experimental design (ED): how attributes and levels
are combined into different choice sets*:

– Identification: effects independently estimated

– Cognitive complexity: burden on respondent
– Market realism: presented choices are realistic

 Efficiency: precision on parameter estimates

*Source: Louviere, J.J., D.A. Hensher, and J.D. Swait, Stated choice methods: analysis and applications. 2000:
Cambridge University Press.
Cambridge University Press.

4. Experimental design
• Before we can start with ED, we need to know
– Attributes, alternatives <-> subset
– Levels <-> non-linearities, cover range evenly, easy
– What response we want <-> real situation
– How utility function will look like <-> write-out, test
– What model will be estimated <-> test
– What statistical properties should the design have
– How many choice tasks <-> resp vs statistical eff
– How will the survey be taken <-> f(complexity)
4. Experimental design
• After we have generated the ED
– Randomise the order of the
CT/alternatives/attributes between respondents
– Decide on the format
• Table
• Picture
• Cartoon
• Movie
• VR

1. What are DCEs?
• Indirect utility function car-train DCE example*:

 Vcar,s = βtime * TTcar,s + βcost* TCcar,s

Attribute level

 Vtrain,s = βtrain + βtime * TTtrain,s + βcost * TCtrain,s

observable utility parameter weight

* Source: Bliemer, M.C.J. and J.M. Rose, Experimental design influences on stated choice outputs: An empirical study in air
travel choice. Transportation Research Part A: Policy and Practice, 2011. 45(1): p. 63-79.

4. Write out utility functions
scenario You are looking for a way to get home. Which of the following
roads would you prefer

alternative Road A Road B neither

20 35
20 10
2 1
Travel cost


5. ED criteria
• Past studies: orthogonality or random

• More and more: D-efficiency/error with computer

– Random does not give nice spread with low # CT
– Some orthogonal designs are less efficient than other
(non-)orthogonal designs
– Unlabeled OMEPs often contain dominating alt
– Orthogonality is lost in data collection/estimation
– Orthogonality leads to independently estimable parameter
estimates for linear models only
– Parameters are unbiased also for non-orthogonal

5. ED criteria

5. ED criteria
• When all face the same design, Xn = X for all n

5. ED criteria
• Generating efficient designs
– Software: Ngene, SAS, JMP
– Algorithms to go from candidate set to current best
– Different software, different algorithms
– D-error > 1 = alert, alert -> mistake in coding?
– More choice tasks is lower D-error only due to more questions not due
to more efficient individual tasks
– A D-efficient design has some degree of utility balance, but not too much
(random choice), and not too little (dominant)
– More levels, narrower range = larger D-error -> 3 levels 
– Check effect of misspecification -> graph D-error (y) vs size of prior, if U-
shaped -> more effort into finding good prior

5. ED criteria

5. ED criteria
• Multiple ways to create D-efficient (D-error) EDs:

– A priori assumptions on model type

• Optimize for MNL and evaluate for advanced

– A priori assumptions on parameter values (priors)

• No info -> zero prior or orth
• Literature, pilot studies, sign
• Safe = small number

– For same X, lower error is better design

5. ED criteria

5. ED criteria
• Steps in generating efficient designs
– Step 1: Specify the utility specification and priors for the likely final model
to be estimated from data collected using the SC design.
– Step 2: Randomly populate the design matrix, X, to create an initial
design.. The initial design, however, should incorporate all the constraints
that the analyst wishes to impose upon the final design outcome. For
example, if the analyst wishes to retain attribute level balance, then the
initial design should display this property. The initial design can be
constructed with the desired number of rows, however the number of
rows should be greater than or equal to K/(J-1).
– Step 3 and 4: Calculate the P and construct the AVC
– Step 5: evaluate efficiency
– Step 6: change design and repeat steps 3-5

Thank you for your attention!

Additional literature
• Hensher, Rose, Greene (2015) applied choice analysis,
Cambridge university press
• Rose, Bliemer (2014) stated choice experimental design
theory: the who, the what and the why, in Hess and Daly,
Handbook of choice modelling, Edward elgar
• Rose, Bain, Bliemer (2011) Experimental design strategies for
SP studies dealing with non-market goods, In Bennett,
International handbook on non-marketed environmental
valuation, Edward elgar
• Bateman, I., et al., Economic valuation with stated preference
techniques: A manual. 2002, Cheltenham: Edward Elgar. 458.

Assignment (20p)
Because a working session on campus is less
appropriate given the cirucmstances the working
session is replaced by an assignment that you can send
in (word file + excel file(s)) via email in groups of 2. The
time it takes to complete the assignment is estimated
to be equivalent to the foreseen time of the working

Deadline: Sunday 23/01/2022

Assignment part 1 (2p)
Explain in your own words why choice modelling
benefits from a probabilistic framework, being
random utility theory.

(max +- 5 sentences)
Assignment part 2 (1p)
Seeing that to estimate the β of a conditional
logit model you maximize the log likelihood

– What will be the sign of the log-likelihood and

why? (max 1 line)
Assignment part 3 (1p)
Imagine a stated choice having 2 labeled
alternatives with 4 attributes. 1 attribute has 2
levels, 2 attributes have 3 levels, and 1 attribute
has 4 levels.

– How many choice sets are in the full factorial?

(max 1 line -> show calculation)
Assignment part 4 (6p)
Calculate the Dp-error using the “evaluate
experimental design students.xlsx” for the
experimental design and given priors

– You can find the formula’s you need to implement

on slide 40 and 44.
• To multiply matrices in excel you need to use the command MMULT
• To transpose matrices in excel you need the command TRANSPOSE
• To calculate a determinant in excel you need the command MDETERM
• For calculations with matrices to work you need ctrl+shift+enter to
automatically fill in the entire array
Assignment part 5 (4p)
Suppose we want to calculate how much
Flemish adults are WTP to avoid the extinction
of the panda using a double bounded CVM.

– How would your valuation scenario and value

elicitation question look like? Please create one as
if you were actually going to distribute a survey on
this topic (max 10 lines).
Assignment part 6 (6p)
• Given the experimental design for the following choice set
102 1 1 0 400 0 0 1 0 0 0
102 1 2 1 200 0 1 0 1 1 1
102 1 3 0 0 0 0 0 0 0 0
102 2 1 0 200 0 0 1 1 1 0
102 2 2 1 700 1 0 0 0 0 1
102 2 3 0 0 0 0 0 0 0 0
102 3 1 1 400 1 0 1 1 0 0
102 3 2 0 850 0 1 0 0 1 1
102 3 3 0 0 0 0 0 0 0 0
• The following explanation of the attributes and levels
102 4 1 0 400 0 0 0 1 0 0
102 4 2 1 700 0 1 1 0 1 1
102 4 3 0 0 0 0 0 0 0 0
104 1 1 0 200 1 0 1 1 1 1
104 1 2 0 700 0 0 0 0 0 0
104 1 3 1 0 0 0 0 0 0 0
104 2 1 0 200 0 0 1 0 0 0
104 2 2 0 850 1 0 0 1 1 1
104 2 3 1 0 0 0 0 0 0 0
104 3 1 0 700 0 1 0 1 1 0
104 3 2 0 400 0 0 1 0 0 1
104 3 3 1 0 0 0 0 0 0 0
104 4 1 0 850 0 0 0 1 1 0
104 4 2 0 700 0 1 1 0 0 1
104 4 3 1 0 0 0 0 0 0 0
Assignment part 6
• And the following regression output for a conditional logit (aka multinomial
logit) model

– Please calculate the probability of selecting alternative 1 (you don’t have to take into account
the attribute crayfish and the coefficient for the intercept can be added to the utility function of
alternatives 1 and 2). Write down intermediate results. Please do this 2 times. Once you assume
that the scale = 1 and once you assume that the scale is 10. See slide 31 of previous slide show
for the formula that includes scale.

