Experimental Design Slides 2021-2022

Setting up DCEs and why choosing
D-efficiency as the ED criterion
Duurzaamheidsanalyse
2021-2022
Contents
1. What are DCEs?
2. Setting up DCEs
3. Survey development
4. Experimental design
5. Experimental design criteria
6. Assignment
2
1. What are DCEs?
• In DCEs respondents are asked to choose their
preferred alternative from a given set of alternatives
defined by levels of attributes
Labeled alternative
Choice set 1 Car Train
Travel time 10 minutes 15 minutes Level
Travel cost 1€ 0,5 €
Choice
Attributes
Scenario
* Source: Bliemer, M.C.J. and J.M. Rose, Experimental design influences on stated choice outputs: An empirical
study in air travel choice. Transportation Research Part A: Policy and Practice, 2011. 45(1): p. 63-79.
3
1. What are DCEs?
• Means to which DCE can be used:
– Computation parameter weights

– Computation alternative utility
– Computation alternative/scenario probability
– Computation (m)WTP
– Forecasting market share
– Computation elasticities
–…
4
Car Train
Choice set Travel time Travel cost Travel time Travel cost
1 10 1.00 15 0.50
How can you collect choice data?
2 20 1.50 20 0.50
3 15 2.00 25 0.50
4 15 1.50 15 1.00
5 10 2.00 20 1.00 How many stated choice tasks can be

6 20 1.00 25 1.00 created for this problem: 2 alt with 2
7 20 2.00 15 1.50
attr having 3 levels?
8 15 1.00 20 1.50
= Difference between full factorial and
9 10 1.50 25 1.50
fractional factorial
L^MA
5
1. What are DCEs?
• Decision maker
– Individual
– Household/group
– Company/organization
• Choice set
– Made up of finite number of mutually exclusive alt’s
– Alt’s described by att’s
– One alt is chosen
• Decision rule
– Y= f(x, z, β) with Y = discrete
6
1. What are DCEs?
• Compared to RP choice
– Need for experimental design to encourage trade-off
– Respondents face multiple choice sets
– Attributes and levels are chosen by analyst
– Allows for hypothetical choice scenario’s
• Control vs reliability (no consequences)
• Can we replace people fully by more choice sets?

Maximize information with
limited number of observations
7
1. What are DCEs?
• DCEs are based on 2 fundamental building blocks:
– Lancaster’s theory of value: utility from a good arises

from the characteristics of that good
– Random utility theory (RUT): latent utility (U) can be

divided into an observable or systematic (V) and
unobservable or random (ε) part => probabilistic
utility function
• The indirect utility function describes the individual V m,s

of an alternative m for a given choice set s
8
1. What are DCEs?
U = utility
n= individual
i,j = alternative
9
1. What are DCEs?
• Common assumption I: the error term enters the utility
function as an additive term.
• Common assumption II: the utility function is a linear function

of the attributes
U ik  ai  M k   ik
There is a trade-off between the benefits of assuming a less

restrictive formulation and the complications that arise from
doing so. This is especially relevant for the way income enters
the utility function.
A simpler functional form (e.g. linear in income) makes estimation

of the parameters and calculation of welfare effects (remember
welfare effects for linear additive function was the negative of a
ratio with the price coefficient in the denominator) easier, but
the estimates are based on restrictive assumptions. 10
1. What are DCEs?
Historically the most common model to estimate a DCE is the MNL
also CL. Main reason is simplicity to estimate. However, the last
10 years or so a rapid development of other models, computer
capacity and algorithms has made this model somewhat less
important.
Suppose we have a choice set with J alternatives. The probability

that individual k chooses alternative i can then be expressed as

Pik  P vik (a i , y  p i )   ik  v jk (a j , y  p j )   jk ; j  i 
 P jk   ik  vik  v jk ; j  i 
We assume that the error terms have an extreme value type I

distribution (iid), the variance of this distribution is
var( )   2 2 6
11
1. What are DCEs?
Of any discrete choice model
1. The true parameters are confounded with the scale parameter
2. Only the utility difference matters. Consequently, there must
be a difference between the alternatives in order to estimate a
parameter
3. This means that we can only include M-1 alternative specific
constants.
Of the MNL model

4. The alternatives are independent (because of the IID
assumption). Results in the IIA property
5. Limited modeling of taste variation. Unobserved heterogeneity
is captured via the error term in a simple fashion. However,
socio-economic variables can account for observed
heterogeneity.
12
1. What are DCEs?
It can be shown that the choice probability for an MNL can be
expressed as
exp(vik  )
Pik  J
 exp(v
j 1
jk )
Which is a very simple and nice expression! But this will come with
some ”costs”.
The parameters are normalised with a scale paramters. This

complicates the interpretation of models, and in particular a
comparison among models.
13
1. What are DCEs?
The ratio of choice probabilities between two alternatives in a
choice set is unaffected by what other alternatives that are
available in the choice and the levels of the attributes of the other
alternatives
Pik
exp(vik )  exp(v
jS m
jk )
exp(vik )
 
Pnk exp(v nk )  exp(v
jS m
jk ) exp(v nk )
May or may not be satisfied, in many cases not. With many

alternatives this is nevertheless a useful property.
Can be tested with the Hausman-McFadden test (1984).

Essentially: If IIA is satisfied then the ratio of choice probabilities
should not be affected by whether another alternative is on the
choice set or not. Hence, one way of testing IIA is to remove one
alternative and re-estimate the model and compare the choice
probabilities.
14
1. What are DCEs?
We will use the Wetland study. A mail survey to Swedish
housholds about possible development of wetland areas (both for
biodiversity and recreation reasons).
Attribute Description Variable Levels
Total cost (Cost) The total cost for the individual Cost 200, 400, 700, 850
Surrounding vegetation Forest or meadow-land Meadow Forest, Meadow
Biodiversity The wetland can contain different numbers of both rare and more Medbio Low, Medium, High,
common species. Highbio
Fish The design of the wetland area can improve the conditions for fish Fish No, Yes
species
Fenced waterline The water is surrounded with a 1m fence in order to prevent Fence No, Yes
drowning accidents.
Crayfish Introduction of Swedish crayfish and allow fishing. Crayfish No, Yes
Walking facilities Construction of the wetland area for outdoor life Walk No, Yes
1. What are DCEs?
Each respondent made at most 4 choice situations. There was
always an opt-out situation
Choice 1
Of the three alternatives below, mark the alternative you prefer.
Your choice
(Mark your choice)
Wetland Alternative 1 Alternative 2 Alternative 3
Simple ponds
Attributes:
Surrounding vegetation
1. Surrounding vegetation Forest Meadow-land Forest
Water issues
2. Fish Good conditions No actions No actions
3. Cray fish Introduction No introduction No introduction
Other attributes
4. Biodiversity Low High Low
5. Walking facilities No walking facilities Walking facilities No walking facilities
6. Fence No Fence No
___________________ ______________ _______________ ______________
= Total cost per citizen SEK 850 SEK 400 SEK 0
1. What are DCEs?
The data needs to be arranged in a specific way for
STATA/NLOGIT. Each row in the data set represents one of the
alternatives of a choice set. So if there are 3 alternatives (as in
this case) each choice set will have 3 rows in the data set.
ID SET ALT CHOICE COST HIGHBIO MEDBIO MEADOW FISH FENCE WALK
102 1 1 0 400 0 0 1 0 0 0
102 1 2 1 200 0 1 0 1 1 1
102 1 3 0 0 0 0 0 0 0 0
102 2 1 0 200 0 0 1 1 1 0
102 2 2 1 700 1 0 0 0 0 1
102 2 3 0 0 0 0 0 0 0 0
102 3 1 1 400 1 0 1 1 0 0
102 3 2 0 850 0 1 0 0 1 1
102 3 3 0 0 0 0 0 0 0 0
102 4 1 0 400 0 0 0 1 0 0
102 4 2 1 700 0 1 1 0 1 1
102 4 3 0 0 0 0 0 0 0 0
104 1 1 0 200 1 0 1 1 1 1
104 1 2 0 700 0 0 0 0 0 0
104 1 3 1 0 0 0 0 0 0 0
104 2 1 0 200 0 0 1 0 0 0
104 2 2 0 850 1 0 0 1 1 1
104 2 3 1 0 0 0 0 0 0 0
104 3 1 0 700 0 1 0 1 1 0
104 3 2 0 400 0 0 1 0 0 1
104 3 3 1 0 0 0 0 0 0 0
104 4 1 0 850 0 0 0 1 1 0
104 4 2 0 700 0 1 1 0 0 1
104 4 3 1 0 0 0 0 0 0 0
1. What are DCEs?
We use the NLOGIT command in limdep, together with the Model
command, where we specify the utility function for each of the
alternatives.
nlogit;lhs=choice;choices=new1,new2,base;
Model:
U(new1)=alfa+b_cost*cost+b_meadow*meadow+b_highbi*highbio+b_medbio*
medbio+b_fish*fish+b_fence*fence+b_cray*crayfish+b_walkt*walk/
U(new2)=alfa+b_cost*cost+b_meadow*meadow+b_highbi*highbio+b_medbio*
medbio+b_fish*fish+b_fence*fence+b_cray*crayfish+b_walkt*walk/
U(base)=b_cost*cost+b_meadow*meadow+b_highbio*highbio+b_medbio*
medbo+b_fish*fish+b_fence*fence+b_cray*crayfish+b_walkt*walk$
Note 1: Choice is the choice indicator.

Note 2: In this particular case we assume that the two ”new”
alternatives have a common alternative specific constant.
Note 3: The number of ASC cannot be higher than the number of
alternatives minus one.
1. What are DCEs?
Multinomial Logit
Coefficient Standard error
Intercept 0.1195 0.3384
Cost -0.0012 0.0000
Meadow -0.0518 0.3967
Highbio 0.7835 0.0000
Medbio 0.5906 0.0000
Fish 0.4051 0.0000
Fence -0.1946 0.0016
Crayfish -0.1301 0.0339
Walk 0.7532 0.0000
1. What are DCEs?
Can we compare the coefficients for different attributes within this
model?
Can we compare the coefficients of this model with the coefficients

from another model?
1. What are DCEs?
Can we compare the coefficients for different attributes within this
model?
Yes and No. A meaningfull comparison of coefficients require that
they are measured on the same scale. In this particular case all
variables are dummy variables so it is actually possible. But be
careful here!
Can we compare the coefficients of this model with the coefficients

from another model?
Yes and No. You can compare sign and significance. But you cannot
compare the size of the coefficients. This because all the coefficients are
scaled with an unknown scale paramter. And without further information
we cannot say anything about the scale parameter.
Recap - Intro - Set-up - CVM - CM - Example - Appreciation - Assignment

2. Setting up DCEs
Done
To come
* Source: Ryan, M., K. Gerard, and M. Amaya-Amaya, Discrete Choice Experiments in a Nutshell, in Using Discrete Choice Experiments to Value Health and
Health Care, M. Ryan, K. Gerard, and M. Amaya-Amaya, Editors. 2008, Springer Netherlands. p. 13-46.
22
3. Survey development
• Designing a SP survey (lay-out):
– Introductory section:
• Intro Easy = warm-up, engage,
• Socio-demographics screen
• Use of the good and substitutes
– Valuation section
• Valuation scenario
• Value elicitation questions Cognitively demanding, unfamiliar
• Follow-up questions
– Final section
• Socio-Demographics
Sensitive information
• Attitude/opinion
• Identification 23
• Experimental design (ED): how attributes and levels
are combined into different choice tasks*
*Source: Louviere, J.J., D.A. Hensher, and J.D. Swait, Stated choice methods: analysis and applications. 2000:
Cambridge University Press.
24
4. Choice task
scenario
alternative
scenario or
profile
response
25
4. Choice task
scenario Describes the background and choice text and is fixed across alt’s,
yet may vary over choice tasks and may be different across resp’s
alternative Labeled Unlabeled Typically fixed
Attributes are Attributes and

scenario or typically fixed, levels may be Describe the
profile levels vary over different across alternative
choice tasks respondents
response Best, best and worst, conditional best
26
4. Unlabeled choice task
scenario You are looking to buy a new laptop for use at home. Which of
the following laptops would you prefer
alternative Laptop A Laptop B Laptop C
scenario or
profile
response All alternatives have the same utility function; used for valuation
27
4. Unlabeled choice task
scenario You are looking for a way to get home. Which of the following
roads would you prefer
alternative Road A Road B Road C (current)
scenario or
profile Fixed levels
response
28
4. Labeled choice task
scenario Consider a 70 year old patient with advanced breast cancer. As his
doctor, what treatment do you recommend?
alternative Radiotherapy Surgery No treatment
scenario or
profile No levels
response Each alternative may have a different utility function; also used
for prediction and for elasticities
29
4. (Un)labeled choice task?
scenario You are looking to buy a new laptop for use at home. Which of
the following laptops would you prefer
alternative Laptop A Laptop B Neither
scenario or
profile
response
30
are combined into different choice tasks*
– An experimental design is a matrix of values that is used to

determine what goes where in the survey
• Each row represents a choice task
• Each column represents an attribute of an alternative
31
Car Train
Choice set Travel time Travel cost Travel time Travel cost
1 10 1.00 15 0.50
2 20 1.50 20 0.50
Fractional factorial =
3 15 2.00 25 0.50
“Smart” selection
4 15 1.50 15 1.00
out of the full space
5 10 2.00 20 1.00 = full factorial
6 20 1.00 25 1.00
7 20 2.00 15 1.50
8 15 1.00 20 1.50
9 10 1.50 25 1.50
32
Reminder
* Source: Ryan, M., K. Gerard, and M. Amaya-Amaya, Discrete Choice Experiments in a Nutshell, in Using Discrete Choice Experiments to Value Health and
Health Care, M. Ryan, K. Gerard, and M. Amaya-Amaya, Editors. 2008, Springer Netherlands. p. 13-46.
33
are combined into different choice sets*:
– Identification: effects independently estimated

– Cognitive complexity: burden on respondent
– Market realism: presented choices are realistic
 Efficiency: precision on parameter estimates

34
• Before we can start with ED, we need to know
– Attributes, alternatives <-> subset
– Levels <-> non-linearities, cover range evenly, easy
– What response we want <-> real situation
– How utility function will look like <-> write-out, test
– What model will be estimated <-> test
– What statistical properties should the design have
– How many choice tasks <-> resp vs statistical eff
– How will the survey be taken <-> f(complexity)
35
• After we have generated the ED
– Randomise the order of the
CT/alternatives/attributes between respondents
– Decide on the format
• Table
• Picture
• Cartoon
• Movie
• VR
36
1. What are DCEs?
• Indirect utility function car-train DCE example*:
 Vcar,s = βtime * TTcar,s + βcost* TCcar,s

Attribute level
 Vtrain,s = βtrain + βtime * TTtrain,s + βcost * TCtrain,s
observable utility parameter weight

* Source: Bliemer, M.C.J. and J.M. Rose, Experimental design influences on stated choice outputs: An empirical study in air
travel choice. Transportation Research Part A: Policy and Practice, 2011. 45(1): p. 63-79.
37
4. Write out utility functions
scenario You are looking for a way to get home. Which of the following
roads would you prefer
alternative Road A Road B neither
Distance
20 35
Travel
20 10
time
2 1
Travel cost
response
38
5. ED criteria
• Past studies: orthogonality or random
• More and more: D-efficiency/error with computer

– Random does not give nice spread with low # CT
– Some orthogonal designs are less efficient than other
(non-)orthogonal designs
– Unlabeled OMEPs often contain dominating alt
– Orthogonality is lost in data collection/estimation
– Orthogonality leads to independently estimable parameter
estimates for linear models only
– Parameters are unbiased also for non-orthogonal
*
39
5. ED criteria
40
5. ED criteria
• When all face the same design, Xn = X for all n
41
5. ED criteria
• Generating efficient designs
– Software: Ngene, SAS, JMP
– Algorithms to go from candidate set to current best
– Different software, different algorithms
– D-error > 1 = alert, alert -> mistake in coding?
– More choice tasks is lower D-error only due to more questions not due
to more efficient individual tasks
– A D-efficient design has some degree of utility balance, but not too much
(random choice), and not too little (dominant)
– More levels, narrower range = larger D-error -> 3 levels 
– Check effect of misspecification -> graph D-error (y) vs size of prior, if U-
shaped -> more effort into finding good prior
42
5. ED criteria
43
5. ED criteria
• Multiple ways to create D-efficient (D-error) EDs:
– A priori assumptions on model type

• Optimize for MNL and evaluate for advanced
– A priori assumptions on parameter values (priors)

• No info -> zero prior or orth
• Literature, pilot studies, sign
• Safe = small number
– For same X, lower error is better design

44
5. ED criteria
45
5. ED criteria
• Steps in generating efficient designs
– Step 1: Specify the utility specification and priors for the likely final model
to be estimated from data collected using the SC design.
– Step 2: Randomly populate the design matrix, X, to create an initial
design.. The initial design, however, should incorporate all the constraints
that the analyst wishes to impose upon the final design outcome. For
example, if the analyst wishes to retain attribute level balance, then the
initial design should display this property. The initial design can be
constructed with the desired number of rows, however the number of
rows should be greater than or equal to K/(J-1).
– Step 3 and 4: Calculate the P and construct the AVC
– Step 5: evaluate efficiency
– Step 6: change design and repeat steps 3-5
46
Thank you for your attention!
47
Additional literature
• Hensher, Rose, Greene (2015) applied choice analysis,
Cambridge university press
• Rose, Bliemer (2014) stated choice experimental design
theory: the who, the what and the why, in Hess and Daly,
Handbook of choice modelling, Edward elgar
• Rose, Bain, Bliemer (2011) Experimental design strategies for
SP studies dealing with non-market goods, In Bennett,
International handbook on non-marketed environmental
valuation, Edward elgar
• Bateman, I., et al., Economic valuation with stated preference
techniques: A manual. 2002, Cheltenham: Edward Elgar. 458.
48
Assignment (20p)
Because a working session on campus is less
appropriate given the cirucmstances the working
session is replaced by an assignment that you can send
in (word file + excel file(s)) via email in groups of 2. The
time it takes to complete the assignment is estimated
to be equivalent to the foreseen time of the working
session.
Deadline: Sunday 23/01/2022

Assignment part 1 (2p)
Explain in your own words why choice modelling
benefits from a probabilistic framework, being
random utility theory.
(max +- 5 sentences)
Seeing that to estimate the β of a conditional
logit model you maximize the log likelihood
function.
– What will be the sign of the log-likelihood and

why? (max 1 line)
Imagine a stated choice having 2 labeled
alternatives with 4 attributes. 1 attribute has 2
levels, 2 attributes have 3 levels, and 1 attribute
has 4 levels.
– How many choice sets are in the full factorial?

(max 1 line -> show calculation)
Calculate the Dp-error using the “evaluate
experimental design students.xlsx” for the
experimental design and given priors
– You can find the formula’s you need to implement

on slide 40 and 44.
• To multiply matrices in excel you need to use the command MMULT
• To transpose matrices in excel you need the command TRANSPOSE
• To calculate a determinant in excel you need the command MDETERM
• For calculations with matrices to work you need ctrl+shift+enter to
automatically fill in the entire array
Suppose we want to calculate how much
Flemish adults are WTP to avoid the extinction
of the panda using a double bounded CVM.
– How would your valuation scenario and value

elicitation question look like? Please create one as
if you were actually going to distribute a survey on
this topic (max 10 lines).
• Given the experimental design for the following choice set
ID SET ALT CHOICE COST HIGHBIO MEDBIO MEADOW FISH FENCE WALK
102 1 1 0 400 0 0 1 0 0 0
102 1 2 1 200 0 1 0 1 1 1
102 1 3 0 0 0 0 0 0 0 0
102 2 1 0 200 0 0 1 1 1 0
102 2 2 1 700 1 0 0 0 0 1
102 2 3 0 0 0 0 0 0 0 0
102 3 1 1 400 1 0 1 1 0 0
102 3 2 0 850 0 1 0 0 1 1
102 3 3 0 0 0 0 0 0 0 0
• The following explanation of the attributes and levels
102 4 1 0 400 0 0 0 1 0 0
102 4 2 1 700 0 1 1 0 1 1
102 4 3 0 0 0 0 0 0 0 0
104 1 1 0 200 1 0 1 1 1 1
104 1 2 0 700 0 0 0 0 0 0
104 1 3 1 0 0 0 0 0 0 0
104 2 1 0 200 0 0 1 0 0 0
104 2 2 0 850 1 0 0 1 1 1
104 2 3 1 0 0 0 0 0 0 0
104 3 1 0 700 0 1 0 1 1 0
104 3 2 0 400 0 0 1 0 0 1
104 3 3 1 0 0 0 0 0 0 0
104 4 1 0 850 0 0 0 1 1 0
104 4 2 0 700 0 1 1 0 0 1
104 4 3 1 0 0 0 0 0 0 0
Assignment part 6
• And the following regression output for a conditional logit (aka multinomial
logit) model
– Please calculate the probability of selecting alternative 1 (you don’t have to take into account
the attribute crayfish and the coefficient for the intercept can be added to the utility function of
alternatives 1 and 2). Write down intermediate results. Please do this 2 times. Once you assume
that the scale = 1 and once you assume that the scale is 10. See slide 31 of previous slide show
for the formula that includes scale.

Experimental Design Slides 2021-2022

Uploaded by

Copyright:

Available Formats

You might also like

Experimental Design Slides 2021-2022

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Experimental Design Slides 2021-2022

Uploaded by

Copyright:

Available Formats

Setting up DCEs and why choosing

D-efficiency as the ED criterion

5. Experimental design criteria

– Computation parameter weights

5 10 2.00 20 1.00 How many stated choice tasks can be

• Can we replace people fully by more choice sets?

– Lancaster’s theory of value: utility from a good arises

– Random utility theory (RUT): latent utility (U) can be

• The indirect utility function describes the individual V m,s

• Common assumption II: the utility function is a linear function

There is a trade-off between the benefits of assuming a less

A simpler functional form (e.g. linear in income) makes estimation

Suppose we have a choice set with J alternatives. The probability

We assume that the error terms have an extreme value type I

Of the MNL model

The parameters are normalised with a scale paramters. This

May or may not be satisfied, in many cases not. With many

Can be tested with the Hausman-McFadden test (1984).

Attribute Description Variable Levels

Surrounding vegetation Forest or meadow-land Meadow Forest, Meadow

Of the three alternatives below, mark the alternative you prefer.

3. Cray fish Introduction No introduction No introduction

5. Walking facilities No walking facilities Walking facilities No walking facilities

Note 1: Choice is the choice indicator.

Can we compare the coefficients of this model with the coefficients

Can we compare the coefficients of this model with the coefficients

Recap - Intro - Set-up - CVM - CM - Example - Appreciation - Assignment

alternative Labeled Unlabeled Typically fixed

Attributes are Attributes and

response Best, best and worst, conditional best

alternative Laptop A Laptop B Laptop C

alternative Road A Road B Road C (current)

alternative Radiotherapy Surgery No treatment

alternative Laptop A Laptop B Neither

– An experimental design is a matrix of values that is used to

– Identification: effects independently estimated

 Efficiency: precision on parameter estimates

 Vcar,s = βtime * TTcar,s + βcost* TCcar,s

 Vtrain,s = βtrain + βtime * TTtrain,s + βcost * TCtrain,s

observable utility parameter weight

alternative Road A Road B neither

• More and more: D-efficiency/error with computer

– A priori assumptions on model type

– A priori assumptions on parameter values (priors)

– For same X, lower error is better design

Deadline: Sunday 23/01/2022

– What will be the sign of the log-likelihood and

– How many choice sets are in the full factorial?

– You can find the formula’s you need to implement

– How would your valuation scenario and value

You might also like