Experimental Design Slides 2021-2022

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 56

Setting up DCEs and why choosing

D-efficiency as the ED criterion

Duurzaamheidsanalyse
2021-2022
Contents
1. What are DCEs?

2. Setting up DCEs

3. Survey development

4. Experimental design

5. Experimental design criteria

6. Assignment

2
1. What are DCEs?
• In DCEs respondents are asked to choose their
preferred alternative from a given set of alternatives
defined by levels of attributes
Labeled alternative
Choice set 1 Car Train
Travel time 10 minutes 15 minutes Level
Travel cost 1€ 0,5 €
Choice
Attributes

Scenario

* Source: Bliemer, M.C.J. and J.M. Rose, Experimental design influences on stated choice outputs: An empirical
study in air travel choice. Transportation Research Part A: Policy and Practice, 2011. 45(1): p. 63-79.
3
1. What are DCEs?
• Means to which DCE can be used:

– Computation parameter weights


– Computation alternative utility
– Computation alternative/scenario probability
– Computation (m)WTP
– Forecasting market share
– Computation elasticities
–…
4
Car Train

Choice set Travel time Travel cost Travel time Travel cost

1 10 1.00 15 0.50
How can you collect choice data?
2 20 1.50 20 0.50

3 15 2.00 25 0.50

4 15 1.50 15 1.00

5 10 2.00 20 1.00 How many stated choice tasks can be


6 20 1.00 25 1.00 created for this problem: 2 alt with 2
7 20 2.00 15 1.50
attr having 3 levels?
8 15 1.00 20 1.50
= Difference between full factorial and
9 10 1.50 25 1.50
fractional factorial

L^MA

5
1. What are DCEs?
• Decision maker
– Individual
– Household/group
– Company/organization
• Choice set
– Made up of finite number of mutually exclusive alt’s
– Alt’s described by att’s
– One alt is chosen
• Decision rule
– Y= f(x, z, β) with Y = discrete
6
1. What are DCEs?
• Compared to RP choice
– Need for experimental design to encourage trade-off
– Respondents face multiple choice sets
– Attributes and levels are chosen by analyst
– Allows for hypothetical choice scenario’s
• Control vs reliability (no consequences)

• Can we replace people fully by more choice sets?


Maximize information with
limited number of observations

7
1. What are DCEs?
• DCEs are based on 2 fundamental building blocks:

– Lancaster’s theory of value: utility from a good arises


from the characteristics of that good

– Random utility theory (RUT): latent utility (U) can be


divided into an observable or systematic (V) and
unobservable or random (ε) part => probabilistic
utility function

• The indirect utility function describes the individual V m,s


of an alternative m for a given choice set s
8
1. What are DCEs?
U = utility
n= individual
i,j = alternative

9
1. What are DCEs?
• Common assumption I: the error term enters the utility
function as an additive term.

• Common assumption II: the utility function is a linear function


of the attributes

U ik  ai  M k   ik

There is a trade-off between the benefits of assuming a less


restrictive formulation and the complications that arise from
doing so. This is especially relevant for the way income enters
the utility function.

A simpler functional form (e.g. linear in income) makes estimation


of the parameters and calculation of welfare effects (remember
welfare effects for linear additive function was the negative of a
ratio with the price coefficient in the denominator) easier, but
the estimates are based on restrictive assumptions. 10
1. What are DCEs?
Historically the most common model to estimate a DCE is the MNL
also CL. Main reason is simplicity to estimate. However, the last
10 years or so a rapid development of other models, computer
capacity and algorithms has made this model somewhat less
important.

Suppose we have a choice set with J alternatives. The probability


that individual k chooses alternative i can then be expressed as

Pik  P vik (a i , y  p i )   ik  v jk (a j , y  p j )   jk ; j  i 
 P jk   ik  vik  v jk ; j  i 

We assume that the error terms have an extreme value type I


distribution (iid), the variance of this distribution is
var( )   2 2 6
11
1. What are DCEs?
Of any discrete choice model
1. The true parameters are confounded with the scale parameter
2. Only the utility difference matters. Consequently, there must
be a difference between the alternatives in order to estimate a
parameter
3. This means that we can only include M-1 alternative specific
constants.

Of the MNL model


4. The alternatives are independent (because of the IID
assumption). Results in the IIA property
5. Limited modeling of taste variation. Unobserved heterogeneity
is captured via the error term in a simple fashion. However,
socio-economic variables can account for observed
heterogeneity.

12
1. What are DCEs?
It can be shown that the choice probability for an MNL can be
expressed as
exp(vik  )
Pik  J

 exp(v
j 1
jk )

Which is a very simple and nice expression! But this will come with
some ”costs”.

The parameters are normalised with a scale paramters. This


complicates the interpretation of models, and in particular a
comparison among models.

13
1. What are DCEs?
The ratio of choice probabilities between two alternatives in a
choice set is unaffected by what other alternatives that are
available in the choice and the levels of the attributes of the other
alternatives
Pik
exp(vik )  exp(v
jS m
jk )
exp(vik )
 
Pnk exp(v nk )  exp(v
jS m
jk ) exp(v nk )

May or may not be satisfied, in many cases not. With many


alternatives this is nevertheless a useful property.

Can be tested with the Hausman-McFadden test (1984).


Essentially: If IIA is satisfied then the ratio of choice probabilities
should not be affected by whether another alternative is on the
choice set or not. Hence, one way of testing IIA is to remove one
alternative and re-estimate the model and compare the choice
probabilities.
14
1. What are DCEs?
We will use the Wetland study. A mail survey to Swedish
housholds about possible development of wetland areas (both for
biodiversity and recreation reasons).

Attribute Description Variable Levels

Total cost (Cost) The total cost for the individual Cost 200, 400, 700, 850

Surrounding vegetation Forest or meadow-land Meadow Forest, Meadow

Biodiversity The wetland can contain different numbers of both rare and more Medbio Low, Medium, High,
common species. Highbio

Fish The design of the wetland area can improve the conditions for fish Fish No, Yes
species

Fenced waterline The water is surrounded with a 1m fence in order to prevent Fence No, Yes
drowning accidents.

Crayfish Introduction of Swedish crayfish and allow fishing. Crayfish No, Yes

Walking facilities Construction of the wetland area for outdoor life Walk No, Yes
1. What are DCEs?
Each respondent made at most 4 choice situations. There was
always an opt-out situation
Choice 1

Of the three alternatives below, mark the alternative you prefer.

Your choice
(Mark your choice)
Wetland Alternative 1 Alternative 2 Alternative 3
Simple ponds

Attributes:
Surrounding vegetation
1. Surrounding vegetation Forest Meadow-land Forest

Water issues
2. Fish Good conditions No actions No actions

3. Cray fish Introduction No introduction No introduction

Other attributes
4. Biodiversity Low High Low

5. Walking facilities No walking facilities Walking facilities No walking facilities

6. Fence No Fence No
___________________ ______________ _______________ ______________
= Total cost per citizen SEK 850 SEK 400 SEK 0
1. What are DCEs?
The data needs to be arranged in a specific way for
STATA/NLOGIT. Each row in the data set represents one of the
alternatives of a choice set. So if there are 3 alternatives (as in
this case) each choice set will have 3 rows in the data set.
ID SET ALT CHOICE COST HIGHBIO MEDBIO MEADOW FISH FENCE WALK
102 1 1 0 400 0 0 1 0 0 0
102 1 2 1 200 0 1 0 1 1 1
102 1 3 0 0 0 0 0 0 0 0
102 2 1 0 200 0 0 1 1 1 0
102 2 2 1 700 1 0 0 0 0 1
102 2 3 0 0 0 0 0 0 0 0
102 3 1 1 400 1 0 1 1 0 0
102 3 2 0 850 0 1 0 0 1 1
102 3 3 0 0 0 0 0 0 0 0
102 4 1 0 400 0 0 0 1 0 0
102 4 2 1 700 0 1 1 0 1 1
102 4 3 0 0 0 0 0 0 0 0
104 1 1 0 200 1 0 1 1 1 1
104 1 2 0 700 0 0 0 0 0 0
104 1 3 1 0 0 0 0 0 0 0
104 2 1 0 200 0 0 1 0 0 0
104 2 2 0 850 1 0 0 1 1 1
104 2 3 1 0 0 0 0 0 0 0
104 3 1 0 700 0 1 0 1 1 0
104 3 2 0 400 0 0 1 0 0 1
104 3 3 1 0 0 0 0 0 0 0
104 4 1 0 850 0 0 0 1 1 0
104 4 2 0 700 0 1 1 0 0 1
104 4 3 1 0 0 0 0 0 0 0
1. What are DCEs?
We use the NLOGIT command in limdep, together with the Model
command, where we specify the utility function for each of the
alternatives.

nlogit;lhs=choice;choices=new1,new2,base;
Model:
U(new1)=alfa+b_cost*cost+b_meadow*meadow+b_highbi*highbio+b_medbio*
medbio+b_fish*fish+b_fence*fence+b_cray*crayfish+b_walkt*walk/
U(new2)=alfa+b_cost*cost+b_meadow*meadow+b_highbi*highbio+b_medbio*
medbio+b_fish*fish+b_fence*fence+b_cray*crayfish+b_walkt*walk/
U(base)=b_cost*cost+b_meadow*meadow+b_highbio*highbio+b_medbio*
medbo+b_fish*fish+b_fence*fence+b_cray*crayfish+b_walkt*walk$

Note 1: Choice is the choice indicator.


Note 2: In this particular case we assume that the two ”new”
alternatives have a common alternative specific constant.
Note 3: The number of ASC cannot be higher than the number of
alternatives minus one.
1. What are DCEs?
Multinomial Logit
Coefficient Standard error
Intercept 0.1195 0.3384
Cost -0.0012 0.0000
Meadow -0.0518 0.3967
Highbio 0.7835 0.0000
Medbio 0.5906 0.0000
Fish 0.4051 0.0000
Fence -0.1946 0.0016
Crayfish -0.1301 0.0339
Walk 0.7532 0.0000
1. What are DCEs?
Can we compare the coefficients for different attributes within this
model?

Can we compare the coefficients of this model with the coefficients


from another model?
1. What are DCEs?
Can we compare the coefficients for different attributes within this
model?
Yes and No. A meaningfull comparison of coefficients require that
they are measured on the same scale. In this particular case all
variables are dummy variables so it is actually possible. But be
careful here!

Can we compare the coefficients of this model with the coefficients


from another model?

Yes and No. You can compare sign and significance. But you cannot
compare the size of the coefficients. This because all the coefficients are
scaled with an unknown scale paramter. And without further information
we cannot say anything about the scale parameter.

Recap - Intro - Set-up - CVM - CM - Example - Appreciation - Assignment


2. Setting up DCEs

Done
To come

* Source: Ryan, M., K. Gerard, and M. Amaya-Amaya, Discrete Choice Experiments in a Nutshell, in Using Discrete Choice Experiments to Value Health and
Health Care, M. Ryan, K. Gerard, and M. Amaya-Amaya, Editors. 2008, Springer Netherlands. p. 13-46.

22
3. Survey development
• Designing a SP survey (lay-out):

– Introductory section:
• Intro Easy = warm-up, engage,
• Socio-demographics screen
• Use of the good and substitutes

– Valuation section
• Valuation scenario
• Value elicitation questions Cognitively demanding, unfamiliar
• Follow-up questions

– Final section
• Socio-Demographics
Sensitive information
• Attitude/opinion
• Identification 23
4. Experimental design
• Experimental design (ED): how attributes and levels
are combined into different choice tasks*

*Source: Louviere, J.J., D.A. Hensher, and J.D. Swait, Stated choice methods: analysis and applications. 2000:
Cambridge University Press.

24
4. Choice task
scenario

alternative

scenario or
profile

response

25
4. Choice task
scenario Describes the background and choice text and is fixed across alt’s,
yet may vary over choice tasks and may be different across resp’s

alternative Labeled Unlabeled Typically fixed

Attributes are Attributes and


scenario or typically fixed, levels may be Describe the
profile levels vary over different across alternative
choice tasks respondents

response Best, best and worst, conditional best

26
4. Unlabeled choice task
scenario You are looking to buy a new laptop for use at home. Which of
the following laptops would you prefer

alternative Laptop A Laptop B Laptop C

scenario or
profile

response All alternatives have the same utility function; used for valuation

27
4. Unlabeled choice task
scenario You are looking for a way to get home. Which of the following
roads would you prefer

alternative Road A Road B Road C (current)

scenario or
profile Fixed levels

response

28
4. Labeled choice task
scenario Consider a 70 year old patient with advanced breast cancer. As his
doctor, what treatment do you recommend?

alternative Radiotherapy Surgery No treatment

scenario or
profile No levels

response Each alternative may have a different utility function; also used
for prediction and for elasticities

29
4. (Un)labeled choice task?
scenario You are looking to buy a new laptop for use at home. Which of
the following laptops would you prefer

alternative Laptop A Laptop B Neither

scenario or
profile

response

30
4. Experimental design
• Experimental design (ED): how attributes and levels
are combined into different choice tasks*

– An experimental design is a matrix of values that is used to


determine what goes where in the survey
• Each row represents a choice task
• Each column represents an attribute of an alternative

*Source: Louviere, J.J., D.A. Hensher, and J.D. Swait, Stated choice methods: analysis and applications. 2000:
Cambridge University Press.

31
4. Experimental design
Car Train

Choice set Travel time Travel cost Travel time Travel cost

1 10 1.00 15 0.50

2 20 1.50 20 0.50
Fractional factorial =
3 15 2.00 25 0.50
“Smart” selection
4 15 1.50 15 1.00
out of the full space
5 10 2.00 20 1.00 = full factorial
6 20 1.00 25 1.00

7 20 2.00 15 1.50

8 15 1.00 20 1.50

9 10 1.50 25 1.50

32
Reminder

* Source: Ryan, M., K. Gerard, and M. Amaya-Amaya, Discrete Choice Experiments in a Nutshell, in Using Discrete Choice Experiments to Value Health and
Health Care, M. Ryan, K. Gerard, and M. Amaya-Amaya, Editors. 2008, Springer Netherlands. p. 13-46.

33
4. Experimental design
• Experimental design (ED): how attributes and levels
are combined into different choice sets*:

– Identification: effects independently estimated


– Cognitive complexity: burden on respondent
– Market realism: presented choices are realistic

 Efficiency: precision on parameter estimates


*Source: Louviere, J.J., D.A. Hensher, and J.D. Swait, Stated choice methods: analysis and applications. 2000:
Cambridge University Press.

34
4. Experimental design
• Before we can start with ED, we need to know
– Attributes, alternatives <-> subset
– Levels <-> non-linearities, cover range evenly, easy
– What response we want <-> real situation
– How utility function will look like <-> write-out, test
– What model will be estimated <-> test
– What statistical properties should the design have
– How many choice tasks <-> resp vs statistical eff
– How will the survey be taken <-> f(complexity)
35
4. Experimental design
• After we have generated the ED
– Randomise the order of the
CT/alternatives/attributes between respondents
– Decide on the format
• Table
• Picture
• Cartoon
• Movie
• VR

36
1. What are DCEs?
• Indirect utility function car-train DCE example*:

 Vcar,s = βtime * TTcar,s + βcost* TCcar,s


Attribute level

 Vtrain,s = βtrain + βtime * TTtrain,s + βcost * TCtrain,s

observable utility parameter weight


* Source: Bliemer, M.C.J. and J.M. Rose, Experimental design influences on stated choice outputs: An empirical study in air
travel choice. Transportation Research Part A: Policy and Practice, 2011. 45(1): p. 63-79.

37
4. Write out utility functions
scenario You are looking for a way to get home. Which of the following
roads would you prefer

alternative Road A Road B neither

Distance
20 35
Travel
20 10
time
2 1
Travel cost

response

38
5. ED criteria
• Past studies: orthogonality or random

• More and more: D-efficiency/error with computer


– Random does not give nice spread with low # CT
– Some orthogonal designs are less efficient than other
(non-)orthogonal designs
– Unlabeled OMEPs often contain dominating alt
– Orthogonality is lost in data collection/estimation
– Orthogonality leads to independently estimable parameter
estimates for linear models only
– Parameters are unbiased also for non-orthogonal
*

39
5. ED criteria

40
5. ED criteria
• When all face the same design, Xn = X for all n

41
5. ED criteria
• Generating efficient designs
– Software: Ngene, SAS, JMP
– Algorithms to go from candidate set to current best
– Different software, different algorithms
– D-error > 1 = alert, alert -> mistake in coding?
– More choice tasks is lower D-error only due to more questions not due
to more efficient individual tasks
– A D-efficient design has some degree of utility balance, but not too much
(random choice), and not too little (dominant)
– More levels, narrower range = larger D-error -> 3 levels 
– Check effect of misspecification -> graph D-error (y) vs size of prior, if U-
shaped -> more effort into finding good prior

42
5. ED criteria

43
5. ED criteria
• Multiple ways to create D-efficient (D-error) EDs:

– A priori assumptions on model type


• Optimize for MNL and evaluate for advanced

– A priori assumptions on parameter values (priors)


• No info -> zero prior or orth
• Literature, pilot studies, sign
• Safe = small number

– For same X, lower error is better design


44
5. ED criteria

45
5. ED criteria
• Steps in generating efficient designs
– Step 1: Specify the utility specification and priors for the likely final model
to be estimated from data collected using the SC design.
– Step 2: Randomly populate the design matrix, X, to create an initial
design.. The initial design, however, should incorporate all the constraints
that the analyst wishes to impose upon the final design outcome. For
example, if the analyst wishes to retain attribute level balance, then the
initial design should display this property. The initial design can be
constructed with the desired number of rows, however the number of
rows should be greater than or equal to K/(J-1).
– Step 3 and 4: Calculate the P and construct the AVC
– Step 5: evaluate efficiency
– Step 6: change design and repeat steps 3-5

46
Thank you for your attention!

47
Additional literature
• Hensher, Rose, Greene (2015) applied choice analysis,
Cambridge university press
• Rose, Bliemer (2014) stated choice experimental design
theory: the who, the what and the why, in Hess and Daly,
Handbook of choice modelling, Edward elgar
• Rose, Bain, Bliemer (2011) Experimental design strategies for
SP studies dealing with non-market goods, In Bennett,
International handbook on non-marketed environmental
valuation, Edward elgar
• Bateman, I., et al., Economic valuation with stated preference
techniques: A manual. 2002, Cheltenham: Edward Elgar. 458.

48
Assignment (20p)
Because a working session on campus is less
appropriate given the cirucmstances the working
session is replaced by an assignment that you can send
in (word file + excel file(s)) via email in groups of 2. The
time it takes to complete the assignment is estimated
to be equivalent to the foreseen time of the working
session.

Deadline: Sunday 23/01/2022


Assignment part 1 (2p)
Explain in your own words why choice modelling
benefits from a probabilistic framework, being
random utility theory.

(max +- 5 sentences)
Assignment part 2 (1p)
Seeing that to estimate the β of a conditional
logit model you maximize the log likelihood
function.

– What will be the sign of the log-likelihood and


why? (max 1 line)
Assignment part 3 (1p)
Imagine a stated choice having 2 labeled
alternatives with 4 attributes. 1 attribute has 2
levels, 2 attributes have 3 levels, and 1 attribute
has 4 levels.

– How many choice sets are in the full factorial?


(max 1 line -> show calculation)
Assignment part 4 (6p)
Calculate the Dp-error using the “evaluate
experimental design students.xlsx” for the
experimental design and given priors

– You can find the formula’s you need to implement


on slide 40 and 44.
• To multiply matrices in excel you need to use the command MMULT
• To transpose matrices in excel you need the command TRANSPOSE
• To calculate a determinant in excel you need the command MDETERM
• For calculations with matrices to work you need ctrl+shift+enter to
automatically fill in the entire array
Assignment part 5 (4p)
Suppose we want to calculate how much
Flemish adults are WTP to avoid the extinction
of the panda using a double bounded CVM.

– How would your valuation scenario and value


elicitation question look like? Please create one as
if you were actually going to distribute a survey on
this topic (max 10 lines).
Assignment part 6 (6p)
• Given the experimental design for the following choice set
ID SET ALT CHOICE COST HIGHBIO MEDBIO MEADOW FISH FENCE WALK
102 1 1 0 400 0 0 1 0 0 0
102 1 2 1 200 0 1 0 1 1 1
102 1 3 0 0 0 0 0 0 0 0
102 2 1 0 200 0 0 1 1 1 0
102 2 2 1 700 1 0 0 0 0 1
102 2 3 0 0 0 0 0 0 0 0
102 3 1 1 400 1 0 1 1 0 0
102 3 2 0 850 0 1 0 0 1 1
102 3 3 0 0 0 0 0 0 0 0
• The following explanation of the attributes and levels
102 4 1 0 400 0 0 0 1 0 0
102 4 2 1 700 0 1 1 0 1 1
102 4 3 0 0 0 0 0 0 0 0
104 1 1 0 200 1 0 1 1 1 1
104 1 2 0 700 0 0 0 0 0 0
104 1 3 1 0 0 0 0 0 0 0
104 2 1 0 200 0 0 1 0 0 0
104 2 2 0 850 1 0 0 1 1 1
104 2 3 1 0 0 0 0 0 0 0
104 3 1 0 700 0 1 0 1 1 0
104 3 2 0 400 0 0 1 0 0 1
104 3 3 1 0 0 0 0 0 0 0
104 4 1 0 850 0 0 0 1 1 0
104 4 2 0 700 0 1 1 0 0 1
104 4 3 1 0 0 0 0 0 0 0
Assignment part 6
• And the following regression output for a conditional logit (aka multinomial
logit) model

– Please calculate the probability of selecting alternative 1 (you don’t have to take into account
the attribute crayfish and the coefficient for the intercept can be added to the utility function of
alternatives 1 and 2). Write down intermediate results. Please do this 2 times. Once you assume
that the scale = 1 and once you assume that the scale is 10. See slide 31 of previous slide show
for the formula that includes scale.

You might also like