Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Exploring causal effects of neighborhood type on walking

behavior using stratification on the propensity score


Xinyu (Jason) Cao
Humphrey Institute of Public Affairs
University of Minnesota
301 19th Ave S., Minneapolis, MN, 55455
Email: cao@umn.edu
Phone: 612-625-5671
Fax: 612-625-3513

Abstract
The causality issue has become one of the key questions in the debate over the relationship
between the built environment and travel behavior. To ascertain whether changes to the built
environment are a cost-effective way to change travel behavior, it is necessary to determine the
magnitude of the effect. Further, it is important to understand if the observed influence of the
built environment on travel behavior diminishes substantially once we control for self-selection.
Using 1,553 residents living in four traditional and four suburban neighborhoods in Northern
California, this study explores the causal effect of neighborhood type on walking behavior and
the relationship between this effect and the observed influence of neighborhood type on walking
behavior. Specifically, this study applied propensity score stratification, which has been widely
used to reduce selection bias. The results showed that, on average, the causal influences of
neighborhood type are likely to be overstated by 64% for utilitarian walking frequency and 16%
for recreational walking frequency, if residential self-selection is not controlled for. However,
neighborhood type still plays a more important role in affecting walking behavior than self-
selection. This study also offers a basic tutorial for the propensity score stratification approach
and discusses its strengths and weaknesses for applications in the field of land use and travel
behavior.

Key words: causality, land use, smart growth, transportation, treatment effect

1
1. INTRODUCTION
Suburban development has been widely criticized for its contribution to auto dependence and its

consequences: air pollution, global climate change, and oil dependence. Numerous studies have

investigated the relationships between the built environment and travel behavior since the 1990s

(Crane, 2000; Ewing and Cervero, 2001; Frank and Engelke, 2001; Handy, 1996). These studies

found that many attributes of traditional neighborhoods (such as high density, high accessibility,

and mixed land use) have a positive association with walking and/or a negative relationship with

driving. The results point to the movement of using land use and transportation policies to

reduce auto dependence and its negative impacts. Most recently, decision-makers at the state

and local levels have been considering land use policies as a way to reduce vehicle-miles

traveled (VMT) and thus greenhouse gas emissions. The recent report Growing Cooler (Ewing

et al., 2008) concluded that “it is realistic to assume a 30 percent cut in VMT [for people in areas

of] compact development” (p. 9). However, association does not necessarily mean causality. It

is possible that residential self-selection is at work - individuals who prefer walking may

selectively live in a neighborhood conducive to walking and walk more.

The goal of research regarding self-selection is to establish whether there is a causal relationship

between the built environment and travel behavior, and ultimately to determine the magnitude of

this relationship. Such evidence provides a basis for the adoption of policies that aim to change

travel behavior by changing the built environment. The existence of self-selection doesn't mean

that the built environment is irrelevant. For the sake of increasing active travel, self-selection is

desirable if there is an unmet demand for pedestrian-oriented neighborhoods. Although some

studies contended those neighborhoods are undersupplied, Cao (Cao, 2008) offered a critique of

2
the studies and his empirical results did not support the argument of unmet demand. Further, the

demand for pedestrian-oriented neighborhoods may be softer than it appears. Although

individuals may desire walkability-related attributes (such as stores within walking distance),

they may also prefer contradictory qualities (such as large lots and free parking) (Walker and Li,

2007). Therefore, to the extent self-selection exists but is not controlled for, we are likely to

misestimate the influence of built environment when we use land use policies to try to reduce

travel, fuel consumption, and emissions. For example, if those who have an automobile-oriented

lifestyle end up living in dense and diverse neighborhoods despite their preferences (e.g. because

of undersupply of the neighborhoods), their travel behavior will probably not match that of those

who actively want and choose to live in such neighborhoods.

Recent studies have investigated the causal relationships between the built environment and

travel behavior. Among 38 studies reviewed in Cao et al.(Cao et al., 2009), many concluded the

evidence of residential self-selection, and virtually every study found a statistically significant

influence of the built environment on travel behavior, controlling for self-selection (Boarnet et

al., 2005; Boarnet and Sarmiento, 1998; Frank et al., 2007; Khattak and Rodriguez, 2005;

Krizek, 2003; Vance and Hedel, 2007). It is arguable that the magnitude of an effect is at least as

important as statistical significance of the effect, especially as statistical significance is affected

by sample size (Ziliak and McCloskey, 2004). Therefore, to ascertain whether changes to the

built environment are a cost-effective way to change travel behavior, given the opportunity costs

of spending resources another way, it is necessary to determine the magnitude of the effect, not

just whether one occurs or not. Further, it is known that the observed influence of the built

environment on travel behavior (without a correction for self-selection) constitutes the influence

3
of the built environment itself (the causal influence of the built environment) and the influence of

self-selection. This intrigues planners and make them interested in knowing if the observed

influence diminishes substantially once we control for self-selection.

However, few studies have shed light on the proportion of the causal influence of the built

environment in the observed influence of the built environment on travel behavior. Using travel

diary data from the Regional Travel – Household Interview Survey, Salon (Salon, 2006)

estimated a three-tiered nested logit model of residential choice, auto ownership, and walking

level. She concluded the effect of the built environment itself accounted for 1/2 to 2/3 of the

effect of a change in population density on walking level in most areas of New York City. Using

the 1998-1999 Austin Travel Survey, Zhou and Kockelman (Zhou and Kockelman, 2008)

employed a sample selection model to investigate the causal influence of residential location on

VMT. They found that the causal effect of the built environment accounted for 58-90% of the

“total” (not observed but derived from models) influence of residential location on VMT,

depending on model specifications.

In addition, a few studies indicated whose effect of the built environment itself and the self-

selection on travel behavior is stronger, but they did not show how much stronger (although the

causal effects may be calculated using parameter estimates). The results of these studies were

mixed, however. For example, using a 2003-2004 survey in the San Diego and San Francisco

metropolitan areas, Chatman (Chatman, 2009) employed a negative binomial model to explore

the impact of the built environment on trip frequencies by different modes, controlling for

preferences for mode choice and socio-demographics. He concluded the built environment

4
impacts travel behavior, and residential self-selection bias is modest. Schwanen and Mokhtarian

(Schwanen and Mokhtarian, 2005) compared the mode choice of consonant residents (those

whose residential choices match their travel/residential preferences) and dissonant residents.

They found that urban-oriented suburbanites (dissonant suburbanites) commuted by car at rates

almost as high as other suburbanites. Therefore, for suburbanites, the built environment has a

relatively stronger influence on mode choice than attitudes. However, in the urban

neighborhood, they found that the built environment has an influence on mode choice similar in

magnitude to travel preferences. Kitamura et al. (Kitamura et al., 1997) evaluated the relative

contributions of built environment variables and attitudes by gradually including different groups

of variables in their model specifications. They found that attitudes explain travel behavior

better than neighborhood characteristics. Using the 1995 National Personal Transportation

Survey, Boer et al. (Boer et al., 2007) found that a number of built environment variables were

significantly associated with the choice of walking or not. However, after propensity score

matching (with demographics being independent variables in the propensity score model), many

previously significant built environment variables became insignificant although few remained

significant. They concluded that self-selection played an important role in walking choice.

Given limited research and mixed results, the extent to which the built environment itself

contributes to the observed influence of the built environment on travel behavior is inconclusive.

Using the 2003 data collected from Northern California, this study applied a propensity score

stratification approach to identify the respective effects of neighborhood type and self-selection

on walking behavior: Handy et al. (Handy et al., 2006) used the same dataset and examined

whether the influence of the built environment on walking behavior is causality or correlation.

5
They adopted negative binomial models and did not quantify the effect of the built environment

and the effect of self-selection on walking behavior. This study moves beyond Handy et al. and

aims to answer the following questions: (1) How large is the causal influence of neighborhood

type on walking behavior? (2) To what extent do neighborhood type itself and self-selection

contribute to the observed influence of neighborhood type on walking behavior? Further, this

paper is one of few applications of the propensity score approach in the field of land use and

transportation. It offers a tutorial for the method and discusses its strengths and weaknesses.

The organization of this paper is as follows. Section 2 reviews the conceptual connection

between residential self-selection and misestimation. Section 3 describes the propensity score

approach. The next section presents the data and variables. Section 5 discusses modeling

results. The last section presents the limitations and summarizes major findings.

2. SELF-SELECTION AND MISESTIMATION


Many studies have speculated that if the influence of self-selection on travel behavior exists but

is not taken into account, we are likely to overestimate the causal influence of the built

environment (Mokhtarian and Cao, 2008; Pinjari et al., 2007). However, is it possible that we

underestimate the influence?

First assume that the relationship between walking behavior and neighborhood type is

confounded by only walking preference. Let ATE (average treatment effect) denote the causal

influence of neighborhood type on walking behavior, across the entire population. Similar to a

natural experiment, suppose we could (though impossible) randomly assign half of a sample to a

walkable neighborhood and the other half to a non-walkable neighborhood. Then the ATE is the

observed difference in walking behavior between residents in walkable and non-walkable

6
neighborhoods, because the influence of walking preference on walking behavior will be

cancelled out due to random assignment (Figure 1). Now, assume that all people can self-select

the neighborhoods that match their walking preference. For people selectively living in the

walkable neighborhood, walking preference tends to facilitate walking behavior in addition to

the effect of neighborhood type, and vice versa. Therefore, the observed difference in walking

behavior between the two neighborhoods (ATE1) will be larger than the ATE. By contrast, if all

people are mismatched (that is, people who prefer walking live in the non-walkable

neighborhood, and vice versa), the observed difference in walking behavior between the

neighborhoods (ATE2) will be smaller than the ATE. The ATE1 and ATE2 are the upper and

lower bounds of the observed influence of neighborhood type on walking behavior, respectively.

[Figure 1]

In reality, not all people can find the neighborhoods that match their walking preference and not

all people are mismatched. For such a sample, the observed influence of neighborhood type on

walking behavior (ATE3) will lie somewhere between the ATE1 and the ATE2. Therefore,

conceptually, we can overestimate or underestimate the influence of neighborhood type on

walking behavior. Some studies found that up to about a quarter of residents were mismatched

(Cao, 2008; Schwanen and Mokhtarian, 2004). This evidence seems to suggest that the ATE3 is

larger than the ATE but smaller than the ATE1. However, the misestimation remains uncertain

because (1) the influence of travel preferences on travel behavior does not appear to be

symmetric for residents living in different types of neighborhoods (Schwanen and Mokhtarian,

7
2003; Schwanen and Mokhtarian, 2005); (2) individuals’ travel behavior is influenced by many

factors beyond neighborhood type and walking preferences.

3. METHODOLOGY
Stratification is an effective way to control for selection bias (Rosenbaum and Rubin, 1984). In

observational studies, observations in the treatment group often differ systematically from those

in the control group. In this context, treatment consists of residents living in traditional

neighborhoods and control includes suburbanites. To reduce the bias, we can classify residents

into several strata based on their characteristics, and then compare travel behavior between

residents living in traditional and suburban neighborhoods that were grouped into the same

stratum (Rosenbaum and Rubin, 1984). If the assignment of a treatment is confounded by only a

single variable, treatment and control units in a given stratum tend to carry similar values (within

a prespecified range) of the variable (i.e., to be balanced). Within the stratum, this sub-

classification practice is approximately equivalent to randomly assigning similar units into a

treatment group and a control group. That is, stratification roughly resembles a true random

experiment. It was concluded that a sub-classification up to five strata can reduce more than

90% of the bias resulting from one continuous variable (Cochran, 1968). Through two pair-wise

comparisons of random experiments and quasi-experiments, Luellen et al (Luellen et al., 2005)

found that five strata (quintiles) reduced selection bias by 73% and 90%, respectively.

When we balance strata simultaneously on many confounding variables, it is desired to rely on a

scalar function of those variables (Rosenbaum and Rubin, 1984). Self-selection bias can result

from a large number of variables. However, stratification is convenient for only a small number

of variables because the number of strata grows at an exponential rate as the number of variables

8
increases. If, for example, a variable is sub-classified into five strata, a stratification of k

variables will produce 5k groups. Excess stratification may lead to some empty strata or few

observations in some strata, and hence make a direct comparison of outcomes impossible. The

limitation calls for a scalar that carries the information required to balance all variables.

The propensity score is a scalar function that can be used to balance multiple variables. Using

large and small sample theory, Rosenbaum and Rubin (Rosenbaum and Rubin, 1983) have

proved that “adjustment for the scalar propensity score is sufficient to remove bias due to all

observed covariates” (p.41). According to their definition, the propensity score in this context is

the conditional probability that an individual lives in traditional neighborhoods (receives a

treatment) given her observed characteristics. It can be estimated using binary choice models

(Rosenbaum and Rubin, 1983). It is worthy noting that the propensity score model can also be

used to the case with one control and multiple treatments in spite of few applications. Due to

limitations of land use data in this study, we chose a binary classification: one control and one

treatment.

Here, the goal of propensity score stratification is to estimate the causal effect of neighborhood

type on travel behavior. Our interest is the ATE, which, ideally, represents the average increase

in travel behavior of moving a randomly-selected person from a suburban neighborhood to a

traditional one (Mokhtarian and Cao, 2008). Before estimating the ATE, propensity score sub-

classification and balance assessment are in order (as discussed in Section 5). Then, for each

stratum, we calculate the difference in travel behavior between residents in traditional and

suburban neighborhoods. The ATE is a weighted average of the differences of all strata.

9
The propensity score method is desirable for its ability to estimate the causal influence of the

built environment on travel behavior. It is different from other approaches for causal effects.

First, it is distinct from the statistical control method, which explicitly measures attitudes and

incorporates them in the behavior equation (Mokhtarian and Cao, 2008). Conceptually, the

propensity score approach controls for the observed characteristics that affect whether an

individual lives in traditional or suburban neighborhoods. The attention is directed to the

imbalance in variables between traditional and suburban neighborhoods. The latter identifies the

determinants of travel behavior through incorporating confounding factors directly into the

behavior equation, so that we can eliminate all differences between traditional and suburban

neighborhoods that affect the behavior. The attention is directed to the behavior outcome (Oakes

and Johnson, 2006; Winship and Morgan, 1999). Empirically, the model used to estimate a

propensity score is a prediction model so it is not necessary to evaluate multicolinearity and

statistical significance of independent variables; interaction and polynomial terms are always

encouraged for propensity score estimation (Oakes and Johnson, 2006). However,

multicolinearity and statistical significance are important for a model aiming to explaining travel

behavior in the statistical control approach.

The sample selection model is essentially a generalized propensity score approach, although the

application of the former is earlier than that of the latter (Winship and Morgan, 1999). The

sample selection approach first estimates individuals’ prior selection into different types of

residential locations, and then model travel behavior as conditional on that prior selection

(Mokhtarian and Cao, 2008). The difference between the two approaches is that the sample

10
selection model requires a strong normality assumption and inserts a lambda (selection

correction factor) in the behavior equation whereas the model using propensity score as a

regressor inserts the estimated propensity score (predicted probability) in the behavior equation

(Winship and Morgan, 1999).

4. DATA AND VARIABLES


The data came from a self-administered survey mailed in late 2003 to residents of eight

neighborhoods in Northern California. The neighborhoods were selected to vary systematically

on neighborhood type, size of the metropolitan area, and region of the state. Neighborhood type

was differentiated as “traditional” for areas built mostly in the pre-World War II era, and

“suburban” for areas built more recently. This distinction reflects a significant change in design

characteristics for residential neighborhoods as the suburban boom took place following World

War II. Using the US Census, we screened potential neighborhoods to ensure that average

income and other characteristics were near the average for the region. The traditional

neighborhoods included Mountain View (Downtown), Sacramento (Midtown), Santa Rosa

(Junior College area), and Modesto (Central). The suburban neighborhoods were Sunnyvale (I-

280 area), Sacramento (Natomas area), Santa Rosa (Rincon Valley area), and Modesto (suburban

area). The four traditional neighborhoods differ in visible ways from the four suburban

neighborhoods – the layout of the street network, the age and style of the houses, and the

location and design of commercial centers (Figure 2). A selection of the objective accessibility

measures reveals distinct differences between traditional and suburban neighborhoods (Table 1).

Residents of traditional neighborhoods on average have two to four times more businesses within

400m and 1600m from home. In addition, the average distance to the nearest establishment of

any type for residents of traditional neighborhoods (247m) is less than half the distance for

11
suburban residents (557m), and residents of traditional neighborhoods are closer to every type of

establishment on average than suburban residents. Further, there are some differences in

accessibility among the four traditional (suburban) neighborhoods. However, in general, the

differences between traditional and suburban neighborhoods are much larger than the differences

within the same types of neighborhoods.

[Figure 2 and Table 1]

The original database consisted of 6,746 valid addresses (out of 8,000 addresses). 1,682 surveys

were returned and the response rate is about 25%. This response rate is considered quite good

for a survey of 14 pages, since the response rate for a survey administered to the general

population is typically 10-40% (Sommer and Sommer, 1997). A comparison of sample

characteristics to population characteristics, based on the 2000 U.S. Census (Table 2), shows that

survey respondents tend to be older than residents of their neighborhood as a whole, and that the

percent of households with children is lower for the sample for most neighborhoods. In addition,

median household income for survey respondents was higher than the census median for all but

one neighborhood, a typical result for voluntary self-administered surveys. However, since the

focus of our study is on explaining travel behavior as a function of other variables rather than on

describing the simple univariate distribution of the behavior per se, these differences are not

expected to materially affect the results (Babbie, 2007). This study also applies the propensity

score approach to control for self-selection bias.

[Table 2]

12
The dependent variables are walking to store frequency and strolling frequency. In the survey,

respondents were asked to report the frequency in the last 30 days they walked from their

residence to a local store or shopping area, and the frequency in the last 30 days they took a walk

or stroll around their neighborhood.

The independent variables are classified into three groups: residential preferences, travel

attitudes, and socio-demographics. Respondents were asked to indicate the importance of 34

attributes regarding their residence and neighborhood when/if they were looking for a new place

to live, on a four-point scale from 1 (“not at all important”) to 4 (“extremely important”). A

factor analysis reduced these items to six factors: accessibility, physical activity options, safety,

socializing, attractiveness, and outdoor spaciousness (Table 3). To measure attitudes regarding

travel, the survey asked respondents whether they agreed or disagreed with a series of 32

statements on a 5-point scale from 1 (“strongly disagree”) to 5 (“strongly agree”). Factor

analysis reduced these 32 items to six underlying dimensions: pro-bike/walk, pro-transit, pro-

travel, travel minimizing, car dependent, and safety of car (Table 3). Refer to Handy et al.

(Handy et al., 2006; Handy et al., 2004) for detailed discussion on both factor analyses. Finally,

the survey contained a list of socio-demographic variables including gender, age, employment

status, educational background, household income, household size, the number of children in the

household, and so on.

[Table 3]

13
5. RESULTS
Binary logistics regression in SPSS 15.0 was used to estimate the propensity score. The

inclusion or exclusion of a variable in the propensity score model is based on its relevance to

residential choice rather than its statistical significance (Rubin and Thomas, 1996). In fact,

scholars strongly oppose using statistical significance as a criterion (Luellen et al., 2005) (p.536).

Residents living in traditional neighborhoods differ from suburbanites in socio-demographics,

residential preferences, and travel attitudes. Hence, they are potential independent variables for

the model.

The procedure for developing the propensity score model is as follows. All socio-demographics,

residential preferences, and travel attitudes were allowed to enter the model. Based on quintiles

of the propensity score (as recommended by Rosenbaum and Rubin (Rosenbaum & Rubin,

1984)), respondents were classified into five strata. The next step is to examine if residents in

traditional neighborhoods do not differ (in terms of their characteristics) from suburbanites in the

same stratum. A two-way (2 treatments x 5 strata) ANOVA was adopted (Rosenbaum & Rubin,

1984). When the interaction effect and main effect of the treatment are insignificant at the 0.05

level, the variables are considered to be balanced. Otherwise, we need to adjust the propensity

score using a different model specification. Specifically, the unbalanced variable, its high-order

form (such as polynomial terms), and its interaction with other variables can enter the model

until the balance of all variables is achieved (Oakes & Johnson, 2006; Rosenbaum & Rubin,

1984). Note that some variables in this data contain missing values. Including all independent

variables in the model inevitably reduces effective sample size. So variables that are not

significantly different between traditional and suburban neighborhoods before (and after) the

stratification were removed from the model specification.

14
Table 4 presents the final propensity score model. Pseudo R-square of the model is 0.178, which

is considered typical for a model with balanced market shares and using disaggregate data with a

large sample size. Renters, the number of children under 18 years old, and number of adults are

significant in the model, with expected signs. The high-income are more likely to live in

traditional neighborhoods, which is consistent with the observed bivariate relationship shown in

Table 5. Residential preferences for socializing and attractiveness and the pro-bike/walk attitude

are positively associated with the choice of traditional neighborhoods, but those preferring

neighborhood safety and valuing the safety of car tend to live in suburban neighborhoods. When

evaluating whether variables were balanced with the original model, I found that residential

preference for outdoor spaciousness showed systematical differences. Several different model

specifications were tried and the inclusion of the preference for outdoor spaciousness, its

quadratic term, and its interaction with household size in the model balances the variable.

[Tables 4 and 5]

Table 5 compares variables between traditional and suburban neighborhoods before and after

propensity score stratification. Before the adjustment, the majority of socio-demographics differ

significantly, even at the 0.01 and 0.001 levels. Two residential preferences (safety and

spaciousness) and four travel attitudes (pro-bike/walk, pro-transit, safety of car, and car

dependent) are also different. After the adjustment, none of them are significant at the 0.05

level, as indicated by the small F-statistics of the interaction effect and main effect of the

treatment. Therefore, the propensity score stratification successfully balances the quintiles

15
simultaneously on these variables. The quintiles are the final five strata for stratification. Note,

although the strata was classified based on only the propensity score, the influences of socio-

demographics and attitudes have been incorporated into the propensity score (Table 4).

To figure out the ATE with a propensity score adjustment, we first calculate treatment effect for

each of five quintiles and then take a weighted average of these treatment effects (Rosenbaum

and Rubin, 1984). Table 6 presents the ATEs of neighborhood type on walking behavior.

Overall, the results suggest that residents living in traditional neighborhoods tend to walk more

than suburbanites. In particular, the ATE of neighborhood type on walking to store (utilitarian

walking) frequency is 1.86 times per month, which accounts for 61% (=1.86/3.05) of the

observed difference between residents in traditional and suburban neighborhoods. The causal

influence of neighborhood type on strolling (recreational walking) frequency is 2.05 trips per

month, which accounts for 86% (=2.05/2.38) of the observed difference. The difference in the

percentages shows that residential self-selection tends to have a stronger influence on utilitarian

walking than recreational walking. This makes sense since access to destinations is one of more

important factors influencing residential choices. To test the robustness of stratification, I

conducted propensity score matching using a caliper approach and a kernel approach in Limdep

9.0. As shown in the last two columns of Table 6, the differences among different approaches

are less than 10%. Therefore, propensity score stratification is considered to be reliable.

[Table 6]

16
As discussed earlier, there are some differences in accessibility measures among the four

traditional (and suburban) neighborhoods, although not substantial. A sensitivity analysis was

conducted to examine how pooling the four traditional (and suburban) neighborhoods influences

travel behavior outcomes. In particular, I re-ran the models eight times with each time leaving

out one neighborhood. Overall, for walking to store frequency, the results are fairly stable

(Table 7). With all neighborhoods in the model, the ATE accounts for 61% of the observed

influence of neighborhood type on walking frequency; for the remaining eight models, this share

ranges from 55% to 62%. With all neighborhoods in the model, the ATE accounts for 54% of

the mean frequency for the whole sample; for the eight models, this proportion ranges from 47%-

59%. For strolling frequency, the results show similar patterns although a range is somewhat

larger than that for walking frequency. Therefore, the results based on all neighborhoods have

magnitudes similar to those from the sensitivity analysis.

[Table 7]

6. CONCLUSIONS
This study applies propensity score stratification to determine the causal effect of neighborhood

type on walking behavior and its share in the observed influence of neighborhood type.

Although the approach can be used to estimate treatment effects, it is not a panacea for

addressing selection bias. First, although propensity score stratification can reduce 90% of

selection bias, it cannot fully eliminate the influence of residential self-selection on travel

behavior. Second, because the propensity score model assumes that all variables affecting

outcomes and treatment assignments are measured through observed characteristics, hidden bias

can be a potential concern (Rosenbaum and Rubin, 1983). If unmeasured characteristics (for

17
example, attitudes were not measured in most travel diary) are a source of self-selection, this

approach cannot compensate for that. In this study, we have measured attitudinal factors and

presumably hidden bias is not a major problem. Further, it is desirable to have two or more

treatments modeled. The overwhelming majority of propensity score applications involve only a

single treatment and one control. Recently, there are a few applications of multinomial logit

propensity score model and ordered propensity score model (Imai and van Dyk, 2004). In this

study, we chose a binary neighborhood type, which is a coarse measurement of the built

environment. People living in the same type of neighborhoods (and hence presumably received

the same treatment) are often exposed to different levels of treatments. Therefore, it is ideal to

use a composite measure, derived from various dimensions of the built environment, to classify

the environment. The pedestrian environment factor in Portland, OR and the transit

serviceability index in Montgomery County, MD can be potentially good measures although not

widely available.

Nevertheless, this study provides insightful evidence to understand the causal influence of the

built environment on walking behavior. First, the results show that if residential self-selection is

not controlled for, we are likely to overestimate (not to underestimate) the causal influence of the

built environment. In particular, the causal influence of neighborhood type on walking to store

frequency and strolling frequency will be overstated by 64% (=3.05/1.86-1) and 16%,

respectively. These weaken the observed connections between neighborhood type and walking

behavior, especially utilitarian walking. However, although both the built environment and self-

selection influence walking behavior, the former tends to play a more important role. For

walking to store frequency, the ATE of neighborhood type is 1.86 trips per month, which

18
accounts for 54% of the mean frequency for the whole sample. This considerable influence

provides a supportive evidence for the ability of changes in the built environment to stimulate

meaningful changes in walking behavior. However, given our cross-sectional sample, we only

tested a single direction of causality – attitudes influencing behavior – and not the converse.

Accordingly (in concert with other such studies), by not allowing for the possibility that both

travel choices and the chosen built environment are changing attitudes over time, we may be

overestimating the influence of attitudes (self-selection) on the built environment and travel

behavior, and hence underestimating the influence of the built environment on travel behavior.

Therefore, a longitudinal analysis is called for although it is time-consuming and costly.

ACKNOWLEDGEMENTS
The data collection was funded by the UC Davis-Caltrans Air Quality Project, the Robert Wood

Johnson Foundation, and the University of California Transportation Center. The survey was

designed by Susan Handy and Patricia Mokhtarian. Thank Michael Oakes for his help on

technical concepts. Comments from three anonymous referees have greatly improved the paper.

19
TABLE 1. Accessibility of Residents in Traditional vs. Suburban Neighborhoods

Junior College

Rincon Valley
Silicon Valley

Silicon Valley
Sacramento -

Sacramento -
Santa Rosa -

Santa Rosa -
- Sunnyvale
- Mountain
Traditional

Modesto -

Modesto -

nbhd type
Suburban

Suburban
Midtown

Natomas

p-value
Central
View
No. of business types
w/in…
400m 2.6 2.5 2.1 1.2 4.1 0.8 1.1 0.8 0.8 0.6 0.00
1600m 13.0 13.5 13.4 10.4 14.1 9.6 9.1 8.7 10.9 9.4 0.00

Minimum distance in
meters to…
Any business 247 284 235 298 192 557 462 581 502 704 0.00
Institutional 377 417 381 427 305 760 574 727 683 1087 0.00
Maintenance 380 351 408 478 317 819 873 851 663 898 0.00
Eat-out 526 587 438 816 349 789 794 955 696 740 0.00
Leisure 508 547 618 654 293 814 692 932 799 869 0.00
N 882 220 208 183 271 741 209 155 197 180
Note: accessibility were estimated for each respondent, based on distance along the street network from home to a variety of destinations classified
as institutional (bank, church, library, and post office), maintenance (grocery store and pharmacy), eating-out (bakery, pizza, ice cream, fast food,
and take-out), and leisure (health club, bookstore, bar, theater, and video rental). Commercial establishments were identified using on-line yellow
pages, and ArcGIS was used to calculate network distances between addresses for survey respondents and commercial establishments.

20
Table 2. Sample vs. Population Characteristics
Traditional Suburban

SR Junior College

SR Rincon Valley
Mountain View

MD Suburban
SC Midtown

SC Natomas
MD Central

Sunnyvale
Sample Characteristics
Number 228 215 184 271 217 165 220 182
Percent of females 47.3 54.3 56.3 58.2 46.9 50.9 50.9 54.9
Average auto ownership 1.80 1.63 1.59 1.50 1.79 1.66 1.88 1.68
Age 43.3 47.0 51.3 43.4 47.1 54.7 53.2 45.6
Average HH size 2.08 2.03 2.13 1.78 2.58 2.19 2.41 2.35
Percent of HHs w/kids 21.1 18.6 21.7 8.9 42.4 24.8 25.5 31.9
Percent of home owners 51.1 57.8 75.6 47.0 61.1 68.7 81.0 82.4
Median HH income (k$) 98.7 55.5 45.5 64.2 95.0 49.5 55.5 55.3
Population Characteristics
Age 36.1 36.3 36.5 42.7 35.9 38.3 38.1 31.7
Average HH size 2.08 2.21 2.46 1.79 2.66 2.48 2.51 2.57
Percent of HHs w/kids 19.3 20.3 32.9 12.4 35.3 35.4 34.2 41.7
Percent of home owners 34.3 31.2 58.8 34.3 53.2 63.5 61.4 55.2
Median HH income (k$) 74.3 40.2 42.5 43.8 88.4 49.6 40.2 46.2
Notes: SR = Santa Rosa, MD = Modesto, SC = Sacramento, HH = household

21
Table 3. Key Variables Loading on Residential Preference and Travel Attitude Factors

Factor Statement
Residential Preferences
Accessibility Easy access to a regional shopping mall (0.854); easy access to downtown (0.830);
other amenities such as a pool or a community center available nearby (0.667);
shopping areas within walking distance (0.652); easy access to the freeway (0.528);
good public transit service (bus or rail) (0.437)
Physical Good bicycle routes beyond the neighborhood (0.882); sidewalks throughout the
activity options neighborhood (0.707); parks and open spaces nearby (0.637); good public transit
service (bus or rail) (0.353)
Safety Quiet neighborhood (0.780); low crime rate within neighborhood (0.759); low level
of car traffic on neighborhood streets (0.752); safe neighborhood for walking (0.741);
safe neighborhood for kids to play outdoors (0.634); good street lighting (0.751)
Socializing Diverse neighbors in terms of ethnicity, race, and age (0.789); lots of people out and
about within the neighborhood (0.785); lots of interaction among neighbors (0.614);
economic level of neighbors similar to my level (0.476)
Attractiveness Attractive appearance of neighborhood (0.780); high level of upkeep in neighborhood
(0.723); variety in housing styles (0.680); big street trees (0.451)
Outdoor Large back yards (0.876); large front yards (0.858); lots of off-street parking (garages
spaciousness or driveways) (0.562); big street trees (0.404)
Travel Attitudes
Pro-bike/walk I like riding a bike (0.880); I prefer to bike rather than drive whenever possible
(0.865); biking can sometimes be easier for me than driving (0.818); I prefer to walk
rather than drive whenever possible (0.461); I like walking (0.400); walking can
sometimes be easier for me than driving (0.339)
Pro-transit I like taking transit (0.778); I prefer to take transit rather than drive whenever possible
(0.771); public transit can sometimes be easier for me than driving (0.757); I like
walking (0.363); walking can sometimes be easier for me than driving (0.344);
traveling by car is safer overall than riding a bicycle (0.338)
Pro-travel The trip to/from work is a useful transition between home and work (0.683); Travel
time is generally wasted time(-0.681); I use my trip to/from work productively
(0.616); The only good thing about traveling is arriving at your destination (-0.563); I
like driving (0.479)
Travel Fuel efficiency is an important factor for me in choosing a vehicle (0.679); I prefer to
minimizing organize my errands so that I make as few trips as possible (0.671); I often use the
telephone or the Internet to avoid having to travel somewhere (0.514); The price of
gasoline affects the choices I make about my daily travel (0.513); I try to limit my
driving to help improve air quality (0.458); Vehicles should be taxed on the basis of
the amount of pollution they produce (0.426); When I need to buy something, I
usually prefer to get it at the closest store possible (0.332)
Safety of car Traveling by car is safer overall than riding a bicycle (0.489); traveling by car is safer
overall than walking (0.753); traveling by car is safer overall than taking transit
(0.633); the region needs to build more highways to reduce traffic congestion (0.444);
the price of gasoline affects the choices I make about my daily travel (0.357)
Car dependent I need a car to do many of the things I like to do (0.612); getting to work without a
car is a hassle (0.524); we could manage pretty well with one fewer car than we have
(or with no car) (-0.418); traveling by car is safer overall than riding a bicycle
(0.402); I like driving (0.356)
Note: The numbers in parentheses are the pattern matrix loadings for the obliquely rotated factors.

22
TABLE 4. Binary Logit Model for Propensity Score
Coefficients p-value
Constant 0.726 0.036
Social-demographics
Renter 0.822 0.000
Income (k$) 0.006 0.002
Age -0.003 0.555
# adults in the household -0.412 0.000
# children (<18) in the household -0.404 0.000
Female 0.209 0.095
Neighborhood Preferences
Spaciousness -0.107 0.440
Spaciousness-square -0.136 0.010
Spaciousness x household size 0.071 0.200
Accessibility 0.061 0.455
Physical activity options -0.088 0.288
Safety -0.527 0.000
Socializing 0.239 0.001
Attractiveness 0.338 0.000
Travel Attitudes
Pro-bike/walk 0.314 0.000
Pro-travel -0.072 0.232
Travel minimizing -0.043 0.481
Pro-transit 0.091 0.175
Safety of car -0.456 0.000
Car dependent -0.090 0.153
N 1553
Log-likelihood at zero -1076.46
Log-likelihood at constant -1070.93
Log-likelihood at convergence -884.93
McFadden R-square 0.178
Suburban neighborhood is the reference category.

23
TABLE 5. Comparison of Covariates between Traditional and Suburban Neighborhoods before and after Stratification
Variables Traditional Suburban Treatment t- Treatment F- Interaction F-
neighborhood neighborhood statistics before statistics after statistics after
stratification a stratification b stratification c
Socio-demographics
Education background 4.280 (0.0450, 841)d 4.080 (0.0500, 709) 2.91** 0.02 0.06
Household income ($) 71508 (1259, 842) 68176 (1327, 711) 1.82 0.06 1.51
# cars 1.660 (0.0280, 842) 1.820 (0.0320, 711) -3.93*** 0.15 0.15
Age 45.0 (0.519, 842) 49.1 (0.558, 711) -5.36*** 0.03 0.95
Household size 2.020 (0.0360, 842) 2.470 (0.0500, 711) -7.37*** 1.88 2.35
# adult 1.725 (0.0233, 842) 1.900 (0.0297, 711) -7.84*** 1.36 1.07
# children (≤5) 0.120 (0.0140, 842) 0.180 (0.0190, 711) -2.87** 0.33 1.88
# children (≤12) 0.200 (0.0190, 842) 0.350 (0.0280, 711) -4.37*** 0.01 1.95
# children (<18) 0.290 (0.0240, 842) 0.570 (0.0350, 711) -6.49*** 0.74 1.76
Renter (dummy) 0.440 (0.0170, 842) 0.260 (0.0170, 711) 7.21*** 0.08 1.38
Female (dummy) 0.530 (0.0170, 842) 0.500 (0.0190, 711) 1.49 0.10 0.37
Worker (dummy) 0.840 (0.0130, 837) 0.790 (0.0150, 705) 2.45 * 0.35 0.83
Neighborhood Preferences
Accessibility -0.357 (0.0304, 842) -0.409 (0.0372, 711) 1.08 0.34 0.42
Physical activity options -0.306 (0.357, 842) -0.329 (0.0382, 711) 0.45 0.05 0.95
Safety 0.215 (0.0297, 842) 0.609 (0.0270, 711) -9.84*** 0.57 0.41
Socializing -0.199 (0.0368, 842) -0.294 (0.0412, 711) 1.71 0.40 0.43
Spaciousness -0.121 (0.0333, 842) 0.006 (0.0376, 711) -2.53* 0.20 2.14
Attractiveness 0.080 (0.0318, 842) 0.013 (0.0311, 711) 1.51 1.03 0.71
Travel Attitudes
Pro-bike/walk 0.216 (0.0359, 842) -0.219 (0.0334, 711) 8.86*** 0.08 0.47
Pro-travel -0.027 (0.0353, 842) 0.021 (0.0366, 711) -0.95 0.27 0.49
Travel minimizing 0.0166 (0.0343, 842) -0.018 (0.377, 711) 0.68 0.00 1.24
Pro-transit 0.146 (0.0353, 842) -0.171 (0.0355, 711) 6.33*** 0.01 0.15
Safety of car -0.276 (0.0346, 842) 0.288 (0.0332, 711) -11.75*** 2.52 2.04
Car dependent -0.059 (0.0358, 842) 0.076 (0.0351, 711) -2.70** 0.08 1.13
* 0.01<p<0.05 ** 0.001<p<0.01 *** p<0.001
a. t-statistic = independent sample t-statistic with Levene’s test for equality of variances.
b. F-statistic for main effect of neighborhood type after adjusting propensity score quintiles.
c. F-statistic for interaction effect between neighborhood type and propensity score quintile.
d. mean (standard error of mean, number of observations).

24
Table 6. Average Treatment Effects on Travel Behavior
Q1 Q2 Q3 Q4 Q5 ATE Mean a Caliper Kernel
Walk Suburban 1.25 1.71 2 2.36 3.79 1.81
N 219 213 155 87 34 708
Traditional 1.48 3.71 3.12 4.39 7.68 4.86
N 88 96 154 222 275 835
ATE 0.23 2 1.12 2.03 3.89 1.86 3.05 1.98 1.95
Stroll Suburban 7.27 7.24 8.37 7.7 6.88 7.53
N 218 214 155 87 34 708
Traditional 8.35 9.65 8.83 9.2 11.7 9.91
N 88 95 155 222 274 834
ATE 1.08 2.41 0.46 1.5 4.82 2.05 2.38 2.20 2.23
a. Mean outcomes without a propensity score adjustment
Note: Q1 is the stratum of observations whose propensity scores to live in traditional neighborhood do not exceed 20 percentile of the propensity
scores (people with the lowest propensity to live in traditional neighborhood); Q2 is the stratum of observations whose propensity scores are larger
than 20 percentile but do not exceed 40 percentile of the propensity scores; similarly for Q3, Q4, and Q5. Q5 includes people with the highest
propensity to live in traditional neighborhood.

Table 7. Sensitivity Analysis of Neighborhood Pooling


Models without One Neighborhood
All MV SV SR-urban SR-sub MD-urban MD-sub SC-urban SC-sub Min. Max.
Walk ATE 1.86 1.76 1.97 1.68 1.75 2.11 1.87 1.44 1.9 1.44 2.11
Observed 3.05 2.91 3.17 3.06 2.89 3.7 3.02 2.43 3.11 2.43 3.7
ATE/Observed 0.61 0.60 0.62 0.55 0.61 0.57 0.62 0.59 0.61 0.55 0.62
Sample Mean 3.46 3.17 3.66 3.26 3.70 3.60 3.70 2.91 3.63 2.91 3.70
ATE/Mean 0.54 0.56 0.54 0.52 0.47 0.59 0.50 0.50 0.52 0.47 0.59
Stroll ATE 2.05 2.25 1.89 1.73 2.14 2.45 2.06 1.46 2.18 1.46 2.45
Observed 2.38 2.6 2.6 2.17 2.43 2.79 2.43 1.91 2.08 1.91 2.79
ATE/Observed 0.86 0.87 0.73 0.80 0.88 0.88 0.85 0.76 1.05 0.73 1.05
Sample Mean 8.82 8.74 8.93 8.56 8.94 8.89 8.99 8.39 9.09 8.39 9.09
ATE/Mean 0.23 0.26 0.21 0.20 0.24 0.28 0.23 0.17 0.24 0.17 0.28
Notes: MV = Mountain View, SV = Sunnyvale, SR = Santa Rosa, MD = Modesto, SC = Sacramento

25
Figure 1. The Relationship between Self-Selection and Misestimation

Non-Walkable Walkable

Random ATE = μ2 – μ1
μ1 μ2

All Matched ATE1 = μ2’ – μ1’


μ1’ μ2’

All Mismatched ATE2 = μ2” – μ1”


μ1” μ2”

μ1, μ1’, and μ1” are observed mean walking behavior of people living in the non-walkable neighborhood;
μ2, μ2’, and μ2” are observed mean walking behavior of people living in the walkable neighborhood.

26
Figure 2. Comparison of Traditional and Suburban Neighborhoods (Sacramento)
Sacramento – Traditional Sacramento - Suburban
Street network

Houses

Commercial centers

27
REFERENCES
Babbie E R, 2007 The practice of social research (Thomson Wadsworth, Belmont, CA)
Boarnet M G, Day K, Anderson C, McMillan T, Alfonzo M, 2005, "California's safe routes to school
program - Impacts on walking, bicycling, and pedestrian safety" Journal of the American Planning
Association 71 301-317
Boarnet M G, Sarmiento S, 1998, "Can land-use policy really affect travel behaviour? A study of the link
between non-work travel and land-use characteristics" Urban Studies 35 1155-1169
Boer R, Zheng Y, Overton A, Ridgeway G K, Cohen D A, 2007, "Neighborhood Design and Walking
Trips in Ten U.S. Metropolitan Areas" American Journal of Preventive Medicine 32 298-304
Cao X, 2008, "Is Alternative Development Undersupplied? Examination of Residential Preferences and
Choices of Northern California Movers" Transportation Research Record: Journal of the Transportation
Research Board 2077 97-105
Cao X, Mokhtarian P L, Handy S L, 2009, "Examining the Impacts of Residential Self-Selection on
Travel Behaviour: A Focus on Empirical Findings" Transport Reviews 29 359-395
Chatman D G, 2009, "Residential choice, the built environment, and nonwork travel: evidence using new
data and methods" Environment and Planning A 41 1072-1089
Cochran W G, 1968, "The Effectiveness of Adjustment by Subclassification in Removing Bias in
Observational Studies" Biometrics 24 295-313
Crane R, 2000, "The Influence of Urban Form on Travel: An Interpretive Review" Journal of Planning
Literature 15 3-23
Ewing R, Bartholomew K, Winkelman S, Walters J, Chen D, 2008, "Growing Cooler: The Evidence on
Urban Development and Climate Change", (Urban Land Institute, Washington, DC)
Ewing R, Cervero R, 2001, "Travel and the Built Environment: A Synthesis" Transportation Research
Record: Journal of the Transportation Research Board 1780 87-114
Frank L D, Engelke P O, 2001, "The Built Environment and Human Activity Patterns: Exploring the
Impacts of Urban Form on Public Health" Journal of Planning Literature 16 202-218
Frank L D, Saelens B E, Powell K E, Chapman J E, 2007, "Stepping towards causation: Do built
environments or neighborhood and travel preferences explain physical activity, driving, and obesity?"
Social Science & Medicine 65 1898-1914
Handy S, 1996, "Methodologies for exploring the link between urban form and travel behavior"
Transportation Research Part D: Transport and Environment 1 151-165
Handy S, Cao X, Mokhtarian P L, 2006, "Self-Selection in the Relationship between the Built
Environment and Walking: Empirical Evidence from Northern California" Journal of the American
Planning Association 72 55 - 74
Handy S, Mokhtarian P, Buehler T J, Cao X, 2004, "Residential Location Choice and Travel Behavior:
Implications for Air Quality", (University of California, Davis)
Imai K, van Dyk D A, 2004, "Causal inference with general treatment regimes: Generalizing the
propensity score" Journal of the American Statistical Association 99 854-866
Khattak A J, Rodriguez D, 2005, "Travel behavior in neo-traditional neighborhood developments: A case
study in USA" Transportation Research Part a-Policy and Practice 39 481-500
Kitamura R, Mokhtarian P L, Daidet L, 1997, "A micro-analysis of land use and travel in five
neighborhoods in the San Francisco Bay Area" Transportation 24 125-158
Krizek K J, 2003, "Residential Relocation and Changes in Urban Travel: <i>Does Neighborhood-Scale
Urban Form Matter?</i>" Journal of the American Planning Association 69 265-281
Luellen J K, Shadish W R, Clark M H, 2005, "Propensity Scores: An Introduction and Experimental Test"
Evaluation Review 29 530-558
Mokhtarian P L, Cao X, 2008, "Examining the impacts of residential self-selection on travel behavior: A
focus on methodologies" Transportation Research Part B: Methodological 42 204-228

28
Oakes M J, Johnson P J, 2006, "Propensity score matching for social epidemiology", in Methods in
epidemiology Eds M J Oakes, J S Kaufman (John Wiley & Sons, Inc., New York)
Pinjari A, Pendyala R, Bhat C, Waddell P, 2007, "Modeling residential sorting effects to understand the
impact of the built environment on commute mode choice" Transportation 34 557-573
Rosenbaum P R, Rubin D B, 1983, "The Central Role of the Propensity Score in Observational Studies
for Causal Effects" Biometrika 70 41-55
Rosenbaum P R, Rubin D B, 1984, "Reducing Bias in Observational Studies Using Subclassification on
the Propensity Score" Journal of the American Statistical Association 79 516-524
Rubin D B, Thomas N, 1996, "Matching Using Estimated Propensity Scores: Relating Theory to
Practice" Biometrics 52 249-264
Salon D, 2006 Cars and the city: An investigation of transportation and residential location choices in
New York city, Agricultural and Resource Economics, University of California, Davis
Schwanen T, Mokhtarian P L, 2003, "Does dissonance between desired and current neighborhood type
affect individual travel behaviour? An empirical assessment from the San Francisco Bay Area", in
Proceedings of the European Transport Conference (ETC), Strasbourg, France
Schwanen T, Mokhtarian P L, 2004, "The extent and determinants of dissonance between actual and
preferred residential neighborhood type" Environment and Planning B-Planning & Design 31 759-784
Schwanen T, Mokhtarian P L, 2005, "What affects commute mode choice: neighborhood physical
structure or preferences toward neighborhoods?" Journal of Transport Geography 13 83-99
Sommer B B, Sommer R, 1997 A practical guide to behavioral research: tools and techniques (Oxford
University Press, New York)
Vance C, Hedel R, 2007, "The impact of urban form on automobile travel: disentangling causation from
correlation" Transportation 34 575-588
Walker J, Li J, 2007, "Latent lifestyle preferences and household location decisions" Journal of
Geographical Systems 9 77-101
Winship C, Morgan S L, 1999, "The estimation of causal effects from observational data" Annual Review
of Sociology 25 659-706
Zhou B, Kockelman K, 2008, "Self-Selection in Home Choice: Use of Treatment Effects in Evaluating
Relationship Between Built Environment and Travel Behavior" Transportation Research Record: Journal
of the Transportation Research Board 2077 54-61
Ziliak S T, McCloskey D N, 2004, "Size matters: the standard error of regressions in the American
Economic Review" Journal of Socio-Economics 33 527-546

29

You might also like