2012 - Estimating A Vehicle Ownership Model From Targeted Marketing Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

EstimatingaVehicleOwnershipModelonTargetedMarketingData

Gregory Macfarlane and Laurie Garrow


gregmacfarlane@gatech.edu, laurie.garrow@ce.gatech.edu
Abstract
Data used to estimate regional travel demand models are typically
collected in household travel surveys or by US Census Bureau. It
may be possible to estimate many models of travel behavior on data
available in targeted marketing records, which are sold by credit re-
porting agencies; recent political and budgetary situations renders
exploration of alternative data sources prudent. In this work, we es-
timate a vehicle ownership model, on a dataset assembled by joining
tm records representing the Atlanta region to the Georgia vehicle
registration database. We compare this model to that currently in-
cluded in the ARC model system in qualitative and quantitative
terms, and discuss obstacles associated with using TM records to
model urban travel behavior. We also present suggestions and op-
portunity for model improvement that targeted marketing data rep-
resent.
Atlanta Model
The Atlanta travel demand model [1] was updated in 2008, based on
work done for Honolulu by Ryan and Han [2]. The model as currently
implemented uses the following utility equations in a multinomial
logit model:
0 Cars =0
1 Car =1.992 + 0.619 Inc1 + 1.929 Inc2 + 2.205 Inc3
+ 1.869 Inc4 0.115 ln(Density)
+ 3.38 WSu Import + 1.27 OSu Import
2 Cars =3.314 0.450 Inc1 + 1.559 Inc2 + 3.085 Inc3
+ 3.582 Inc4 0.295 ln(Density)
+ 3.38 WSu Import + 1.27OSu Import
3 Cars =3.482 + 1.76 Inc1 + 0.352 Inc2 + 2.324 Inc3
+ 3.141 Inc4 0.803 ln(Density)
+ 3.38 WSu Import + 1.27 OSu Import
Where
Income is given by a set of dummy variables separated at $20, $50,
and $100 thousand dollars.
Density is the jobs and population of a zone divided by its acreage.
Suciency for workers is the number of household workers with a
car available under each alternative. Others suciency is the
number of residual cars. If there are no workers or no other
household members, these values are undened.
Importance is the relative importance of cars to accessibility in a
zone,
Import =

j
Sj/T
2
ijHWY

j
Sj/T
2
ijk
(1)
where Sj is the employment and population a zone, and Tijk
is the loaded network travel time from i to j by mode k.
Data
We joined TM records purchased from a credit reporting agency
to the Georgia motor vehicle registration database. The records
represent households in the Atlanta region in December 2010. The
full sample contains 423,717 records, records, from which we extract
an estimation sample of 25,000 records.
The sampling methodology species that the alternatives for this
model are 1, 2, or 3 or more vehicles, and that the sample is
not exogenous to the choice. Parameters in choice-based samples
are unbiased except for the alternative-specic constants, but this
bias has a known correction (McFadden, presented in [3]) which we
employ in this analysis.
The TM data used in this analysis do not distinguish workers
from non-workers. For this analysis, we use an imputed labor-
force participation variable. The imputation introduces some uncer-
tainty as to whether household members are workers or not.
Suciency
The suciency variable introduces a problem: for households with-
out any workers or without any other members, the associated suf-
ciency is undened. It cannot simply be zero because this would
imply that none of the workers had an available vehicle, when in fact
all of them do.
The chart below gives the frequencies of household vehicles by num-
ber of workers, with colors indicating the value of the suciency
variable for that alternative.
Sufficiency Values and Sample Frequencies
1 Car 2 Cars 3+ Cars Total
0 Workers 3513 2244 1335 7092
1 Worker 4145 3075 2002 9222
2 Workers 1309 2947 2717 6973
3 Workers 219 447 817 1483
4+ Workers 22 56 152 230
Suciency 1 2 3 NA
This variable denition causes 7092 records to be invalid. We can,
however, re-dene the variable as insuciency, or the number of
workers without a car available to them. This new variable takes the
values dened in the chart below.
Insufficiency Values and Sample Frequencies
1 Car 2 Cars 3+ Cars Total
0 Workers 3513 2244 1335 7092
1 Worker 4145 3075 2002 9222
2 Workers 1309 2947 2717 6973
3 Workers 219 447 817 1483
4+ Workers 22 56 152 230
Suciency 3 2 1 0
Estimation
Atlanta Estimated Work Only Insuciency Seniors Adults
2:(intercept) 1.32 0.22 (0.25) 0.20 (0.08) 0.39

(0.07) 0.39

(0.07) 0.39

(0.07)
3:(intercept) 1.49 0.67 (0.28) 1.04

(0.10) 1.26

(0.08) 1.27

(0.08) 1.28

(0.08)
WorkSu * Import 3.38 6.45

(0.89) 7.29

(0.23)
OtherSu * Import 1.27 2.22

(0.71)
WorkInsu*Import 8.12

(0.22)
OtherInsu*Import 4.91

(0.30)
Non-SeniorInsu*Import 8.14

(0.22)
SeniorInsu*Import 4.51

(0.31)
AdultInsu*Import 7.12

(0.19)
2:Lower-middle Income 0.08 0.50 (0.24) 0.17 (0.08) 0.33

(0.06) 0.32

(0.06) 0.33

(0.06)
3: 1.33 0.46 (0.23) 0.34

(0.10) 0.43

(0.08) 0.44

(0.08) 0.44

(0.08)
2:Upper-middle Income 2.16 0.74

(0.23) 0.44

(0.08) 0.52

(0.06) 0.52

(0.06) 0.54

(0.06)
3: 3.37 0.71

(0.23) 0.70

(0.10) 0.71

(0.07) 0.71

(0.07) 0.73

(0.07)
2:High Income 1.64 1.06

(0.25) 0.84

(0.09) 0.93

(0.07) 0.92

(0.07) 0.96

(0.07)
3: 0.49 1.06

(0.24) 1.14

(0.10) 1.16

(0.08) 1.17

(0.08) 1.21

(0.08)
2:log(Density) 0.18 0.18 (0.07) 0.18

(0.02) 0.18

(0.02) 0.18

(0.02) 0.19

(0.02)
3: 0.69 0.25

(0.08) 0.31

(0.02) 0.29

(0.02) 0.29

(0.02) 0.30

(0.02)
N 1996 17908 24010 24010 24010
LL(C) -2142.23 -19635.71 -26257.44 -26257.44 -26257.44
LL(

) NA -2081.14 -18487.29 -24563.95 -24554.64 -24606.89

2
C
NA 0.0285 0.0585 0.0645 0.0649 0.0629

2
(
2
) NA 122.17 2296.85 3386.98 3405.61 3301.11
p(
2
) NA 0.00 0.00 0.00 0.00 0.00
DF 10 10 9 10 10 9
Standard errors in parentheses

indicates signicance at p < 0.01


Discussion
Models are estimated using mlogit for R [4].
Atlanta These coecients have been normalized against the one-
vehicle alternative, and the income variables have also been
normalized against the low-income dummy variable.
Estimated The Atlanta ownership model specication estimated
on TM records. Many coecients have the same relationship
to each other, but income is less separating, and the coecients
on higher incomes take a dierent sign.
Insuciency This is the Atlanta specication with insuciency
replacing the suciency specication. We eliminate far fewer
records, allowing for a more ecient and predictive model.
Seniors Instead of comparing workers to non-workers, a compar-
ison between the availability of cars to seniors versus non-
seniors yields a signicant improvement to model t, with a
likelihood ratio test statistic p-value of 0.
Adults This model examines whether it is even necessary to dis-
criminate between adults of dierent types. In this case, either
a workers or a seniors specication is indeed signicantly pre-
ferred.
Recommendation
Replacing the automobile suciency variables with an in-
suciency specication may allow for a more ecient and
more predictive model. The fact that targeted marketing records
do not contain worker status explicitly is does not appear to be a
drawback, given than senior status is signicantly more predictive.
Further analysis into other variables available in TM records, such
as lifestyle or childrens ages, is warranted.
References
[1] Atlanta Regional Commission. The Travel Forecasting Model Set For the
Atlanta Region, 2008.
[2] James M. Ryan and Gregory Han. Vehicle-Ownership Model Using Family
Structure and Accessibility Application to Honolulu, Hawaii. Transportation
Research Record, 1676:110, 1999.
[3] Charles F. Manski and Steven R. Lerman. The Estimation of Choice Prob-
abilities from Choice Based Samples. Econometrica, 45(8):19771988, 1977.
[4] Yves Croissant. mlogit: Multinomial Logit Model, 2011.

You might also like