Information To U Se R S

INFORMATION TO U SE R S
This manuscript has been reproduced from the microfilm master. UMI films
the text directly from the original or copy submitted. Thus, som e thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.
The quality of this reproduction is dependent upon th e quality of the

copy subm itted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleedthrough, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by

sectioning the original, beginning at the upper left-hand comer and continuing
from left to right in equal sections with small overlaps.
Photographs included in the original manuscript have been reproduced

xerographically in this copy. Higher quality 6” x 9" black and white
photographic prints are available for any photographs or illustrations appearing
in this copy for an additional charge. Contact UMI directly to order.
Bell & Howell Information and Learning

300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA
800-521-0600
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Essays on Housing Markets:
Dynamics, Fundamentals, and Measurement
by
Christian Landers Redfearn
B.S. (Northwestern University) 1988

M.S. (University of California, Berkeley) 1992
A dissertation subm itted in partial satisfaction of the

requirements for the degree of
Doctor of Philosophy
in
Economics
in the
GRADUATE DIVISION
of the
UNIVERSITY OF CALIFORNIA, BERKELEY
Committee in charge:
Professor John M. Quigley, Chair
Professor Alan J. Auerbach
Professor Nancy E. Wallace
Spring 2000
UMI Number: 9979776
Copyright 2000 by
Redfearn, Christian Landers
All rights reserved.
UMI*
UMI Microform9979776
Copyright 2000 by Bell & Howell Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
Bell & Howell Information and Learning Company

300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Essays on Housing Markets:
Dynamics, Fundamentals, and Measurement
Copyright 2000
by
Christian Landers Redfearn
To my family, in gratitude for their confidence and patience— I relied on both when
my own wavered, and to the memories of Richard E. Phillips and Paul S. McCord
whose influence on my life became clear only after they were gone.
Contents
List o f Figures vi
List o f Tables vii
1 Transaction Costs, Price Discovery, and the Dynam ics of Owner-

O ccupied H ousing Prices 4
1.1 In tro d u c tio n ..................................................................................................... 4
1.2 Market Frictions & Housing Price D y n a m ic s .......................................... 6
1.3 A Model of Persistence in Housing P r ic e s ................................................. 10
1.4 The D a t a ........................................................................................................ 14
1.5 Do Housing Prices Follow a Random Walk? .......................................... 17
1.6 The Predictability of the Returns to H ousing.......................................... 24
1.7 Is th at a $20 Dollar Bill on the P o r c h ? .................................................... 29
1.8 C onclusion........................................................................................................ 36
2 The Com position o f M etropolitan Em ploym ent and the Correlation

o f H ousing Prices Across M etropolitan Areas 38
2.1 In tro d u c tio n ..................................................................................................... 39
2.2 Atlantic City and “Industrial Distance” .................................................... 41
2.3 Related R e s e a rc h ............................................................................................ 43
2.4 The D a t a ........................................................................................................ 46
2.5 Spatial Factors, Industrial Mix, and the
Returns to Owner-Occupied H o u s in g ........................................................ 50
2.6 Estim ation and R e su lts .................................................................................. 54
2.7 C onclusion......................................................................................................... 59
3 D o H ousing Transactions Provide M isleading Evidence

A bout the Course o f Housing Values? 60
3.1 In tro d u c tio n ...................................................................................................... 61
3.2 Repeat Sales and Sample Selectivity........................................................... 64
3.3 The Estimation P ro c e d u re ........................................................................... 66
3.4 The D a t a ......................................................................................................... 67
V
3.5 Sample Selectivity and House Prices ......................................................... 71

3.6 C onclusion........................................................................................................ 78
B ib lio g ra p h y 80
A S w ed ish D a ta 86
B T h e H y b r id M e th o d 91
C T im e - I n d e p e n d e n t S e le c tio n 95
vi
List of Figures
1.1 Actual and Predicted V a ria n c e s ........................................................ 20

1.2 Likelihood Functions - Stockholm and U p p sa la .............................. 22
1.3 Likelihood Functions - Malmo and Gothenburg ................................... 23
2.1 Housing Prices-New Jersey MSAs Relative to New Jersey S tate . . . 42

2.2 Housing Prices-Other MSAs Relative to New Jersey S ta te .......... 42
2.3 Industrial Distance-New Jersey MSAs Relative to Atlantic City . . . 43
2.4 Industrial Distance-Other MSAs Relative to Atlantic City ................ 44
3.1 Effects of Selectivity upon House Price Indexes - Stockholm ............ 77
vii
List of Tables
1.1 The Distribution of Paired S a le s .................................................. 16

1.2 Tests for a Random Walk in Housing P r i c e s ............................ 21
1.3 Forecasts of Aggregate Housing R e t u r n s .................................. 28
1.4 Forecasting Individual Housing R e tu rn s ...................................... 31
1.5 Transactions C o s t s ......................................................................... 32
1.6 Real Returns for Different Types of Home B u y e r s .................. 34
1.7 Excess Returns for Different Types of Home B u y e r s ............... 35
2.1 Descriptive S ta tis tic s ...................................................................... 49

2.2 Static Regression R e su lts................................................................ 56
2.3 Dynamic Regression R e s u lts ......................................................... 58
3.1 Frequency of Sales ........................................................................................ 70

3.2 Average Characteristics of Dwellings as a Function of Sales Frequency:
Regions I-IV ....................................................................................... 72
3.3 Average Characteristics of Dwellings as a Function of Sales Frequency:
Regions V - V I I I ................................................................................ 73
3.4 Estim ated Coefficients from Time-Invariant Probit Selection Model . 75
3.5 Implications of A lternate Models of Sample Selectivity on House Price
E s t i m a t e s .......................................................................................... 76
A .l Distribution of Housing Sales ..................................................................... 87

A.2 Average Housing C haracteristics................................................... 88
A.2 Average Housing Characteristics (c o n tin u e d )............................ 89
C .l Implications of the Poisson Selectivity Model on House Price Estimates 96
viii
Acknowledgements
Giving a blank page to a graduate student about to file his or her dissertation is
a dangerous thing. Gushing and earnest or professional and restrained, I have been
informed th a t anything that I write will seem, well, unseemly as soon as I reread it
after it has been set in microfiche for posterity. Seven years of work—uncountable
late nights, conferences, drafts, lost d ata and found errors—are represented by the
three chapters th a t begin on the next page, and yet somehow I’m certain at this
moment th a t I will be judged by future generations on the basis of an adjective on
this page. Who and how to thank for all the help I ’ve received in this endeavor? My
feeling now, and with less than an hour to file I guess for the record, is: who not to
thank? The time restraint may save me.
I’d like to begin by expressing my gratitude to my advisor, John Quigley. He
gambled on me when he helped an aimless operation research student with a single
undergraduate class in economics into the doctoral program a t Berkeley. He has been
an exceptional advisor and supported me throughout my tim e here. My committees,
both orals and dissertation, prepared me for virtually everything th a t was thrown
at me during the job search. Alan Auerbach, Aviv Nevo, Jam es Pierce, and Nancy
Wallace took my raw ideas and made them defensible. To those of you who have
stood in front of a room full of strangers, any one of whom could be waiting for an
opening to grind an ax, you know that confidence in your preparation is invaluable.
Their guidance made it possible for me to enjoy the job m arket and I am indebted
to them.
While none of the research th at I helped with when I visited the U.S. Census
is included in this document, the research methods th a t I picked up from Kathy
O ’Regan influence all of the empirical work herein. It is not clear whether I adopted
them or she imposed them, but it is clear th at I’ve benefited from her rigor and from
all of the advise she’s given over the years.
The d a ta from two of the three paper contained in this dissertation are Swedish
and I had the good fortune to spend two summers there doing research. It was my
first chance to live abroad and when I first began to gravitate towards the topics on
Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.
which this dissertation is based. Peter Englund supported my visit and has become a
friend and coauthor. His graduate students and others were terrific hosts and I look
forward to seeing them as often as I can. Thanks in particular to Per Asberg, Tommy
Berger, Robert Boije, and M ats Wilhelmsson.
I ’ve benefited from my weekly meetings with Kritina Lybecker, Patrick McCabe,
and Clara Wang. This dissertation/support group has become a personal institution
and I will miss it. I’d also like to thank Cathy M euter and Athena Carillo, for making
my life as a research assistant easier than it would have been without their help.
The rest of those to whom I owe thanks defy simple categories, they’ve helped
in too many ways to list. Thanks to my classmates Michael Ash, Dan Covitz, and
Brendan Cushing-Daniels, who with or without their knowledge became role models
for me. To my housemates, Helen Sillet, Paul Davies, Mark Glickman, and Ethan
Pollack, thank you for helping me have a life outside the classroom and office. To
Dave Vockell, Benedicte Callan, Gedge Knopf, and Chad Slawner, thanks for the
support. To Jennifer Gold, thank you for doing the impossible: I’d happily repeat
the last year. To my family, who very simply, I cannot thank enough.
1
Introduction
Despite their obvious significance to individual consumers and relevance within
the larger national economy, housing and the markets in which it trades are not well
understood. It is clear, for example, th at more of a typical consumer’s budget is al
located to housing than to any other good, but little is known about how consumers
form their valuations of individual dwellings. In this age of the m utual fund, investors
of every stripe understand the concept of a diversified portfolio. However, diversi
fication within a portfolio of real estate assets is typically implemented w ithout an
understanding of the economic fundamentals at work—as if space itself generated the
returns to real estate. Consider also th at until recently in the United States, the stock
of owner-occupied housing represented greater wealth than the capitalized value of
American stock markets. However, where reports on stock price levels and move
ments are difficult to avoid, accurate and systematic aggregate housing price indexes
are hard to find. T hat im portant questions remain unanswered is more a testam ent
to the challenges endemic to housing research than to scholarly inattention.
This dissertation represents an on-going effort to understand the function of hous
ing m arkets and to advance the tools used in housing research; it proceeds in three
parts. In the first chapter, the micro-foundations of housing markets are addressed.
Specifically, the dynamics of owner-occupied housing prices a t the level of the indi
vidual dwelling and the impact of these local phenomena on aggregate housing prices
are examined. At issue is the assumption that housing prices follow a random walk.
The empirical evidence overwhelmingly rejects this assumption in favor of slow mean-
reversion in prices. The model developed in this chapter suggests that the persistence
arises from a price discovery process that carries forward valuation errors embedded
in past sales.
C hapter 1 also addresses a question th at arises naturally in light of the result that
housing prices do not follow a random walk. T hat is, how can predictable movements
in housing prices, implied by their mean reversion, be sustainable? The chapter uses
routine bootstrap techniques to evaluate the profitability of a simple investment rule—
one th a t exploits the predictable component of housing returns. The results indicate
2
that, while returns are forecastable, the large transactions costs associated with home
ownership prevent profitable speculation in owner-occupied housing markets.
In the second chapter, the scale of the research widens in order to examine the be
havior of aggregate m etropolitan housing prices. Specifically, the presence of common
fundamentals is inferred from the correlation in housing price movements. Despite its
idiosyncrasies, housing markets operate under the laws of supply and dem and like any
other good. T h a t is, if metropolitan areas are viewed as small open economies, they
will face shocks to the supply and demand of common imports and exports—shocks
that may ultim ately influence housing markets in similar ways.
This chapter demonstrates th at the correlation of returns to residential housing
between two metropolitan areas is a function not only of their physical proximity,
but also of the similarity in their industrial composition. This implies th a t as local
economies evolve, so will the covariance of housing returns. It is common to see real
estate diversification based on the historical relationship between broad geographic
regions. The results presented in this chapter suggest th at the benefits derived from
diversification are maximized by considering the industry risk inherent in the current
metropolitan areas, not just the correlation of past returns.
Chapter 3 examines one of the issues central to the viability of system atic and
accurate housing price indexes in the United States. While hedonic analyses are sys
tematically impossible due to d a ta limitations, an alternative technique for measuring
aggregate prices may be possible: the repeat sales method, which estim ates price in
dexes on the basis of multiple sales of individual dwellings. These transactions may
be a nonrandom sample of the underlying population of dwellings and may preclude
the construction of price indexes th at truly reflect the evolution of housing prices in
the entire stock of housing. For example, it is widely thought that smaller “starter
homes" sell more frequently th an more expensive properties and th a t th e frequency
of transactions on high-valued properties varies over the business cycle. If rates of
appreciations vary across different types of housing, then the repeat sales m ethod may
introduce bias into aggregate indexes because price appreciation within the sample
of repeat-sale dwellings may differ from than of the rest of the housing stock.
The th ird chapter considers the importance of these selectivity issues in making
3
inferences about housing price movements. A model of housing price determination

is estim ated th at accounts for the nonrandom selection of observed transactions. The
factors affecting the probabilities that transactions on different houses will be observed
are also analyzed, and the effect of these factors upon housing prices is considered.
The analysis considers a variety of plausible selection models, using non-parametric
as well as parametric methods. For each of the alternatives, the estim ated effect of
selectivity upon housing price calculations is substantial.
4
Chapter 1
Transaction Costs, Price Discovery,

and th e Dynam ics of
Owner-Occupied Housing Prices
This paper examines the dynamics of owner-occupied housing prices both at the
level of the individual dwelling and in aggregate. Using a unique d ata set, a model
of individual dwelling prices is estimated; the model represents features of housing
markets more accurately than standard models of housing prices. Statistical tests
strongly reject the hypothesis th at individual housing prices follow a random walk
in favor of the alternative hypothesis that housing prices are mean reverting. This
result also holds in aggregate, offering an additional explanation for the “inertia”
commonly found in housing returns. Finally, the paper shows th at excess returns
are forecastable, but demonstrates, using realized returns to individual dwellings,
that the large transactions costs associated with home ownership prevent profitable
speculation in owner-occupied housing markets.
1.1 Introduction
The past decade of research on housing price dynamics has established th at
changes in housing prices exhibit “inertia.” T hat is, the returns to housing con
5
sistently have been shown to be serially correlated. The existing research is typically
aggregate in scope and focused on documenting rather th an explaining the observed
pattern of returns. In contrast, this paper addresses housing price dynamics at the
level of the individual dwelling and provides empirical evidence that transaction costs
play a significant role in restraining the m arket’s response to apparent excess returns
to housing.
The model of housing prices developed in this paper extends a standard pricing
model to more accurately represent market features specific to housing markets. It
incorporates a more general, and more appropriate, error structure of housing prices
at the level of the individual dwelling. As a result, the model is a generalization
of other widely-used methods of measuring aggregate housing prices. In particular,
the most common method—used extensively in academic and professional research
and reported by government agencies (e.g., the Office of Federal Housing Enterprise
Oversight [OFHEO])—is shown to be a special case. The model supports tests of
the assumptions implicit in the conventional models. Specifically, it supports a direct
test of the hypothesis th at individual dwelling prices follow a random walk against
the alternative hypothesis that prices are mean reverting.
The research design used in this analysis requires repeat sales of identical dwellings
to identify the structure of housing prices. The data are drawn from virtually every
arm ’s-length sale of residential housing in Sweden over a 13-year period—the raw
data total over 500,000 observations. Each observation consists of the date of sale,
sale price, and almost 30 variables describing the physical attributes of the dwelling.
Moreover, the sales are recorded so th at repeat sales of dwellings can be identified.
The empirical results clearly indicate th at, at the individual dwelling level, housing
prices do not follow a random walk. The variance of price appreciation is clearly am
increasing function of the time between sales, but the function is not linear, as implied
by a random walk. The high, but not perfect, serial correlation in the error structure
at the individual dwelling level implies th a t changes in aggregate housing prices are
serially correlated in a predictable way. The results suggest that shocks to prices
persist over several years, offering an explanation for the observed predictability in
aggregate housing prices.
6
This paper addresses the apparent profitability of predictable returns to housing

for four distinct types of economic agents in housing markets. Using repeat sales data
and standard bootstrap techniques, and accurately accounting for transactions and
holding costs, the value of an investment rule that exploits housing return predictabil
ity is evaluated. The results suggest th at institutional features of housing markets
preclude consistent excess returns. In fact, the simulations indicate th at economic
agents in housing markets compete away the excess profits to approximately the level
of the transactions costs of buying and selling housing for the typical participant. The
results should be generalizable to other housing markets, where the costs associated
with market entry and exit are similar.
Section 1.2 discusses housing market frictions and housing price dynamics. Sec
tion 1.3 develops a general model of housing prices. The d ata are described in Section
1.4. Section 1.5 presents tests for a random walk in housing prices against the alter
native of a mean reverting process. The link between individual pricing errors and
aggregate price movements, and tests for the predictability of aggregate returns, ap
pear in Section 1.6. The profitability of forecastable housing returns and the impact
of transaction costs on housing markets are addressed in Section 1.7. Section 1.8
concludes.
1.2 Market Frictions &; Housing Price Dynamics

Several characteristics of housing th a t distinguish it from other goods create fric
tions th at may lim it an agent’s ability to perceive or respond to predictable housing
returns. Housing is expensive—by far the largest single purchase a typical household
will make— requiring financing and its associated costs. Its inherent heterogeneity
and fixed location result in high search and relocation costs. Even in equilibrium,
these transaction costs can create a wedge between observed and fundam ental prices
because it is not beneficial for agents to engage in trade th at would otherwise re
move systematic excess returns. Given these frictions, observed sale prices are noisy
estimates of underlying fundamental prices.
Transaction costs are identified, if not measured, in the housing price literature
7
as examples of market frictions th a t could allow the persistence of correlated returns

to housing. However, other frictions have been examined. Malpezzi (1999) provides
empirical evidence th at regulation slows the speed of adjustment to changes in ag
gregate demand. Stein (1995) constructs a model in which liquidity constraints cause
correlated housing returns. Spiegel and Strange (1992) develop a model in which
principal-agent problems lead to credit constraints and result in correlated returns to
housing. Case and Shiller (1989) suggest that agents fail to incorporate predictable
interest rate movements into housing prices, resulting in forecastable returns. Gatzlaff
(1994) finds th at if expected inflation is controlled for, the predictability of housing
returns is reduced, but not eliminated.
Two papers address housing price dynamics at the level of the individual dwelling.
Hill, Sirmans, and Knight (1999), using a small d ata set from Baton Rouge, Louisiana,
are able to reject a random walk in housing prices by finding sample variances in
violation of those implied by a random walk. Kuo (1996) estimates an AR(2) error
process using several different methods, including a Bayesian estimation. Only the
Bayesian method yields consistent and statistically significant results. The economic
significance of the coefficients from these regressions is quite small—over a typical
interval between sales the implied autocorrelation is essentially zero.
In order for market participants to respond to pricing errors, two things are re
quired. First, an action must be profitable after accounting for transaction costs,
which are substantial in housing markets. Second, market participants m ust be able
to perceive the errors. Slow price discovery may make this difficult, and may be an
additional source of noise in housing prices. In response to the arrival of information,
new fundamental prices are “discovered” through the repetition of individual price
formation throughout the market. The speed of this process is certain to be a func
tion of the level of activity, or thickness, in the m arket because agents learn as they
view the prices at which others trade.
Even in financial markets, widely thought to be the most informationally effi
cient, there is evidence th at prices are established, not instantaneously, b u t slowly as
agents incorporate both news and the information embodied in other trades. Leach
and Madhavan (1993) and Leach and Madhavan (1992) suggest th at financial market
8
makers “facilitate price discovery” by setting prices to induce trade—recovering any

loss by making better pricing decisions in the future due to the information gained
by inducing trade. Romer (1993) argues th at the trading process itself is a cause of
movements in prices, characterizing the financial market (although the characteriza
tion applies equally well to housing markets) as “engaged in a many-dimensional and
a many-agent inference problem with multiple layers of uncertainty and heterogene
ity and with frictions in the trading process.” Romer continues, “As a result, market
prices are not related in any simple and mechanical way to news. Nonetheless, mar
ket participants are groping toward reasonable estimates of fundamentals, and price
movements, even when they are unrelated to outside news, generally represent im
provements in assessments of underlying fundamentals.” (p. 1129). Rarely discussed
in the housing literature, price discovery is likely to influence price dynamics in the
thinly traded housing market.
Price discovery in housing markets cannot occur as it does in financial or many
other markets. Unlike other consumption goods, housing transactions occur infre
quently. Most other goods can be transported to the market offering the highest
return; this is impossible in the case of housing. Housing’s fixed location implies that
even dwellings with identical physical structures may differ in price simply because the
price incorporates a complicated set of implied locational amenities and costs. Fur
thermore, the stock of housing is characterized by diversity—dwellings vary widely
across structural attributes, style, and vintage. In short, “comparison shopping” in
housing m arkets is different than it is in other goods markets; it is substantially more
difficult to determine the m arket price of a dwelling when every other previously sold
dwelling is necessarily an imperfect substitute.1
In practice, buyers and sellers estimate the fundamental price of a home by uti
lizing the information embodied in a set of previous sales. The usefulness of any
one of these sales as a reference depends on its similarity across physical, spatial,
and temporal dimensions. Inferences about the fundamental price of the home can
‘There is considerable research on observed price dispersion, even among goods close in char
acteristics to the “widget” found in hypothetical perfect markets. Varian (1980), in assessing the
sharp distinction between theoretical prices and those observed in real markets, notes that the law
of one price is “no law at all." See also Rothschild (1974), Reinganum (1979), and Williams (1995).
9
be drawn only imperfectly from the set of past sales, because dwellings differ struc
turally, enjoy different locational attributes, and are valued under different market
conditions as time passes. Because housing trades infrequently, the arrival of new
information about market values is slow. Indeed, from an informational standpoint,
the closest comparable sale across these three dimensions may be the last sale of the
same dwelling. If previously observed sales occur at prices th at deviate from their
fundamental values, the information set used to establish reservation prices and bids
will be contaminated with past errors and lead to new errors in price formation. In
this way pricing errors may persist.
The effort to uncover the fundamental value of a dwelling is further complicated
by the fact th at an observed sales price is not only a function of its fundamental value,
but also of unobserved buyer and seller characteristics (Quan and Quigley 1991). For
any given sale, all that is known is th at an offer was received th a t was at least as
large as the owner’s reservation price.
This discussion suggests that housing markets are characterized by a costly match
ing process: heterogeneous agents on both sides of a transaction involving a hetero
geneous good. The expensive and time-consuming search in which buyers and sellers
engage implies th at prices of individual dwellings are determined by a small num
ber of participants informed by noisy prices from previous sales. In thin markets,
such as the housing market, the fundamental price may never be realized. T hat is,
the observed sale price may deviate substantially from one th at would be obtained
if housing markets were frictionless. Furthermore, the deviation may persist for an
extended period of time.
The model developed in this paper addresses the price dynamics in housing mar
kets at the level of the individual dwelling by accounting for two sources of pricing
errors. The first is the error which occurs at the time of sale, and captures the noise
around the true market price as a result of unobserved buyer and seller characteristics.
The urgency to purchase on the part of the buyer and the holding costs incurred by
the seller are examples of factors th at could cause the price of a particular dwelling
to deviate from its fundamental price.
The second type of error captures persistence in housing returns at the level of
10
the individual dwelling. In a frictionless housing market, “innovations” would replace

“errors.” These would reflect new information about the m arket’s valuation of the
dwelling, and would be fully and permanently be incorporated into the dwelling’s
price. Housing markets are far from frictionless; pricing errors may persist because
the price discovery process is slow and markets fail to perceive them, or because
transaction costs may prevent the realization of apparent gains. This proposition is
addressed more formally in the next section.
1.3 A Model of Persistence in Housing Prices

Let the log sale price of dwelling i at time t be given by
Vit = P t + Q it + £it = P t + X itP + & t, (1 .1 )
where Vu is the log of the observed sales price of dwelling i at tim e £, Pt is the log
of aggregate housing prices. Qu is the log of housing quality. Housing quality is
parameterized by X u , the set of relevant dwelling attributes, and /?, a vector of coef
ficients from which implicit prices can be derived for each attribute. The stochastic
component is a composite error,
£it = Sit + Viti (L 2 )
reflecting the two sources of uncertainty in the model discussed above: that which
occurs at the time of sale,Vit, and that which persists over time, eit. vu is white-noise,
with mean 0 and variance . As discussed above, the persistence of pricing errors
reflects the process by which the housing market incorporates new information about
the market price of a dwelling. The arrival of new information in the form of other
dwelling sales will eventually eliminate the previous pricing error. We model this
persistence as an autocorrelated process:
- f //it, (L 3)
where fin is distributed with mean 0 and variance
11
If A < 1, the first two moments of £it axe finite and given by
= E [ £ i t + Vit] = £[A£i,t-l + Hit + Vit ] = ^ SE [ H t - s ] + E[Vit] = Oi (1*4)

4=0
and, using E[fMtVjT] = 0 V {i, j, t , r} and E[mtHjr] = 0 V {i ^ j, t ^ r},
e[te.)2J = E((e« + 0,,)“] = £ A‘E \£-.\ + Sfoi] = ^ £ A- + *1 = - S ^ r + <

4=0 4=0 V1 A >
(1.5)
Because housing sales are infrequent, the covariance is more usefully defined for
general intervals between sales, i.e.,
^[£iii£ir] = E [£it£i-r] ( 1.6 )
= E [ { £ it + V i t ) ( £ i r + Vir))
= E [e<t£ir] + E [VitVir]
= E (A £iT + ^ ' Ht-j)î + 0

j=0
= Al~TE 4
erf,
= At-T /
I — X2 J
In the following analysis, we employ a repeat sales model. We do so for two
reasons: the first is th at the error structure developed in this section is only identified
by multiple observations of sales on the same dwelling. The second is that we want
to test the null hypothesis that the autocorrelation coefficient A equals one. If this
is true, the derivation of the moments of the error structure presented above are not
meaningful, as the infinite series in equations (1.5) and (1.6) do not converge. If prices
follow a random walk, the unconditional variance of £u, and therefore of does not
exist. However, use of a repeat sales model solves this problem, as discussed below.
The repeat sales model is obtained by differencing the hedonic model, equation
( i.i )
Vu - Vu m X„0 + P , ~ X u l 3 - Pt + (it - (ir, (1-7)
If dwellings are unchanged between sales,2 the model simplifies to
V it-V ir^P t-P r+ Z it-Z ir- (1-8)

2This assumption is discussed below.
12
This can be estim ated with the regression
Vit — ViT = D itr + “ itr i ( 1 -9 )
where D itT is a m atrix of dummy variables indicating tim e of sale; D,tr takes -1 at
the time of the first sale, +1 at the second, and 0 elsewhere. Ej(r is the differenced
stochastic term s, -
Note th a t in this form, the unconditional variance of the stochastic term exists
even if the error process follows a random walk. For finite intervals, the variance of
the error term Eur is linear in the time between sales when A = 1. T hat is, for two
sales at t and r ,
E[(Eit)2|* - r < oo] = al(t - r ) . (1.10)
The covariance m atrix associated with the repeat sales regression, equation (1.9) is
block diagonal with each block associated with multiple repeat sales of an unchanged
dwelling. The general form of the covariance matrix is
£ [(= „.), (= * )] = (A- 5 - A- 1 - A'-» + A " ’ ) + ^ ( / „ - / „ - I r„ + / „ )

( 1.11)
The indicator variables, Ijk equal 1 if j — k and 0 otherwise. This general formulation
can be expressed more succinctly by considering the three types of elements found
the covariance matrix. The diagonal elements, the variances of each draw of X i u r
(where t — g and r = 7 ) are given by
v[(?« - 6 ,), (f« - &,)] = (1 - At_T) + 2< (1.12)
Where there are unchanged dwellings are sold three or more times there exist “adja
cent3 sales,” these are given by
- 6 r ), - M ] = ( A - r - A1’ ’ — 1 + AT-‘y) - < (1.13)
3Adjacent paired sales are those which share an individual sale. That is, if dwelling t sells three
times (t , r, 7 ), then the first paired sales are at times t and r, while the second results from the two
sales at t and 7 . They are adjacent in the sense that both pairs share the observation at time r.
13
Finally, where four or more sales occur, a third type of covariance element exists:
“non-adjacent” paired sales. In these cases, there is no common individual sale ob
servation in either paired sale. These elements are defined by4
V[(& - fir). (& - &,)) = (V-« - - A'-» + X’ - ’) (1.14)
It should be noted that, if indeed A does equal one, then the model developed above
collapses into the “weighted repeat sales” model proposed by Case and Shiller (1987)
and widely employed in academic research.5 The weighted repeat sales model uses
assumptions about the error structure in house prices to generate efficient parameter
4The derivation of the general form of the covariance matrix is straightforward. Consider
K a r [X ± y ] = Var[X} + Va r [ Y] ± 2 Va r [ X, Y] or Var [ X, Y) = | {Var [ X + Y] - Var[X\ - y a r [y ]).
Substituting - £,r for X and £jg - for Y , we get
^ a r [(£ it " £ i r i t j g ~ ? j-|f] = 2^ a r& ‘ - £*r Gj 9 ~ £ n ] — V a r [£»t ~ — V a r [ £ j g ~ fj-y ]) ■
By assumption, the covariance is 0 across units (i ^ j), and the stochastic terms have mean zero.
Therefore, only within-unit covariation is considered—the unit subscript is dropped below. Note that
the covariance matrix is block diagonal under the assumption that the errors are neither spatially
nor temporally correlated across dwellings. This may be a strong assumption, but routinely made.
See Goetzmann and Wachter (1995), for a discussion of this assumption. The general covariation
equation is solved in parts, using the elements of the covariance matrix, equations (1.5) and (1.6)
developed above.
V arfo - £r + - £7] = E[£t\ + £[£t] + £[£3] + £(£7] +

2 {-E[ZtSr] + EltiSg] - £ [ « - , ] - Eltrtg] + £ [ « , ] ~ S feS y])
-2
) +2(l^) (+ V " S " ^ "V_9+AT_7)

+ 2 0 ^ ^ — I t r + I t g ~ I t -1 ~ I r g + f r y ~ I g t ^
Varfr-frl = B[£f | + E | g ] - 2E[{l£ , ] . T^ + i t o ; - l ^ l
Var[i s - t , | = £ [{ J) + C | { 5 | - 2 £ | { j{,] = i ^ L + 2 » ; - ^ P
5To see this, reconsider the structure of the repeat sales model developed above. Specifically let
A = 1 in the repeat sales regression, that is
V \t — V \r = Pt — Pt + S t ~~ £r + V i t ~ V i r (1-15)
14
estim ates of the effect of time on aggregate housing price levels. Case and Shiller
argue th a t there is a drift in housing “value,” that it follows a Gaussian random
walk, and th a t the variance of housing prices is therefore linear in the time between
sales. The weighted repeat sales procedure, as typically implemented, makes this
explicit assumption about the form of heteroskedasticity in addition to an implied
assumption concerning the covariance between any two paired sales. Specifically, it
is assumed th a t the covariances are zero everywhere. Neither of these assumptions
has been tested formally. However, because the “weighted repeat sales” method is a
special case of the more general model developed above, a joint test of its maintained
hypotheses can be developed by testing the null hypothesis th at A = 1 against A ^ 1.
1.4 The Data

The d ata utilized in this paper have been compiled by Statistics Sweden and
consist of all arms-length sales of single family housing in Sweden from 1981:1 to
1993:111. The d ata are unique both in their breadth— each housing sale in Sweden
is recorded—and in their detail, with an extensive array of physical characteristics
reported. The research reported below employs a subset of the data, focusing on those
four of the eight administrative regions in which these d a ta are recorded and which
are primarily “urban.” These four metropolitan regions contain each of the major
Swedish urban areas; they are referred to by the most populous city in each region.
t- r - l
= Pt - P T + (At_T - l)e T + ^ + Vit ~ T)ir-
t —T — 1
= Pi - Pr + # i t - * + ’t o “ t o r -
<=0
This equation is the same regression employed, either directly and indirectly, in hundreds of academic
papers and is the same technique utilized by the Office of Federal Housing Enterprise Oversight
(OFHEO) in the construction of their aggregate price indexes. The variance structure of the weighted
repeat sales method can also be derived from the more general model developed above. Examine
the definition of the diagonal elements of the covariance matrix, (1.12). If A is 1, both become
- r) + 2cr*—exactly equal to the theoretical variance which is the basis for the weighted repeat
sales method.
15
The metropolitan regions are Gothenburg, Malmo, Stockholm, and Uppsala.6 The
data are discussed in detail in Appendix A, but several features of the data central
to this analysis warrant discussion here.
First, the detailed set of characteristics that describe the physical structure make
it possible to verify the central assumption of repeat sales models of housing prices,
which is th a t twice-sold dwellings are identical in attributes at each time of sale.
In practice, this assumption is difficult to verify, but the attributes reported in the
data set make this possible. The data contain not only primary characteristics such
as lot size and living area, but also additional variables th a t offer a comprehensive
description of the dwelling, including the number of garages, kitchen quality, type
of wall, roof, and floor, and the presence of amenities such as sauna, fireplace, and
furnished basement. The list is extensive—in excess of thirty housing characteristics—
and enables the detection of even minor structural changes th at would invalidate the
assumption of constant quality.7 The research presented in this paper uses only those
repeat sales of homes which are verified to remain unchanged across the measured
characteristics.
Second, the population of dwellings is a panel so th a t units that sold more than
once can be distinguished from those th at did not. The panel nature of the data
identifies the appropriate specification of the model developed in Section 1.3.
Table 1.1 illustrates the extent to which multiple (unchanged) paired sales are
observed during the sample period for each of the four metropolitan regions. It shows
that while the large majority of repeat sale dwellings sold twice during the sample
period, a significant minority were sold three or more times. Table 1.1 also presents
the average time interval between sales, and the mean and standard deviation of
nominal price appreciation for each of the sale frequencies. The relationship between
6I am warned by a former resident of Norrkoping, one of several prominent cities in the “East
Middle Sweden” region, that I will offend someone regardless of which city I choose as the reference.
I use population as an objective standard for size, and do so because Uppsala is a less clumsy
reference than “East Middle Sweden.”
7Englund, Quigley, and Redfearn (1999a) examine in detail the validity of the constant quality
assumption. They find that it does not hold in general, and that failure to exclude altered dwellings
can lead to substantial bias in measured aggregate housing prices. Approximately 40% of all paired
sales in the raw data had to be removed as a result of measured change in the physical attributes of
the dwelling between sales.
16
Table 1.1: The Distribution of Paired Sales

Average Standard
Number Interval Deviation
of Between Mean Price of Price
Paired Frequency Sales Appreciation Appreciation
Sales (percent) (quarters) (percent) (percent)
A. Stockholm
(7677 observations)
1 87.82 12.81 25.26 32.01
2 10.51 10.70 22.04 28.49
3 1.45 9.06 19.67 27.21
4 0.18 7.54 15.84 21.23
5 0.03 6.40 14.65 17.52
6 0.01 3.83 5.48 11.27
B. Uppsala
(11023 observations)
1 86.44 12.77 18.31 23.75
2 11.70 10.41 15.44 20.71
3 1.38 8.32 13.71 18.97
4 0.31 6.79 12.27 15.77
5 0.11 6.38 11.05 14.90
6 0.05 4.22 8.13 12.73
7 0.01 5.71 11.36 10.26
C. Malmo
1 87.39 12.07 19.65 27.04
2 11.34 9.97 17.06 22.99
3 1.10 8.47 14.93 21.60
4 0.14 7.93 11.60 17.77
5 0.03 5.00 6.76 11.53
D. Gothenburg
1 86.95 11.84 19.53 25.84
2 11.40 9.81 15.97 21.56
3 1.47 7.89 12.55 18.45
4 0.16 6.06 8.27 16.90
5 0.01 4.60 2.74 7.12
6 0.01 5.00 10.32 21.43
17
the interval of tim e separating two sales and the variance of prices is one of the central
empirical questions addressed in this paper and will be revisited below.
The table shows that time between sales, the mean price appreciation, and the
variance of housing price appreciation exhibit strong tendencies to decline as the
number of sales increases. Of course, the average time between sales declines with
the number of sales, since the sample period is fixed. The majority of the sales dur
ing the sample period are drawn from a period of rising nominal housing prices, so
it is not surprising th at total appreciation is higher when the average time between
sales is longer. However, the standard deviation of price appreciation shows consis
tent reduction in the volatility in prices between sales as the interval between sales
decreases. This is consistent with the hypothesis th a t heteroskadasticity in housing
price appreciation is a function of the time interval between sales of the same dwelling.
The table also indicates that while the distribution of paired sales and their asso
ciated time intervals are similar between Stockholm and the other regions, both the
mean and standard deviation of price appreciation is higher in Stockholm th an in the
other three regions, suggesting th at price appreciation is not uniform across regions.
1.5 Do Housing Prices Follow a Random Walk?

The model presented above supports an explicit test of the hypothesis th a t indi
vidual housing prices follow a random walk against the alternative hypothesis that
they follow a mean reverting process. The test is implemented simply by estim ating
the model developed in Section 1.3 using generalized least squares over a range of val
ues for the autocorrelation coefficient, A. For each A the concentrated log-likelihood
is calculated using
logL = —n * (e i b ® -1 e gi») ~ < M * |, (1.16)
where n is the num ber of observations, e gu is the vector of residuals from a generalized
least squares regression using 4* as the estimated covariance matrix described by
(1.11). Well-defined probabilistic statem ents about A can be made using likelihood-
ratio tests.
18
As discussed in Section 1.3, the covariance m atrix predicted by (1.11) is block

diagonal with the dimension of the blocks determined by the number of sales of an
unchanged dwelling. The elements of the covariance m atrix depend on th e time
between sales, the value of the autocorrelation coefficient, and the variances of the
pricing errors defined in (1.1), a* and cr*. Consistent estim ates of these error variances
are obtained by estimating the following stacked regression,
2a* + h ( t,r, <7, 1 )0 % if t=g, r = 7 (diagonal elements),
eureigy = -la * + h {t,r,g , y)a* if r = g, (off-diagonal elements),
0a* + h(t,T ,g, ~f)a* otherwise (off-off diagonal elements).

(1.17)
The errors, e^r and are obtained from the appropriate elements of the outer prod
uct e • e', where e is the vector of residuals from a first-step regression. h(t, r, g , 7 ) is
an element of the covariance matrix, given by ( 1.11), defining the expected covariance
between two paired sales of the same dwelling.
This regression differs from the typical implementation of conventional repeat
sales models. The “weighted repeat sales” method imposes two assumptions on the
form of the covariance matrix: housing prices follow a random walk, and the covari
ance between any two paired sales is zero. The covariance matrix is, under these
assumptions, diagonal with the variance for each paired sale defined as a linear func
tion of the time interval between sales. The model developed above predicts nonzero
off-diagonal elements and improves identification of the error variances by including
them in the stacked regression.
The estim ation procedure outlined above estimates the log-likelihood function as
a function of the serial correlation coefficient, A. Obtaining Amai, the value of the
correlation coefficient that maximizes the likelihood function, equation (1.16), follows
directly. Figure 1.1 illustrates the nature of the maximization problem.
The upper panel of Figure 1.1 shows the sample error variances and the relative
19
frequency of observations by elapsed time between sales. Two things are clear from
this panel. First, the error variances axe an increasing function of the tim e between
sales. Second, the large majority of observations are paired sales which sell is less
than six years. The large sample variances observed in the longer intervals axe poorly
estimated as a result of the small number of observations.
The covariance matrix described in (1.11) defines the predicted variance of the
price appreciation as a function of two variables: the elapsed time between sales
and the correlation of the errors. This relationship is illustrated in the lower panel
of Figure 1.1. It shows the sample error variances from Gothenburg superimposed
against lines m apping specific values of A and time between sales into error variances
predicted by th e covariance m atrix developed above, equation (1.12).
Clearly, the three lines with positive serial correlation fit the data fax b etter than
does the line implied by no serial correlation. In this case, where A = 0, the variance
is homogeneous in the time interval between sales—a condition clearly violated by
the sample variances shown in Figure 1.1. The line associated with the random walk
increases linearly with the elapsed time between sales, as it would if the housing
market perm anently incorporated pricing errors.
The two lines representing intermediate values of serial correlation increase asymp
totically to a level given by the standard definition of the (unconditional) variance of
an autocorrelated process. T hat is, if the errors follow (1.3),
£it = ASj)t_i + f t t ,
where, fin is distributed with m ean zero and variance a and where |A| < 1, the
unconditional variance of £ is
s i t e .) ’] = J ^ r y
This implies th a t for any paired sale, the unconditional variance of price appreciation
is a increasing function of the correlation coefficient. An autocorrelated process with
correlation coefficient A > 0 is consistent with pricing errors for a particular dwelling,
persisting as a function of the arrival and incorporation of new information about
housing prices.
20
Figure 1.1: Actual and Predicted Variances

Sam ple Variance a s a Function of Time B etw een S a le s - Gothenburg
0.25
-4— S am p le variance
O relative frequency of observations
0.2
OO
0.15 Oo
Variance
0.1
0.05
°°O o o 00 I \ \ \
--------- 1----- Qq& qq n o o p —&
Time B etw een S a le s
Predicted Variance a s a Function of Time B etw een S a le s and Correlation Coefficient

0.25
+ S am p le Variance
■• • • lambda = 0.00
lam bda = 0.85
— lam bda = 0.95
0.2
— lam bda = 1.00
0.15
Variance
0.05
Time B etw een S a le s
21
Table 1.2: Tests for a Random Walk in Housing Prices
Stockholm Uppsala Malmo Gothenburg
Observations 7767 11203 10720 11460
^max 0.860 0.930 0.912 0.912
Likelihood Ratio (x2) tests:

Ho : A = 1 1340.1 743.5 903.9 1155.5
£
■<
o
513.2 1977.7 1576.8 2053.7

II
X.005 — 7.88, x.oio — 6-63, x.oso — 3.84
Results of the estimation procedure for the four m etropolitan regions are plotted
in Figures 1.2 and 1.3. The results offer strong support for positive serial correlation,
with the maximum value of the likelihood function occurring at A approximately
equal to 0.9 in all four regions. The figures also indicate th at neither the random
walk hypothesis, imposed by the weighted repeat sales method, nor the case of no
serial correlation, is supported by the data.
Formal tests of these conclusions are reported in Table 1.2. In all four regions,
both null hypotheses, H0 : A = 0 and H0 : A = 1, are rejected at the one percent level
in favor A = Amax, the serial correlation coefficient th at maximized the log-likelihood
function. These results indicate th at the housing market removes pricing errors quite
slowly. A quarterly serial correlation coefficient of 0.9 implies th at if a dwelling is
sold with a pricing error, the market will eliminate only one-third of the error over
the course of year. That is, if a dwelling sells twice, with a year between sales, the
expected sales price at the second sale will be different than its “fundamental” price
by two-thirds of the previous pricing error. After two years just under half of the
error will remain. The initial error will not fade to less th an ten percent of its initial
value for almost six years.
22
Figure 1.2: Likelihood Functions - Stockholm and Uppsala

Maximum Likelihood Estimation of Serial Correlation Coefficient - Stockholm
-4 .9 2
-4.93
-4.94
Value of Log-Likelihood
-4.95
-4.96
-4.97
-4.98
-4.99
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Serial Correlation Coefficient (lambda)
Maximum Likelihood Estimation of Serial Correlation Coefficient - Uppsala

-7.44
-7.46
-7.48
-7.5
-7.52
-7.54
-7.56
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
23
Figure 1.3: Likelihood Functions - Malmo and Gothenburg

Maximum Likelihood Estimation of Serial Correlation Coefficient - Malmo
-7 .5 7
-7.58
-7.59
■7.6
-7.61
-7.62
-7.63
-7.64
-7.65
-7.66
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Maximum Likelihood Estimation of Serial Correlation Coefficient - Gothenburg

-7.8
-7.82
-7.84
-7.86
-7.88
-7.9
-7.92
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
24
1.6 The Predictability of the Returns to Housing

Because the large majority of previous literature on housing price dynamics is
based on aggregate measures of housing prices, it is instructive to establish a link be
tween the findings of existing research and the results of the last section. This is done
in two ways. First, the relationship between persistence in individual dwelling prices
and aggregate housing prices is demonstrated. Then, following previous research, the
predictability of three aggregate return series is formally tested.
It is straightforward to compute the effect of pricing errors at the individual
dwelling level upon aggregate housing prices. Consider an economy of identical and
unchanging dwellings. Let housing quality be normalized to one so that the log of
quality, XuPt, is zero for all i and t. Let an estim ate of the housing price level be the
mean of the sold dwellings in each period. W ithout dwelling heterogeneity there is
no need to control for quality. T hat is, let
P, = n-'Y,V«
i
= n-' ('£P<+'Z
\ i i
c<>+Y.1‘) ■
i f
(118)
The true housing price index, Pt, is a constant and the mean of the white-noise errors,
r/it , is zero. Denote the average of the the autocorrelated errors as
(1.19)
i
The estim ator of aggregate prices becomes
Pt = P t + ? f (1 -2 0 )
The unconditional expectation of Pt is unbiased, since E [et] = 0. However, in the

case of an an aggregate shock to prices one period earlier, = e > 0, the aggregate
price estim ator is no longer unbiased. The mismeasurement is
Pt = Pt + £t — Pt + Ast-i = Pt -h Ae. (1.21)
Moreover, aggregate pricing errors will lead to correlation in aggregate returns to

housing for many periods if, as indicated above, A is close to one.
25
In order to test this proposition, three returns series axe calculated and their
predictability is examined using the data described above. The first series considered
is the nominal return due to capital appreciation in housing prices. Price appreciation,
7Tt, is simply
ir. = 1.-p - ~ (122)
I (-1
where Pt is the index of aggregate housing prices.
The second series is real returns, which includes not only the return due to price
appreciation but also the “dividend” from the implicit rent paid to owner-occupiers.
This total return is discounted by the change in cost of living index (less shelter).
The real return at time £, rt, is given by
_ ^ + Rt _ C P It /, 93)
‘ Pt- 1 C P I t- X’ ( }
where R t and C P It are, respectively, the implicit rent and cost of living indexes at
time t.
The third series is the excess return to home ownership. This series, denoted by
ER u measures the nominal returns to housing in excess of the home owner’s cost of
housing capital, or
£ f l , = ^ - ^ - l - [ ( l - a „ ) ( i + a p) + i l , (1.24)
r t -1
where a y and a p are taxes on marginal income and property, respectively. 5 is the rate
of depreciation, and i is the mortgage rate. The bracketed term is the home owner’s
cost of capital, and reflects the tax advantaged status of housing (Poterba 1984). Both
the interest costs and property taxes are deductible from pre-tax income. Note th at
the user cost of capital does not include an adjustm ent for risk. This is significant
for the interpretation of the results and is discussed below.
The aggregate housing price index employed in this section isconstructed using
the hybrid model developed in a companion paper, Englund, Quigley,and Redfearn
(1998), and reviewed in Appendix B. The hybrid method uses all available sales
information to construct the aggregate price index. Not only are the large majority
of housing transactions during the sample period single sales, many of those which
26
do sell more than once are altered between sales and, for the purposes of measuring
aggregate prices, are no longer considered repeat sales. The usable data sets differ by
approximately a factor of five for the Swedish data used in this paper.8
Moreover, the appropriate approach for testing the predictability of returns is
to approximate, as closely as possible, the information set available to the investor
at each point in time. The small sample problems inherent in repeat sales models
are exacerbated over short intervals.9 In order to avoid this problem, typically the
aggregate housing price index is estim ated for the entire sample period and it is
assumed that investors observe prices only up to the time of their decision. However,
this approach does not address the fact th at the indexes are conditional on all repeat
sales over the entire sample period. This means th at when forecasting changes forward
from time t, the estim ated price index is calculated using sales information from t + 1
on. The repeat sales indexes used to evaluate the forecastability of housing returns
are, in fact, not the indexes available to agents at time t.
If there were sufficient observations, repeat sales indexes, and the returns series
dependent on them, could be estimated at each point in time. In general, the small
fraction of sales th at are repeats precludes period-by-period estimation. Use of the
hybrid method avoids this problem. It is possible, using the Swedish data described
above, to estimate an aggregate price index and each of the returns defined above at
each point in time, accurately reflecting the information available when the housing
investment decision is made.
Calculation of both real and excess return series requires an estimate of the implicit
rent th at accrues to owners who live in their home. The proxy used in this research is
obtained by assuming th a t the implicit rent is equated to the real interest rate times
the value of the home.10 The marginal income tax rate is assumed to be 40 percent,
8Sample selectivity is also a problem in the repeat sales data sets. Gatzlaff and Haurin (1997),
GatzlafF and Haurin (1998), and Englund, Quigley, and Redfearn (1999a) find that the sample of
sold dwellings are not a random sample of all dwellings and that the appreciation rates o f those
homes selected into the observed sample is not reflective of the stock as a whole. Biased estimates
of the aggregate price indexes result if the selection process is not controlled for.
9This problem is discussed and measured in Englund, Quigley, and Redfearn (1999b). They find
the confidence intervals around measured prices are significantly larger for the indexes constructed
using the repeat sales method when compared with those of the hybrid method.
10Previous research has employed local rent indexes as a proxy for the implicit rents, but this
27
and the property tax is 1.5 percent of an assessment of the dwelling, which is taken
as 75 percent of the market price. The proxy for mortgage interest rate is the rate
on the six-month government notes minus two percent, reflecting the normal spread
between the two instruments.
After constructing the three series described above, their forecastability is tested
by estimating an AR(4) process for each. For example, the regression for price ap
preciation is given by
7Tt = K + /?l7T(_ i + 0 2 ^ t - 2 + 0 3 ^ t - 3 + 0 4 ^ 1 -4 + (1-25)
where u t is white noise. Regressions for real and excess returns are analogous. Fore
castability is accepted if the set of coefficients, /3, are jointly non-zero.
Table 1.3 shows the results of the regressions for each of the return series of the four
metropolitan regions. The table indicates that, with the exception of Gothenburg, all
three return series are “predictable." In Stockholm, Uppsala, and Malmo, the models
explain between twenty-two and forty-five percent of the variance in returns. The
models use 39 quarters of data and estimate four coefficients, implying F-statistics of
4.02, 2.69, and 2.14 for one-, five-, and ten-percent levels of significance, respectively.
The null hypothesis th a t the set of coefficients on the lagged returns is zero is rejected
for all three series in Stockholm, Uppsala, and Malmo.
The results presented in this section suggest th at persistence at the individual
dwelling level can cause persistence at the aggregate level. While Case and Shiller
(1990) argue the predictability in housing returns are caused by the failure of housing
markets to incorporate predictable interest rates over the course of their sample pe
riod, the simple model of aggregate prices developed above indicates th a t predictable
changes in aggregate housing prices may arise because housing m arkets only slowly
incorporate information about shocks. The empirical results presented above are
consistent with previous research—housing returns are predictable.
is less appropriate in the Swedish context. The rental market is highly regulated, and rents are
based on construction costs rather than on supply and demand in spot markets. Moreover, at the
regulated price, rental markets do not clear: there are queues for apartments in Stockholm.
28
Table 1.3: Forecasts of Aggregate Housing Returns

_____________ Stockholm Uppsala Malmo Gothenburg
A. Dependent Variable: Price appreciation, tt*
R2 0.42 0.40 0.26 0.15
F-statistic 6.93 6.61 3.44 1.76
Intercept 0.002 0.002 0.001 0.002
(0.37) (0.37) (0.09) (0.34)
nt - 1 0.222 -0.141 -0.028 0.007
(1.41) (1.00) (0.18) (0.04)
7Tt_2 0.324 0.286 0.277 0.131
(1.99) (2.04) (1.79) (0.81)
7Te_3 0.182 0.190 0.202 0.225
(1.16) (1.36) (1.30) (1.39)
TTt—4 0.067 0.447 0.316 0.275
(0.40) (3.08) (1.97) (1.58)
B. Dependent Variable: Real Return, rt
R3 0.45 0.34 0.24 0.10
F-statistic 6.32 3.97 2.42 0.89
Intercept 0.001 0.005 0.001 0.005
(0.23) (0.68)(0.12) (0.54)
n -i 0.080 -0.238 0.067 -0.021
(0.48) (1.59)(0.38) (0.11)
rt- 2 0.083 0.101 0.296 0.036
(0.53) (0.67)(1.65) (0.20)
n -3 0.367 0.237 0.079 0.272
(2.37) (1.57)(0.44) (1.48)
rt- 4 0.332 0.540 0.320 0.214
(1.94) (3.55)(1.78) (1.11)
C. Dependent Variable: Excess Return, ERt
R2 0.31 0.22 0.23 0.13
F-statistic 3.48 2.17 2.35 1.20
Intercept 0.008 0.008 0.005 0.006
(1.19) (1.15) (0.69) (0.74)
E R t- i 0.208 -0.154 -0.024 -0.152
(1.14) (1.04) (0.14) (0.86)
E R t-2 0.098 0.044 0.291 0.099
(0.49) (0.30) (1.67) (0.57)
E R t~ 3 0.136 0.101 0.072 0.287
(0.68) (0.65) (0.40) (1.63)
E R t. A -0.040 0.308 0.110 0.154
(0.20) (2.01) (0.60) (0.80)
29
1.7 Is that a $20 Dollar Bill on the Porch?

Housing markets lack the financial instrum ents needed to exploit the aggregate
relationships estim ated in the previous section. Of the set of options available in
financial markets, only the buy-and-hold strategy exists in housing markets; invest
ment in housing requires taking ownership of a particular property. This appears
to be substantially more risky than indicated by aggregate returns. The standard
deviation of price change reported in Table 1.1 is more than twice as large as the
standard deviation of aggregate prices. The tests developed in this section use itera
tive cross-validation techniques (bootstrap) to determine the value of the predictable
relationship between past aggregate and future individual returns using a simple in
vestment rule while accounting for transactions and holding costs.
The investment rule is straightforward: buy if the model predicts returns greater
than zero. For real returns, this means th a t the combined return due to capital gains
and implicit rent need only to surpass inflation. For excess returns, the combined
return has to exceed the home owner’s cost of capital. The profitability of this rule
is examined over three investment horizons: 4, 8, and 12 quarters.
The relationship between past and future returns is estim ated by utilizing the
panel nature of the data. For each paired sale, both the lagged aggregate returns
from the periods preceding the first sale and realized return at the second sale are
known. Those paired sales which are sold 4, 8, and 12 quarters apart are extracted
from the repeat sales d a ta described above. These paired sales are then merged with
lagged aggregate real and excess returns. An observation is then the realized sale
price at the second time of sale, the observed price at the time of the first sale, and
a vector of lagged aggregate returns from the periods immediately prior to the first
sale. The subsample is split—ninety percent of the merged data are used to estim ate
the relationship between lagged returns and realized returns, the other ten percent
are used to test for excess profitability given this relationship.
For each of the test intervals, the quarterly geometric mean of realized returns
in the ninety-percent subsample are regressed on lagged aggregate returns to test for
predictability over a specific investment horizon. Using excess returns as an example,
30
the regression is
((1 + E R \ + n t ) l ^n ) —1 = k + f i i E R t - i + fa E R t-i + /3 zE R t-3 + P tE R t-* + wt, (1.26)
where u>t is white noise, E R \+n t is the realized excess return for dwelling i between
t and t + n, and E R t-1 is the aggregate excess returns in period t — 1 . The invest
ment rule can then be evaluated using the remaining observations in the ten-percent
subsample-those not used in the regression. For each, the forecasted return is cal
culated. If the predicted return is greater than zero, the dwelling is “purchased.”
Realized returns for “purchased” properties are then compared with those which are
not.
The number of observations in any one interval is in the hundreds, so the 90%/10%
split leaves few observations on which to judge profitability. To overcome this prob
lem, the cross-validation technique described above is repeated, and fifty iterations are
executed for each interval for each of the four regions. The results of this procedure
are displayed in several tables below.
Table 1.4 shows the distribution of forecasts for real and excess returns. It contains
the number of sampled dwellings th a t are forecast to have positive and non-positive
returns and the actual return observed over the period between the two sales. It is
immediately clear th a t the combined return due to capital gains and implicit rent
exceeded inflation for most of the sample period as almost every sampled dwelling
is forecast to have positive real returns. This is not true of the forecasted excess
returns. While a clear majority of sampled dwellings is forecast to have positive
returns, a sizeable minority is not. The difference in realized excess returns is striking.
The average excess return is one to three percentage points higher per quarter. This
indicates that the predictable relationship observed at the aggregate level holds when
measured at the level of the individual dwelling.
However, predictability alone does not necessarily indicate arbitrage opportuni
ties. In order to determine the potential to exploit the relationship between past and
future returns, it is necessary to account for the substantial costs of buying, owning,
and selling housing. These costs are explained in detail in Soderberg (1995). Ta
ble 1.5 summarizes the major costs of participating in the housing market for four
31
Table 1.4: Forecasting Individual Housing Returns

(Returns Reported as Percent per Quarter)
Realized Real Return Realized Excess R eturn

Given Given
> 0] E [ r t+ i < 0] E [ E R t + i > 0] E [ E R t + i < 0]
obs mean obs mean obs mean obs mean
A. Interval = 4 Quarters
Stockholm 666 2.41% 17 1.04% 387 1.89% 296 -1.23%

U ppsala 1069 2.77 - - 639 1.77 457 -0.15
Malmo 1192 4.09 - - 1140 2.45 52 1.17
G othenburg 1288 3.31 - - 1271 1.54 17 0.30
B. Interval = 8 Quarters
Stockholm 636 2.64 _ _

459 1.29 177 -0.18
U ppsala 1009 2.66 - - 631 1.32 378 0.11
Malmo 1138 2.76 - - 768 1.49 370 0.00
Gothenburg 1027 2.21 - - 603 1.15 424 -0.68
C. Interval = 12 Quarters
Stockholm 442 3.44 _ _

442 1.89
U ppsala 811 2.79 - - 750 1.35 61 -2.08
Malmo 720 2.62 - - 679 1.04 41 -0.06
Gothenburg 743 2.87 - - 743 1.12 • ”
32
Table 1.5: Transactions Costs

(Costs reported as percent of dwelling price, unless otherwise indicated)
Type of Buyer
New Passive Resident Outside

Cost Entrant Investor Investor Investor
Registration,
Inspection, none 2% 2% 2%
Assessment
Tax on none none none 40% (of rent)

Rental Income
Broker’s Fee none 5% 5% 5%

at Sale
Capital Gains Tax none none 30% (of gain) 30% (of gain)
hypothetical investors.
The first is referred to as a new entrant. This individual is living at home with his
parents and waiting for the right time to buy. He faces no transaction costs as they are
considered sunk costs. The second buyer is the passive investor, whose investments in
housing are guided primarily by her desired level of housing consumption. She moves
as family size and employment change, incurring the costs of entry and exit, b ut
continues to roll any accumulated capital gains back into housing and thus avoids
paying capital gains taxes. The resident investor is an individual who invests in
housing because of the total returns; he will move out and sleep in his office when
higher returns appear elsewhere. The last of the four buyers is the outside investor
who acquires housing but does not occupy it, and is therefore subject to the rental
income he earns from the property.
Clearly, these are simplifications, but the four investors described above provide
enough structure to understand the approximate magnitude of the restraints on com-
33
petition in housing markets. For example, the tax on rental income that is faced by
owners who do not occupy their dwelling penalizes the outside investor relative to the
other types of investors. These institutional features of housing markets represent a
substantial barrier to entry for economic agents who are strong forces for efficiency
in other markets.
Based on the investment rule employed above ( “purchase” if expected returns are
positive), realized profits from trade are calculated and presented in Tables 1.6 and
1.7. Table 1.6 shows that, even after incorporating the appropriate costs, the real
returns to housing are generally positive for all four types of buyer. The increasingly
higher costs facing the passive, resident, and outside investor, significantly reduce the
realized real returns for each. In the shortest interval, the outside investor experiences
negative real returns, but as the interval increases, real returns rise, reflecting the
am ortization of the fixed costs incurred at purchase and sale.
Table 1.7 shows the realized excess returns for the same four types of buyers. The
table differs sharply from the realized real returns in Table 1.6. After accounting for
the costs associated with investment in housing, excess returns are not as large as
real returns. The table shows th at both the resident and outside investors lose money
when buying and selling housing over these three time horizons. The new entrant, as
in Case and Shiller (1989), is able to make consistent excess returns by waiting for a
positive forecast.
The most interesting result given in Table 1.7 is the returns experienced by the
passive investor. This type of buyer is the most common; the majority of housing
transactions are not first time purchases, but reflect adjustm ents to housing consump
tion. Most of these transactions do not incur capital gains taxes because the gains are
reinvested in more expensive housing. For these individuals, the excess returns are
close to zero over the shortest interval, and increase slightly as the interval increases.
While this appears to be evidence that excess returns are available to typical market
participants, it should be noted that the costs of moving have not been included be
cause no d ata were available. Furthermore, the measure of excess returns is relative
to th e risk-free rate of short-term government securities, not the “market” return.
The home owner is not compensated for the risk implicit in owning housing. Table
34
Table 1.6: Real Returns for Different T ypes of Home Buyers
“Realized” Real Return Given E [ r t+i > 0]

(Real Return Reported as Percent per Quarter)

Entrant Investor Investor Investor
Stockholm 2.41% 0.60% -0.02% -0.72%

Uppsala 2.77 0.94 0.19 -0.49
Malmo 4.09 2.17 0.97 0.32
Gothenburg 3.31 1.43 0.50 -0.18
B. Interval = 8 Q uarters
Stockholm 2.64 1.65 0.89 0.27

Uppsala 2.66 1.65 0.87 0.25
Malmo 2.76 1.77 0.98 0.36
Gothenburg 2.21 1.26 0.68 0.04
Stockholm 3.44 2.67 1.57 1.00

Uppsala 2.79 2.07 1.24 0.65
Malmo 2.62 1.92 1.13 0.55
Gothenburg 2.87 2.14 1.27 0.69
Table 1.7: Excess Returns for Different Types of Home Buyers
“Realized" Excess R eturn Given E [ r t+i > 0]

(Excess Return Reported as Percent per Quarter)

Entrant Investor Investor Investor
Stockholm 1.89% 0.01% -0.99% -1.68%
<M CM
rm
Uppsala 1.77 -0.11 -1.11
1 1
4
CO
Malmo 2.46 0.54 -0.67
Gothenburg 1.54 -0.34 -1.27 -1.95
B. Interval = 8 Quarters
Stockholm 1.29 0.29 -0.57 -1.18

Uppsala 1.32 0.30 -0.59 -1.18
Malmo 1.50 0.48 -0.46 -1.05
Gothenburg 1.15 0.15 -0.65 -1.26
Stockholm 1.84 1.07 -0.03 -0.59

Uppsala 1.35 0.62 -0.27 -0.86
Malmo 1.04 0.33 -0.48 -1.06
Gothenburg 1.20 0.47 -0.39 -0.98
36
1.1 showed considerable volatility in price appreciation. It seems likely th at the true
excess returns will be close to zero or even negative.
In incomplete markets, risk averse agents will require a premium to hold risky
assets. Because the returns reported in Table 1.7 are uncompensated for risk, these
returns should be viewed as an upper bound on the true excess return. As such, the
results presented in Table 1.7 offer evidence in support for housing market efficiency
in the sense th at excess returns are essentially zero.
These results should hearten believers in competitive markets. Housing markets
exhibit correlated returns, but do not offer arbitrage opportunities. For the m ajority
of the participants in housing markets, the predictability of returns cannot be prof
itably exploited. Furthermore, it appears th a t market forces have driven the returns
to housing to approximately the home owner’s cost of capital after accounting for the
fixed cost associated w ith buying and selling housing.
1.8 Conclusion
Over the past decade, researchers have appealed to transaction costs to explain
consistent findings of inertia in the movement of aggregate housing prices. Typically,
however, no empirical evidence has been provided to support these claims and little
is known about the im pact that these or other housing market features have on price
dynamics.
Using a body of d a ta uniquely suited to the task, this paper addresses the price
dynamics of owner-occupied housing markets in several ways. In contrast to the
majority of existing research on housing prices, this paper focuses on the behavior
of individual dwelling prices. In addition, the data support an examination of the
impact of transaction costs on housing price dynamics.
The d ata suggest th a t Case and Shiller’s (1989) speculation “th a t the variance of
the noise [in price appreciation] increases with the interval between sales,” is correct.
The analysis shows th a t the error variance is positively correlated with the interval of
time between sales. However, the results also indicate that the correlation is not one.
Rather than following a random walk, the errors are characterized more accurately
37
as an autocorrelated process with a quarterly correlation coefficient of about 0.9—

pricing errors will persist for many years. This result is particularly im portant for the
measurement of aggregate housing prices. The most common method for constructing
indexes of aggregate housing prices, the “weighted repeat sales method,” imposes the
random walk assumption, and therefore weights observations incorrectly. The serial
correlation of individual dwelling return found in this paper is shown to lead to serial
correlation in aggregate housing returns as well.
The panel nature of the data also allow an evaluation of the profitability of a
simple investment rule. This rule takes advantage of the relationship between past
and future returns using the only investment strategy possible in housing markets:
buy-and-hold. The d ata also allow the calculation of the costs associated with entry
and exit from housing markets, as well as the cost of holding housing as an income-
producing asset.
This paper demonstrates that the serial correlation th at characterizes individual
home prices can be used to predict excess returns. However, only for new entrants, for
whom entry costs are sunk and capital gains taxes are essentially irrelevant, does the
relationship between past and future returns yield positive excess returns. “Specula
tors” earn large negative returns, even after exploiting their predictability. It appears
that excess returns have been driven down to approximately the transaction costs of
buying and selling housing for the typical home owner. In this sense, the results offer
no evidence of m arket inefficiency—the typical investor in owner-occupied housing
is earning zero excess returns, even though the measure of excess returns does not
compensate the owner for risk. It appears that the intuition of previous researchers is
well-founded: transaction costs in housing markets allow predictable returns to per
sist. While some housing investments will yield excess returns, Lo and MacKinlay’s
(1999) summary of the consensus on financial market efficiency seems appropriate:
“an occasional free lunch is perm itted, but free lunch plans are ruled out” (p.7).
38
Chapter 2
The Com position of M etropolitan

Employment and th e Correlation
of Housing Prices Across
M etropolitan Areas
“The variation in default rates by region is quite substantial. Default

rates in the N orthcentral states were about five times as large as default
rates in the Southeastern states. These differences reflect the credit rate
risk associated with the real estate markets in each of the regions, the
fortunes o f the regional economies, and the loan-to-value ratios and ages
of the mortgages.”
- Quigley and Van Order (1991), p. 358, italics added
“The pattern of covariances in these returns suggests th at portfolio risk

can be reduced by geographical diversification . . . ”
- Quigley and Van Order (1991), p. 361
Implicit in Quigley and Van O rder’s (1991) argument is an assum ption th a t the fun
damentals th at generate returns to housing are not perfectly correlated across space.
However, if metropolitan areas are viewed as small open economies, they will share
39
shocks to the prices of common im ports and exports—shocks that may spill over to
housing markets. This paper dem onstrates th a t the correlation of returns to residen
tial housing between two m etropolitan areas is a function not only of their physical
proximity, but also the similarity of their industrial composition. This implies th at
as local economies evolve, so will the covariance of housing returns, which suggests
that the benefits derived from diversification are maximized by considering the indus
try risk inherent in the current m etropolitan areas, not just the correlation of past
returns.
2.1 Introduction
In 1978 the composition of employment in Atlantic City, New Jersey shifted drar
matically with the legalization of gambling. Thereafter, hotel and casino expansion
further differentiated its economic base from the rest of the state. Through the first
half of the next decade, housing price appreciation in Atlantic City followed a course
that was less reflective of its location than th at of its new industrial composition.
While the other metropolitan areas in New Jersey experienced similar housing
price growth, the change in the value of Atlantic City’s owner-occupied housing ex
hibited a pronounced cycle around the rest of the state. Between 1978 and 1982,
aggregate house price growth in Atlantic City grew by 55 percent more than th e state
average. By 1988 the index of aggregate housing prices in Atlantic City had depreci
ated relative to the state housing price level by 20 percent, a fall of almost half from
its 1982 high. In contrast, average housing price indexes in other major m etropolitan
areas in New Jersey deviated from the state average by more than 10 percent during
only a handful of quarters during the entire 21-year sample period.
The introduction of gambling into Atlantic City induced both an increase in total
employment and a shift in the fraction of employment dedicated to hotels and casi
nos. Over a short period of adjustm ent, Atlantic City’s cross section of employment
resembled that of the gambling cities of Nevada. Furthermore, for the years follow
ing the legalization of gambling, movements in house prices in Atlantic C ity more
closely resembled those of Las Vegas and Reno than those of the m etropolitan areas
40
immediately surrounding it.

This paper examines the influence of industrial similarity on the correlation of
aggregate house prices between m etropolitan areas. The model parameterizes corre
lation as a function of national, regional, state, and local factors. By partitioning the
fundamental determ inants of house prices in this way it is possible to test the hypothe
sis that industrial similarity influences the correlation of housing returns independent
of the effect of physical proximity. The analysis tests whether m etropolitan areas
that share more similar industrial composition also share more similar movements in
housing prices.
A reduced-form model of housing returns is presented. It relies on the spatial
scope of the factors of supply and demand to identify the effect of the physical prox
imity and industry mix. For example, national interest rates, state taxes, and changes
in the demand for land in a nearby city, all exert independent effects on the price of
local housing. The empirical results suggest that housing returns across metropoli
tan areas are related by their industrial similarity. In each version of the model
tested, the relative similarity between two metropolitan areas’ industrial composition
is a significant predictor of similarity in housing price movements between the two
metropolitan areas. The relationship between two housing markets may change over
time as their shared exposure to common shocks varies. Historical tim e series may
be less meaningful as the basis of diversification as metropolitan economies evolve.
Section 2.2 develops the concept of industrial similarity and provides a suggestive
anecdote as to its influence by examining the experience of Atlantic City and the
legalization of gambling in 1978. Section 2.3 reviews several papers th a t focus on the
role of industry and space in housing markets. Section 2.4 describes the data. Section
2.5 discusses the role of industrial similarity and physical proximity in determining
housing prices and outlines the research design. Estimation results are discussed in
Section 2.6. Section 2.7 previews ongoing extensions and concludes.
41
2.2 Atlantic City and “Industrial Distance”

In 1978 gambling was legalized in Atlantic City. This exogenous shock greatly
altered the structure of Atlantic City’s employment, shifting it dramatically toward
services. Hotel and casino employment growth was particularly large. Over the next
five to seven years, housing prices in Atlantic City behaved remarkably unlike the
other five metropolitan areas within New Jersey, and more like three metropolitan
areas well outside the state and region, in New Orleans, Las Vegas, and Reno. I will
refer to these cities as “destination cities.’’1
Figure 2.1 shows the course of aggregate housing prices in New Jersey’s metropoli
tan areas normalized by aggregate New Jersey state housing price levels. Figure 2.2
shows the same for Atlantic City, Las Vegas, Reno, and New Orleans. Compared to
the other of New Jersey’s major metropolitan areas, the idiosyncratic movement of
Atlantic C ity’s housing prices is striking. Beginning in 1979 the evolution of prices
in Atlantic City diverged and did not return to a pattern typical of the other New
Jersey cities until more than a decade later. In the interim, housing prices followed a
path similar to those of other “destination cities.”
Another way to view the information shown in Figures 2.1 and 2.2 is to compare
the correlation coefficients for the two groups of cities. For the entire sample period,
from 1975 to the third quarter of 1996, the average of the correlations between A t
lantic City and the five other New Jersey m etropolitan area is 0.37; between Atlantic
City and the “destination cities” it is 0.21. However, from 1978-1985 the average as
sociation with the Las Vegas, Reno, and New Orleans rose to 0.47, while the average
correlation with Bergen-Passaic, Middlesex-Hunterdon-Somerset, Monmouth-Ocean
City, Newark, and Trenton weakened slightly to 0.34.
In Figures 2.1 and 2.2 physical proximity appears to be more influential than
industrial similarity when, during the late 1980s, housing prices rose steeply in all
of the m etropolitan areas surrounding Atlantic City. A t this point the correlation of
l More specifically Las Vegas and Reno are “destinations” for gambling. While New Orleans also
offers some opportunity to gamble, employment is not dominated by this industry. The three share
a high percentage of their workforce in services, especially hotels. Overall industrial similarity is
discussed below.
42
Figure 2.1: Housing Prices-N ew Jersey M SAs R elative to New Jersey State
160
140
— 120
100
1976 1976 1960 1962 1984 1986 1988 1990 1992 1994 1996
Y#ar
Figure 2.2: Housing Prices-Other MSAs Relative to New Jersey State

160
Atlantic City
Las Vegas
Reno
140 New Orleans
100
1976 1978 1980 1962 1984 1986 1988 1990 1992 1994 1996
Year
43
Figure 2.3: Industrial Distance-New Jersey MSAs R elative to A tlantic City

0.35
Monmouth-OC
Other NJ MSAs
0.3
0.25
0.2
0.15
0.1
0.05
1975 1980 1985 1990 1995 2000

Yaar
housing prices with the “destination” cities weakened as the growth in Atlantic C ity’s
housing prices accelerated with the rest of New Jersey’s.
Figures 2.3 and 2.4 preview a key variable developed below, industrial distance.
The figures show this measure of industrial similarity between A tlantic City and the
other m etropolitan areas over the sample period. Figure 2.3 shows how Atlantic C ity’s
cross section of employment diverged from the rest of the state— how much greater
the “industrial distance” became—as the employment in A tlantic City evolved rapidly
after the legalization of gambling. Conversely, Figure 2.4 dem onstrates how this
measure became smaller—the “industrial distance” narrowed—as the cross section of
employment in A tlantic City grew more similar to those of Reno and Las Vegas.
2.3 Related Research

There is little existing literature on spatial correlation in housing prices across
metropolitan areas, and even less on spatial correlation across regions. Space and ge-
44
Figure 2.4: Industrial Distance-O ther MSAs Relative to Atlantic City

0.3S
Las Vegas
- - 0- Reno
New Orleans
0.3
■*+
0.2S
0.2
b
0.15
0.05
1975 1980 1985 1990 1995 2000

Year
ography have received more research attention recently, but the majority of this work
has concentrated on spatial correlation in housing prices within housing m arkets.2
Clapp, Dolde, and Tirtiroglu (1995) find a significant spatial diffusion process in
housing prices across neighboring municipalities within larger metropolitan areas in
their study of San Francisco and Connecticut. Pollakowski and Ray (1997) obtain
similar results, finding a significant lead/lag structure in intrametropolitan housing
prices within a large urban area, but do not reach the same conclusion for Census
divisions.
The economic rationales that support these types of results include informational
decay as distance increases, spatial spillovers of shocks, or “ripples” (Meen and
Andrew 1998, Cromwell 1992), and broadly similar economic fundamentals within
regions.
Others have examined the impact of industry mix on economic outcomes. Terkla
2Volume 14(3), 1997, of the Journal of Real Estate Finance and Economics is dedicated to spatial
correlation at the micro level.
45
and Doeringer (1991) use a modified shift-share analysis to examine the relative im
portance of industry mix and local cost factors on employment growth. They find
“that industry mix interacting with national trends dominates the economic per
formance of regions over the short-run periods." Clark (1998) finds th a t industry-
specific shocks are important, b ut concludes that they are dominated by region-
specific shocks. Browne (1992) compares the industrial structures of New England
and Texas in an attem pt to explain the boom and bust cycles in each. Case and Shiller
(1996) find th at the share of local employment in manufacturing was a significant in
fluence on the course of house prices within the Boston consolidated metropolitan
statistical area.
The mechanism by which housing markets may be influenced by changes in em
ployment is developed in Blanchard and Katz (1992) and, more recently, in Johnes
and Hyclak (1999). In both, shocks to demand for locally produced goods leads to
changes in local employment, with migration restoring equilibrium. T h at is, the level
of local employment, not the level of wages, adjusts to aggregate shocks. The link
to local housing markets is through the aggregate demand for housing, falling with
net emigration and higher short-run employment. While not specifically addressing
industry mix, these papers illustrate how the composition of local economic activity
might influence housing markets, and suggest that aggregate industry shocks might
systematically influence geographically independent housing markets. Clearly indus
try shocks occur; recent shocks to industries such as motor vehicles and the energy
sector—their asymmetric impacts on Detroit and Texas, respectively—are now fa
miliar. Similarly, changes in m ilitary spending had a disproportionate effect in areas
heavily invested in defense activities, such as New England and Southern California.
Abraham, Goetzmann, and Wachter (1994), henceforth referred to as AG&W,
establish m etropolitan area groupings based on the “closeness” of their housing price
movements. Similarity in this regard is agnostic as to any dimension of comparison
save quarterly returns to owner-occupied housing. They find “geography dominates
economics when it comes to differentiating housing markets.” This inference is based
on the city groupings they observe by clustering them according to similar movements
in house prices. Membership in a typical cluster appears to be primarily a function
46
of geographical proximity and secondarily as a function of gross concentrations of

economic activity, i.e. the Rust Belt, the Oil Patch states, etc. Their results and
supporting arguments are intuitively appealing, but unsatisfactory on several points.
The first of these is the lack of systematic analysis of their clusters. Visual inspec
tion of the results supports the notion that geography is relatively more im portant
than economics in determining the relationship between outcomes in housing markets
across metropolitan areas, but no formal test is undertaken. A second weakness is
that the research method employed in AG&W is essentially static. While the location
of metropolitan areas is fixed over time, their economies are not—as local employment
evolves so will the similarity in housing returns. Finally, the AG&W results are based
on only 30 metropolitan areas, limiting the extent to which industry concentration
effects might be found.3
This paper examines the extent to which industrial similarity influences the cor
relation of returns to housing across metropolitan areas. The use of historical returns
in any analysis is valid only if the data generating process in time-invariant. The
results of AG&W suggest th at industrial concentration has guided return similarity.
This implies th at as the underlying economies evolve so may the similarity of returns
across metropolitan areas. Testing this proposition is the m ajor goal of this paper.
2.4 The Data

This paper utilizes the Fannie M ae/Freddie Mac Conventional Mortgage Home
Price Indexes and the state and area employment time series from the Bureau of
Labor Statistics (BLS).
The Conventional Mortgage Home Price Indexes are based on the combined his
tory of mortgages purchased by Fannie Mae and Freddie Mac. The indexes are con
structed using the weighted repeat sales method described in Case and Shiller (1989),
and so hold quality constant.4 The combined mortgage pool includes both refinances
3Henderson (1997) finds that “medium-sized” cities are much more likely to specialize in a par
ticular industry. It is more likely to find industry effects where concentration is high, so expanding
the analysis to the largest 150 cities is likely to help identify industry effects.
4This method is not without its detractors. Meese and Wallace (1991) find evidence that it
47
as well as house sales, so not all house prices are derived from market transactions,
using instead the appraised value when refinancing occurs.5
The strength of the Fannie M ae/Freddie Mac indexes is the breadth of their
coverage. The d ata include indexes for all 50 states and the District of Columbia,
as well as 151 metropolitan areas. The indexes are published quarterly, beginning in
1975. The indexes employed in this paper extend though the third quarter of 1996.
Employment information comes from the Bureau of Labor Statistics (BLS). These
time series include aggregate employment data by state and metropolitan area, as well
as by 1-, 2-, 3-, and 4-digit SIC codes. The length of the time series is, in general,
inversely related to its specificity, with very little comprehensive d ata existing for
most 3- and 4-digit industry classifications. For the broad employment categories the
data are excellent; the 1-digit time series are available monthly beginning in 1939.
The final source of d ata is the location coordinates of each m etropolitan area—
these are metropolitan “centers” as defined by the U.S. Census. The latitude and
longitude of each city are used to establish the physical distance between each pair
of cities using an adjustm ent to Pythagoras’ Theorem to account for the curvature of
the earth. They are great circle distances that do not consider natural features, such
as lakes, mountains, etc. that would influence actual travel distances.
Table 2.1 describes the data. Panel A summarizes the movement of housing re
turns for four consecutive five-year periods. It is immediately clear th a t aggregate
house price movements have varied substantially, both across time and Census divi
sion. The divisional returns are calculated using the unweighted quarterly housing
returns from the metropolitan areas within each of the nine Census divisions. The
returns are nominal, which is readily apparent from the consistently high returns
during the high-inflation period in the late 1970s. Also noteworthy is the variation
substantially overstates price increased in rising markets. Englund, Quigley, and Redfearn (1999a,
1999b) and GatzlafF and Haurin (1997, 1998) find significant evidence that the use o f only those
dwellings that sell twice imparts selection bias and results in a biased price index.
6For the period 1975-1994, Pollakowski and Ray (1997) report a total pool of 17.5 million mort
gages yielding 4.6 million matching transactions. The presence of refinances in the pool does not
seem to cause any consistent bias. Where possible the Freddie/Fannie indexes have been compared
with commercially available repeat-sales indexes, which use only transactions data. W hile there are
short-run deviations between the indexes, there is a high correlation between the indexes over the
entire sample period.
48
in returns within each cross-section, suggestive of idiosyncratic regional movement in

housing prices.
Panel B reports the summary statistics for the correlations in housing returns
between metropolitan areas, the dependent variable in the first model presented be
low. Not surprisingly, there is considerable variation in each of the three measures of
correlation. Panel B shows th at the price indexes vary considerably from quarter to
quarter, b ut generally rise together with the overall price level.
A measure of similarity in metropolitan employment structure is constructed. The
measure, referred to in this paper as industrial distance, is the Euclidean distance
between the industrial composition of two cities. That is, the industrial distance
between city i and city j is defined as
In d u stria l D istance = ID ij = (sâretfct - sharejfct)2j . (2-1)
where k and t index industry and time, respectively, shareikt is the proportion of
total m etropolitan area i employment involved in industry fc at time t.
Industrial distance is calculated using one-digit SIC codes. This is a coarse level of
aggregation, which hides much of the specific industrial composition of a metropolitan
economy. However, it does identify broad industrial structure—cities identified by
concentrations in government, finance, and manufacturing are differentiated.6
The components of industry mix are given in Panel C. The share of local employ
ment has been calculated for the one-digit industries with the exception of mining,
which employs an insignificant proportion of the labor force in the metropolitan areas
included in this research. Panel C emphasizes the substantial variation in metropoli
tan employment, with the larger variances occurring in the proportion of employment
in manufacturing and government.
Prom the industry shares, industrial distance is calculated. The summary statistics
for industrial distance are reported in Panel D. The minimum of .005 describes the
6The appropriate industrial classification level is the topic of ongoing research. The key issue for
this research is the extent to which similarity arises from the specific bundle of goods a city produces
or whether broader types of activities, such as manufacturing or finance, insurance, and real estate
(FIRE), are sufficient.
49
Table 2.1: D escriptive Statistics
A. Average Q uarterly Returns Year

(nominal, in percent) 1976-1980 1981-1985 1986-1990 1991-1995
East North Central 2.23% 0.29% 1.28% 1.26%

East South Central 2.63 0.87 0.83 1.07
Mountain 3.20 0.64 0.38 1.84
Census Middle Atlantic 2.16 1.90 1.81 0.38
Division New England 2.47 3.16 1.46 -0.07
Pacific 3.90 0.62 2.45 0.09
South Atlantic 2.17 1.13 1.06 0.74
West North Central 2.36 0.49 0.69 1.15
West South Central 3.01 1.01 -0.51 1.03
B. Average Interm etropolitan Correlations
Mean Std. Dev. Minimum Maximum
Index Levels 0.89 0.10 0.08 1.00
Q uarterly Returns 0.06 0.21 -0.72 0.93
Yearly Returns 0.23 0.24 -0.71 0.96
C. Share of M etropolitan Employment by Industry
(percent) Mean Std. Dev. Minimum Maximum
Services 24.27% 5.81% 8.96% 50.17%
Trade 23.50 2.57 13.40 37.02
Manufacturing 18.77 8.84 2.87 47.64
Government 17.23 6.07 6.17 42.98
FIRE 5.94 2.25 2.21 19.11
Trade 5.18 1.61 1.58 11.29
Construction 5.09 1.97 1.36 16.65
D. Industrial Distance
(times 100) Mean Std. Dev. Minimum Maximum
Industrial Distance 15.35 7.59 0.53 55.65
E. Physical Distance
(in miles) Mean Std. Dev. Minimum Maximum
Physical Distance 1332.27 897.65 11.67 3899.99
50
close sim ilarity in the shares of employment between Nashua, New Hampshire and
Rockford, Illinois in 1975. Nashua is also part of the pair of cities th a t is “furthest”
as measured by industrial distance. At .556, Nashua and Las Vegas, Nevada in 1980
are the m ost dissimilar m etropolitan areas during the sample period.
Panel E reports the summary statistics for the physical distance measure. The
maximum distance of 3899 miles may be surprising—it is the distance from Honolulu,
Hawaii to Portland, Maine.
2.5 Spatial Factors, Industrial Mix, and the

Returns to Owner-Occupied Housing
The anecdotal evidence presented in Section 2.2 suggests th at a t least two lo
cal factors play a significant role in generating correlated outcomes across housing
markets: sim ilarity in industrial composition, and physical proximity. In order to
estimate the magnitudes of their influence, it is necessary to control for other sources
of correlation in housing returns. Previous research provides guidance in identifying
them (see especially Reichert (1990)).
In general, changes in inflation, interest rates, and the federal tax code have a
nation-wide impact on housing prices. These factors set the param eters of user cost
(see Poterba (1984)) and, therefore, influence the price of owner-occupied housing
systematically across the country. Gross migratory trends such as the exodus from the
Midwest to the Sunbelt or to the Rocky Mountain states are region-wide phenomena,
and should have distinct effects on housing prices in the affected areas. State tax and
expenditure policies directly, and indirectly, influence house prices w ithin a state’s
borders. Exposure to common policies that affect the value of housing will cause
house prices in different m etropolitan areas to be correlated as a result of their co-
location w ithin different goverment jurisdications. The focus of this paper is local
metropolitan characteristics as an additional source of common movements in housing
prices.
If industries concentrate to the point that are entirely located w ithin one geo
51
graphical area, then it will not be possible to differentiate between their respective
influences on housing prices. Identification requires variation in metropolitan industry
mix, within and across regions. Ellison and Glaeser (1997) find th a t spatial concentra
tion varies substantially across industries. Henderson (1997) finds that medium-sized
cities are more likely to have a portion of their employment concentrated in one in
dustry than are larger cities. Diminishing economies of scale in combination with
increasing transportation costs to suppliers and markets ensure that the production
of goods and services is located throughout the country.7 It is through this variation
in local employment that aggregate industry shocks link local economies, even those
separated by great physical distances. These shocks, in turn, affect housing m arkets.
Changes in the health of local economies affect house prices through shifts in the
demand for residential real estate. Housing cannot migrate in response to changing
demand, so the transmission of shocks to the local economy should be apparent in
local housing prices as a result of their inelastic supply. It should be noted th a t
supply responses to these shocks are likely to differ across metropolitan areas. To
the extent th at supply responses differ, this will dampen the measured correlation in
housing returns. This will make finding a statistically significant relationship among
the factors causing correlated movements in housing prices more difficult.
Neighboring metropolitan areas may experience similar co-movements in housing
prices due, not to similar industrial compositions, but rather to competition for land.
In this sense all houses within commuting distance can be viewed as imperfect substi
tutes, where the ability to substitute depends in part on the relative proximity of the
metropolitan areas. For example, the success of Silicon Valley has been felt in every
part of the San Francisco Bay Area, including the Central Valley city of Tracy, a city
very far from San Jose in many dimensions, but not distance. In this case, housing
price correlation is driven by physical proximity and not the underlying structure of
the local economy.
The research design follows from the proposition th at existing industry structure
is itself a filter through which aggregate industry shocks are passed to local housing
7See Kim (1995) for a discussion of long-run trends in deconcentration in manufacturing.
52
markets. This implies th at the correlation between the returns to housing between
two cities will then be a function of the similarity of their local industrial composition.
AG&W employ the k-means algorithm8 to allocate m etropolitan areas to one of a
predetermined number of clusters.9 In the context of housing returns, the statistical
relationship th at drives the results of the k-means algorithm is the correlation between
housing returns between metropolitan areas. T hat is, m etropolitan areas are grouped
to maximize the correlation between the return series w ithin clusters and minimize the
correlation across clusters. In order to compare the relative importance of geography
and economics, the first model presented in this paper also exploits the correlation in
returns to owner-occupied housing.
The correlation between the time series of quarterly house price changes in two
cities is regressed on measures of physical proximity and industrial similarity. T hat
is,
Pij = a + 'yIDij + SPD ij + P cC M S A ij + fisS ta te ^ + (3RRegion,ij (2.2)
where PD ij and ID y are physical distance and the industrial distance between
m etropolitan areas i and j , respectively, a is an intercept, <5 and y are the effects of
the two distances. C M S A ij and S ta te ^ are dummy variables th at take the value one
if m etropolitan areas i and j are in the same consolidated metropolitan statistical
areas or state, respectively. Regioriij is also a dummy variable, indicating whether
the two cities are in the same region. Region is defined by Census region, division, or
Bureau of Economic Analysis region, depending on the specification.
One potential challenge to the preceding analysis is the static treatment of in
dustrial composition. The economies of major U.S. m etropolitan areas have changed
substantially over the twenty-year sample period. If this transformation has occurred
unevenly across metropolitan areas the shared exposure to aggregate industry shocks
may have changed substantially over time.
8Simply put, the k-means algorithm clusters data so that the within cluster variation in minimized
while the across-cluster variation is maximized. The k refers to the specified number of groups into
which the data is clustered.
9AG&W attempt to endogenize the number of clusters, but find little evidence that strongly
favored one partition over another. They argue that “meaningful divisions can be identified at
several levels of aggregation.”
53
In order to capture the evolution of industrial composition and, therefore, of in

dustrial distance between two metropolitan areas, an analogous model is estim ated
using quarterly data. The model is dynamic, using quarter-by-quarter returns and in
dustrial distances. The static elements, physical distance, state and region dummies,
are defined as above.
Three examples of regressions will help demonstrate the mechanics of the second
model. Consider the case that correlation is purely a national effect—possibly through
inflation, national economic growth, etc. In this instance, regressing the period t
return of city i on the period t return of city j for every combination of cities will
obtain the correct estimate of the average intratemporal correlation. T hat is,
ru = P • r > (2.3)
If, on the other hand, correlation in housing returns is purely an intrastate phe
nomenon—returns to aggregate metropolitan housing prices move together within
state borders—then equation (2.3) is modified slightly:
r u = p - 1 , - Tj U (2.4)
where Ia is an indicator taking the value of one when cities i and j are in the same
state and zero if not.
Finally, consider the case th a t the correlation in housing returns declines linearly
with the physical distance between the cities, that is
(2.5)
In this case, the regression described by
(2.6)
will yield an estimate of the distance-based intermetropolitan correlation, p.

In addition to testing for national, state, and distance effects, the influences of co-
location w ithin a region or consolidated metropolitan area and similarity of industrial
54
mix can also be examined. These tests are undertaken simultaneously by executing
the following regression,
ru = (oct + l I D ij + SPDij + p c C M S A ij + (3s Sta teij + PRRegionij)-rj t +Uijt. (2.7)
In equation (2.7), r,* and rJt are the returns to housing in metropolitan areas i and j ,
respectively, at time £.10 a, C M S A ij, Stateij, and Regionij, are defined as above with
an added time index, £, on the intercept a. These terms should capture systematic
national, regional, and state effects. S ■rjt is the interaction of the return in city j
at time t with the physical distance between two cities—the parameter 8 captures
the effect of distance-weighted house price changes outside of city i on house price
changes in city i. 7 • and 7 are defined analogously, with the important difference
that the industrial distance varies with time. Uijt is white noise.
The model is slightly unusual in that all of the coefficients are the effects due to
interaction terms. However, interpreting these is straight-forward. For example, if
there exists a common return element to the correlation between metropolitan house
prices within the same region, the interaction of the state dummy variable with return
should be significant and positive. Similarly, a significantly negative coefficient on the
industrial distance interaction term can be interpreted as evidence that correlation
in housing returns declines with the dissimilarity between the industrial structure of
two cities.
2.6 Estimation and Results

For each possible pairing of metropolitan areas, the correlation of yearly change
in house prices was calculated, as were the physical and industrial distances between
the two cities. The industrial distance used in this static analysis is the average of
the quarterly industrial distances. A variety of other measures of industrial distance
were examined without any significant effect on the results described below.
10The return to housing is measured as a function of the change in the aggregate housing price
index. Three series were utilized: quarterly percent change, year-over-year percent change, and
kernel estimator of quarterly percent change. The results were robust to choice of return measure.
55
Table 2.2 reports the results of estim ating several specifications of equation (2.2).
The explanatory power of the models is low, accounting for only fourteen percent of
the variation in the observed correlation in housing price changes between two cities.
However, each of the param eters is of the predicted sign and all are highly significant.
The coefficients on industrial distance11 strongly suggest th a t similarity in in
dustrial structure indeed influences the co-movement of return to housing across
m arkets—th at ceteris parabis greater industrial similarity leads to more highly corre
lated returns. The estim ated param eter on industrial distance is highly stable across
the different models, indicating th at effect of industrial distance is robust to the
param eterization of physical proximity.
The variables capturing spatial proximity—physical distance and the dummies for
same CMSA, state, and region—are also highly significant and of the predicted sign.
Specifically, the coefficient on physical distance is significant and negative in each
model, varying with the combination of included spatial dummies. The difference
between the regional partitions is somewhat surprising. The Census divisions are
smaller, and allow for specific divisional effects to be captured, but the models indicate
the the broader Census regions are more appropriate.
All of the national, regional, and state effects discussed above are visible in the
different specifications. The intercept, interpreted as the national contribution to
interm etropolitan correlation, is positive and highly significant. Regional and state
effects are also im portant. Taken together, the static model suggests th at the strength
of the correlation between housing returns in two metropolitan areas dissipates with
the dissimilarity of their underlying economies and the physical distance between
them.
This static model is restrictive in the sense th at the industrial distance between
each pair of cities is the average industrial distance over two decades. Clearly, holding
relative industrial similarity constant over this period is unrealistic. The second
model allows for the underlying metropolitan economies and their relative industrial
11The models presented use log transformation of both industrial distance and physical distance.
The relationship between these variables and the correlation of housing returns may be highly
nonlinear. A limited set of other nonlinear transformations of these distances distance yielded no
improvement in explanatory power.
Table 2.2: Static Regression Results

(t-statistic in parentheses, 8911 observations)
Model 1 2 3 4 5 6 7 8 9 10
R-squared 0.008 0.107 0.110 0.112 0.118 0.128 0.122 0.119 0.137 0.127
Intercept 0.252 1.101 1.026 0.982 0.924 0.758 0.831 0.896 0.643 0.760
(19.05) (48.12) (37.58) (34.29) (31.44) (22.74) (25.82) (29.61) (18.12) (22.58)
ln(Industrial Dist.) -0.054 -0.031 -0.032 -0.029 -0.027 -0.029 -0.030 -0.027 -0.029
( 8.35) ( 4.99) ( 5.20) ( 4.74) ( 4.48) ( 4.73) ( 4.92) ( 4.49) ( 4.82)
ln(Physical Dist.) -0.108 -0.106 -0.100 -0.092 -0.070 -0.079 -0.088 -0.054 -0.070
(32.73) (31.94) (28.47) (24.94) (16.76) (19.53) (23.18) ( 12.00) (16.24)
GMSA dummy 0.163 0.125 0.152 0.150
( 5.01) ( 3.84) ( 4.70) ( 4.59)
State dummy 0.164 0.154 0.131 0.094
( 9.15) ( 8.56) ( 7.32) ( 4.92)
Census Reg. dummy 0.121 0.118
(13.80) (13.36)
Census Div. dummy 0.124 0.106
(11.31) ( 9.08)
C Jl
o
57
similarity to evolve over time.

While the form of the second model, defined in equation (2.7), differs from the first,
the interpretation of the estimated coefficients is the same. If industrial similarity is
a determ inant of housing price correlation, then the interaction term relating the
housing returns of two cities as a function of the industrial distance between the two
should be negative and significant. The same logic applies for the physical distance-
return interaction terms.
Table 2.3 reports the results of the estimation of the time-varying correlation
model, and offers more support for the hypothesis th a t both linear and industrial
distances influence the correlation of housing returns across metropolitan areas.
W ith the exception of the CMSA dummy, the results presented in Table 2.3 are
similar to those presented in Table 2.2. The dynamic models explain about 24 percent
of the variation in annual housing returns, approximately twice th at of the static
models. Again, the coefficient on industrial distance is highly significant and stable.
The regional dummies are similar in magnitude and similar in relative magnitude,
with the coefficient on the state dummy being the largest, with census division and
census region following.
The coefficient on the CMSA dummy is much larger relative to the other dummies
in the dynamic model than in the static model. This indicates that there is consider
able co-movement of housing returns within large conurbations that is measured only
after controlling for the evolution of the underlying economies.
The coefficient on industrial distance is highly significant statistically, b ut less so
economically. In order to understand what the magnitude of the estim ate implies,
a simple counterfactual can be calculated. If two cities were to converge slightly in
their industrial composition, th at is, if the industrial distance were to be reduced by
10 percent, w hat would the outcome be on aggregate correlation in housing returns?
The coefficients from model 9 predict that a one standard deviation increase from the
mean, in industrial similarity would lead to a three percent increase in the correlation
between their aggregate housing returns. A symmetric change the physical proximity
of the two cities has a much larger effect, increasing the correlation of the returns by
19 percent.
Table 2.3: Dynamic Regression Results

(t-statistic in parentheses, 152267 observations)
Model 1 2 3 4 5 6 7 8 9 10
R-squared 0.229 0.240 0.240 0.241 0.241 0.242 0.241 0.242 0.244 0.243
Intercept 0.042 0.041 0.041 0.042 0.041 0.041 0.041 0.042 0.042 0.042
(222.41) (221.37) (221.38) (222.19) (221.81) (221.61) (221.66) (222.47) (222.73) (222.60)
ln(Industrial Dist.) -0.025 -0.012 -0.014 -0.011 -0.011 -0.011 -0.013 -0.013 -0.012
( 7.87) ( 3.84) ( 4.46) ( 3.56) ( 3.59) ( 3.38) ( 4-16) ( 3.99) ( 3.90)
ln(Physical Dist.) -0.080 -0.080 -0.067 -0.066 -0.049 -0.056 -0.057 -0.026 -0.041
(46.76) (46.26) (35.81) (33.76) (21.17) (25.86) (27.66) (10.25) (17.46)
CMSA dummy 0.254 0.227 0.250 0.244
(17.28) (15.33) (16.79) (16.38)
State dummy 0.142 0.122 0.105 0.066
(15.03) (12.75) (10.99) ( 6.42)
Census Reg. dummy 0.100 0.102
(19.32) (19.48)
Census Div. dummy 0.112 0.098
(17.85) (14.44)
Cn
00
59
2.7 Conclusion
The inference th a t should be drawn from this research is th a t industrial compo
sition m atters in a particular way to the co-movement of aggregate housing returns
across m etropolitan areas. The relationship is consistent with the theory that aggre
gate industry shocks are transm itted to local economies as a function of the types of
economic activity th a t the city undertakes.
This finding should inform an investor attem pting to hedge residential housing
risk in two ways. The first is th at understanding portfolio risk in residential real
estate requires understanding the industry risk metropolitan areas face. The second
is that the use of backward-looking correlations may not be wise as the industrial
composition of cities evolve. Optim al hedging will have to consider the underlying
process th at produces returns in housing markets.
Clearly more work is necessary to understand and control for the influence of
spatial proximity. Additional research is also under way to b etter characterize regional
business cycles and the way in which shocks are propagated across industries. It is
likely th at the marginal propensity to consume owner-occupied housing differs across
industries, implying th a t implicit weighting scheme used in calculating the industrial
similarity between cities could be improved upon. A more relevant measure would
account for asymmetries along these lines.
Advances in the treatm ent of these variables should only increase the reliability in
the results presented in this paper. However, the findings of this paper strongly sug
gest th at the exposure to common industry shocks systematically influences outcomes
in residential housing markets.
60
Chapter 3
Do H ousing Transactions Provide

M isleading Evidence
A bout the Course of H ousing
Values?
This paper is coauthored w ith Peter Englund, Stockholm School of Economics,

and John Quigley, University of California, Berkeley.
Estimates of the prices of housing and the value of its stock are derived from
observations on housing transactions. These transactions may well be a nonrandom
sample of the underlying population of dwellings. For example, it is widely thought
that smaller “starter homes” sell more frequently than more expensive properties and
that the frequency of transactions on high-valued properties varies over the business
cycle.
This paper considers the im portance of these selectivity issues in making inferences
about housing price trends. We estimate a model of housing price determination
and of the nonrandom selection of observed transactions. We analyze the factors
affecting the probabilities th a t transactions on different houses will be observed, and
we estim ate the effect of these factors upon housing prices. The analysis considers
a variety of plausible selection models, using non-parametric as well as parametric
61
methods. For each of the alternatives, the estimated effect of selectivity upon housing
price calculations is substantial.
The analysis is based on a unique body of data containing observations of all house
sales in Sweden during the period 1981-1993.
3.1 Introduction
Estim ates of the value of stocks of durable goods are derived from observations
on sales. Often the sales represent a small fraction of the stock, and imputations of
value may be crude. In the property market, appraisers use sales of houses or other
real property to estimate the values of other properties. Sales information is also used
to compute price indexes for the housing stock by relying upon a variety of statistical
techniques. These aggregate price measures, however, are derived from a very small
amount of information. In the U.S. single family housing market, for example, only
about seven percent of the standing stock is sold in any year. In most other countries
the fraction is even smaller. In the Swedish housing market, the source of the data
analyzed below, only about three percent of the stock turns over in a given year.
There are several mechanisms th at could generate a sample of house sales out of
a population of houses during any time interval. First, the observable characteristics
of houses or of time periods may affect the likelihood of a dwelling being traded. Life
cycle savings behavior may suggest that young households will purchase smaller, less
expensive dwellings and will “trade up” several times as circumstances permit. In this
case, with a growing population a sample of sales would include a disproportionate
share of these “starter homes.”
Second, the unobservable characteristics of houses sold frequently could differ from
those sold infrequently. For example, if some defects in dwellings were difficult for
potential purchasers to uncover, then as long as the number of transactions on a
house were public information, dwellings sold more frequently would sell for less than
those sold infrequently (regardless of their underlying quality). This is a standard
“lemons effect” arising from the asymmetry of information between buyer and seller
(see (Akerlof 1970)).
62
Third, house sales could be a random sample from the stock of houses. People
die; they are transferred; they move to other regions. For a variety of idiosyncratic
reasons, dwellings appear on the m arket in any given time interval.
Little empirical evidence exists on potential selectivity. Case, Pollakowski, and
Wachter (1997) analyzed the housing characteristics and price appreciation patterns
for houses in four U.S. counties. They compared houses which sold more frequently
with those sold less frequently, finding significant differences in types of dwellings and
patterns of price change. Gatzlaff and Haurin (1997) analyzed house sales in Dade
County, Florida. Clapp, Pollakowski, and Lynford (1992) analyzed house sales in
Connecticut, and Jud and Seaks (1994) analyzed house sales in Greensboro, North
Carolina.1 These studies provide weak evidence th a t house sales are not a random
sample of the stock of houses. Presumably, failure to account for nonrandom selection
of houses biases statistical analyses based on samples of observed sales.
This paper extends these analyses in two ways. First, it provides a more complete
analysis of the nature of nonrandomness in samples of housing transactions than
has been previously reported. We present and test several models of the selection
process. In the most general model, we postulate th a t the probability th a t a dwelling
is sold at two points in time varies systematically with its physical characteristics and
with the specific tim e periods themselves. We also test special cases of this model,
including the hypothesis that the number of sales of any dwelling in a given time
interval depends only upon the characteristics of the dwelling. We test these models
using nonparametric as well as param etric methods (Ahn and Powell 1993), and we
test specifically for “lemons” behavior in the housing market (Akerlof 1970).
Second, the paper provides a far more complete quantitative analysis of the effects
of these forms of selectivity on housing price calculations. We accomplish this by
analyzing all single family housing transactions in Sweden during a 13-year period;
the analysis is based on almost half a million transactions including more th an 100,000
repeat sales of owner-occupied dwellings. We estim ate the nature and incidence of
1All of these studies deal with the selection problem within a repeat-sales framework, i.e., they
compare a repeat-sales price index with an index computed from single sales. DiPasquale and
Somerville (1995) analyze the selectivity of single sales compared to unsold dwellings.
63
selectivity in samples of house transactions for each of the eight administrative regions
in the country. We use this information to analyze the effects of sample selectivity
on measures of housing prices in each of these regions.
We find in all cases that samples of sold dwellings are decidedly nonrandom sam
ples of the housing markets from which they are selected. In general, the probability
that any house sells depends upon the physical characteristics of the dwellings and
the time period under consideration.2 We also find, with one im portant exception,
that this selectivity has substantial effects upon estimates of housing prices. In seven
of the eight regions in Sweden, selectivity-corrected price indexes show smaller price
increases over the 13-year period investigated. The differences are reasonably large
and are consistent across various selection models, suggesting that, over this period,
the price appreciation of sold houses was 5 to 11 percent larger than the unrealized
capital gains on elements in the larger stock of unsold dwellings.
We find essentially no evidence that the unobserved characteristics of dwellings
affect housing prices after controlling for those observable characteristics which influ
ence the frequency of sale. Apparently, the transactions costs of buying and selling
are large enough, relative to the cost of repairing defects, to prevent disappointed
purchasers from disposing of lemons.
Section 3.2 presents a simple model of housing sales and selectivity. Section 3.3
outlines the estimation procedure. Section 3.4 describes the d ata utilized. Section
3.5 presents empirical estim ates of the model and reports their implications for the
estimation of aggregate housing prices. Section 3.6 is a brief conclusion; an appendix
provides more detail on the sample selectivity issue.
2Of course, to the extent that sale probabilities also vary by region, the empirical models reflect
this as well. There is some evidence that sale probabilities vary by neighborhood (Jud and Seaks
1994). We do not measure neighborhood directly, but in several of the models reported below we
do control for unobserved characteristics of dwellings (e.g., the neighborhoods in which they are
located) in a perfectly general way.
64
3.2 Repeat Sales and Sample Selectivity

An accurate measure of aggregate housing prices must account for the physical
diversity within the housing stock. We utilize a method which controls this hetero
geneity by comparing the observed sales price of the same unit at two points in time
(see Bailey, M uth, and Nourse (1963)). W ith quality held constant, changes in price
are attributed solely to the effect of time. However, limiting the sample to dwellings
that sell two or more times greatly reduces the fraction of the stock represented in the
data. For reasons noted above, this may leave the resulting estimates of the aggregate
price index particularly susceptible to sample selection bias.
To analyze this, let i and t index dwellings and time periods, respectively. Define
Pit as the logarithm of house value (i.e., selling price), Xu as the set of relevant
characteristics of the physical structure, including location, Du as a set of dummy
variables with a value of one for the time period of sale (and zero otherwise), and £u
as a well-behaved error term . Then we may express the price as
Pit = Xu(3 + DuS + £u (3.1)
where p and S represent vectors of hedonic coefficients. The price difference between
two sales of the same unit a t time t and r is
P it - P ir = (X u - X i r ) P + ( D it — D u ) S + £u ~ Sit • (3.2)
If the set of physical characteristics remains unchanged over time, i.e., X u = X iT,
then equation (3.2) simplifies to
P it - P ir = D i36 + UitT , (3.3)
where
1 if s =t
1 if S = T (3.4)
0 otherwise
and,
V iIt — S u S ir • (3.5)
65
Estim ates of the effect of time are obtained by regressing the difference in log
price on Du- Because the characteristics of the dwelling unit are identical at the
time of the two sales, quality is held constant. Requiring only the transaction prices
and dates for two sales from the same unit, the model is a parsimonious means of
obtaining estimates of the course of aggregate housing prices.
Following Gatzlaff and Haurin (1997), a sale is observed as the result of two price-
generating processes. Let P ° be the log offer price made in period £ by a potential
buyer of unit i, and P$ be the reservation price held by the owner and potential seller
of unit i. These prices can be described by
Pg = P« + eg, (3.6)
and
Pg = P<t + 4 - (3.7)
Offer prices reflect buyers’ preferences, reservation prices, and perceptions of mar
ket conditions. Reservation prices reflect sellers’ costs of waiting as well. We assume
the errors in equations (3.6) and (3.7) are well-behaved.
A sale occurs when the price offered by a potential buyer, P °, is at least as large as
the reservation price held by the potential seller, P * . Because d ata are generated only
for sold dwellings, the expected transaction price of an observed sale, the expectation
of equation (3.1), is
E(P„) = X ltf) + D„6 + E ( t a | P g > Pg), (3.8)
where (3 and S are the hedonic coefficient vectors. Estimates of 0 and d are subject
to bias if sample selection is not random. In the repeat sales model, equation (3.3),
an observation is generated only if two sales of the same unit occur, th at is, only if
Pg>Pg and Pg>Pg. (3.9)
The expected difference in log price for the sample of observed sales is
E(Pa - Pir) = Di,S + E ( v „ r \ P g > P g , P ° > P g ) . (3.10)
66
As in the single-period model, the estim ated coefficients in equation (3.10) are subject
to bias if the conditional expectation of the error term is nonzero.
As shown by Heckman (1979), consistent estimates of the coefficients in equation
(3.8) or (3.10) may be obtained by modeling the process th at selects dwellings into
the set of observed sales. Heckman shows that the inclusion of the inverse Mills’
ratio, derived from the selection process, in the subsequent regression yields unbiased
estimates of the parameters, despite nonrandom sample selection. The selectivity-
corrected repeat sales model associated with equation (3.10) is
Pit — Pir = Dia5 -f 'J/Ajtr + UJitr , (3-11)
where A*tr is the inverse Mills’ ratio associated with an observation of paired sales
at times t and r , and uuT is a well-behaved error term. Thus, unbiased estim ates of
aggregate price movements in the stock of housing may be based on the nonrandom
sample of dwellings sold two or more times during a time interval.
3.3 The Estimation Procedure

As indicated above, a house sale is observed in period t if and only if the price
offered by a potential buyer exceeds the reservation price of the current owner. Let
Sur equal one if the ith dwelling is sold in period t and also in period r. In general, the
probability th a t SuT equals one depends both on the specific time periods involved—
e.g. because mobility varies—and on house characteristics—e.g. because smaller
houses, “starter homes" are easier to sell. This may be expressed as
prob(SitT = 1) = prob(f(Zi, t, r) -I- T)itT > 0), (3.12)
where Zi is some set of physical characteristics, and the composite error term r)itT
includes any idiosyncratic characteristics of the sellers and prospective buyers of
dwelling i at t and r.
Equation (3.12) may be estim ated as a probit and the inverse M ills’ ratio A<tr
computed directly for inclusion in equation (3.11).3 In this formulation, the proba
3The inverse Mills’ ratio is defined as Aitr = where 4> is the standard normal density
function and $ is the cumulative normal density function.
67
bility of sale of a dwelling in two specific periods is a function of the specific tim e
periods involved and some set of physical characteristics, Z.
Simpler special cases may be more plausible. For example, suppose the probability
of a particular house being sold at t is independent4 of its probability of sale at r , i.e.
prob(SitT = 1) = prob(S*t = 1) • prob(S*T = 1), (3.13)
where
prob(S*t - 1) = prob(g(Zi,t) + r)it > 0). (3.14)
Alternatively, and still more restrictively, suppose the probability of sale is a

function only of the characteristics of the dwelling itself, i.e.,
prob(Sit = 1) = prob(S*i = 1) (3.15)
where
prob(S*t* = ) = prob(h(Zi) + 77* > 0). (3.16)
This special case may reflect the belief th at “starter homes” are equally likely to
sell in any tim e period and are more likely to sell than larger and more expensive
properties (see Case, Pollakowski, and W achter (1997) for a discussion). Note th at,
in these selectivity models, the probability of sale is a function of characteristics
observable to buyers and sellers. See footnote 8, below, for evidence from models
where we postulate th at unobservables also affect the probability of sale and the
selling price.
3.4 The Data

The d ata used in this analysis consist of all sales of owner-occupied housing in
Sweden during the period from January 1, 1981 through August 28, 1993. C ontract
data reporting the transaction price for each sale have been merged with tax assess
ment records containing detailed information about the characteristics of each house.
The merged d ata set contains 462,749 observations on sales from 393,908 separate
69
dwellings in eight adm inistrative regions. Figure 3.4 indicates the regional character
of the data. The largest conurbations are located in Region I (Stockholm), Region
V (Gothenburg), and Region IV (Malmo). Time is recorded in 26 half-year intervals.
The d ata set is exceptional in its detailed description of each dwelling at the date of
sale and its identification of repeat sales. These data are described in more detail in
Englund, Quigley, and Redfearn (1998).
The selection process is estim ated from observations on the attributes of each
dwelling and a set of dummy variables indicating two half-years of potential sale.
The dependent variable in the most general selection model (SitT, from equation
(3.12)) has a value of one if the dwelling was sold in both half-years indicated by
the dummy variables. Each dwelling is observed in 325 (=26*25/2) pairs of half-
year periods. Dwellings are observed to sell up to eight times during the period.
A part from the characteristics of the dwelling, one additional independent variable is
included in the analysis: gross migration, the total number of in and out migrants,
measured separately by region and half-year interval.
Table 3.1 indicates the extent to which the reliance on unchanged repeat sales
limits the size of the available sample. The large majority of dwellings were sold only
once during the 13-year sample period. The tail of the distribution of sales is long but
thin—note th at 334,007 dwellings sold once between 1981 and 1993, but only 52,097
dwellings with unchanged characteristics were exchanged twice. Only 7,804 dwellings
with unchanged characteristics sold three or more times. Note th at we restrict the
sample to transactions without changes in physical characteristics between sales. See
Englund, Quigley, and Redfearn (1999) for an indication of the importance of this
restriction (which typically cannot be made in repeat-sales studies).
It is quite clear th a t estim ation of house price indices using repeat sales (i.e.,
equation (3.3) utilizes d ata covering but a small fraction of sold homes, and an even
smaller fraction of the entire stock. The sample of dwellings sold during the entire
sample period represents only about 25 percent of the stock of single-family houses in
Sweden. This sample shrinks when restricted to unchanged repeat sales, accounting
4See Gatzlaff and Haurin (1997) for a discussion.
Table 3.1: Frequency of Sales

Region Total Total
Number number of number of
of sales I II III IV V VI VII VIII dwellings transactions
Dwellings sold only once during sample period:
1 47,100 59,170 34,013 54,806 67,014 38,440 14,455 19,009 334,007 334,007
Dwellings sold two or more times and unchanged between sales, by number of sales:
2 6,766 9,576 5,197 9,449 10,034 5,670 2,285 3,120 52,097 104,194
3 811 1,301 697 1,238 1,317 760 304 438 6,866 20,598
4 112 154 73 120 170 73 39 52 793 3,172
5 14 34 5 15 18 2 7 11 106 530
6 2 12 1 3 1 1 2 5 27 162
7 1 6 0 0 1 0 1 1 10 70
8 0 1 0 0 0 0 1 0 2 16
Total dwellings sold two or more times and unchanged between sales:
7,706 11,084 5,973 10,825 11,541 6,506 2,639 3,627 59,901 128,742
-4
o
71
for only five percent of the housing stock.

Tables 3.2 and 3.3 reports averages of selected housing attributes as a function of
the frequency of sale of dwellings in each of the eight regions.
The sample is divided into single and repeat sales, and then further restricted to
dwellings that sold three or more times. The pattern is clear: newer, smaller, lower
quality, and lower priced houses sell more frequently. Repeat-sale dwellings are also
more likely to be close to the center of the local labor market, and are less likely to be
detached units. The table provides support for the notion th at lower priced dwellings
sell more often, but it also suggests th at the population of repeat sales may not be
representative of the larger stock of dwellings.
3.5 Sample Selectivity and House Prices

The most general form of the probit selection model, equation (3.12), is, assuming
linearity,
prob(Sitr = 1) = prob(aZi + 77) + 6lM t + 0TM r + rfitT > 0), (3.17)
where Z{ is a vector including 11 characteristics of dwelling i: M j represents gross

migration at time j , and 7) is a vector of 26 variables measuring time periods, with a
value of one for each of the two periods in which a sale is recorded, and zero otherwise.
The symbols a ,7 , and 9 represent estim ated coefficients, and % T is a composite error
term, assumed normally distributed. Each dwelling is observed 325 times, with the
periods in which sales occur noted in the vector T .5
The more restrictive models of selectivity, equations (3.14) and (3.16), are
prob(S't = 1) = prob(aZi + 7 7) + 6M t + r)u > 0), (3.18)

5The probit models in equation (3.17) are estimated using choice-based samples drawn from the
325 alternative ways in which two sales can be consummated in 26 half-year periods. The samples
include observations on all double sales and a random sample of approximately five percent of
the alternative time-period combinations in which repeat sales did not occur. These choice-based
samples are weighted according to the technique suggested by Manski and McFadden (1981). The
sample sizes for the probit results range from 486,000 in Region VII to 2,219,000 in Region V.
72
CO oo ^ © ^ co^ » h ^
u3 ^3 u*-,h3 S20
*—*** tj«
5.353
(5.33)
(0.49)
t- © ,*cocOrHts-5'Oc^co$3E:*HS Qif2^H
+
CO ©
H ^CSrHC0© t > © s—
O" ©w O © O ©s—O o O ' © s—
O■'© O o
W ^ Sw* q © ' 1'
5.405
co
(0.48)
(5.36)
to h ^ C*-hO ^ N O ^ ^co
COi sO: fT*© ^ 1 0 ^ fT ' co
oco (£) 00 00 c i s ^ 0
> - 5?c<Ss^ co© o© © © © © 2 0 2 0 © d 2©
H
I ^ ^ ^ ^ ^ C S ^ b - ^ C S ^CS ^ £J fT' 3
5.261
(0.47)
(5.26)
CO © h00
© © 2 °° 2 °Q 52
CO oC dS dcdSa^d*t '©- *©©©^ ©f‘1 ©’ ©° 9©*©l2©^©c1© ©
- r'©1©^ ©c1© ‘©^
Table 3.2: Average Characteristics of Dwellings as a Function of Sales Frequency: Regions I-IV
<«*
6.672
(0.42)
(8.43)
*— n 00 00
cs s h -ocio^î odoq^f î H
00 o oW
2 ci o» ^M: c o t-
•;
CS
h 2
^ ^
*-* rzr-
CO CO CO S
JJ i f l- £co cS ^ P
+ »H cs w? d
0 © «• co . **
CO
co
© w
csK
t-4
co 3rH
*■ '**
*—1” «s
©
i-4 »oO o ®
—
, o .0 o“ f i o g o o © o © ©, ©
7.282
(0.40)
(9.15)
o
CM s, S
1© 10 (s.© *—©« $ . «t o«-- f5 S ' eS S ' §>
^ CO CN Is-
ri g ri ® w ^
B c* sCO i nWM_c Ws njs,H ©» o O ^© d © o o
^00 C ÔcsO,
8.295
(0.38)
(9.58)
*• C 1^ 1 C H ^ cs ^ co © cr ©
^ ô ^cs
CO^©aO©©t-COts - 22aO£J©22 co 'O
10 ,-H 00 CS *
31J 00
4* 00
h-
ococioitvliop^'l^® ^^^
co Ww 2 w 5 5 r t ° £ ' 0 w ©
0 S2 ©° 2£ © 2 © S ' © S ' ®
'CwO ^ ©
5.348
(0.45)
(6.60)
CO o CO ^ C- "
e^^c ot - i uo^Nh-co S ocsS^ rw^ ^Sco
Tf w hoocs co £2 01 52 co br 00 2 o
+ cs pcs SJ CS s j H w H CO t - h
CO 10 y ^ C S pwHgCj 0©© C
“ •'
0©O©OoO©
w w w w w w w © © ©, © © © 2 , d
5.558
©
(0.43)
(6.80)
SSTS o ^w
^
._ • ^ 00. d
S o?,iO;Sfibf SglCS
>e' 2N 85V:s ur jt 2V2 oS^3^ 4T
-s <V^hs 5NS
*<N ST» «3 £T>Ol
Ei H: *M> 2p c5
i v2 tt '. 5
+ cd
cs 3 g Cl fH « r-
O
o o. o P. o .0. o P. o .0. d o d o d
dH O
5.762
(6.87)
© (0.41)
co ^ !=<©© O
©
CO CO © CO S3
w O' H*
00
SJ m c 3 2 8 g i S oq «i cs3 g
$ m o g o , o g o £ o o o o o g o
4.236
(5.89)
(0.49)
CO © ■K-s K . «—N © d * s CS z * ' © *—s H ^ © '’■"t TfC ' CO ©

f- © © C 0 c0 p 3 , 0 « © ^ © © © t ^ © ^ l 0 ts-C 0 ©
+
CO
© cs o d © c s e d d © * r H * u * ‘ ^ * c s , * ,"H » ,",* » u 5
H
h-
© ^3 S d o o j d o d o o 0 6 2 ©2®
CO „ <r'©'-“'C s^scs^rscsc^cs^xf^T '© ôo

4.636
© ©
irt cr
(0.48)
(6.18)
CO
CO
1-4 cs
■+■ 00
o
H
CspCScS^CO^CS^COÎf-iCO^Hp©
© 2PH SW S(>.^«o © 2 ® S ® w ® 2 © 2 © 2 © S* ©
e
H O ^ « < 7 N J J S I 2' O O f ' O C N
4.806
(0.46)
(kilometers)________ (6.04)
CO ©
cs co oo es oo
CO
o oi 8 c i ® ’
© scês —l o ^ N—^£c2n9^ 9F 2H' tonte,^:2®>
^îo
CO in
© cs, ch cn d g d g o o o o o d c i c5 d o o
Is-
Dist. to center
a o a * a -a
(fraction)
.2 J .2 h .2 ■*
?(£& & &cn 1.1 1 1 jfi.
Ihble 3.3: Average Characteristics of Dwellings as a frinction of Sales frequency: Regions V-VHI
(Standard Deviations in parentheses)

Region V VI VII VIII
Times sold 1 2+ 3+ 1 2+ 3+ 1 2+ 3+ 1 2+ 3+
Price 506.68 478.75 450.10 383.73 384.57 371.59 376.13 383.35 375.04 405.69 402.70 387.52
(000s SEK)
Year built 55.25 57.27 58.00 52.22 53.69 53.66 51.24 54.01 54.87 55.80 57.89 58.18
(19xx) (23.7) (22.7) (22.4) (24.8) (24.6) (24.4) (25.2) (24.4) (23.6) (24.8) (23.7) (23.3)
Interior size 119.03 116.93 114.24 116.17 116.64 114.26 116.77 116.10 113.11 117.74 117.83 115.59
(square meters) (38.3) (36.8) (38.3) (37.9) (37.5) (37.0) (38.6) (36.1) (33.1) (36.0) (35.1) (31.6)
Parcel size 1140.73 1011.04 952.31 1405.92 1271.49 1212.83 1469.36 1214.25 1120.85 1280.36 1082.97 974.13
(square meters) (1145.4) (1041.4) (1016.1) (1240.6) (1144.5) (1126.2) (1309.0) (1088.9) (1068.3) (1177.3) (959.9) (765.7)
Two car garage 0.064 0.052 0.045 0.076 0.068 0.056 0.061 0.048 0.042 0.106 0.094 0.079
(fraction) (0.24) (0 .22 ) (0 .21 ) (0.26) (0.25) (0.23) (0.24) (0 .21 ) (0 .20 ) (0.31) (0.29) (0.27)
Sauna 0.180 0.171 0.163 0.186 0.191 0.182 0.207 0.203 0.172 0.377 0.405 0.397
(fraction) (0.38) (0.38) (0.37) (0.39) (0.39) (0.39) (0.41) (0.40) (0.38) (0.49) (0.49) (0.49)
Detached House 0.808 0.743 0.698 0.897 0.853 0.829 0.874 0.816 0.784 0.885 0.838 0.791
(fraction) (0.39) (0.44) (0.46) (0.30) (0.35) (0.38) (0.33) (0.39) (0.41) (0.32) (0.37) (0.41)
Stone/Brick ext. 0.296 0.275 0.253 0.248 0.234 0.210 0.165 0.163 0.156 0.190 0.189 0.171
(fraction) (0.46) (0.45) (0.44) (0.43) (0.42) (0.41) (0.37) (0.37) (0.36) (0.39) (0.39) (0.38)
Fireplace 0.354 0.313 0.276 0.386 0.366 0.355 0.357 0.326 0.297 0.295 0.275 0.263
(fraction) (0.48) (0.46) (0.45) (0.49) (0.48) (0.48) (0.48) (0.47) (0.46) (0.46) (0.45) (0.44)
Winter walls 0.201 0.185 0.172 0.189 0.192 0.186 0.210 0.212 0.198 0.286 0.291 0.291
(fraction) (0.40) (0.39) (0.38) (0.39) (0.39) (0.39) (0.41) (0.41) (0.40) (0.45) (0.45) (0.45)
Good kitchen 0.259 0.225 0.213 0.307 0.271 0.266 0.285 0.237 0.220 0.236 0.189 0.185
(fraction) (0.44) (0.42) (0.41) (0.46) (0.45) (0.44) (0.45) (0.43) (0.42) (0.43) (0.39) (0.39)
Good/ex. roof 0.798 0.746 0.702 0.786 0.764 0.745 0.609 0.595 0.583 0.501 0.431 0.375
(fraction) (0.40) (0.44) (0.46) (0.41) (0.43) (0.44) (0.49) (0.49) (0.49) (0.50) (0.50) (0.48)
Dist. to center 5.947 5.719 5.625 5.859 5.330 5.098 11.629 10.399 9.918 7.243 5.834 5.355
(kilometers) (5.85) (5.76) (5.64) (9.14) (8.07) (7.62) (14.47) (13.48) (12.83) (14.10) ( 12 .20 ) (11.18)
74
and
prob(S*t = 1) = prob(aZi + rju > 0), (3.19)
In equation (3.18), 7* is a vector of 26 time variables, with a value of one in the

time period in which a sale is recorded.6
Table 3.4 summarizes the results of the estim ated selectivity models using the
time-invariant probit model, equation (3.19).
By and large, the probit results confirm the patterns noted in Tables 3.2 and
3.3. Smaller dwellings with fewer amenities are more likely to trade. In general,
the results are not sensitive to the choice of model, even though the estim ated a
coefficients are generally less significant when time dummies are included. Further,
the gross m igration variables are only marginally significant when the model includes
dummy variables for time, as in equations (3.17) and (3.18). The results indicate that
the probability of sale, ceteris paribus, dropped sharply after 1991.
Table 3.5 summarizes the implications of these models of selectivity for the esti
mates of housing prices. The table reports the coefficient of the inverse Mills’ ratio
in the equation estimating the selectivity-corrected price index (i.e., the coefficient 4/
in equation (3.11)). It also summarizes the difference between prices computed from
the uncorrected estimator, equation (3.3), and the selectivity-corrected estimator,
equation (3.11). These results are reported for each of the three selectivity models,
equations (3.17), (3.18), and (3.19).
The coefficients of the inverse Mills’ ratio based upon these selection models are
large and highly significant in the estimation of the price index—at least for all
regions outside Stockholm. This indicates that sample selectivity “m atters” in the
computation of the appropriate housing price index.7 The inverse Mills’ ratio is
significantly positive and im portant for all three formulations of the selection model.8
6The probit models in equation (3.18) are estimated using samples which include all sales during
the period, but without distinguishing multiple sales of any property. The dependent variable for
these analyses is the sale or nonsale of each dwelling in each time period.
7The selection specification in equation (3.18) seems more plausible to us, but not to some others
who have read preliminary versions of this paper.
8 Strictly interpreted, the standard selection-correction method which underlies the results re
ported in Table 3.5 (and Appendix Table A l) requires that the errors in equations (3.10) and (3.12)
be jointly normally distributed. A test o f the restriction was made using a nonparametric technique
Table 3.4: Estim ated Coefficients from Time-Invariant Probit Selection Model
(t-statistics in parentheses)
Region I II III IV V VI VII VIII
Intercept -1.271 -1.412 -1.275 -1.413 -1.246 -1.196 -0.980 -1.204

(13.43) (16.78) (10.17) (15.93) (17.38) (10.07) (5.84) (8.07)
Interior size * -0.027 -0.010 -0.009 -0.014 -0.016 -0.002 -0.004 -0.006
(square meters) (4.29) (1.92) (1.32) (2.73) (3.30) (0.39) (0.43) (0.61)
Parcel size * -0.067 -0.051 -0.086 -0.043 -0.091 -0.118 -0.167 -0.095
(square meters) (2.27) (2.07) (2.37) (1.64) (4.25) (3.49) (3.44) (2.20)
Square of parcel size * 0.004 0.003 0.005 0.002 0.006 0.007 0.010 0.005
(square meters) (1.66) (1.64) (1.98) (1.29) (3.74) (3.08) (2.80) (1.57)
Tiled bathroom 0.021 0.005 0.005 0.016 0.004 0.003 0.005 -0.001
(1 = yes) (3.87) (0.86) (0.74) (3.50) (0.86) (0.49) (0.44) (0.05)
Sauna 0.013 0.009 0.009 0.004 0.006 0.007 0.000 0.012
(1 = yes) (2.94) (2.35) (1.60) (0.77) (1.61) (1.29) (0.00) (2.01)
Single detached house -0.017 -0.013 -0.007 -0.018 -0.012 -0.017 0.006 -0.008
(1 = yes) (2.90) (2.38) (0.92) (2.98) (2.24) (2.17) (0.50) (0.76)
Laundry room -0.005 0.003 0.004 0.000 0.005 0.003 0.005 -0.007
(1 = yes) (0.94) (0.60) (0.64) (0.11) (1.20) (0.62) (0.59) (0.94)
“W inter” walls/windows 0.015 0.015 0.011 0.013 0.010 0.010 0.016 0.010
(1 = yes) (3.21) (3.79) (2.06) (3.08) (2.70) (1.86) (2.00) (1.69)
Electric furnace 0.009 0.003 -0.001 0.007 0.003 0.003 -0.001 -0.012
(1 = yes) (1.57) (0.56) (0.07) ( 1.21) (0.55) (0.49) (0.07) (1.73)
Slate/copper roof 0.001 -0.003 -0.005 -0.006 -0.010 0.004 0.012 0.001
(1 = yes) (0.14) (0.77) (0.89) (1.75) (2.66) (0.88) (1.85) (0.26)
Distance to City Center 0.000 0.000 -0.001 0.001 0.000 0.000 0.000 0.000
(Kilometers) (1.30) (0.54) (2.15) (2.88) (0.22) (0.37) (0.02) (1.83)
Note: * - Variable measured in logarithms. cn
Table 3.5: Implications of A lternate Models of Sample Selectivity on House Price Estimates
A. Estim ated Coefficient of the! Inverse Mills Ratio in Price Index Equation:
(t-ratio in parentheses)
Equation (17) -0.151 1.124 2.587 2.496 1.680 2.084 1.460 1.693
(0.57) (5.39) (6.72) (9.12) (7.04) (7.18) (3.01) (4.64)
Equation (18) -0.037 2.258 3.991 4.901 3.092 3.920 3.294 3.782
(0.08) (6.61) (7.20) (10.64) (8.09) (7.73) (3.74) (5.35)
Equation (19) 0.017 1.577 2.568 2.907 1.895 2.624 2.229 2.206
(0.06) (7.69) (8.01) (10.94) (8.70) (8.44) (4.42) (5.48)
B. Average deviation between biased and selectivity-corrected price index:
(in percentage points; 1981:1=]LOO)
Equation (17) -0.33 2.07 4.19 4.60 3.26 3.83 2.54 3.24
Equation (18) -0.07 3.83 5.71 8.39 5.45 6.21 5.09 6.40
Equation (19) 0.07 5.50 7.92 10.51 7.17 8.62 7.23 7.99
C. Maximum deviation between biased and selectivity-corrected price index:

(in percentage points; 1981:1 = 100)
Equation (17) 0.68 4.26 10.10 10.25 7.05 7.97 5.36 6.53
Equation (18) 0.16 8.26 12.91 19.93 12.00 13.83 11.78 15.05
Equation (19) 0.14 11.91 18.58 24.52 15.01 19.97 16.80 17.54
o>
77
Figure 3.1: Effects of Selectivity upon House Price Indexes - Stockholm
Panels B and C in the table summarize the extent of the differences between the
biased estim ates of housing prices and the selectivity-corrected estimates. In this
comparison, we normalize the indexes at 100 at the beginning of the period, 1981:1
for each region, and compare the subsequent estimates. We report this comparison for
each of the three selection models. The average discrepancy between the uncorrected
and corrected price indexes is negligible in Stockholm (Region I) but quite large in all
other regions— ranging from two to eleven percentage points depending on model and
region. Figure 3.1 is based upon equation (3.18); it presents the biased and unbiased
estimates of housing prices for Stockholm, Gothenburg and Malmo, Sweden’s three
largest metropolitan regions, during the period 1987-93. As the figures illustrate, the
selectivity correction adds little in Stockholm, but the selectivity-corrected measures
of housing prices are quite a bit lower in the other two metropolitan areas. The
differences peak towards the end of the period, when the selectivity-corrected indexes
are 2-5 percent below the uncorrected indexes.9
suggested by Newey, Powell, and Walker (1990). (See also Ahn and Powell (1993)). The structure
of the selection correction terms, Attn A<t, and Aj in the different models is approximated through
a series of basis functions, whose arguments are the single-valued index function ZS. Numerous
combinations of the approximations were included in the second step regressions reported in Table
3.5 (and in Appendix Table A l as well) without any significant change in the estimated coefficients
in equation (3.11) or the resulting housing price indexes.
9Note that in all three selectivity models, the probability of sale in any period is a function of
78
The value of owner-occupied homes comprises about two thirds of household net
wealth in Sweden. This suggests th at correcting for sample selectivity lowers the
estimate of 1993 household wealth by 2-3 per cent, relative to its value in 1981. The
overestimate of valuation accumulates gradually over time b ut is arrested in 1991
when the housing price cycle in Sweden reached its peak. There is a slight tendency
in the opposite direction after 1991, suggesting th a t the direction of the bias might be
related to the housing price cycle itself.10 Generally the differences in time patterns
between the two index series are not dramatic. The differences in rates of change do
not exceed 1.4 percent in any half-year.
The appendix presents the results from a special case of the time-invariant model
of selectivity, equation (3.19), corresponding to a Poisson process generating house
sales from the population. In this case, the average deviation between the uncorrected
and the selectivity-corrected price index ranges between 5 and 10 percentage points
in the regions outside Stockholm. The maximum deviation approaches 24 percentage
points.
3.6 Conclusion
In this paper, we have examined the nature of the selection process that distin
guishes dwellings which are sold frequently from the entire stock of sold dwellings.
Specifically, we consider the influence of time and a dwelling’s physical attributes on
its probability of sale at two points in time. We have also explored the impact of
these relationships on the measurement of aggregate housing prices.
the observable characteristics of dwellings at time. We analyzed the extent to which unobservables
affected house prices by adding a variable indicating the number of times each house was sold to
equations (3.17), (3.18), and (3.19). After controlling for the observable characteristics of dwellings
which affect the frequency of sale, the additional variable reflects any unmeasured characteristics of
dwellings which affect sale frequencies. For seven of the eight regions, the coefficient estimate was
negative, but only in one case was the estimate significantly less than zero, providing only quite
weak evidence that “lemons” behavior in important in this market. Of course, if transaction costs
are 5-10 percent of sales prices, it would require the concealment of very expensive defects to induce
high turnover in the housing market.
10The tim e span covered by the data is too short to allow us to distinguish this from the alternative
interpretation that the biased indexes tend to overestimate consistently the rate of price change.
79
We find, using a sample of essentially all arm ’s-length sales in Sweden during a
13-year period, th at the selection process governing dwelling unit sales is distinctly
nonrandom, confirming earlier suggestive work. We also find th at the appropriate
correction for the selection process implies th at housing price appreciation is otherwise
overstated in a conventional repeat-sales price index.
The ramifications for national housing wealth may be substantial. The results
indicate average deviations in the estimated indexes attributable to sample selection
ranging between two and eleven percent towards the end of the period, a substantial
difference given the size of the housing stock. The implications are clear: the use
of transactions d ata requires careful consideration of the process th at generates the
observations, and the nonrandom nature of the selection process has a significant
impact on measured aggregate housing prices.
80
Bibliography
A braham , J . M ., W . N . G o e t z m a n n , and S. M . W a c h t e r (1994): “Homo

geneous Groupings of M etropolitan Housing M arkets,” Journal of Housing Eco
nomics, 3 (3 ), 186-206.
Ahn, H ., a n d J . L . P o w e l l (1993): “Semiparametric Estimation of Censored

Selection Models with a Nonparametric Selection Mechanism,” Journal o f Econo
metrics, 5 8 (1 -2 ), 3 -2 9 .
A kerlo f, G. (1970): “The M arket for Lemons: Qualitative Uncertainty and the
Market Mechanism,” Quarterly Journal of Economics, 84, 481-500.
B a ile y , M. J., R. F. M u th , and H. O. N o u rs e (1963): “A Regression Method

for Real Estate Price Index Construction,” Journal of the American Statistical
Association, 58, 933-42.
B la n c h a r d , O. J., and L. F. K a tz (1992): “Regional Evolutions,” Brookings

Papers on Economic Activity, 0(1), 1-61.
B r o w n e , L . E . (1992): “Why New England Went the Way of Texas Rather than
California,” New England Economic Review, 0(0), 2 3 -4 1 .
C a s e , B ., H. O . P o l l a k o w s k i, a n d S. M. W achter (1997): “Frequency of

Transaction and House Price Modeling,” Journal o f Real Estate Finance and Eco
nomics, pp. 173-87.
C A S E , K. E. (1991): “The Real Estate Cycle and the Economy: Consequences of

the Massachusetts Boom of 1984-87,” New England Economic Review, 0(0), 37-46.
81
C ase, K. E ., and R. J. S h il l e r (1987): “Prices of Single-Family Homes since

1970: New Indexes for Four Cities,” New England Economic Review, 0(0), 45-56.
---------- (1989): “The Efficiency of the Market for Single-Family Homes,” American
Economic Review, 79(1), 125-37.
----------- (1990): “Forecasting Prices and Excess Returns in the Housing M arket,”
American Real Estate and Urban Economics Association Journal, 18(3), 253-73.
---------- (1996): “Mortgage Default Risk and Real Estate Prices: The Use of Index-
Based Futures and Options in Real E state,” Journal of Housing Research, 7(2),
243-58.
C lapp, J ., H. O. P o l l a k o w s k i, and L. Lynford (1992): “Intram etropolitan Lo

cation and Office Market Dynamics,” American Real Estate and Urban Economics
Association Journal, 20(2), 229-57.
C lapp, J. M ., W . D o l d e , and D. T ir t ir o g l u (1995): “Imperfect Information

and Investor Inferences from Housing Price Dynamics,” Real Estate Economics,
23(3), 239-69.
C L A R K , T . E. (1998): “Employment Fluctuations in U.S. Regions and Industries:

The Roles of National, Region-Specific, and Industry-Specific Shocks,” Journal of
Labor Economics, 16(1), 202-29.
C ro m w ell, B. A. (1992). “Does California Drive the West? An Econometric Inves

tigation of Regional Spillovers,” Federal Reserve Bank of San Francisco Economic
Review, 0(2), 13-23.
D i P a s q u a l e , D ., a n d C . T . So m e r v il l e (1995): “Do House Price Indices Based

on Transacting Units Represent the Entire Stock? Evidence from the American
Housing Survey,” Journal o f Housing Economics, 4(3), 195-229.
E l l is o n , G ., and E. L. G laeser (1997): “Geographic Concentration in U.S.

Manufacturing Industries: A Dartboard Approach,” Journal o f Political Economy,
105(5), 889-927.
82
E ng lund, P ., J . M. Q u ig l e y , a n d C . L. R edfearn (1998): “Improved Price

Indexes for Real Estate: Measuring the Course of Swedish Housing Prices,” Journal
of Urban Economics, 44(2), 171-96.
(1999): “The Choice of Methodology for Computing Housing Price Indexes:

Comparisons of Temporal Aggregation and Sample Definition," Journal o f Real
Estate Finance and Economics, 19(2), 91-112.
---------- (1999a): “Do Housing Transactions Provide Misleading Evidence About the
Course of Housing Values?,” unpublished manuscript, pp. 1-26.
---------- (1999b): “The Choice of Methodology for Computing Price Indexes: Com
parisons of Temporal Aggregation and Sample Definition,” Journal of Real Estate
Finance and Economics, 19(3), 91-112.
G atzlaff, D. H. (1994): “Excess Returns, Inflation and the Efficiency of the Hous
ing Market,” Journal of the American Real Estate and Urban EconomicsAssocia-
tion, 22(4), 553-81.
G a t z l a f f , D . H ., a n d D . R. H a u r in (1997): “Sample Selection Bias and Repeat-

Sales Index Estim ates,” Journal o f Real Estate Finance and Economics, 14(1-2),
33-50.
---------- (1998): “Sample Selection and Biases in Local House Value Indices,” Journal
of Urban Economics, 43(2), 199-222.
G oetzm ann, W . N ., and S. M. W achter (1995): “Clustering Methods for Real

Estate Portfolios,” Real Estate Economics, 23(3), 271-310.
H eckman, J. J . (1979): “Sample Selection Bias as a Specification Error,” Econo-

metrica, 47(1), 153-61.
H enderson, V. (1997): “Medium Size Cities,” Regional Science and Urban Eco
nomics, 27(6), 583-612.
83
H il l , R. C ., C. F. S ir m a n s , a n d J. R. K n ig h t (1999): “A Random Walk Down

Main Street?,” Regional Science and Urban Economics, 29, 89-103.
Johnes, G ., and T. H yc lak (1999): “House Prices and Regional Labor M arkets,”
Annals o f Regional Science, 33(1), 33-49.
Jud, G. D ., and T . G. Seaks (1994): “Sample Selection Bias in Estim ating Housing
Sales Prices,” Journal of Real Estate Research, 9(3), 289-98.
Kim, S. (1995): “Expansion of Markets and the Geographic Distribution of Eco

nomic Activities: The Tends in U.S. Regional Manufacturing Structure, 1860-
1987,” Quarterly Journal of Economics, 110(4), 881-908.
Kuo, C .-L . (1996): “Serial Correlation and Seasonality in the Real E state M arket,”
Journal o f Real Estate Finance and Economics, 12, 139-62.
L each, J . C ., and A. N. M adhavan (1992): “Intertemporal Price Discovery by

Market Makers: Active versus Passive Learning,” Journal of Financial Intermedi
ation, 2(2), 207-35.
---------- (1993): “Price Experimentation and Security Market Structure,” Review of

Financial Studies, 6(2), 375-404.
L o, A . W ., and A . C . M a c K i n l a y (1999): A Non-Random Walk Down Wall

Street. Princeton: Princeton University Press.
M a l p e z z i, S. (1999): “A Simple Error Correction Model of Housing Prices,” Journal

o f Housing Economics, 8(1), 27-62.
M a n s k i, C. F ., and D. M c Fa d d e n (1981): Structural Analysis o f Discrete Data

with Econometric Applications. Cambridge, MA: M IT Press.
M e e n , G ., and M . A n d r e w (1998): Modelling Regional House Prices: A Review o f

the Literature. A Report Prepared fo r the Department o f the Environment, Trans
port, and the Regions.
84
M eese, Rm and N. W allace (1991): “Nonparametric Estimation of Dynamic

Hedonic Price Models and the Construction of Residential Housing Price Indices,”
American Real Estate and Urban Economics Association Journal, 19(3), 308-32.
N ew ey, W . K ., J. L. P o w e l l , and J . R. W alker (1990): “Semiparametric

Estimation of Selection Models: Some Empirical Results,” American Economic
Review, 80(2), 324-28.
P o l l a k o w s k i, H. O ., a n d T . S. R ay (1997): “Housing Price Diffusion Patterns

at Different Aggregation Levels: An Examination of Housing Market Efficiency,”
Journal o f Housing Research, 8(1), 107-24.
POTERBA, J. M. (1984): “Tax Subsidies to Owner-Occupied Housing: An Asset-

Market Approach,” Quarterly Journal o f Economics, 9 9 (4 ), 7 2 9 -5 2 .
Q uan, D. C ., and J. M. Q u ig l e y (1991): “Price Formation and the Appraisal

Function in Real Estate Markets,” Journal o f Real Estate Finance and Economics,
4(2), 127-46.
Q u ig l e y , J. M. (1995): “A Simple Hybrid Model for Estimating Real Estate Price

Indexes,” Journal of Housing Economics, 4(1), 1-12.
Q u ig l e y , J. M ., and R. V an O rder (1991): “Defaults on Mortgage Obligations

and Capital Requirements for U.S. Savings INstitutions: A Policy Perspective,”
Journal o f Public Economics, 44(3), 353-69.
R e ic h e r t , A. K. (1990): “The Impact of Interest Rates, Income, and Employment

upon Regional Housing Prices,” Journal o f Real Estate Finance and Economics,
3(4), 373-91.
R e in g a n u m , J. F. (1979): “A Simple Model of Equilibrium Price Dispersion,”

Journal o f Political Economy, 87(4), 851-58.
Romer, D. (1993): “Rational Asset-Price Movements without News,” American

Economic Review, 83(5), 1112-30.
85
R o t h s c h i l d , M. (1974): “Searching for the Lowest Price When the Distribution of

Prices Is Unknown," Journal o f Political Economy, 82(4), 689-711.
S a m u e l s o n , P . (1965): “Proof th a t Properly Anticipated Prices Flucuate Ran

domly,” Industrial Management Review, 6, 41-49.
SoDERBERG, B . (1995): “Transaction Costs in the M arket for Residential Real Es
tate,” Department of Real Estate Economics Working Paper no. 20, Royal Institute
o f Technology, Sweden, 20, 1- 12.
S p i e g e l , M ., a n d W . S t r a n g e (1992): “A Theory of Predictable Excess Returns
in Real Estate,” Journal of Real Estate Finance and Economics, 5(4), 375-92.
S t e i n , J . C. (1995): “Prices and Trading Volume in the Housing Market: A Model

with Down-Payment Effects,” Quarterly Journal o f Economics, 110(2), 379-406.
T e r k l a , D . G ., a n d P . B . D o e r i n g e r (1991): “Explaining Variations in Employ

ment Growth: Structural and Cyclical Change among States and Local Areas,”
Journal o f Urban Economics, 29(3), 329-48.
V a r i a n , H. R. (1980): “A Model of Sales,” American Economic Review, 70(4),

651-59.
W i l l i a m s , J. T . (1995): “Pricing Real Assets with Costly Search," Review o f Fi
nancial Studies, 8(1), 55-90.
86
Chapter A
Swedish D ata
Housing transactions d a ta are recorded routinely in Sweden and are m aintained

historically by the Statistics Sweden in Stockholm. The raw data utilized in this
study consist of all residential (non-farm) housing sales recorded during the 1981:1-
1993:111 period, divided into eight geographical regions. These regions were defined by
Statistics Sweden for adm inistrative purposes. The four regions used in this analysis
axe the four most populous, and most urban, of the eight administrative regions.
Sales prices are taken directly from the sales contract that is subm itted to court
in order to obtain legal confirmation of ownership. Information about housing char
acteristics derives from forms subm itted by the homeowner to the tax authorities,
which is used to assess properties.
Table A .l reports the distribution of dwellings by the number of sales during the
13-year period in each region. About three-quarters of the dwellings in the sample
were sold once, but the distribution of sale has a long tail with each region. The
entire sample (all eight regions) consists of 533,894 transactions on 423,963 dwellings.
Table A.2 summarizes the average characteristics of dwellings sold in the four
“urban” regions during this period. As indicated in Table A.2, the average selling
price of dwellings in Stockholm, 772,655 SEK (roughly $100,000 U.S.) was more than
55% larger than average in the other regions. The average interior size, about 120
square meters, was quite similar across regions, while the average lot size was much
87
Table A.l: Distribution o f Housing Sales

# of Sales Stockholm Uppsala Malmo Gothenburg
1 47100 59170 54806 67014

2 10083 12867 12858 14429
3 1829 2704 2759 2798
4 273 364 397 404
5 40 66 67 48
6 3 14 2 3
7 2 7 1 1
8 - 1 - -
smaller in Stockholm, 827 square meters, th a n in the other three regions, which
average approximately 1100 square meters.
The raw d ata include a wide variety of indicators of the quality and amenity of
dwellings. These include two size variables and dummy variables for the number of
garages; nine dummy variables recording various amenities, including, for example,
the existence of a fireplace, a sauna, a laundry room; and 12 dummy variables mea
suring the quality of insulation, heating system s, kitchens, and roofs. In addition,
the vintage (year of construction) and age a t th e time of sale is recorded for each
dwelling.
Each dwelling is located in one of 111 well-defined labor market areas. Stockholm
is a single labor market area; other regions are composed of several labor market
areas (up to 24 for Gothenberg). For each dwelling, the straight line distance to the
economic center of its labor market area is recorded.
In addition to these qualitative and quantitative aspects of the dwellings, the
existence of transferable capital subsidies is m easured. Beginning in 1975, the Swedish
government provided loans with guaranteed interest rates to the purchasers of newly
constructed dwellings. The rules governing these subsidies varied over time, i.e., with
year of construction. They also varied with the size and construction cost of d w e llin g s
Because the subsidy is tied to a specific dwelling, in the event of sale it is potentially
capitalized into the selling price for a used dwelling. Table A.2 indicates th at the
88
Table A.2: Average Housing Characteristics

Sales Price 772.655 475.247 438.664 496.385

(000’s Crowns, SEK) (462.05) (239.55) (283.62) (346.02)
Size
Interior Size 122.004 119.483 119.678 118.256
(square meters) (35.98) (36.38) (39.88) (37.78)
Parcel size 827.392 1138.848 1084.492 1092.914
(square meters) (814.00) (1138.62) (1080.94) (1109.95)
One car garage 0.705 0.706 0.581 0.621
(l= yes) (0.46) (0.46) (0.49) (0.49)
Two car garage 0.047 0.078 0.044 0.059
(l= yes) (0.21) (0.27) (0.20) (0.24)
Amenity
Tile Bath 0.118 0.091 0.143 0.110
(l= yes) (0.32) (0.29) (0.35) (0.31)
Sewer connection 0.988 0.984 0.974 0.977
(l= yes) (0.11) (0.13) (0.16) (0.15)
Sauna 0.217 0.216 0.122 0.177
(l= yes) (0.41) (0.41) (0.33) (0.38)
Stone/brick 0.234 0.353 0.548 0.288
(l= yes) (0.42) (0.48) (0.50) (0.45)
Single detached 0.664 0.807 0.865 0.784
(l= yes) (0.47) (0.39) (0.34) (0.41)
Finished basement 0.162 0.185 0.134 0.171
(l= yes) (0.37) (0.39) (0.34) (0.38)
Fireplace 0.368 0.342 0.259 0.339
(l= yes) (0.48) (0.47) (0.44) (0.47)
Laundry room 0.842 0.817 0.784 0.811
(l= yes) (0.36) (0.39) (0.41) (0.39)
W aterfront location 0.007 0.005 0.004 0.004
(l= yes) (0.08) (0.07) (0.07) (0.06)
Quality
Age at time of sale 26.572 30.738 39.674 30.578
(Years) (20.48) (24.75) (28.42) (23.46)
Vintage 59.915 55.705 47.057 55.995
(19xx) (20.35) (24.63) (28.33) (23.33)
89
Table A.2: Average Housing Characteristics (continued)

“W inter Quality” Insulation

Walls only 0.163 0.191 0.179 0.195
(l=yes) (0.37) (0.39) (0.38) (0.40)
Walls and windows 0.832 0.802 0.802 0.791
(l=yes) (0.37) (0.40) (0.40) (0.41)
Kitchen
Good 0.789 0.760 0.687 0.725
(l=yes) (0.41) (0.43) (0.46) (0.45)
Excellent 0.198 0.217 0.279 0.247
(l=yes) (0.40) (0.41) (0.45) (0.43)
Heating system
Electric radiator 0.400 0.322 0.323 0.359
(l=yes) (0.49) (0.47) (0.47) (0.48)
Electric furnace 0.111 0.086 0.090 0.106
(l=yes) (0.31) (0.28) (0.29) (0.31)
Solar/other 0.344 0.367 0.478 0.424
(l=yes) (0.48) (0.48) (0.50) (0.49)
Exterior steam 0.083 0.176 0.067 0.037
(l=yes) (0.28) (0.38) (0.25) (0.19)
O ther central heat 0.050 0.021 0.021 0.051
(l=yes) (0.22) (0.14) (0.14) (0.22)
Wood burning stove 0.009 0.021 0.009 0.018
(l=yes) (0.09) (0.14) (0.09) (0.13)
Roof
Cem ent/steel 0.009 0.009 0.015 0.013
(l=yes) (0.10) (0.09) (0.12) (0.11)
S late/other 0.663 0.768 0.657 0.766
(l=yes) (0.47) (0.42) (0.47) (0.42)
Other
Distance to center 4.744 5.685 5.318 5.863
(l=yes) (6.09) (6.84) (5.30) (5.81)
Urban Area 0.903 0.791 0.757 0.745
(l=yes) (0.30) (0.41) (0.43) (0.44)
Capital 2.979 2.847 2.361 2.845
(l=yes) (12.16) (11.14) (10.85) (11.64)
Conditional Subsidy 25.863 22.501 24.769 24.013
(l=yes) (26.30) (23.20) (26.08) (25.21)
90
present value of the average subsidy for all transactions is this sample is small, about
3000 SEK or $400 U.S. However, for those transactions on a subsidized dwelling, the
conditional average value of the remaining capital subsidy is as high as 26,000 SEK or
$3300 U.S. The average conditional subsidy is approximately 3% to 6% of the average
sales price.
This appendix borrows heavily from Englund, Quigley, and Redfeam (1999b)
91
Chapter B
The Hybrid M ethod
The aggregate housing price series used in the tests of market efficiency is con
structed using the method developed in Englund, Quigley, and Redfearn (1998). It
is a hybrid index, originally suggested in Quigley (1995), and is discussed in Case
(1991). The intent of this index construction method is to make full use of a data
set rich in information about dwelling characteristics and in which repeats sales are
identified. T he method utilizes multiple sales of the same dwelling to estim ate the er
ror structure of the price generation process. This supports generalized least squares
estimation of a hedonic regression which uses all observed housing sales. In this way,
much more information is used in the estimation of the aggregate prices than is used
in repeat sales methods, improving the efficiency of the param eter estim ates and the
accuracy of the resulting housing price index.
Assume th a t the sale price of a housing unit is an amalgam, P Q , of an index
representing the price, P , of a housing unit and another index representing the level
of services, Q, em itted by th at unit. To represent this, suppose
Vu = Qu + Pt + Mu , (B.l)
where Vu is the logarithm of the observed selling price of dwelling i a t time t, Qu, is
the log of the quality of dwelling i sold at time t, and Pt is the log of the constant
quality housing price index at time t. u>u is a random error, reflecting idiosyncratic
aspects of particular transactions, e.g., a “distressed” sale.
92
According to (B .l), each dwelling emits a quality of serve Qu which is prices at

Pt at a particular point in time. Qu is unobserved, but
Qit = P X u + ft + Vit • (B-2)
According to (B.2), housing quality is a function of observable characteristics of

dwellings at time t , X u may include the vintage (production year) of the dwelling as
well as the accumulated physical depreciation of th at dwelling at year t. The term &
represents th at unmeasured characteristics of dwelling i. Combining (B .l) and (B.2)
yields
Vu = P X u + Pt + & + £it i (B-3)
where eu is a composite error term,
Cit = Vit + ^ it •
Assume
£[«,] = 0, £(«?] = <r{2 .
If all dwellings in a given sample were repeat sales, all the param eters of the model
could be estimated by making further assumptions about the structure of the errors,
£u- In the model developed in the body of this paper, the errors are assumed to be
generated by a first-order autoregressive process.
A sample of single sales permits (B.3) to be estimated, but it does not permit
the measured characteristics of dwellings to be distinguished from the unmeasured,
individual-specific characteristics of those dwellings. Presumably, many characteris
tics of individual housing th at are difficult to measure quantitatively are important
in affecting housing values.
To combine samples of single and multiple sales in the same analysis, rewrite (B.3)
as
Vu = PXu + Pt + + e»t i (B-4)
= 0*X*t + P yrY R i + P iA G E u + 5 1 + €it ■

j= 0
93
Two components of the vector of housing characteristics have been expressed separ
rately, leaving the remaining vector X*t and the coefficient vector /?*. The housing
characteristics represented separately include the year the dwelling was built (Y R i)
and the age of the dwelling at year t (A G E u). The combined effect of the year built
on housing prices is pyrYRi, and the accumulated effect of depreciation are PdAGEu,
where Pyr and 0d are estim ated coefficients. The separate effect of the vintage of
construction, 0 V, on housing prices is simply Pyr - Pd-
The price level at any period t is parameterized by a set of dummy variables Dt
which take on a value of 1 for all time periods prior to the observed sale and 0 for all
periods after the observed sale, i.e.,
Pt = •
j= o
Obviously, the parameters of (B.4) cannot be estim ated directly, since vintage, age,
and time of sale are co-linear. Using the subsample of repeat sales alone, however,
we can estim ate
Vi t = 0-X -u + 0 w Y R i + j 2 't ’iD ! + - n ,, (B.5)
j =0
where
7it = P d A G E i t + & + ( a . (B.6)
The residuals from (B.5) are sufficient to provide an estim ate of the depreciation
param eter P d ’.
7a ~ 7ir = Pd(t - t ) + eit - e,T• (B.7)
The error structure of eit, obtained from the model developed in the main body of
this paper, is
£it = Aej.t-i + fin • (B.8)
Together, these param eters identify the covariance m atrix of disturbances in (B.5):
0 for i ^ j,
E[7<t7ir] = < (B.9)
<
t\ + ffj[A(*-T)/ ( l - A2)] + PlAGEtAGEr for i = j .
94
Using the entire sample of single and repeat sales, (B.5) is estimated by generalized
repeat sales (GLS), where th e GLS matrix is the inverse of the matrix given in (B.9).
Finally, the aggregate index of housing prices, / £, is constructed by
l t = exp{Pt) = exp
This appendix borrows heavily from England, Quigley, and Redjeam (1998)
95
Chapter C
Tim e-Independent Selection
The simple time-independent form of selection equation (3.19) suggests a more

powerful method of estimating this model using the total number of sales of each
dwelling during the 13-year analysis period. Specifically, if Yi is the number of sales
of dwelling i during the period and if this count follows a Poisson process, then the
truncated Poisson distribution describes the probability th at Yi equals the count of
sales observed during the period:
prob(Yi = y \ y > l ) = ^ i —^ , (C.l)
where
M A ,) = * ( * , ) + & • (C.2)
The Poisson arrival parameter, A, , is estimated for the sample period for each
dwelling i. The arrival rate of sales for a single period is then A i/T , where T is the
number of periods. The probability of sale in periods t and r is
prob(SitT) = prob(Sit) • prob{SiT) = (1 -prob(Y i = 0))2 . (C.3)
Equation (C .l) is estimated by maximum likelihood methods. Equation (C.3) can be

computed directly from equations (C .l) and (C.2). W hen the selectivity correction
is based upon equation (C.3), the Mills’ ratio is again highly significant in seven of
the eight regions. The average deviation between the uncorrected and the selectivity-
corrected price index ranges between 5 and 10 percentage points.
96
Table C.l: Implications of the Poisson Selectivity Model on House Price Estimates
A. Estim ated Coefficient of the Inverse Mills Ratio in Price Index Equation:
(t-ratio in parentheses)
0.004 0.026 0.046 0.049 0.030 0.048 0.046 0.042

(0.84) (7.50) (8.53) (10.57) (8.08) (9.18) (5.23) (5.86)
B. Average deviation between biased and selectivity-corrected price index:
(in percentage points)
0.93 5.38 8.40 10.21 6.66 9.30 8.52 8.56

C. Maximum deviation between biased and selectivity-corrected price index:
(in percentage points)
1.99 11.64 19.71 23.84 13.96 21.46 19.91 18.90
Appendix Table C .l reports the implications of the sample selectivity model based
upon the Poisson model. The coefficients are similar to those reported in Table 3.5.
The t-ratios of the selectivity parameter are somewhat higher than those reported
in the text. (Again, there is no evidence of sample selectivity in Stockholm.) The
average deviation estimated by this selectivity model is somewhat larger and the
maximum deviation is substantially larger.

Information To U Se R S

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information To U Se R S

Uploaded by

Copyright:

Available Formats

INFORMATION TO U SE R S

The quality of this reproduction is dependent upon th e quality of the

Oversize materials (e.g., maps, drawings, charts) are reproduced by

Photographs included in the original manuscript have been reproduced

Bell & Howell Information and Learning

Christian Landers Redfearn

B.S. (Northwestern University) 1988

A dissertation subm itted in partial satisfaction of the

All rights reserved.

Bell & Howell Information and Learning Company

List o f Tables vii

1 Transaction Costs, Price Discovery, and the Dynam ics of Owner-

2 The Com position o f M etropolitan Em ploym ent and the Correlation

3 D o H ousing Transactions Provide M isleading Evidence

3.5 Sample Selectivity and House Prices ......................................................... 71

1.1 Actual and Predicted V a ria n c e s ........................................................ 20

2.1 Housing Prices-New Jersey MSAs Relative to New Jersey S tate . . . 42

3.1 Effects of Selectivity upon House Price Indexes - Stockholm ............ 77

1.1 The Distribution of Paired S a le s .................................................. 16

2.1 Descriptive S ta tis tic s ...................................................................... 49

3.1 Frequency of Sales ........................................................................................ 70

A .l Distribution of Housing Sales ..................................................................... 87

C .l Implications of the Poisson Selectivity Model on House Price Estimates 96

inferences about housing price movements. A model of housing price determination

Transaction Costs, Price Discovery,

This paper addresses the apparent profitability of predictable returns to housing

1.2 Market Frictions &; Housing Price Dynamics

as examples of market frictions th a t could allow the persistence of correlated returns

makers “facilitate price discovery” by setting prices to induce trade—recovering any

the individual dwelling. In a frictionless housing market, “innovations” would replace

1.3 A Model of Persistence in Housing Prices

Vit = P t + Q it + £it = P t + X itP + & t, (1 .1 )

£it = Sit + Viti (L 2 )

where fin is distributed with mean 0 and variance

= E [ £ i t + Vit] = £[A£i,t-l + Hit + Vit ] = ^ SE [ H t - s ] + E[Vit] = Oi (1*4)

e[te.)2J = E((e« + 0,,)“] = £ A‘E \£-.\ + Sfoi] = ^ £ A- + *1 = - S ^ r + <

^[£iii£ir] = E [£it£i-r] ( 1.6 )

= E (A £iT + ^ ' Ht-j)^i + 0

V it-V ir^P t-P r+ Z it-Z ir- (1-8)

This can be estim ated with the regression

Vit — ViT = D itr + “ itr i ( 1 -9 )

£ [(= „.), (= * )] = (A- 5 - A- 1 - A'-» + A " ’ ) + ^ ( / „ - / „ - I r„ + / „ )

v[(?« - 6 ,), (f« - &,)] = (1 - At_T) + 2< (1.12)

- 6 r ), - M ] = ( A - r - A1’ ’ — 1 + AT-‘y) - < (1.13)

V[(& - fir). (& - &,)) = (V-« - - A'-» + X’ - ’) (1.14)

K a r [X ± y ] = Var[X} + Va r [ Y] ± 2 Va r [ X, Y] or Var [ X, Y) = | {Var [ X + Y] - Var[X\ - y a r [y ]).

Substituting - £,r for X and £jg - for Y , we get

^ a r [(£ it " £ i r i t j g ~ ? j-|f] = 2^ a r& ‘ - £*r Gj 9 ~ £ n ] — V a r [£»t ~ — V a r [ £ j g ~ fj-y ]) ■

V arfo - £r + - £7] = E[£t\ + £[£t] + £[£3] + £(£7] +

) +2(l^) (+ V " S " ^ "V_9+AT_7)

Varfr-frl = B[£f | + E | g ] - 2E[{l£ , ] . T^ + i t o ; - l ^ l

1.4 The Data

= Pt - P T + (At_T - l)e T + ^ + Vit ~ T)ir-

Table 1.1: The Distribution of Paired Sales

1.5 Do Housing Prices Follow a Random Walk?

As discussed in Section 1.3, the covariance m atrix predicted by (1.11) is block

2a* + h ( t,r, <7, 1 )0 % if t=g, r = 7 (diagonal elements),

eureigy = -la * + h {t,r,g , y)a* if r = g, (off-diagonal elements),

0a* + h(t,T ,g, ~f)a* otherwise (off-off diagonal elements).

Figure 1.1: Actual and Predicted Variances

Predicted Variance a s a Function of Time B etw een S a le s and Correlation Coefficient

Time B etw een S a le s

Table 1.2: Tests for a Random Walk in Housing Prices

In order to capture the evolution of industrial composition and, therefore, of in