Infrastructures - Nov292018 JACG 170519

Were railways indispensable for urbanisation?
evidence from England and Wales

∗ † ‡ § ¶
Dan Bogart, Xuesheng You, Eduard Alvarez, Max Satchell, and Leigh Shaw-Taylor
k
Draft: November 19 , 2018
Abstract
England and Wales underwent a remarkable urbanisation during the railway era
in the nineteenth century. Yet this economy was already industrialised with well-
developed transport infrastructure prior to railways. This raises the question of whether
railways were indispensable for urbanisation over the medium and long term. In this
paper, we examine the population growth eects of being close to railway stations
versus being close to turnpike roads, inland waterways, and ports. Our estimates
show that being within a short commuting or shipping distance to all infrastructures
signicantly increased a locality's population growth from 1841 to 1891. The same is
true for population growth from 1891 to 2011. Across numerous specications, we nd
that railways had the largest growth eects, but turnpike roads and inland waterways
had signicant eects too, and even more so for ports. Our estimates contribute to a
deeper understanding of the spatial patterns of growth during the industrial revolution
and the eects of transport infrastructure on long-run urbanisation.
Keywords : Urbanisation, railways, transport, spatial reorganization

JEL Codes : N4, O18, R11
∗
Corresponding author. Associate Professor, Department of Economics, UC Irvine, dbogart@uci.edu
†
Research Associate, Faculty of History, University of Cambridge, xy242@cam.ac.uk
‡
Senior Lecturer, Economics and Business, Universitat Oberta de Catalunya, ealvarezp@uoc.edu
§
Research Associate, Dept. of Geography, University of Cambridge, aems2@cam.ac.uk
¶
Senior Lecturer, Faculty of History, University of Cambridge, lmws2@cam.ac.uk
k
Data for this paper was created thanks to grants from the Leverhulme Trust (RPG-2013-093), Transport
and Urbanization c.1670-1911, NSF (SES-1260699), Modelling the Transport Revolution and the Industrial
Revolution in England, the ESRC (ES 000-23-0131), Male Occupational Change and Economic Growth in
England 1750 to 1851, and ESRC (RES-000-23-1579) the Occupational Structure of Nineteenth Century
Britain: Grant. We thank Walker Hanlon, Gary Richardson, Petra Moser, Kara Dimitruk, Arthi Vellore,
William Collins, Jeremy Atack, Alan Rosevear, and Elisabet Viladecans Marsal for comments on earlier
drafts and seminar participants at UC Irvine, UC San Diego, NYU, Florida State, Trinity College Dublin,
Queens Belfast, the University of Los Andes, Vanderbilt, and EHA Meetings. We also thank Craneld
University for share their soils data.
1
1 Introduction
Improvements in transport infrastructure can substantially change trade and travel pat-
terns. However, it is not obvious that transport improvements, even large ones, signicantly
change urbanisation. Population may continue to cluster around older transport infras-
tructures that remain in use or get transformed to new uses through technological change.
In this paper, we use the lens of history to examine how a large-scale transport improve-
mentthe railwaychanged the population geography of England and Wales relative to
previous infrastructures. There is a broader literature on whether railways were crucial (or
even indispensable) to economic development in the nineteenth century and over the longer-
1
run. England and Wales is an interesting case because it was already industrialised when
railways started spreading in the 1830s. It had large secondary employment compared to
2
other economies and was more urbanised with high levels of migration. This economy also
had well developed transport infrastructure before railways, including a large network of
ports, roads, navigable rivers, and canals.
There is a long-standing debate on the impact of railways in England and Wales. One
argument is that canals, roads, and ports helped determine the location of new urban
centers in the eighteenth century and these centers persisted into the railway era. A related
argument is that shipping and inland water transport remained competitive with railways on
bulky-low value goods, like coal, and hence continued to inuence the location of population.
The opposing argument notes that railways generally provided superior transport services
compared to preexisting modes. This view sees railways as shaping location within major
3
urban centers, including their suburbs.
We examine the medium and long-run population growth eects of being within a short
commuting or shipping distance to railway stations versus being within the same distance to
turnpike roads, inland waterways, and ports. We use a new data set with local populations
in every decennial census year from 1801 to 1891 and in 2011. Our new spatial units are
consistent across time and are 15 square km on average. They are similar to parishes and
4
townships, the smallest places reported in the British Census. We also incorporate GIS
data on railway lines and stations, turnpike roads, ports, and inland waterways. Most of
1 For example, see foundational papers by Fogel (1964) and Fishlow (1965). For more recent studies see
Berger and Eno (2015), Hornung (2015), Jedwab, Kerby, and Moradi (2015).
2 See Shaw-Taylor and Wrigley (2014) for an overview of occupational structure and urbanisation. See
Redford (1976) and Long 2005) for studies on migration.
3 See Hawke (1974), Dyos and Aldcroft (1974), Simmons (1986), Leunig (2006), Crafts and Mulatu (2006),
Armstrong (2009), Kellet (2012), Maw (2013), Crafts and Wolf (2014).
4 Unfortunately, our population data do not include Scotland or Ireland, and thus we cannot make rm
statements about the UK.
these networks are observed between 1830 and 1860, but for some we have earlier dates. The
networks are created from historical maps and allow us to measure the distance between
units and infrastructure with great precision. Finally, we add geographic characteristics,
like elevation, ruggedness, soils, rainfall, temperature, coastline, and coal.
Our main specication is a long-dierence, where unit population growth from 1841 to
1891 is regressed on indicators for being within 2 km of infrastructures. 2 km corresponds to
30 minutes walking distance, which is approximately the average commuting time in devel-
5
oped economies today. The baseline specication also includes geographic and structural
controls, like population density and occupational shares c.1840, pre-trends in population
growth, along with registration district xed eects. Registration districts are about 250
square km on average and encompass our spatial units. That implies we are controlling for
unobservable factors common to all units within a relatively small area.
We further address endogeneity of railways using propensity score matching and instru-
mental variables (IV). For the IV, we construct a Least Cost Path (LCP) connecting large
towns in 1801 and incorporating the added costs of building railways over rugged terrain.
Proximity to the LCP is a good instrument because it identies units that were close to
6
stations mainly because they were near favorable routes for connecting large towns.
Our rst main nding is that proximity to railway stations had a large eect on pop-
ulation growth between 1841 and 1891. This result is consistent across all specications,
including IV. In our preferred specication, being within 2 km of a railway station increased
unit population growth by 15.9 percentage points (pp) over 50 years, or an increased growth
rate of 0.3% per year. To put this estimate into perspective, the total population in Eng-
land and Wales increased by 79 pp between 1841 and 1891. The average population growth
across all units was 1 pp, which is equivalent to a 0.02% annual growth rate.
Our second main nding is that proximity to pre-rail infrastructures also had a large eect
on population growth between 1841 and 1891. Being within 2 km of an inland waterway
(turnpike road) increased population growth by 5.6 pp (3.8 pp). We also nd that being
within 4 km of a port increased population growth by 15.5 pp, which is similar to railways.
Separating the eects across time reveals that railways had a much larger eect than
inland waterways and turnpike roads from 1841 to 1871. Railways were a strong substitute
for roads and inland waterways initially, but by 1891 the latter gained new users like om-
5 The US census reports that commuting times in US cities average 26.1 minutes. See U.S. Census Bureau,
2012-2016 American Community Survey 5-year estimates.
6 Our methodology draws on the so-called inconsequential place approach and other studies which least
cost paths as instruments for infrastructure. See Chandra and Thompson (2000), Michaels (2008), Faber
(2014), and Lipscombe et. al. (2013).
2
nibuses and steamboats. Port eects are largest from 1841 to 1871 and less signicant by
1891. Coastal shipping was highly productive in the mid-1800s and was crucial in supplying
London's coal. But coastal shipping increasingly lost its market share to railways and only
some ports gained from international shipping.
The third main nding is that units within a short distance to all infrastructures con-
tributed to higher population growth between 1891 and 2011. The long run eects are
sizable. A one standard deviation change in proximity to 1851 railway stations accounts for
0.161 standard deviations of growth. The same for ports, turnpike roads, and inland water-
ways are 0.078, 0.044, and 0.065 standard deviations. Together our results show that railways
were the most important driver of long-run urbanisation among the historic infrastructures.
However, railways were not indispensable for all growth because pre-rail infrastructures also
made signicant contributions.
There is a remaining question as to whether units close to infrastructures pulled popu-
lation from areas more distant and grew at their expense. In extensions, we use units more
than 10 km from infrastructures as the control group. For railway stations, we nd more
growth between 0 and 6 km distance, but there was no dierence in growth for units between
6 to 10 km from stations. Thus, we do not nd any evidence that railways contributed to
relative population declines just beyond the commuting zone measured by 6 km. Neverthe-
less, we still think railways pulled population near stations. There are two reasons. First,
migration rates were very high in nineteenth century England (Long 2005). Second, we use
additional data sources to show that proxies for migration, like the Irish born population
percentage, increases more near railway stations. Also, fertility, another driver of population
growth, decreases more near railway stations.
The previous ndings raise another question: did railways or other infrastructures reor-
ganize population with little impact on productivity? In extensions, we provide evidence
railways increased population growth by 25 pp for units at the 75th percentile of 1841 pop-
ulation density, 15 pp for units at the 50th percentile, and 8 pp at the 25th percentile. The
same is true for inland waterways, although to a lesser extent. This heterogeneity suggests
that railways and waterways attracted migrants to localities that were more productive.
Some supporting evidence suggests moving a worker from a unit in the 50th to the 85th per-
centile of population density increased their wages by 6.8%. Over the longer term, we think
the higher population and productivity near early infrastructures attracted new investments
and technologies and hence attracted even more migrants during the twentieth century. In
other words, infrastructure, technology, and migration were reinforcing processes.
Our results contribute to a broader literature on the spatial patterns of growth during
3
the industrial revolution. A wide range of factors are discussed like endowments, markets,
7
and human capital. Our new data set can test many growth channels. Here we show that
transport infrastructures had signicant eects on population growth. But our analysis and
data also document the importance of other factors like coal, climate, and ruggedness.
8
We also add to the large literature on railways and nineteenth century growth. Perhaps
our main contribution is to reemphasize the importance of comparing railways with inland
waterways, roads, and ports. Modal substitution and complementarity are central to the
analysis of any transport improvement.
Finally, our study contributes to the literature on infrastructure and urbanisation in

9
contemporary contexts. Over the last 50 years there has been a dramatic rise in urbanisation
across the world. Given the signicant social and economic implications, it is useful to look
at history. The English and Welsh case shows that multiple infrastructures can evolve and
inuence urbanisation long into the future.
2 Background on transport infrastructure

England and Wales (EW) had a well-developed transport network long before its railway
network grew. Figure 1 shows the length of turnpike road, inland waterway, and railway
10
networks from 1700 to 1890. In this section, we discuss each of these networks and ports.
11
As early as 1680 EW had about 12,000 km of main roads. But they were in poor
condition and local governments, then in charge, had little capability to improve them. As
an alternative, EW turned to using tolls and non-governmental organizations called turnpike
trusts. Their powers came from an act of parliament. Acts named local landowners and
merchants to serve as trustees. Parliament oered little in subsidies. Locals purchased
bonds to nance improvements and were repaid using toll revenues.
The rst trusts generally improved the main roads already in place. Later trusts extended
the road network and transformed dirt paths into roads suitable for wheeled trac. As gure
1 shows the turnpike network grew from about 8000 km in 1750 to about 38,000 km in 1830.
7 See Fernihough and Hjortshøj O'Rourke (2014), Crafts and Wolf (2014), Klein and Crafts (2012), Becker,
Hornung, and Woessmann (2011).
8 For previous studies on English and Welsh railways see Hawke (1974), Leunig (2006), Casson (2013),
Gregory and Marti Henneberg (2010), Alvarez et. al. (2013), and Heblich, Redding, and Sturm (2018),
who focus on London. For other countries see Berger and Eno (2015) for Sweden, Tang (2014, 2017) for
Japan, Hornung (2015) for Prussia, Atack, Bateman, Haines, and Margo (2010), Attack and Margo (2011),
Donaldson and Hornbeck (2016), and Hodgson (2018) for the US, and Donaldson (2014) for India.
9 See Redding and Turner (2015) for an overview. Some papers of related interest include Duranton and
Turner (2012), Faber (2014), Jedwab et. al. (2015), Storeygard (2016), and Baum-Snow et. al. (2017).
10 Network maps are available at https://www.campop.geog.cam.ac.uk/research/projects/transport/data/
11 These are documented in Ogilby's Britannia Atlas. See Satchell (2017) for a description of these roads.
4
Figure 1: Evolution and size of infrastructure networks in England and Wales 1700-1890
Sources: see data section.
At their peak there were approximately 1000 dierent trusts. They managed all main roads,
12
although some turnpike roads could be considered secondary in importance.
Many users of turnpike roads obtained transport services from public carriers and coach-
ing companies. How did they benet from turnpike roads? Transport costs could have
13
increased because of the tolls and the localism of trusts, but that did not happen. The
shift to y-by-night services and stagecoaches with steel springs meant that passenger travel
times fell substantially between 1750 and 1820. Real freight rates also fell by over 40% as
wagons got bigger and load sizes increased. The growing use of stagecoaches and wagons
14
led to the concentration of economic activity around turnpike roads.
Turnpike trusts faced a crisis when railways were widespread. The nances of many
trusts deteriorated and most stop functioning by the 1870s (see their decline in gure 1).
Responsibility for maintaining turnpike roads passed to newly formed highway districts and
county councils.
The inland waterway network developed at the same time as turnpike roads. Around
1700 EW had a large system of navigable rivers including the Thames, Severn, Great Ouse,
12 For a summary see Bogart (2017).

13 For a summary of the eects of turnpike trusts see Bogart (2005).
14 See Bogart (2009) and Pawson (1977) for this evidence.
5
and Trent (Willan 1964). River navigations, or improved rivers which bypassed dicult
sections, were added between 1700 and 1750. Canals or articial waterways were built
between 1760 and 1830. Like turnpike roads, river navigations and canals were authorized
by acts of parliament (Bogart 2017). Acts granted authority to companies and included
procedures to negotiate the purchase of land. Most canals required many investors and were
organized as joint stock companies.
By 1830 there were several long-distance canals linking important centers. One example
is the Leeds and Liverpool Canal, which connected the leading woolen and cotton textile
towns. Another example is the Grand Junction Canal, which shortened the waterway dis-
tance between London and Manchester. Independent carriers were hired by individuals and
rms to provide freight services on canals. Like road carriers, they relied on horsepower to
draw their boats. Nevertheless, the eciency of hauling over water brought low cost trans-
15
port to inland regions. Canals were especially important in the movement of coal. As one
illustration, the price of coal in Manchester fell by half after the completion of the nearby
Bridgewater Canal in 1761. Some historians argue that canals led to the development of
inland industrial centers by providing cheap fuel (Maw 2013, Crafts and Wolf 2014).
There was also signicant investment in ports. It is estimated there were 391 acres of wet
dock space and 50 harbors in 1830. By contrast, England had no wet docks and a handful
of harbors in 1660 (Pope and Swann 1960). The ports of Liverpool and London provide two
illustrations. In Liverpool, dock acreage increased 11-fold between 1710 and 1830, including
the rst commercial wet dock. The investment was nanced by Liverpool merchants who
wanted to facilitate the import of cotton and foodstus. London was a center for domestic
and international trade, but there was little investment in its ports between 1700 and 1799.
Then there was a dock building boom from 1799 to 1825 which transformed the capacity of
the capital port (Dyos and Aldcroft 1974, p. 58).
Port infrastructures complemented improvements in shipping technology. After 1800
sailing vessels became larger and more durable with metaled hulls. Sails and rigging also
improved. One indication is the greater speeds achieved by sailing vessels in the early 1800s
(Solar 2013). The arrival of the steamship was even more revolutionary, although its impact
was delayed until the 1860s. Before that steamships were expensive and not as cost ecient
as sailing ships (Dyos and Aldcroft 1974, p. 257). Improvements in engine eciency, steel
hulls, and propellers eventually turned the tide. Steamship capacity exceeded sail for the
rst time in 1883. Steamships would go on to revolutionize trade and travel across the
oceans (Pascali 2017). However, coastal sailing vessels continued at many ports (Armstrong
15 See Turnbull (1987) and Bogart, Lefors, and Satchell (forthcoming) for a discussion of canal carriers.
6
2009, Langton and Morris 2002).
The rst steam powered rail service open to the public came in 1825 in the northern
coal mining region between Stockton and Darlington. In 1830, the Liverpool and Manch-
ester railway opened to facilitate passenger trac. The rail network expanded dramatically
following the `Mania' of the mid-1840s. The signicance of the Mania can be seen in Figure
1 through the growth of track mileage. By 1851 regional rail networks had formed around
16
the large towns in addition to trunk lines connecting larger towns.
Railways were built and operated by joint stock companies. They provided passenger
and freight services directly to customers. Passengers accounted for most revenues initially,
but after 1850 freight accounted for more. One of the most dicult challenges facing railway
companies was their high construction costs. A key factor was the route of their lines. A
distinction was made between the original line, which often aimed to connect large trading
towns, and the branch lines, which linked smaller towns to the original lines. Railway
companies preferred original lines whenever possible. One promoter advised the following,
stick to the original line; keep down the capital and let competing schemes do their worst
(quoted in Simmons 1986, p. 271).
Railway's impact on the EW economy is often emphasized by historians. Dyos and
Aldcroft (1974, p. 229) argue that urban growth was the most `conspicuous product of
railway development.' Their impact seems to have grown over time as regulations forced
railway companies to provide transport to lower socioeconomic groups. The Cheap Trains
Act of 1883 made daily workman's trains mandatory and led to lower commuting fares.
Despite the popular view that railways were economically crucial, there is a debate as
to whether they signicantly changed the location of population and economic activity.
Central to this debate is the degree of substitution or complementarity between railways
and preexisting modes. We now turn to this issue.
3 Modal substitution and complementarity

In this section, we dene how transport modes can be substitutes or complements and
we briey examine evidence from the literature. The standard mode choice model considers
17
a traveler or shipper who has N transport options (i.e. modes). Each mode has a set of
attributes like the fare, travel time, and convenience. A traveler will also have an idiosyn-
cratic preference yielding an individual utility from each transport mode Ui . The traveler
will choose mode i if Ui >Uj for all j 6= i.

16 For the literature on the mania see Casson (2009), Odlyzko (2010), Campbell and Turner (2012, 2015)
17 See Small (2013) for an overview of transport demand.
7
Modal substitution occurs if the demand for transport mode i decreases when the fare or
travel time of another mode j decreases. Consider a case where a new mode is better than
existing modes on all attributes. Only those travelers with a high idiosyncratic demand will
use the old mode. Every other will shift to the new. We call this `complete' substitution.
There is another case where the new transport mode is better on some attributes. Say
the new mode oers lower travel time than an existing transport mode, but its fare is larger.
In that case, there will be some travelers that shift to the new because they value time more
and others will continue to use the old because they are more fare sensitive. We call this
`partial' substitution. It implies both modes co-exist in a market. High xed costs can also
lead to co-existence because it prevents a mode from being available to all travelers. In this
case, one could observe two modes in use even though one is better on all attributes.
Transport modes can also be complements, which means the demand for transport mode
i increases when the fare or travel time of another transport mode j decreases. In one case,
dierent transport modes are links in the same journey. Introducing better attributes on
one link, increases the demand on all links. Complementarity can also arise if the new
mode increases overall transport demand. Here there will be more travel by those who value
attributes of the old.
What does the literature say about substitution and complementarity concerning rail-
ways in EW? It appears railways were a complete substitute for long distance road transport.
Railways oered faster services at less than half the fares and freight rates. As a result, long-
18
distance coaching and road freight services were largely displaced by the 1850s.
There is a counter-argument that railways could not be built everywhere due to xed
costs. This allowed some short-distance road transport to continue. There was also inno-
vation in road. The omnibus spread in the mid 1800s. It carried more passengers at lower
fares than coaches of old. Highway districts and county councils also assisted by improving
the former turnpike roads (Dyos and Aldcroft 1974, p. 241).
The standard view is that most canals failed to compete with railways on long distance
trac because of their slow speed. Some tried to compete on cost, but the lack of coordi-
nation led many canal companies to sell out to railways. In 1883 half of inland waterway
mileage was leased or owned by railway companies. The remaining canals served short dis-
tance trac in industrialised areas. There is some evidence for a canal revival after the
1870s. New regulations helped by requiring canals to publish through rates on long distance
journeys and by limiting railway control. The application of steam power to canal boats
18 Between 1845 and 1850, the number of passenger journeys by rail increased by 117%, and again by 65%
between 1850 and 1855 ( Mitchell 1998).
8
was another factor (Boughey and Hadeld 2012).
Railways are thought to have been a partial substitute for coastal shipping. For example,
railways gained in the biggest marketthe transport of coal to London. In 1850, 98.4% of
coal imported into London came by coastal ship. By 1870 the rail share was 55.7% and
19
in 1880 it was 62%. Armstrong (2009) argues that steamships halted the further decline
in coastal shipping. However, the extent to which the two modes co-existed is debatable
because many ports came under the authority of railways (Dyos and Aldcroft 1974).
In the case of international shipping, railways were a complement. There was a tremen-
dous growth in foreign trade from the 1840s, especially in grain (O'Rourke 1997). Sailing
ships and then steamships transported the grain from the Americas, India, and Russia to
EW, where it was transported inland by rail. Hawke (1970, p. 128) estimates that imports
represented more than half of all wheat hauled by English railways in 1865. Some would
even argue that railways and shipping created the world grain market together.
4 Theoretical and empirical frameworks

Our main goal is to identify the relative importance of rail versus pre-rail infrastructures
in nineteenth century population growth in EW. This section discuss how we adapt common
theoretical and empirical frameworks for our research question. Redding and Turner (2015)
summarize a theoretical model, which links transport infrastructure and location of economic
activity. They show equilibrium population in any location is increasing in the quality of its
commuting technology and its rm and consumer market access. Consumer market access
is measured by the variety of goods available and the trade costs of shipping those varieties
to the location. Firm market access is a weighted sum of rm demands and depends on the
cost of shipping goods to other markets. Better transport infrastructure plays a role in this
model by reducing trade costs and hence increasing market access for some locations. Better
infrastructure also reduces commuting time and hence increases eective units of labor.
Reading and Turner (2015) argue that a fairly standard regression specication provides
a reduced form version of the model. City i population in year t, Yit , is regressed on a
measure of transport infrastructure access, such as an indicator for connection to a highway
network dit , plus time-varying controls and location and time specic xed eects. Duranton
and Turner (2012) adapt a similar specication to incorporate a partial adjustment process
where population growth is a function of the dierence between a city's actual population
and its equilibrium or target population. The estimating equation becomes
19 These gures are reported in Hawke (1970, p. 168).
9
yit+1 − yit = λyit + adit + cxit + εit (1)
where the left hand-side variable yit+1 − yit is the log dierence in city population between t
and t + 1. The right hand side includes the log of initial population yit , indicators for being
connected to a transport network dit , and controls xit .
Specication (1) is appealing for our study because we can estimate the growth impacts of
railways, turnpikes, ports, and waterways by including indicator variables for being within a
short commuting or shipping distance of these infrastructures. If the railway was a complete
substitute for say canals, then we should expect zero eect for the inland waterway indicator
all else equal. The reason is that all shippers, outside of idiosyncratic types, would have
preferred using railways. Over time individuals should migrate from areas with canals to
areas with railway stations leading to population growth near the latter. By contrast, if
railways were a partial substitute say for canals, then being near inland waterways should
contribute to some growth. Users that preferred cheap water transport would migrate to
areas with rivers or canals and users that preferred the speed of railways would migrate to
stations.
There are two limitations to specication (1). First, the indicator variable for infras-
tructure access does not account for network structure. Some studies address this issue by
estimating market access, or population-weighted inverse trade costs between all locations
(see Donaldson and Hornbeck 2016). We do not follow the market access approach because
it estimates trade costs for a single user type. In our case, multiple users appear to be
important. Also, the market access approach identies the eects of trade costs without
dierentiating by infrastructure type. Therefore, the methodology does not easily lend itself
20
to identifying the eects of railway stations, roads, inland waterways, and ports.
Second, specication (1) cannot account for spatial reorganization. Localities just beyond
a short commuting distance of infrastructures may not be a clean control group for localities
within the commuting distance. They are potentially treated by infrastructure and could
lose population. Therefore, estimates based on (1) identify dierences in relative growth, not
absolute growth. Below we address this issue further by studying dierent control groups
and by looking at migration proxies near railways. We also examine heterogeneity according
to initial population density.
20 Alsoas our data includes population for 9489 units, we would need to calculate trade costs for more
than 45 million unit-pairs. That presents a major computational issue.
10
5 Data
Our population data come from British censuses, available every decade starting in
1801. They are digitized at the smallest census place level (e.g. parishes and townships)
21
up to 1891. The census published the same for occupational counts starting in the early
nineteenth century. The counts for 1851 and 1881 are available through the Integrated
Census Micro data project (Schürer and Higgs 2014). The census places with population
and occupations from 1801 to 1891 are not always the same across time. To address boundary
changes, researchers at Cambridge University have created consistent spatial units between
22
1801 and 1891 and linked them with census population data. Using similar techniques, we
create 9489 consistent units mapping population from 1801 to 1891 and male occupations
23
from 1851 to 1881. We call these ùnits' for short. Units are 15 square km on average
and they belong to a larger jurisdiction called registration districts. There are 616 unique
registration districts in our data and they average 250 square km.
Very long-run outcomes are studied by merging our 9489 historical units with 34,753
24
Lower Super Output Areas (LSOAs) with population in 2011. We use the intersect function
in ArcMap applied to the boundary lines of LSOAs and the boundary lines of our units.
The population variables are expressed in natural log dierences over time (see table 1).
The mean 1841 to 1891 log dierence is 0.01, which implies a mean population growth of
1 percentage point between 1841 and 1891. The mean is low in part because some units
in central London experienced large population declines due to out-migration of residents.
Overall there was an increase in urbanisation. The share of the population living in units
with at least 400 persons per square km increased from 42% in 1841 to 68% by 1891.
Our infrastructure data includes GIS shapeles for turnpike roads in 1830, inland wa-
terways in 1680 and 1830, and railway lines and stations in every census year starting in
25 26
1831. We also have GIS data on the main roads in 1680 as surveyed by John Ogilby. In
all cases, the networks are created using historical sources, improving their accuracy.
21 The Cambridge Group for the History of Population and Social Structure kindly provided this data.
22 For details see https://www.campop.geog.cam.ac.uk/research/occupations/datasets/catalogues/documentation.
23 Ms Gill Newton, of the Cambridge Group, developed the Python code for Transitive Closure as part
of the research project `The occupational structure of Britain, 1379-1911' based at the Cambridge Group.
Xuesheng You implemented this code for this particular paper.
24 Oce for National Statistics ; National Records of Scotland ; Northern Ireland Statistics and Re-
search Agency (2017): 2011 Census aggregate data. UK Data Service (Edition: February 2017). DOI:
http://dx.doi.org/10.5257/census/aggregate-2011-2.
25 See Rosevear et. al. (2017), Martí-Henneberg et. al. (2017a, b),
and Satchell, Shaw-Taylor, and Wrigley (2017a, b). For a description see
https://www.campop.geog.cam.ac.uk/research/occupations/datasets/catalogues/documentation
26 For a description of the 1680 Ogilby roads data see Satchell (2017).
11
For ports we draw on a list provided in The Shipowner's and Shipmaster's Directory pub-
lished in 1842. This source identies 247 ports in use. It also describes whether loading
occurred on the beach and water depths at spring and neap tides. Our baseline model
considers all 247 ports regardless of their features. Thresholds for water depth yielded less
precise results. We also use a source published in 1787, which lists the main ports in 1680
(see Alvarez et. al. 2017).
To analyze infrastructures, a straight line is drawn from the center of each unit to its
nearest station, road, waterway, and port. The unit center corresponds to the market square
27
if it had a town or the centroid if the unit had no town. The mean distance to an 1851
station is 10.4 km (see table 1). The mean distance to a waterway or turnpike road in 1830
was less at 7.2 and 1.9 km. As expected, mean distances to ports were greater, but given
that England had such a large network of ports in 1842 the average was only 30.2 km. We
use these distances to calculate indicators for being close to infrastructures.
An important fact concerns the spread of railway stations over time. In 1841, 1851, and
1861, 4.6%, 13.6% and 19.7% of units had a railway station within 2 km. By 1881 29.9% of
units had railway stations within 2 km.
The geographic data include variables for being on exposed coalelds, being on the coast,
ruggedness, average rainfall, average temperature, an index for wheat suitability, and the
28
share of land in 10 dierent soil types. We call these `rst-nature' variables following the
literature in economic geography (see Fujita et. al. 2001). Coastal is identied using an
intersection of the seacoast with unit boundaries. The ruggedness measures include average
elevation within units, the average elevation slope, and the standard deviation in elevation
slope. See appendix A.2 for details. Rainfall, temperature, and wheat suitability come from
29
FAO. Of special signicance, Satchell and Shaw Taylor (2013) identify those areas with
exposed coal bearing strata (i.e. not overlain by younger rocks). Exposed coalelds were
30
more easily exploited by early nineteenth century technology compared to concealed coal.
27 We identify if a market existed at some point between 1600 and 1850. This ap-
plies to 746 of the 9489 units. It should be noted that little error is introduced by us-
ing the market or the centroid since units are so small. For a description of towns see
https://www.campop.geog.cam.ac.uk/research/occupations/datasets/catalogues/documentation
28 Soils data (c) Craneld University (NSRI) 2017 used with permission. The 10 soil categories are based
on Avery (1980) and Clayden and Hollis (1985). They include (1) Raw gley, (2) Lithomorphic, (3) Pelosols,
(4) Brown, (5) Podzolic, (6) Surface-water gley, (7), Ground-water gley, (8) Man made, (9) peat soils, and
(10) other. See http://www.landis.org.uk/downloads/classication.cfm#Clayden_and_Hollis. Brown soil
is the most common and serves as the comparison group in the regression analysis.
29 See the Global Agro-Ecological Zones data at http://www.fao.org/nr/gaez/about-data-
portal/agricultural-suitability-and-potential-yields/en/. We selected low input and rain fed for wheat
suitability.
30 For a description see https://www.campop.geog.cam.ac.uk/research/occupations/datasets/catalogues/documentation
12
Table 1: Summary statistics
Variable Obs. Mean Std. Dev. Min Max
Population growth variables
Ln di. population 1841 to 1891 9489 0.010 0.513 -3.079 4.874
Ln di. population 1891 to 2011 9488 0.545 0.965 -4.202 5.617
Infrastructure variables
Distance to rail station in 1851 km 9489 10.45 11.065 0.021 73.12
Distance to LCP km 9489 11.86 16.548 0.000 116.3
Distance to inland waterway 1830 km 9489 7.231 6.501 0.000 48.38
Distance to turnpike road 1830 km 9489 1.983 2.458 0.000 22.47
Distance to port 1842 km 9489 30.20 22.81 0.059 99.71
Indicator distance to rail station in 1851<2km 9489 0.136 0.342 0 1
Indicator distance to inland waterway in 1830 <2km 9489 0.233 0.423 0 1
Indicator distance to turnpike road in 1830<2km 9489 0.662 0.472 0 1
Indicator distance to port in 1842 <2km 9489 0.027 0.163 0 1
First-nature controls
Indicator exposed coal 9489 0.080 0.271 0 1
Indicator coastal unit 9489 0.147 0.355 0 1
Elevation 9489 89.72 74.02 -1.243 524.3
Average elevation slope within unit 9489 4.767 3.615 0.484 37.42
SD elevation slope within unit 9489 3.432 2.717 0 23.17
Average rainfall 9484 755.7 191.7 555 1424
Average temperature 9484 8.958 0.658 5.5 10
Wheat suitability (low input level rain-fed) 9484 2188.1 273.25 272 2503
Land area in sq. km. 9484 15.63 22.18 0.003 499.8
Perc. of land with Raw gley soil 9489 0.084 1.327 0 76.49
Perc. of land with Lithomorphic soil 9489 8.615 19.83 0 100
Perc. of land with Pelosols soil 9489 8.203 20.63 0 100
Perc. of land with Podzolic soil 9489 4.624 14.32 0 99.56
Perc. of land with Surface-water gley soil 9489 24.63 29.46 0 100
Perc. of land with Ground-water gley soil 9489 10.187 20.11 0 100
Perc. of land with Man made soil 9489 0.363 3.262 0 94.99
Perc. of land with Peat soil 9489 1.187 5.279 0 91.44
Perc. of other soil 9489 0.535 1.966 0 65.15
Second nature controls
Ln 1841 population per sq. km 9489 4.209 1.346 0.805 11.53
Share of male tertiary empl. in 1851 9489 0.149 0.109 0 0.941
Share of male secondary empl. in 1851 9489 0.196 0.123 0 0.800
Share of male agricultural empl. in 1851 9489 0.553 0.227 0 1
Share of male mining & forestry empl. in 1851 9489 0.025 0.076 0 0.745
Share of male unspecied empl. in 1851 9489 0.074 0.090 0 0.760
Ln distance to major city in 1801 9487 4.756 0.620 0.594 6.037
Sources: see text.
13
8% of our units are on exposed coalelds.
We have another set of unit-level variables called `second-nature' factors. These include
distance to one of the ten largest cities in 1801, log population density in 1841, and 1851 male
occupational shares in ve categories: (1) tertiary, (2) agriculture, (3) secondary, (4) min-
31
ing/forestry, and (5) unspecied. Population density in 1841 varied signicantly, although
much was concentrated near large cities, like Manchester and London. Male occupational
structures also exhibit concentration in 1851, especially in secondary employment. The top
32
1% of units accounted for 57% of male secondary employment in 1851.
Figure 2 shows the kernel density estimates for the distribution of population growth
from 1841 to 1891 depending on whether units are within 2 km of various infrastructures.
The rst panel clearly shows that units within 2 km of 1851 railway stations tended to have
higher growth than units more than 2 km from 1851 stations. There were some exceptions
however as growth was sometimes negative for units within 2 km of stations as indicated
by the longer left tail. Panels b to d show a similar pattern for being within 2 km of 1830
inland waterways and turnpike roads and 1842 ports. Hence there is some initial evidence
that railways were one of several infrastructures increasing population growth in the second
half of the nineteenth century.
6 Main results
In this section, we estimate how population growth was aected by infrastructures. We
begin by analyzing the following `long dierences' specication:
rail pre−rail
yi1891 − yi1841 = β1 Ii1851 + β2 Ii1840 + γxi + εi (2)
where yi1891 −yi1841 is the natural log dierence. The initial year 1841 is chosen because there
were few railway stations open in 1831. 1891 is the last year for which we have historical
data.
One main explanatory variable is

rail
Ii1851 equal to one if unit i is within 2 km of a railway
station in 1851 and 0 otherwise. 1851 is chosen because the rail network underwent its
largest 10-year expansion in the 1840s. As robustness, we check whether station proximity in
earlier or later years changes the conclusions. 2 km is chosen because it takes approximately
30 minutes to walk 2 km. We think 30 minutes represents a typical commute time for
31 Here we follow the primary, secondary, and tertiary (PST) coding system described in detail in Shaw
Taylor et. al. (2014) and Wrigley (2015). We do not code female occupations because there is less agreement
in the literature (see You 2014).
32 For more details on occupational structure see Shaw-Taylor and Wrigley (2014).
14
Figure 2: The distribution of population growth and infrastructure access
Sources: see text.
individuals who worked near the station or for rms carting their goods to the station for
quick delivery. However, this assumption is based on limited data, and therefore in a later
33
section, we consider greater distances. Other main explanatory variables are included in
pre−rail
Ii1840 . They are three indicators identifying whether a unit is within 2 km distance from
turnpike roads, inland waterways, and ports.
There are two sets of control variables included in xi . The rst nature controls are
listed in table 1 except for the square of temperature and rainfall, which allow for non-linear
eects in climate variables. The second nature controls are also listed in table 1. The
log of 1841 population density accounts for the regularity that initially dense units tend
to grow less. The 1851 male occupational shares address the possibility that areas more
specialized in agriculture grow less. Note that roads and canals built in the 1700s may have
33 Insupport of this assumption, Heblich, Redding, and Sturm (2018) use data from a single London rm
to show that 90% of workers lived within 5 km of their residence from 1857 to 1877.
15
caused development by 1841. Therefore, specications with second nature controls hide
some of their eects. However, they capture persistent eects of roads and canals in the era
supposedly dominated by railways.
Our list of control variables is large, but even so there are some factors that cannot
be measured. We use several approaches to address unobserved heterogeneity. First, we
include registration district xed eects. Districts are approximately 250 square km, and
within such an area there were factors aecting growth that are similar across units. Our
second approach recognizes that even within a district there could be unobservable factors
correlated with infrastructures. Some can be captured by a variable for population growth
in the decades before railways. Other approaches include panel regressions, propensity score
matching, and instrumental variables. These approaches are discussed in the next section.
The main coecient estimates for equation 2 are shown in table 2. The standard errors
are clustered on registration districts. In column (1), the only explanatory variable is the
indicator for units within 2 km of 1851 stations, which is associated with 24.5 higher log
points of population growth (approximately 28 percentage points or pp). Column (2) adds
indicators for pre-rail infrastructures. Being within 2 km of inland waterways and turnpike
roads has positive and signicant eects on population growth equal to 9.1 and 6.4 log points
respectively. Being within 2 km of ports is associated with 13.9 higher log points of growth
but the coecient is not precisely estimated.
We now consider specications that include more controls. Column (3) in table 2 adds
the rst nature controls. The estimates for these additional variables are not shown to
34
save space. Interested readers should consult table 10 in appendix A.3. The coecients
for railways, turnpike roads, and inland waterways change little. In fact, the estimates
become more precise. But the estimate for ports falls substantially and becomes close to
zero. Examining this specication more closely we nd that being coastal is correlated with
being within 2 km of a port. We will return to the impact of ports later. The specication
in column (4) adds second nature controls. The estimates are broadly similar except the
railway coecient increases in magnitude and the turnpike and inland waterways coecients
decrease. The latter makes sense because some of road and waterway's contribution is being
captured by population density in 1841 and occupational shares in 1851. The specication in
column (5) adds 616 district xed eects (FEs). The coecients on infrastructures decline
but they remain signicant. The specication in (6) adds a control for unit population
growth from 1801 to 1831. The results are nearly identical diminishing concerns about
34 Of most importance we nd that units with coal have 36.7 higher log points population growth from
1841 to 1891.
16
pre-trends.
In our preferred model (5), being close to stations increased the annual growth rate by
0.3%, while being close to inland waterways increased the annual growth rate by 0.1%. In
terms of beta coecients, a one standard deviation increase in the station variable increases
population growth by 0.106 standard deviations. A one standard deviation increase in the
inland waterway and turnpike variables increased population growth by 0.046 and 0.035
standard deviations. These results imply that in terms of explaining population growth,
being close to railways was more important. Nevertheless, it is striking that roads and
inland waterways had quantitatively signicant eects even after accounting for railways
and other factors.
Table 2: Access to infrastructures and local population growth: baseline estimates
Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3) (4) (5) (6)
coe coe coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.) (std. err.) (std. err.)
Indicator dist. to rail station in 1851<2km 0.245*** 0.186* 0.173*** 0.199*** 0.159*** 0.159***
(0.114) (0.099) (0.051) (0.023) (0.019) (0.020)
Indicator dist. to inland waterway in 1830 <2km 0.091** 0.081*** 0.068*** 0.056** 0.054**
(0.042) (0.019) (0.020) (0.019) (0.019)
Indicator dist, to turnpike road in 1830<2km 0.064*** 0.064*** 0.046*** 0.038*** 0.038***
(0.015) (0.015) (0.011) (0.010) (0.010)
Indicator dist. to port in 1842 <2km 0.130 0.009 0.041 0.094 0.094
(0.088) (0.074) (0.068) (0.061) (0.062)
First nature controls No No Yes Yes Yes Yes

Second nature controls No No No Yes Yes Yes
Registration district Fixed eects No No No No Yes Yes
Control for Pop. growth 1801 to 1831 No No No No No Yes
N 9489 9489 9484 9482 9482 9478
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
and second nature controls see table 1.
The preceding conclusion is supported in alternative specications. These are reported in
appendix A.3 table 11. One alternative uses indicators for units within 2 km of 1841, 1861,
or 1871 stations instead of 1851 stations. The coecients are very similar. Using dierent
dates for nearby railway stations does not matter. A second alternative specication uses an
indicator for whether a unit has any railway line in its boundaries. Our estimates show that
units with any railway line had 16.4 more log points of growth. One might nd it surprising
that indicators for stations and railway lines have similar estimated eects. We think the
high number of stations and relative uniformity across lines in EW meant that units close
17
to railway lines were generally close to stations.
A third alternative specication includes an indicator if the unit had more than one
station. For reference 2.4% of units had more than 1 station in 1851. The results in table 11
appendix A.3 show these units had 27 log points higher growth compared to units without
any stations. The coecients for one station, inland waterways and turnpike roads are
similar to before. These ndings make sense since greater station density oered more local
35
and long-distance connections.
A fourth alternative examines the eects of infrastructures over dierent periods. We
regress population growth from 1841 to 1861 on the same variables including all controls.
The same is done for growth from 1841 to 1871 and so on up to 1841 to 1891. The eects of
infrastructure could diminish with time, in which case the coecient should stay the same
or increase slightly as the time frame increases. The results are reported in table 3. The
eects of railways diminished little. In the specication for 1841 to 1861, the 0.074 coecient
implies a 0.35% higher annual growth rate. For 1841 to 1891, the coecient 0.159 implies a
0.30% higher annual growth rate.
The eects of turnpikes and inland waterways are small and insignicant from 1841 to
1871. This era marked the peak of railway inuence as turnpike and canal companies failed
to compete. However, after 1871 their impact becomes larger and more signicant. These
ndings suggest that turnpike roads and inland waterways were put to new uses after 1871.
The estimated eects of ports diminish with time. For example, being within 2 km of
ports increases the annual growth rate by 0.33% up to 1861 and by 0.18% up to 1891. These
ndings are consistent with (1) shipping playing an important role in the mid-nineteenth
century and (2) railways eventually making inroads into markets previously dominated by
coastal shipping.
35 Inanother related specication, we use log meters of 1851 railway line per square km, log meters of
1830 turnpike road per square km, and log meters of 1830 waterway per sq km. The results show a similar
importance of railways. Railways density has a beta coecient of 0.144, waterway density has a beta
coecient of 0.026, and turnpike road density has a beta coecient of 0.04.
18
Table 3: Access to infrastructures and local population growth over dierent periods
Dep. var.: unit pop. growth in 1841 to 61 1841 to 71 1841 to 81 1841 to 91

(1) (2) (3) (4)
coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.)
Indicator dist. to rail station in 1851<2km 0.074*** 0.106*** 0.143*** 0.159***
(0.009) (0.013) (0.016) (0.019)
Indicator dist. to inland waterway in 1830 <2km 0.006 0.017 0.032** 0.054***
(0.007) (0.011) (0.014) (0.019)
Indicator dist, to turnpike road in 1830<2km 0.004 0.007 0.029*** 0.038***
(0.005) (0.007) (0.009) (0.010)
Indicator dist. to port in 1842 <2km 0.069*** 0.071** 0.085* 0.094
(0.026) (0.032) (0.046) (0.062)
First nature controls Yes Yes Yes Yes

Second nature controls Yes Yes Yes Yes
Registration district Fixed eects Yes Yes Yes Yes
Control for Pop. growth 1801 to 1831 Yes Yes Yes Yes
N 9478 9478 9478 9478
7 Addressing endogeneity of stations

Railway stations were not randomly assigned across space and even with controls our
previous estimates could be biased. An upward bias is the most obvious concern. Railway
companies might have selected units with better growth prospects to earn higher future
revenues. If there is an upward bias then perhaps we are over-stating the relative importance
of railways. In this section, we use three approaches to assess the bias and its direction.
The rst approach estimates a panel regression with unit and census year xed eects.
Recall we observe population and whether a unit has a station nearby in each decade from
1801 to 1891. Estimates from a panel regression are shown in appendix A.4 table 14. Briey
they show getting a railway station within 2 km raised unit population density by 16%. They
also reveal pre-trends, in which population grew before stations opened. Therefore, panel
regression estimates could also be biased because the parallel trends assumption is violated.
The second approach applies propensity score matching. The treatment variable is being
within 2 km of an 1851 railway station. We use a parsimonious set of matching covariates:
(1) population density in 1841, (2) the share of male agricultural employment in 1851, (3)
having exposed coal, and (4) population growth from 1801 to 1831. We match exactly one
19
Table 4: Matching estimator for eect of distance to railway stations
Units within 2km 1851 stations (1 vs. 0)
Covariate Standardized dierencesraw Variance ratioraw
Ln pop. per sq. km 1841 1.071 8.184
Has exposed coal 0.291 2.123
Share of 1851 male emp. in agric. -1.137 1.753
Ln dierence pop. 1831 and 1801 0.230 2.517
N 9,489
Covariate Standardized dierencesmatched Variance ratiomatched
Ln pop. per sq. km 1841 0.007 1.010
Has exposed coal -0.020 0.938
Share of 1851 male emp. in agric. 0.004 0.922
Ln dierence pop. 1831 and 1801 -0.016 1.153
N 9,485
Units within 2km 1851 stations (1 vs. 0)
Av. Ln di, pop. 1891 and 1841 Di. in meansraw data Di. in meansmatched data
(standard error) (robust standard error)
0.010 0.244 0.206
(0.015)*** (0.022)***
N 9,489 9,485
Notes: * p<0.05, ** p<0.01, *** p<0.001.
nearest neighbor using the logit model. This set of covariates yields a balanced matched
sample. Table 4 shows the standard dierences in the covariate means are close to zero in
the matched sample but not in the raw data. The bottom rows of table 4 show the average
dierence in means for population growth from 1841 to 1891. In the raw data units within 2
km of stations have 24.4 log points higher population growth. In the matched sample they
have 20.6 log points higher growth. By comparison our preferred OLS specication in table
2 implied being near railways increased population growth by 15.9 log points. Thus, our
parsimonious matching exercise implies slightly larger eects.
Our third and most detailed approach uses an instrumental variable derived from the
36
ìnconsequential places' approach. The key assumption is that some units became close to
railway stations simply because they were on the route designed to connect larger towns at
a low capital cost. In other words, they were not selected based on their potential for future
growth. The rst step in creating the instrument is to select the towns that will be connected
by railways. We start with all English and Welsh towns having a population greater than
37
5000 in 1801. Their larger size meant they were almost certain to get at least one railway
36 See Chandra and Thompson (2000), Michaels (2008), Faber (2014), and Lipscombe et. al. (2013).
37 The data come from Law (1967) and Robson (2006)
20
line connecting them with another town above 5000. But not all large town-pairs would
be connected. Existing levels of trade and communication were often lower between distant
towns or towns of moderate size. A prot-seeking promoter would see little value in building
a railway to connect them. We use a simple gravity model (GM) to calculate the relative
value of connecting any town-pairs each with a population above 5000. The equation for
P opi P opj
town pairs i and j is GMij = Distij
, where Distij is the straight line distance between
town i and j .
Next we identied a least cost path (LCP) connecting town pairs above a threshold
GMij > 10, 000.38 We assume that in considering their routes, railway companies tried to
minimize the construction costs considering distance and elevation slope. We use construc-
tion cost data for railways built in the 1830s and early 1840s. We also measure the distance
of the lines and total elevation changes between towns at the two ends of the line. The
construction cost is then regressed on the distance and the elevation change to identify the
parameters (the details are in appendix A.1). Based on this analysis we nd a baseline
construction cost per km when the slope is zero and for every 1% increase in slope the
construction cost rises by three times the baseline (costperkm = 1 + 3 ∗ slope%). Next, we
use this formula to identify the LCP connecting our town pairs above the threshold. The
result is a network of candidate railway lines.
The LCP network is shown in the right of gure 3. The left shows the real railway
network in 1851. The overlap is fairly high. Locations close to the LCP are also generally
close to railway stations because they were so numerous along the line.
We use an indicator for being within 2 km of the LCP as our instrument for within 2 km
of stations. The exclusion restriction requires that the instrument only aects population
growth between 1841 and 1891 through its eect on station access. We think this assumption
is plausible under two conditions. First, units containing the town nodes used to construct
the LCP are excluded. They were clearly targeted by railways for their size and possibly
their growth potential. Second, the regression model should contain distance to pre-rail
infrastructures as control variables. If omitted, then one might worry the instrument aects
growth partly through road and waterway access.
We provide a `plausibility check' for the exclusion restriction by testing whether less
than 2 km from the LCP is correlated with unit population growth between 1801 and 1831.
The results are shown in table 5. Note in all specications we exclude 364 units within 2
km of the town nodes used to construct the LCP. The standard errors are always clustered
38 The 10,000 threshold is arbitrary, but as shown below this threshold does a good job predicting the
location of lines and stations.
21
Figure 3: The rail network in 1851 and the least cost path (LCP) network
Sources: see text.
on the registration district. In column (1), being within 2 km of the LCP is positively
and signicantly associated with higher population growth from 1801 to 1831. Columns (2)
and (3) show the same result holds after including district FEs and rst nature controls.
The conclusion changes in column (4), which adds pre-rail infrastructure controls. In this
specication, being within 2 km of the LCP is not signicantly associated with higher
population growth from 1801 to 1831. In column (5) we add second nature controls and
the results are unchanged. Similar specications use decade population growth (e.g. from
1811 to 1821) as dependent variables. The results are reported in the appendix A.3 table
12. None nds a large and signicant eect from being within 2 km of the LCP.
22
Table 5: Pre-trend tests for the validity of instrument distance to LCP
Dep. var.: unit pop. growth 1801 to 1831 (1) (2) (3) (4) (5)
coe coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.) (std. err.)
Distance to LCP for railways <2k 0.0303*** 0.0188* 0.0178* 0.0098 0.0057
(0.008) (0.007) (0.007) (0.007) (0.007)
Indicator dist. to inland waterway in 1830 <2km 0.0362*** 0.0181*
(0.008) (0.007)
Indicator dist. to turnpike road in 1830<2km 0.0325*** 0.0111*
(0.005) (0.005)
Indicator dist. to port in 1842 <2km 0.0806** 0.0566*
(.026) (0.025)
Units with 2 km of LCP nodes removed? Yes Yes Yes Yes Yes
First nature controls No No Yes Yes Yes
Second nature controls No No No No Yes
Registration district Fixed eects No Yes Yes Yes Yes
N 9121 9121 9116 9116 9114
The previous results conrm the importance of including pre-rail infrastructures as con-
trols in the IV specication for railway access. However, one might worry that turnpike
roads, waterways, and ports are endogenous too. To address this issue, we also estimate a
specication instrumenting with indicators for the LCP and 1680 main roads, waterways,
and ports. Using historic infrastructure to instrument for later infrastructure is another
common approach in the literature (e.g. Duranton and Turner 2012).
The IV results for the baseline model are shown in column (2) of table 6 along with
39
the OLS for comparison in (1). The Kleibergen-Paap F statistic is fairly large indicating
there is not a weak instruments problem. The IV estimate implies that being within 2 km
of a railway station caused population growth to rise by 35.5 log points. In OLS the same
estimate is 16.5 log points. Note that the IV estimate is less precise, but it is statistically
signicant at the 10% level. The lower precision is expected given the instrument needs to
predict whether a unit is within 2 km of a station.
39 The OLS and IV models excludes units within 2 km of the LCP. The estimates are similar if they are
included, but we think it is appropriate to drop them as explained earlier.
23
Table 6: Railway stations and population growth: IV estimates
Dep. var.: unit pop. growth 1841 to 1891 OLS IV rail IV rail IV rail, road, water, and port
(1) (2) (3) (4)
coe coe coe coe
variable (t-stat) (t-stat) (t-stat) (t-stat)
Indicator distance to 1851 railway station <2km 0.165*** 0.355* 0.368* 0.376*
(0.019) (0.193) (0.191) (0.196)
Indicator distance to 1830 inl. waterway <2km 0.048** 0.031 0.027 -0.012
(0.019) (0.026) (0.026) (0.037)
Indicator distance to 1830 turnpike road<2km 0.034*** 0.027** 0.027** 0.127**
(0.009) (0.011) (0.011) (0.046)
Indicator distance to 1842 port <2km 0.141** 0.139** 0.140** 0.271**
(0.068) (0.067) (0.065) (0.138)
Kleibergen-Paap rk Wald F statistic 33.12 33.64 8.24
Variables for pop. growth 1821 to 31 and 1831 to 41 No No Yes No

Units with 2 km of LCP nodes removed? Yes Yes Yes Yes
N 9118 9118 9118 9118
and second nature controls see table 1. The instrument for being within 2 km of rail is an indicator for being within 2 km of
the LCP. The instruments for being within 2 km of 1830 turnpike roads, inland waterways, and ports are the equivalent for
1680 main roads, waterways, and ports.
The increased size of the IV estimate is contrary to the argument that railways were
built in units with unobservable factors related to higher growth. One speculation is that
individuals anticipated the building of railways. In order to gain from increased property
values or employment, they moved to future railway units prior to 1841. In that case, one
might expect OLS to yield a downward estimate for population growth from 1841 to 1891.
We test for this type of mechanism by including controls for population growth from 1821
to 31 and from 1831 to 41. They would capture the movement into units just before railway
stations opened. The IV estimates are fairly similar as shown in column (3) of table 6.
Therefore, it does not appear that OLS is downward biased because of anticipation eects.
There is another explanation. According to Kellet (2012), railway companies sometimes
selected routes that went through dilapidated residential areas. These presented less political
opposition and they tended to have a single landowner making right-of-way negotiations
easier. If this was generally true, then units within 2 km of stations probably had negative
24
growth potential in the absence of railways.
The nal IV specication instruments for all infrastructure variables (column 4). The
Kleibergen-Paap F statistic is smaller in this case, so these results need to be interpreted
with caution. The estimated eect of being close to stations is very similar, suggesting
our estimate for railways is not aected by endogeneity of turnpike roads, waterways, and
ports. These results also show that units close to ports and turnpike roads grow signicantly
more even in the IV model. The eect of inland waterways are close to zero, but the same
is true in the previous IV specications (see columns 2 and 3 in table 6). Overall the IV
results further conrm the importance of at least three infrastructures for nineteenth century
growth (ports, turnpikes, and railways).
8 Reorganization and Heterogeneous eects

Our analysis thus far does not account for spatial reorganization. Yet there is some
evidence it mattered. One of the leading historians argues "the railway did not necessarily
produce growth in population or business. It might take people or business away (Simmons
1986 p. 16)." Redding and Turner (2015) propose a method to identify reorganization eects.
They suggest dening a control group more distant from infrastructure and to compare them
with a set of `treated' groups nearby. In our setting one might expect that units just beyond
the 1.5 or 2-hour commuting distance to infrastructures (6 or 8 km) might lose population
due to out-migration to closer units. To identify such an eect, we use units beyond 10 km
as the control group. This approach is not perfect because we don't know if units 10 km
away from infrastructures are truly unaected. Nevertheless, this approach yields insights
on the relative growth eects of infrastructure at varying distances up to 10 km.
We estimate a model with ve distance bins to stations, inland waterways, and ports: 0
to 2, 2 to 4, 4 to 6, 6 to 8, and 8 to 10 km. For turnpikes, around 1% of units were more
than 10 km so we continue to use the simple indicator for being less than 2 km as the only
treatment. The results are reported in table 7. Units 0 to 2 km, 2 to 4 km, and 4 to 6
km from 1851 stations all have higher population growth relative to units more than 10 km
from stations. We also nd that population growth is not signicantly dierent from zero
in units between 6 and 10 km from stations.
25
Table 7: Population growth at varying distances from infrastructures
coe coe coe

variable (std. err.) variable (std. err.) variable (std. err.)
rail station <2km 0.225*** waterway <2km 0.068*** Port <2km 0.168**
(0.027) (0.025) (0.068)
rail station >2km & <4km 0.1087*** waterway >2km & <4km 0.039* Port >2km & <4km 0.167***
(0.023) (0.021) (0.045)
rail station >4km & <6km 0.0438** waterway>4km & <6km 0.028 Port >4km & <6km 0.018
(0.021) (0.018) (0.026)
rail station >6km & <8km 0.071 waterway>6km & <8km -0.007 Port >6km & <8km 0.015
(0.020) (-0.018) (0.028)
rail station >8km & <10km 0.0133 waterway>8km & <10km -0.010 Port >8km & <10km -0.001
(0.022) (0.016) (0.021)
turnpike <2km 0.030
(0.010)***
First nature controls Yes

Second nature controls Yes
Reg. District Fixed eects Yes
N 9482
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district. For the list of rst and second nature
controls see table 1.
The estimates for waterways and ports yield a similar conclusion. Units within 4 km
had higher growth compared to units more than 10 km from inland waterways and ports.
By contrast, units between 4 and 10 km of either did not grow signicantly less than units
more than 10 km. Also striking is that units within 4 km of ports increased growth by
approximately the same amount as units within 4 km of railway stations. However, ports
cannot explain as much of the variation. The beta coecients for having railway stations
within 0 to 2 and 2 to 4 km are 0.151 and 0.078. The same for having ports within 0 to 2
and 2 to 4 km are 0.053 and 0.067.
The main takeaway from table 7 is that units just outside the commuting zone of stations
or other infrastructures did not grow less than units farther from the commuting zone. This
nding does not imply that spatial reorganization was absent. Railways might have pulled
in population equally from units between 6 and 10 km as they did from units between 10
and 15 km or farther. In other words, these results do not rule out higher net migration as
the key reason population grew more near railway stations.
The signicance of migration over fertility and mortality eects is supported by a more
aggregated data analysis. We observe the fertility rate (number of children per woman)
26
and the infant mortality rate (number of children born per 1000 that died before their rst
40
birthday) at the sub-registration district level at each decennial census from 1851. Sub-
districts are 70 square km on average and equal about 4 or 5 of our units. There is also
data at the sub-district level on the percentage of the population that is Irish born and the
number of working age men per 100 working age women. The percentage born in Ireland
is a good indicator of in-migration. The sex ratio is more subtle. A decline in working age
males to females is thought to have been caused by greater in-migration of young women to
work as servants. Of course, this assumes women are more mobile than men, which is not
true in all cases.
To make use of this data we need to match sub-districts across time. Unfortunately, the
sub-districts are not always spatially consistent. We matched sub-districts in 1851 and 1861
based on name and total land area. At this step, we lose about 8% of sub-districts due to
boundary and name changes. We also link our earlier units to sub-districts to identify which
had at least one railway station in 1851. This second step reduces our sample to about 75%
of all sub-districts based on inconsistency in names.
Table 8 reports specications that regress the change in demographic or migration vari-
41
ables from 1851 to 1861 on an indicator for having at least one station in 1851. Panel A
report specications for the change in fertility. Column (1) includes only the station vari-
able. It shows the fertility rate decreased more in sub-districts with stations. On average
fertility rates change by -0.014 and therefore the coecient -0.048 is fairly large. Column
(2) adds a quadratic in sub-district latitude and longitude. The coecient on stations is
similar. Column (3) adds county xed eects. Now the coecient decreases in size and is
no longer signicant. If anything, these results go against population growth near stations
being caused by higher fertility.
Panels B, C, and D analyze changes in infant mortality, the % Irish born, and the male
to female ratio respectively using the same specications. Railways do not have a signicant
eect on infant mortality in any specication. Changes in the % Irish born are positively
associated with stations in the rst two specications without county xed eects. The
average change in % Irish born is 0.102 percentage points, indicating a fairly large eect
40 This data comes from Populations Past. https://www.populationspast.org/imr/1861/#7/53.035/-

2.895. This data has been produced by the 'Atlas of Victorian Fertility Decline' project (PI: A.M. Reid) with
funding from the ESRC (ES/L015463/1), using an enhanced version of data from Schurer, K. and Higgs, E.
(2014). Integrated Census Microdata (I-CeM), 1851-1911. [data collection]. Colchester, Essex: UK Data
Archive [distributor]. SN: 7481, http://dx.doi.org/10.5255/UKDA-SN-7481-1. Dataset last updated: 24th
May 2018.
41 We also use the data to run a regression of sub-district population growth from 1851 to 1861 on an
indicator for having at least one station in 1851. The results are reported in table 13 appendix A.3. They
conrm our earlier conclusion that being near railway stations increased population growth.
27
from stations. Changes in the male to female ratio are negatively associated with railway
stations in the rst two specications, but again the estimate is not signicant with county
xed eects. Overall these results support the argument that railways grew population
through in-migration with the caveat that the estimates are not always precise.
Table 8: Stations, demography, and migration: estimates for sub-districts
Panel A ∆ fertility rate Panel B ∆ inf. mortality rate

(1) (2) (3) (4) (5) (6)
variable (t-stat) (t-stat) (t-stat) (t-stat) (t-stat) (t-stat)
Indicator for station in 1851 -0.0487** -0.0662*** -0.0284 0.0348 -0.0296 -0.237
(0.0232) (0.0226) (0.0219) (0.282) (0.290) (0.402)
Quadratic in lat. and long. No Yes Yes No Yes Yes

County xed eects No No Yes No No Yes
N 1,568 1,568 1,568 1,360 1,360 1,360
Panel C ∆ % Irish born Panel D ∆ male to female ratio
(7) (8) (9) (10) (11) (12)
variable (t-stat) (t-stat) (t-stat) (t-stat) (t-stat) (t-stat)
Indicator for station in 1851 0.141** 0.127*** 0.0540 -1.130** -1.248** -1.045
(0.0533) (0.0445) (0.0494) (0.519) (0.486) (0.681)
Quadratic in lat. and long. No Yes Yes No Yes Yes

County xed eects No No Yes No No Yes
N 1,591 1,591 1,591 1,340 1,340 1,340
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on county in all specications.
Assuming that migration was the main factor, then one may ask: did railways draw
population into more or less dense units? If the answer is more dense, then railways likely
42
raised the productivity of migrants due to agglomeration eects. We test for heterogeneous
eects by 1841 density using the following specication.
rail
yi1891 − yi1841 = β0 Ii1851 rail
+ β1 Ii1851 rail
lnpop41i + β2 Ii1851 (lnpop41i )2 + γxi + εi (3)
where the natural log of 1841 population density and its square are interacted with the
indicator for being within 2 km of an 1851 station. The quadratic formulation is exible
and allows for non-linear eects. Note that 1841 population density and its square are
included as controls in xi , along with district FEs and rst and second nature controls.
42 See Fujita et. al. (2001) and Desment and Rossi-Hansberg (2014) for agglomeration models.
28
Figure 4: Heterogeneity with initial population density
Sources: see text.
The estimates show that being close to railway stations had a signicantly larger growth
eect for units with medium to large population density. To illustrate, we plot our predicted
population growth for units between the 5th and 95th percentiles in 1841 population density.
One prediction is for units less than 2 km from 1851 stations and the other is for units more
than 2 km from stations (see gure 4). Railways have their largest eect for population
densities between the 75th and 90th percentiles. The increase in population was around 25
percentage points for these units. At the 50th percentile railways increased population by
around 16 pp. To put these gures into perspective, a unit at the 85th percentile of 1841
population density was 183% more populous than a unit at the 50th percentile. How did this
density matter? Leunig and Crafts estimate that doubling a town's population increased its
43
wages by 11% in 1868. Using this gure, if railways reallocated population from the 50th
to the 85th percentile then it would raise their wages by 20%a signicant change.
We use a similar methodology to test for an interaction eect between proximity to inland
waterways and 1841 population density. The predictions are summarized in the right-hand
panel of gure 4. They also show bigger eects on units around the 75th percentile. It
appears that inland waterways also had productivity enhancing eects on migrants.
9 Persistence results
We have shown that units within a short commuting or shipping distance of infrastruc-
tures c.1840 aected population growth in the nineteenth century. Now we want to know if
they aected population growth in the twentieth century and up to the present. This issue
43 See Crafts and Leunig, 'Transport improvements '.
29
is related to the impact of adopting railways at an early stage. Previous studies show that
44
some infrastructures, like railways, have signicant persistent eects. If turnpike roads,
waterways, and ports also have signicant persistent eects then this would cast further
doubt on railways being indispensable for all urbanisation.
Persistence is tested using our historical units merged with Lower Super Output Areas
(LSOAs) in 2011. We estimate the following `very' long dierences specication
rail pre−rail
yi2011 − yi1891 = β1 Ii1851 + β2 Ii1840 + γxi + εi (4)
where the dependent variable yi2011 − yi1891 measures the log dierence in population 1891 to
rail pre−rail
2011. The variables Ii1851 and Ii1840 are indicators for being within 4 km of mid-nineteenth
century infrastructures.
The results are reported in table 9. Column (1) includes 1841 population density as a
control along with all the others in table 1. Column (2) shows a similar specication but
replaces 1841 with 1891 population density as a control. The results are similar. In (1)
being within 4 km of 1851 railway stations increases population growth by 29.4 log points
from 1891 to 2011, or 0.21% higher annual growth. The coecients also reveal that being
within 4 km of turnpike roads and inland waterways increased annual growth by 0.10%. The
same for ports increased annual growth by 0.18%. The beta coecients show that railways
explain more of the variation in population growth (0.14 for railways compared to 0.058,
0.045, and 0.066 for waterways, turnpikes, and ports). Perhaps more striking is how much
growth is explained by the pre-rail infrastructures. While we cannot trace out exactly why
pre-rail matters so much up to the present, it is likely their uses evolved with the modern
era. For example, inland waterways are now seen as amenities in many areas.
44 See
Bleakley and Lin (2011), Redding, Sturm, and Wolf (2011), Garcia-López et. al. (2015), Jedwab
and Moradi (2016).
30
Table 9: Infrastructure access and population growth over the very long run
Dep. var.: unit pop. growth 1891 to 2011 (1) (2)

OLS OLS
coe coe
variable (std. err.) (std. err.)
Indicator distance to 1851 railway station <4km 0.294*** 0.338***
(0.031) (0.031)
Indicator distance to 1830 inland waterway <4km 0.115*** 0.129***
(0.032) (0.031)
Indicator distance to 1830 turnpike road<4km 0.124*** 0.121***
(0.031) (0.031)
Indicator distance to 1842 port <4km 0.245*** 0.293***
(0.062) (0.067)
Control for 1841 pop. density Yes No

Control for 1891 pop. density No Yes
First nature controls Yes Yes
Second nature controls Yes Yes
Reg. district Fixed eects Yes Yes
N 9481 9481
10 Conclusion
This paper examines whether railway's eect on urbanisation was substantially larger
than other infrastructures, like turnpike roads, canals, and ports. The paper has several
main points. First, England had a well-developed transport network before railways. It had
many good roads, a huge inland waterway network based on improved rivers and canals,
and numerous ports along its coastline. These early transport improvements helped develop
the English economy before the 1830s when railways started to spread. Many of these
older transport modes went into decline in the mid-nineteenth century when faced with
competition from railways. However, some of these modes had a resurgence as their uses
changed. As our estimates show population growth from 1841 to 1891 was higher near
railway stations, but it was also high near turnpike roads, waterways, and especially ports.
This case reminds us that the impacts of a single infrastructural improvement, however
large, may not be the only important factor aecting urbanisation.
Second, this paper shows that the location of transport infrastructures can help explain
the spatial patterns of economic growth during the later phases of the industrial revolution.
31
Previous studies have not been able to identify the eects of all infrastructures and not at a
disaggregated level. Using our preferred OLS estimates, a counterfactual calculation implies
that if no units were within 2 km of railways then aggregate population growth would have
been 20% lower between 1841 and 1891. A dierent counterfactual implies that if no units
were within 2 km of railways, waterways, or turnpike roads then aggregate growth would
have been 35% lower.
Third, we provide evidence that railways mainly grew population by attracting migrants.
Higher fertility near stations, another candidate mechanism, is rejected by the data. How
then did railways grow the economy beyond providing a new transport mode with better
attributes? Railways increased productivity by moving the population from low to high
density areas, where agglomeration was present.
Fourth, we show that population growth in England and Wales between 1891 and 2011
was inuenced by infrastructures in the mid-nineteenth century. This suggests the policy
decisions made today regarding transport infrastructure will have eects on urbanisation for
decades, perhaps even centuries.
References
1. Aldcroft, Derek Howard, and H. J. Dyos. British transport: an economic survey from the
seventeenth century to the twentieth. Penguin Books, 1974.
2. Alvarez, Eduard, Xavi Franch, and Jordi Martí-Henneberg. "Evolution of the territorial
coverage of the railway network and its inuence on population growth: The case of England
and Wales, 18711931." Historical Methods: A Journal of Quantitative and Interdisciplinary
History 46.3 (2013): 175-191.
3. Alvarez, E, Dunn, O., Bogart, D., Satchell, M., Shaw-Taylor, L. , 'Ports of England and
Wales, 1680-1911', 2017.
4. Armstrong, John. The Vital Spark: The British Coastal Trade, 1700-1930. International
Maritime Economic History Association, 2009.
5. Atack, Jeremy, Fred Bateman, Michael Haines, and Robert A. Margo. "Did railroads induce
or follow economic growth?." Social Science History 34, no. 2 (2010): 171-197.
6. Atack, Jeremy, and Robert A. Margo. "The Impact of Access to Rail Transportation on
Agricultural Improvement: The American Midwest as a Test Case, 1850-1860." Journal of
Transport and Land Use 4.2 (2011).
7. Avery, Brian William. Soil classication for England and Wiles: higher categories. No.
631.44 A87. 1980.
32
8. Baines, Dudley. Migration in a mature economy: emigration and internal migration in
England and Wales 1861-1900. Vol. 3. Cambridge University Press, 2002.
9. Baum-Snow, N., Brandt, L., Henderson, J. V., Turner, M. A., & Zhang, Q. (2017). Roads,
railroads, and decentralization of Chinese cities. Review of Economics and Statistics, 99(3),
435-448.
10. Becker, Sascha O., Erik Hornung, and Ludger Woessmann. "Education and catch-up in the
industrial revolution." American Economic Journal: Macroeconomics (2011): 92-126.
11. Berger, Thor, and Kerstin Eno. "Locomotives of local growth: The short-and long-term
impact of railroads in Sweden." Journal of Urban Economics (2015).
12. Bleakley, Hoyt, and Jerey Lin. "Portage and path dependence." The quarterly journal of
economics 127.2 (2012): 587-644.
13. Bogart, Dan. "Turnpike trusts and the transportation revolution in 18th century England."
Explorations in Economic History 42.4 (2005): 479-508.
14. Bogart, Dan. "Turnpike trusts and property income: new evidence on the eects of trans-
port improvements and legislation in eighteenth-century England 1." The Economic History
Review 62.1 (2009): 128-152.
15. Bogart, Dan. The Transport Revolution in Industrializing Britain, in Floud, Roderick, Jane
Humphries, and Paul Johnson, eds. The Cambridge Economic History of Modern Britain:
Volume 1, Industrialisation, 17001870. Cambridge University Press, 2014.
16. Bogart, Dan. "Party Connections, Interest Groups and The Slow Diusion of Infrastructure:
Evidence From Britain'S First Transport Revolution." The Economic Journal 128.609 (2017):
541-575.
17. Bogart, Dan. `The Turnpike Roads of England and Wales', in The Online Historical Atlas
of Transport, Urbanization and Economic Development in England and Wales c.1680-1911.
Eds. L. Shaw-Taylor, D. Bogart and A.E.M. Satchell, 2017.
18. Bogart, Dan, Michael Lefors, and A. E. M. Satchell. "Canal carriers and creative destruction
in English transport." Explorations in Economic History (forthcoming).
19. Boughey, J., & Hadeld, C. (2012). British Canals: The Standard History. The History
Press.
20. Campbell, Gareth, and John D. Turner. "Dispelling the Myth of the Naive Investor during
the British Railway Mania, 18451846." Business History Review 86.01 (2012): 3-41.
33
21. Campbell, Gareth, and John D. Turner. "Managerial failure in mid-Victorian Britain?:
Corporate expansion during a promotion boom." Business History 57.8 (2015): 1248-1276.
22. Casson, Mark. The world's rst railway system: enterprise, competition, and regulation on
the railway network in Victorian Britain. Oxford University Press, 2009.
23. Casson, Mark. "The determinants of local population growth: A study of Oxfordshire in the
nineteenth century." Explorations in Economic History 50.1 (2013): 28-45.
24. Chandra, Amitabh, and Eric Thompson. "Does public infrastructure aect economic ac-
tivity?: Evidence from the rural interstate highway system." Regional Science and Urban
Economics 30.4 (2000): 457-490.
25. Clayden, Benjamin, and John Marcus Hollis. Criteria for dierentiating soil series. No. Tech
Monograph 17. 1985.
26. Cormen, Thomas H., Charles E Leiserson, Ronald L Rivest and Cliord Stein: Introduction
to Algorithms, Cambridge, MA, MIT Press (3rd ed., 2009) pp.695-6.
27. Crafts, Nicholas, and Abay Mulatu. "How did the location of industry respond to falling
transport costs in Britain before World War I?." The Journal of Economic History 66.03
(2006): 575-607.
28. Crafts, Nicholas, and Nikolaus Wolf. "The location of the UK cotton textiles industry in
1838: A quantitative analysis." The Journal of Economic History 74.04 (2014): 1103-1139.
29. Crafts, Nicholas, and Tim Leunig. Transport improvements, agglomeration economies and
city productivity: did commuter trains raise nineteenth century British wages?', working
paper.
30. Desmet, Klaus, and Esteban Rossi-Hansberg. "Spatial development." The American Eco-
nomic Review 104.4 (2014): 1211-1243.
31. Donaldson, Dave. Railroads of the Raj: Estimating the impact of transportation infrastruc-
ture. No. w16487. National Bureau of Economic Research, 2010.
32. Donaldson, Dave, and Richard Hornbeck. "Railroads and American economic growth: A
market access approach." The Quarterly Journal of Economics 131.2 (2016): 799-858.
33. Duranton, Gilles, and Matthew A. Turner. "Urban growth and transportation." The Review
of Economic Studies 79.4 (2012): 1407-1440.
34
34. Faber, Benjamin. "Trade integration, market size, and industrialization: evidence from
China's National Trunk Highway System." Review of Economic Studies 81.3 (2014): 1046-
1070.
35. Fernihough, Alan, and Kevin Hjortshøj O'Rourke. Coal and the European industrial revolu-
tion. No. w19802. National Bureau of Economic Research, 2014.
36. Fishlow, Albert. American Railroads and the Transformation of the Ante-bellum Economy.
Vol. 127. Cambridge, MA: Harvard University Press, 1965.
37. Fogel, R. "Railways and American Economic Growth." Baltimore: Johns Hopkins Press.(1964).
38. Freeman, Michael J., and Derek H. Aldcroft, eds. Transport in Victorian Britain. Manchester
University Press, 1991.
39. Fujita, Masahisa, Paul R. Krugman, and Anthony Venables. The spatial economy: Cities,
regions, and international trade. MIT press, 2001.
40. Garcia-López, Miquel-Àngel, Adelheid Holl, and Elisabet Viladecans-Marsal. "Suburbaniza-

tion and highways in Spain when the Romans and the Bourbons still shape its cities." Journal
of Urban Economics 85 (2015): 52-67.
41. Gourvish, Terence Richard. Railways and the British economy, 1830-1914. Macmillan Inter-
national Higher Education, 1980.
42. Gregory, Ian N., and Jordi Martí Henneberg. "The railways, urbanization, and local demog-
raphy in England and Wales, 18251911." Social Science History 34.2 (2010): 199-228.
43. Hawke, Gary Richard. Railways and economic growth in England and Wales, 1840-1870.
Clarendon Press, 1970.
44. Heblich, Stephan, Stephen J. Redding, and Daniel M. Sturm. The Making of the Modern
Metropolis: Evidence from London. No. w25047. National Bureau of Economic Research,
2018.
45. Hodgson, Charles. "The eect of transport infrastructure on the location of economic activity:
Railroads and post oces in the American West." Journal of Urban Economics 104 (2018):
59-76.
46. Hornung, Erik. "Railroads and growth in Prussia." Journal of the European Economic As-
sociation 13.4 (2015): 699-736.
35
47. Jedwab, Remi, Edward Kerby, and Alexander Moradi. "History, path dependence and de-
velopment: Evidence from colonial railroads, settlers and cities in Kenya." The Economic
Journal (2015).
48. Jedwab, Remi, and Alexander Moradi. "The permanent eects of transportation revolutions
in poor countries: evidence from Africa." Review of economics and statistics 98.2 (2016):
268-284.
49. Jarvis A., H.I. Reuter, A. Nelson, E. Guevara (2008). Hole-lled seamless SRTM data V4, In-
ternational Centre for Tropical Agriculture (CIAT), available from http://srtm.csi.cgiar.org.
50. Jaworski, Taylor, and Carl T. Kitchens. "National Policy for Regional Development: Histor-
ical Evidence from Appalachian Highways." (2017).
51. Kellett, John R. The impact of railways on Victorian cities. Routledge, 2012.
52. Klein, Alexander, and Nicholas Crafts. "Making sense of the manufacturing belt: determi-
nants of US industrial location, 18801920." Journal of Economic Geography 12.4 (2012):
775-807.
53. Langton, John, and Robert John Morris. Atlas of industrializing Britain, 1780-1914. Rout-
ledge, 2002.
54. Law, Christopher M. "The growth of urban population in England and Wales, 1801-1911."
Transactions of the Institute of British Geographers (1967): 125-143.
55. Leunig, Timothy. "Time is money: a re-assessment of the passenger social savings from
Victorian British railways." The Journal of Economic History 66.3 (2006): 635-673.
56. Lipscomb, Molly, Mushq A. Mobarak, and Tania Barham. "Development eects of electri-
cation: Evidence from the topographic placement of hydropower plants in Brazil." American
Economic Journal: Applied Economics 5.2 (2013): 200-231.
57. Long, Jason. "Rural-urban migration and socioeconomic mobility in Victorian Britain." The
Journal of Economic History 65.1 (2005): 1-35.
58. Martí-Henneberg, J., Satchell, M., You, X., Shaw-Taylor, L., Wrigley E.A., 'England Wales
and Scotland rail lines shapele' (2017a).
59. Martí-Henneberg, J., Satchell, M., You, X., Shaw-Taylor, L., Wrigley E.A., 'England, Wales
and Scotland railway stations 1807-1994 shapele' (2017b).
60. Michaels, Guy. "The eect of trade on the demand for skill: Evidence from the interstate
highway system." The Review of Economics and Statistics 90.4 (2008): 683-701.
36
61. Odlyzko, Andrew. "Collective hallucinations and inecient markets: The British Railway
Mania of the 1840s." University of Minnesota (2010).
62. O'Rourke, Kevin H. "The European grain invasion, 18701913." The Journal of Economic
History 57.4 (1997): 775-801.
63. Pascali, Luigi. "The wind of change: Maritime technology, trade, and economic develop-
ment." American Economic Review 107.9 (2017): 2821-54.
64. Pascual Domènech, P. (1999). Los caminos de la era industrial: la construcción y nanciación
de la red ferroviaria catalana, 1843-1898 (Vol. 1). Edicions Universitat Barcelona.
65. Pawson, Eric. Transport and economy: the turnpike roads of eighteenth century Britain.
Academic Press, 1977.
66. Pope, Alexander, and D. Swann. "The pace and progress of port investment in England
16601830." Bulletin of Economic Research 12.1 (1960): 32-44.
67. Poveda, G. (2003). El antiguo ferrocarril de Caldas. Dyna, 70 (139), pp. 1-10.
68. Purcar, Cristina. "Designing the space of transportation: railway planning theory in nine-
teenth and early twentieth century treatises." Planning Perspectives 22.3 (2007): 325-352.
69. Redding, Stephen J., Daniel M. Sturm, and Nikolaus Wolf. "History and industry location:
evidence from German airports." Review of Economics and Statistics 93.3 (2011): 814-831.
70. Redding, Stephen J., and Matthew A. Turner. "Transportation costs and the spatial organi-
zation of economic activity." Handbook of regional and urban economics. Vol. 5. Elsevier,
2015. 1339-1398.
71. Redford, Arthur. Labour migration in England, 1800-1850. Manchester University Press,
1976.
72. Riley, S. J., S. D. Gloria, and R. Elliot (1999). A terrain Ruggedness Index that quanties
Topographic Heterogeneity, Intermountain Journal of Sciences, 5(2-4), 23-27.
73. Robson, Brian T. Urban growth: an approach. Vol. 9. Routledge, 2006.
74. Rosevear, A., Satchell, M., Bogart, D., Shaw Taylor, L., Aidt, T. and Leon, G., 'Turnpike
roads of England and Wales, 1667-1892', 2017.
75. Satchell, M. 'Identifying the Trunk Roads of Early Modern England and Wales,' 2017.
76. Satchell, M. and Shaw-Taylor, L., Èxposed coalelds of England and Wales' 2013.
37
77. Satchell, M., Shaw-Taylor, L., Wrigley E.A., '1680 England and Wales navigable waterways
shapele', 2017a.
78. Satchell, M., Shaw-Taylor, L., Wrigley E.A., '1830 England and Wales navigable waterways
shapele', 2017b.
79. Schurer, K., Higgs, E. (2014). Integrated Census Microdata (I-CeM), 1851-1911. [data
collection]. UK Data Service. SN: 7481, http://doi.org/10.5255/UKDA-SN-7481-1.
80. Shaw-Taylor, L. and Wrigley, E. A. Occupational Structure and Population Change, in
Floud, Roderick, Jane Humphries, and Paul Johnson, eds. The Cambridge Economic History
of Modern Britain: Volume 1, Industrialisation, 17001870. Cambridge University Press,
2014.
81. Simmons, Jack. The railway in town and country, 1830-1914. (1986).
82. Small, Kenneth. Urban transportation economics. Taylor & Francis, 2013.
83. Storeygard, Adam. "Farther on down the road: transport costs, trade and urban growth in
sub-Saharan Africa." The Review of Economic Studies 83.3 (2016): 1263-1295.
84. Tang, John P. "Railroad expansion and industrialization: evidence from Meiji Japan." The
Journal of Economic History 74.03 (2014): 863-886.
85. Tang, John P. "The Engine and the Reaper: Industrialization and mortality in late nineteenth
century Japan." Journal of health economics 56 (2017): 145-162.
86. Turnbull, Gerard. "Canals, coal and regional growth during the industrial revolution." The
Economic History Review 40.4 (1987): 537-560.
87. Wellington, A.M. The Economic Theory of the Location of Railways: An Analysis of the
Conditions Controlling the Laying Out of Railways to Eect the Most Judicious Expenditure
of Capital. Ed. J. Wiley & sons, 1877.
88. Willan, Thomas Stuart. River navigation in England, 1600-1750. Psychology Press, 1964.
89. Wrigley, Edward Anthony. Energy and the English industrial revolution. Cambridge Uni-
versity Press, 2010.
90. Wrigley, E. A. The PST system of classifying occupations, Working paper 2015.
91. You, Xuesheng. Women's employment in England and Wales, 1851-1911, University of Cam-
bridge, unpublished phd dissertation, 2014.
38
92. U.S. Census Bureau, 2012-2016 American Community Survey 5-year estimates.
A Appendices:
A.1 The least cost path instrument
In this appendix, we describe the instrument for distance to railway stations. The rst
step is to select the nodes of the hypothetical network and then which nodes will become
origins and destinations connected by the least cost path (LCP). The candidate nodes are all
the towns with a population over 5,000 inhabitants in 1801. These were the major population
centers. Each pair of towns, both with a population above 5000, is a potential origin and
destination for railway lines. A gravitational model selects the origins and destinations that
will be connected based on an approximation for the value of trade between the potential
origin and destination. We assume the value of connecting an origin and destination pair is
P opi P opj
given by GMij = Distij
GMij is the gravitational potential between town i and j,
, where
P opi is the 1801 population of town i, and Distit is the straight line distance between i and
j. We chose the town pair i and j as origins and destinations in our LCP if GMij > 10, 000.
The second step is to identify the LCP connecting our nodes. The main criteria used
to plan linear projects is usually the minimization of earth-moving works. Assuming that
the track structure (composed by rails, sleepers and ballast) is equal for the entire length,
it is in the track foundation where more dierences can be observed. Thus, terrains with
higher slopes require larger earth-moving and, in consequence, construction costs become
higher (Pascual 1999, Poveda 2003, Purcar 2007). The power of traction of the locomotives
and the potential adherence between wheels and rails could be the main reason. Besides,
it is also important to highlight that having slopes over 2% might imply the necessity of
building tunnels, cut-and-cover tunnels or even viaducts. The perpendicular slope was also
crucial. During the construction of the track section, excavation and lling have to be
balanced in order to minimize provisions, waste and transportation of land. Nowadays,
bulldozers and trailers are used, but historically workers did it manually. It implied a direct
linkage between construction cost, wages and availability of skilled laborers. In fact, it is
commonly accepted in the literature that former railways were highly restricted by several
factors. The quality of the soil, the necessity of construction tunnels and bridges or the
interference with preexistences (building and land dispossession) were several. Longitudinal
and perpendicular slope were the more signicant ones and we focus on these below.
39
Slopes are determined using elevation data. Several DEM rasters have been analyzed
in preliminary tests, but we nally chose the Shuttle Radar Topography Mission (SRTM)
obtained in 90 meter measurements (3 arc-second). Although being a current raster data
set, created in 2000 from a radar system on-board the Space Shuttle, the results oered
in historical perspective should not dier much from the reality. The LCP tool calculates
the route between an origin and a destination, minimizing the elevation dierence (or cost
in our case) in accumulative terms. The method developed was based on the ESRI Least-
Cost-Path algorithm, although additional tasks were implemented to optimize the results
and to oer dierent scenarios. The input data was the SRTM elevation raster, converted
into slope. This conversion was necessary in order to input dierent construction costs.
The third step is to specify the relationship between construction costs and slope. One
approach is to use the historical engineering literature. Wellington (1877) discusses elevation
slope (i.e. gradients), distance, and operational costs of railways, but this is not ideal as we
are interested in construction costs. We could not nd an engineering text that specied
the relationship between construction costs and slopes. As an alternative we use historical
construction cost data. The following details our data and procedure.
A select committee on railways in 1844 published a table on the construction costs of 54

45
railways. There were 45 with a clear origin and destination, to which we can measure total
elevation change along the route (details are available). For these 45 railways we calculate
the distance of the railway line in meters and the total elevation change (all meters of ascent
and descent). We then ran the following regression for railway i:
ConstructionCostsi = αDistance100M etersi + βElevationchangeM etersi + εi . (5)
where construction costs are measured in . This regression produces unsatisfactory results,
with total elevation change having a negative sign. We think the main reason is that the
sample includes railways with London as an origin and destination. Land values in London
were much higher than elsewhere and thus construction costs were higher there. Therefore,
we omit railways with a London connection. We also think it is important to account for
railways in mining areas as they were typically built to serve freight trac rather than a
mix with passenger.
Our extended model uses construction costs for 36 non-London railways and follows the
following specication:
45 See the Fifth report from the Select Committee on Railways; together with the minutes of evidence,
appendix and index (BPP 1844 XI). The specic section with the data is appendix number 2, report to the
lords of the committee of the privy council for trade on the statistics of British and Foreign railways, pp.
4-5.
40
ConstructionCostsi = αDistance100M etersi +βElevationchangeM etersi +µminingrailwayi +εi
(6)
The results imply that for every 100 meters of distance construction costs rise by 128.9
(st. err 45.27) and holding distance constant construction costs rise by 382.6 (st. err.
274.5) for every 1 meter increase in total elevation change. Construction costs for min-
ing railways are 340,418 less (st. err. 179,815). For our LCP model we assume a
non-mining railway, re-scale the gures into construction costs per 100 meters, and nor-
malize so that costs per 100 meters are 1 at zero elevation change. The formula becomes
N ormalizedCostper100meters = 1+2.96∗(ElevationChangeM eters/Distance100meters).

The elevation change divided by distance can be considered as the slope in percent, in which
case our formula becomes Cost = 1 + 2.96 ∗ %slope. We think this is a reasonable approxi-
mation of the relationship between construction costs, distance, and elevation slope.
For computational purposes it is convenient to divide slope into bins of 0 to 1%, 1 to 2%,
and so on. The following table gives the costs over a standardized distance for dierent slope
bins in our preferred, which is labeled scenario 2. For comparison, we also show parameters
assuming a constant unitary linear cost in slope (scenario 1) and case where slope costs
are graded, and are constant up to 2 to 3% and then rise up to 6-7% when costs become
constant (scenario 3).
slope % cost scenario 1 cost scenario 2 (preferred) cost scenario 3
0 0 1 1
0-1 1 4 1
1-2 2 7 1
2-3 3 10 4
3-4 4 13 7
4-5 5 16 11
5-6 6 19 15
6-7 7 22 19
7-8 8 25 19
8-9 9 28 19
9-10 10 31 19
>10 ... 34 19
The LCP algorithm is implemented using ESRI python, using as initial variables the
elevation slope raster, the reclassication table of construction costs, and the node origin-
destination nodes. The cost distance and the back-link rasters using the formulation below:
41
(CostSurf ace(a) ∗ HF (a)) + CostSurf ace(b) ∗ HF (b))
GMij = ( )∗Surf aceDistance(ab)∗V F (ab)
2
(7)
where CostSurf ace(j) is the cost of travel for cell j, HF (j) is the horizontal factor for cell
j, Surf aceDistance(ab) is the surface distance for a to b, and V F (ab) is the vertical factor
from a to b. Note that the division by 2 of the friction of the segments is deferred until
the horizontal factor is integrated. Finally, we implemented the least-cost-path function to
obtain the LCP corridors. These corridors were converted to lines, exported, merged and
post-processed. Maps of our preferred LCP using scenario 2 are shown in the text.
A.2 Elevation, slope, and ruggedness variables
The aim of this appendix is to explain the creation of the elevation variables, including
the original sources and method we followed to estimate them. There are several initiatives
working on the provision of high-resolution elevation raster data across the world. The
geographical coverage, the precision of the data and the treatment of urban surroundings
concentrate the main dierences between databases.
We obtained several elevation DEM rasters, preferably DTM , covering the entire Eng-
land and Wales. In decreasing order in terms of accuracy, the most precise one database was
LIDAR (5x5m.), Landmap Data set contained in the NEODC Landmap Archive (Centre
for Environmental Data Archival). In second instance, we used EU-DEM (25x25m.) from
the GMES RDA project, available in the EEA Geospatial Data Catalogue (European En-
vironment Agency). The third dataset was the Shuttle Radar Topography Mission (SRTM
90x90m), created in 2000 from a radar system on-board the Space Shuttle Endeavor by
the National Geospatial-Intelligence Agency (NGA) and NASA. And nally, we have also
used GTOPO30 (1,000x1,000m) developed by a collaborative eort led by sta at the U.S.
Geological Survey's Center for Earth Resources Observation and Science (EROS). All those
sources have been created using satellite data, which means all of them are based in cur-
rent data. The lack of historical sources of elevation data obligate us to use them. This
simplication may be considered reasonable for rural places but it is more inconsistent in
urban surroundings where the urbanization process altered the original landscape. Even
using DTM rasters, the construction of buildings and technical networks involved a severe
change in the surface of the terrain. Several tests at a local scale were conducted with the
dierent rasters in order to establish a balance between precision and operational time spend
in the calculations. Total size of the les, time spend in dierent calculations and precision
in relation to the nest data were some of the comparisons carried on. After these, we opted
42
Figure 5: Slope and ruggedness measures
for SRTM90.
As stated in the text, the spatial units used as a basis for the present paper were civil
parishes, comprising over 9000 continuous units. In this regard, we had to provide a method
to obtain unique elevation variables for each unit, keeping the comparability across the
country. We estimated six variables in total: elevation mean, elevation std, slope mean,
slope std, ruggedness mean and ruggedness std. Before starting with the creation of the
dierent variables, some work had to be done to prepare the data. In order to obtain fully
coverage of England and Wales with SRTM data, we had to download 7 raster tiles. Those
images were merged together, projected into the British National Grid and cut externally
using the coastline in ArcGIS software.
Having the elevation raster of England and Wales, we proceed to calculate the rst two
variables: the elevation mean and its standard deviation. A python script was written to
split the raster using the continuous units, to calculate the raster properties (mean and
standard deviation) of all the cells in each sub-raster, and to aggregate the information
obtained in a text le. These les were subsequently joined to the previous shapele of civil
parishes, oering the possibility to plot the results.
The second derivative of those results aimed to identify the variability of elevation be-
tween adjacent cells. In this regard, two methods were developed to measure this phe-
nomenon: ruggedness and slope. Ruggedness is a measure of topographical heterogeneity
43
dened by Riley et al (1999). In order to calculate the ruggedness index for each unit, a
python script was written to convert each raster cell into a point keeping the elevation value,
to select the adjacent values using a distance tool, to implement the stated equation to every
single point, to spatially join the points to their spatial units and to calculate aggregated
indicators (mean and standard deviation) per each continuous units.
In order to calculate the slope variable for each unit, a python script was written to
convert the elevation into a slope raster, to split the raster using the continuous units, to
calculate the raster properties (mean and standard deviation) of all the cells in each sub-
raster, and to aggregate the information obtained in a text le. The obtained results for both
ruggedness and slope are displayed at the end of this note. As the reader will appreciate, the
scale of the indices is dierent (1 - 2 times) but the geographical pattern is rather similar.
In this regard, we used for the paper those variables derived from slope measures because
the time spend in calculations was rather lower.
44
A.3 Additional results
Table 10: All Coecients in model without district FEs

Dep. var.: unit pop. growth 1841 to 1891 coe (std. err.)
Indicator distance to rail station in 1851<2km 0.199*** (0.0231)
Indicator distance to inland waterway in 1830 <2km 0.0680*** (0.0205)
Indicator distance to turnpike road in 1830<2km 0.0468*** (0.0119)
Indicator distance to port in 1842 <2km 0.0411 (0.0681)
First-nature controls
Indicator exposed coal 0.274*** (0.0448)
Indicator coastal unit 0.215*** (0.0361)
Elevation -0.000818*** (0.000196)
Average elevation slope within unit -0.0275* (0.0149)
SD elevation slope within unit 0.0301** (0.0137)
Average rainfall 0.000233 (0.000496)
Average rainfall squared -1.38e-07 (2.57e-07)
Average temperature -0.848*** (0.301)
Average temperature squared 0.0470*** (0.017)
Wheat suitability (low input level rain-fed) 0.0471*** (0.0173)
Land area in sq. km. 0.000207 (0.000308)
Perc. of land with Raw gley soil -0.00225 (0.00657)
Perc. of land with Lithomorphic soil 3.85e-05 (0.000360)
Perc. of land with Pelosols soil -0.000954*** (0.000361)
Perc. of land with Podzolic soil 0.00256*** (0.000866)
Perc. of land with Surface-water gley soil 0.000612 (0.000378)
Perc. of land with Ground-water gley soil -0.00243 (0.00161)
Perc. of land with Man made soil 0.00534** (0.00251
Perc. of land with Peat soil -0.00271** (0.00130)
Perc. of other soil 0.00205 (0.00305)
Second nature controls
Ln 1841 population per sq. km -0.193*** (0.0333)
Share of male tertiary empl. in 1851 -0.390 (0.342)
Share of male agricultural empl. in 1851 -1.498*** (0.215)
Share of male mining & forestry empl. in 1851 -0.510** (0.224)
Share of male unspecied empl. in 1851 -1.183*** (0.171)
Ln distance to major city in 1801 -0.00853 (0.00739)
Registration district Fixed eects NO
R-square 0.256
N 9482
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications..
45
Table 11: Dierent specications for railway variables
Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3) (4) (5)
coe coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.) (std. err.)
Indicator distance to station in 1841 <2km 0.186***
(0.0469)
(0.0159)
(0.0141)
Indicator any rail line in 1851 0.164***
(0.0145)
Exactly one railway station in 1851 0.168***
(0.0208)
More than one railway station in 1851 0.270***
(0.0419)
First nature controls Yes Yes Yes Yes Yes

Second nature controls Yes Yes Yes Yes Yes
District Fixed eects Yes Yes Yes Yes Yes
N 9482 9482 9482 9482 9482
notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
46
Table 12: Pre-trend tests for the validity of instrument distance to LCP
Dep. var.: unit pop. growth 1801 to 1831 1801 to 1811 1811 to 1821 1821 to 1831 1831 to 1841
coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.)
Distance to LCP for railways <2k 0.00335 0.00321 -0.000787 -0.00343
(0.00439) (0.00379) (0.00462) (0.00518)
Indicator dist. to inland waterway in 1830 <2km 0.000107 0.00319 0.0148*** 0.0118**
(0.00426) (0.00468) (0.00418) (0.00499)
Indicator dist. to turnpike road in 1830<2km 0.00177 0.00405 0.00531 -0.00174
(0.00408) (0.00350) (0.00364) (0.00374)
Indicator dist. to port in 1842 <2km 0.0406*** 0.0140 0.00198 0.0329**
(0.0154) (0.0127) (0.0117) (0.0145)
Units with 2 km of LCP nodes removed? Yes Yes Yes Yes

N 9114 9114 9114 9114
Table 13: Stations and population growth: estimates for sub-districts
Dep. var: log dierent pop. 1851 and 1861

(1) (2) (3)
coe coe coe
variable (t-stat) (t-stat) (t-stat)
Indicator for station in 1851 0.0736*** 0.0645*** 0.0432***
(0.0108) (0.0107) (0.00818)
Quadratic in lat. and long. No Yes Yes

County xed eects No No Yes
N 1,599 1,599 1,599
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on county in all specications.
A.4 Panel regression results
We observe whether a unit has a railway station nearby in each decade from 1831 to 1891.
A panel or dierence-in-dierence model will identify whether population density increased
after a unit got its rst station nearby. The specication is the following:
47
yit = βIitrail + αi + δt + γxi δt + εi (8)
where yit is the log of population density in unit i in year t, Iitrail is an indicator equal to 1 if
unit i is within 2 km of a station in year t, αi is a unit xed eect, δt is a census year xed
eect, and xi δt is an interaction between the census year eects and several time-invariant
46
controls. The time span is 10 decades and there are 85,365 unit-year observations. The
standard errors are clustered on the unit to allow for correlation within a unit across time.
The results are reported in column 1 of table 14. The estimated coecient on Iitrail is 0.161
with a standard error of 0.008. In other words, getting a railway station nearby raises the
unit population density by approximately 16%.
We also modify the panel specication to test for pre-trends. Specically, we create a
dummy variable for two decades before a unit gets its rst station, one decade before a
unit gets its rst station, and so on up to an indicator for at least 5 decades after a unit
gets its rst station. The omitted group is three or more decades before. The estimates
are shown in column 2 of table 14. The estimates reveal a signicant pre-trend. Two
decades prior a unit has 2.2% higher population density and one decade prior it has 5.3%
higher population density. However, after the rst railway station is open population growth
increases signicantly. In the rst decade after the station opens the coecient is 0.146 and
the second decade it opens the coecient is 0.164.
46 We include interactions with coal, coastal, elevation, average slope, the standard deviation of slope, a
cubic polynomial in longitude, the same for latitude, and the bottom three quartiles of 1801 population
density.
48
Table 14: Panel Regression estimates
Dep. var.: log unit pop. density in year t (1) (2)

coe coe
variable (std. err.) (std. err.)
Indicator distance to station <2km 0.161***
(0.008)
Indicator two decades before distance to rst station <2km 0.022***
(0.053)
Indicator one decade before distance to rst station <2km 0.053***
(0.006)
Indicator decade distance to rst station <2km 0.109***
(0.007)
Indicator one decade after distance to rst station <2km 0.146***
(0.008)
Indicator second decade distance to rst station <2km 0.164***
(0.009)
Indicator third decade distance to rst station <2km 0.133***
(0.009)
Indicator fourth decade distance to rst station <2km 0.066***
(0.008)
Indicator fth decade distance to rst station <2km 0.020***
(0.004)
Census year xed eects Yes Yes
Unit Fixed eects Yes Yes
Census year xed eects * rst nature controls Yes Yes
N 85,365 85,365
notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on the unit in all specications. For the list of rst
nature controls see table 1.
49

Infrastructures - Nov292018 JACG 170519

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Infrastructures - Nov292018 JACG 170519

Uploaded by

Copyright:

Available Formats

Were railways indispensable for urbanisation?

evidence from England and Wales

Keywords : Urbanisation, railways, transport, spatial reorganization

mentthe railwaychanged the population geography of England and Wales relative to

ports, roads, navigable rivers, and canals.

like elevation, ruggedness, soils, rainfall, temperature, coastline, and coal.

1891 is regressed on indicators for being within 2 km of infrastructures. 2 km corresponds to

unobservable factors common to all units within a relatively small area.

some ports gained from international shipping.

made signicant contributions.

There is a remaining question as to whether units close to infrastructures pulled popu-

growth, decreases more near railway stations.

ganize population with little impact on productivity? In extensions, we provide evidence

other words, infrastructure, technology, and migration were reinforcing processes.

analysis of any transport improvement.

Finally, our study contributes to the literature on infrastructure and urbanisation in

inuence urbanisation long into the future.

2 Background on transport infrastructure

an alternative, EW turned to using tolls and non-governmental organizations called turnpike

merchants to serve as trustees. Parliament oered little in subsidies. Locals purchased

bonds to nance improvements and were repaid using toll revenues.

Sources: see data section.

12 For a summary see Bogart (2017).

organized as joint stock companies.

the capital port (Dyos and Aldcroft 1974, p. 58).

Port infrastructures complemented improvements in shipping technology. After 1800

(quoted in Simmons 1986, p. 271).

Railway's impact on the EW economy is often emphasized by historians. Dyos and

Central to this debate is the degree of substitution or complementarity between railways

and preexisting modes. We now turn to this issue.

3 Modal substitution and complementarity

will choose mode i if Ui >Uj for all j 6= i.

attributes of the old.

the former turnpike roads (Dyos and Aldcroft 1974, p. 241).

4 Theoretical and empirical frameworks

measure of transport infrastructure access, such as an indicator for connection to a highway

and its equilibrium or target population. The estimating equation becomes

19 These gures are reported in Hawke (1970, p. 168).

connected to a transport network dit , and controls xit .

to initial population density.

in central London experienced large population declines due to out-migration of residents.

(see Alvarez et. al. 2017).

use these distances to calculate indicators for being close to infrastructures.

units had railway stations within 2 km.

half of the nineteenth century.

begin by analyzing the following `long dierences' specication:

One main explanatory variable is

Sources: see text.

turnpike roads, inland waterways, and ports.

supposedly dominated by railways.

be measured. We use several approaches to address unobserved heterogeneity. First, we

but the coecient is not precisely estimated.

and other factors.

Table 2: Access to infrastructures and local population growth: baseline estimates

First nature controls No No Yes Yes Yes Yes

The preceding conclusion is supported in alternative specications. These are reported in

A fourth alternative examines the eects of infrastructures over dierent periods. We

0.30% higher annual growth rate.

Dep. var.: unit pop. growth in 1841 to 61 1841 to 71 1841 to 81 1841 to 91

First nature controls Yes Yes Yes Yes

7 Addressing endogeneity of stations

within 2 km of an 1851 railway station. We use a parsimonious set of matching covariates:

mentthe railwaychanged the population geography of England and Wales relative to

made signicant contributions.

inuence urbanisation long into the future.

merchants to serve as trustees. Parliament oered little in subsidies. Locals purchased

bonds to nance improvements and were repaid using toll revenues.

19 These gures are reported in Hawke (1970, p. 168).

begin by analyzing the following `long dierences' specication:

but the coecient is not precisely estimated.

The preceding conclusion is supported in alternative specications. These are reported in

A fourth alternative examines the eects of infrastructures over dierent periods. We

parsimonious matching exercise implies slightly larger eects.

8 Reorganization and Heterogeneous eects

on the relative growth eects of infrastructure at varying distances up to 10 km.

coe coe coe

eects by 1841 density using the following specication.

(LSOAs) in 2011. We estimate the following `very' long dierences specication

large, may not be the only important factor aecting urbanisation.

Volume 1, Industrialisation, 17001870. Cambridge University Press, 2014.

nants of US industrial location, 18801920." Journal of Economic Geography 12.4 (2012):