Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Were railways indispensable for urbanisation?

evidence from England and Wales


∗ † ‡ § ¶
Dan Bogart, Xuesheng You, Eduard Alvarez, Max Satchell, and Leigh Shaw-Taylor

k
Draft: November 19 , 2018

Abstract
England and Wales underwent a remarkable urbanisation during the railway era
in the nineteenth century. Yet this economy was already industrialised with well-
developed transport infrastructure prior to railways. This raises the question of whether
railways were indispensable for urbanisation over the medium and long term. In this
paper, we examine the population growth eects of being close to railway stations
versus being close to turnpike roads, inland waterways, and ports. Our estimates
show that being within a short commuting or shipping distance to all infrastructures
signicantly increased a locality's population growth from 1841 to 1891. The same is
true for population growth from 1891 to 2011. Across numerous specications, we nd
that railways had the largest growth eects, but turnpike roads and inland waterways
had signicant eects too, and even more so for ports. Our estimates contribute to a
deeper understanding of the spatial patterns of growth during the industrial revolution
and the eects of transport infrastructure on long-run urbanisation.

Keywords : Urbanisation, railways, transport, spatial reorganization


JEL Codes : N4, O18, R11

Corresponding author. Associate Professor, Department of Economics, UC Irvine, dbogart@uci.edu

Research Associate, Faculty of History, University of Cambridge, xy242@cam.ac.uk

Senior Lecturer, Economics and Business, Universitat Oberta de Catalunya, ealvarezp@uoc.edu
§
Research Associate, Dept. of Geography, University of Cambridge, aems2@cam.ac.uk

Senior Lecturer, Faculty of History, University of Cambridge, lmws2@cam.ac.uk
k
Data for this paper was created thanks to grants from the Leverhulme Trust (RPG-2013-093), Transport
and Urbanization c.1670-1911, NSF (SES-1260699), Modelling the Transport Revolution and the Industrial
Revolution in England, the ESRC (ES 000-23-0131), Male Occupational Change and Economic Growth in
England 1750 to 1851, and ESRC (RES-000-23-1579) the Occupational Structure of Nineteenth Century
Britain: Grant. We thank Walker Hanlon, Gary Richardson, Petra Moser, Kara Dimitruk, Arthi Vellore,
William Collins, Jeremy Atack, Alan Rosevear, and Elisabet Viladecans Marsal for comments on earlier
drafts and seminar participants at UC Irvine, UC San Diego, NYU, Florida State, Trinity College Dublin,
Queens Belfast, the University of Los Andes, Vanderbilt, and EHA Meetings. We also thank Craneld
University for share their soils data.

1
1 Introduction
Improvements in transport infrastructure can substantially change trade and travel pat-

terns. However, it is not obvious that transport improvements, even large ones, signicantly

change urbanisation. Population may continue to cluster around older transport infras-

tructures that remain in use or get transformed to new uses through technological change.

In this paper, we use the lens of history to examine how a large-scale transport improve-

mentthe railwaychanged the population geography of England and Wales relative to

previous infrastructures. There is a broader literature on whether railways were crucial (or

even indispensable) to economic development in the nineteenth century and over the longer-
1
run. England and Wales is an interesting case because it was already industrialised when

railways started spreading in the 1830s. It had large secondary employment compared to
2
other economies and was more urbanised with high levels of migration. This economy also

had well developed transport infrastructure before railways, including a large network of

ports, roads, navigable rivers, and canals.

There is a long-standing debate on the impact of railways in England and Wales. One

argument is that canals, roads, and ports helped determine the location of new urban

centers in the eighteenth century and these centers persisted into the railway era. A related

argument is that shipping and inland water transport remained competitive with railways on

bulky-low value goods, like coal, and hence continued to inuence the location of population.

The opposing argument notes that railways generally provided superior transport services

compared to preexisting modes. This view sees railways as shaping location within major
3
urban centers, including their suburbs.

We examine the medium and long-run population growth eects of being within a short

commuting or shipping distance to railway stations versus being within the same distance to

turnpike roads, inland waterways, and ports. We use a new data set with local populations

in every decennial census year from 1801 to 1891 and in 2011. Our new spatial units are

consistent across time and are 15 square km on average. They are similar to parishes and
4
townships, the smallest places reported in the British Census. We also incorporate GIS

data on railway lines and stations, turnpike roads, ports, and inland waterways. Most of

1 For example, see foundational papers by Fogel (1964) and Fishlow (1965). For more recent studies see
Berger and Eno (2015), Hornung (2015), Jedwab, Kerby, and Moradi (2015).
2 See Shaw-Taylor and Wrigley (2014) for an overview of occupational structure and urbanisation. See
Redford (1976) and Long 2005) for studies on migration.
3 See Hawke (1974), Dyos and Aldcroft (1974), Simmons (1986), Leunig (2006), Crafts and Mulatu (2006),
Armstrong (2009), Kellet (2012), Maw (2013), Crafts and Wolf (2014).
4 Unfortunately, our population data do not include Scotland or Ireland, and thus we cannot make rm
statements about the UK.
these networks are observed between 1830 and 1860, but for some we have earlier dates. The

networks are created from historical maps and allow us to measure the distance between

units and infrastructure with great precision. Finally, we add geographic characteristics,

like elevation, ruggedness, soils, rainfall, temperature, coastline, and coal.

Our main specication is a long-dierence, where unit population growth from 1841 to

1891 is regressed on indicators for being within 2 km of infrastructures. 2 km corresponds to

30 minutes walking distance, which is approximately the average commuting time in devel-
5
oped economies today. The baseline specication also includes geographic and structural

controls, like population density and occupational shares c.1840, pre-trends in population

growth, along with registration district xed eects. Registration districts are about 250

square km on average and encompass our spatial units. That implies we are controlling for

unobservable factors common to all units within a relatively small area.

We further address endogeneity of railways using propensity score matching and instru-

mental variables (IV). For the IV, we construct a Least Cost Path (LCP) connecting large

towns in 1801 and incorporating the added costs of building railways over rugged terrain.

Proximity to the LCP is a good instrument because it identies units that were close to
6
stations mainly because they were near favorable routes for connecting large towns.

Our rst main nding is that proximity to railway stations had a large eect on pop-

ulation growth between 1841 and 1891. This result is consistent across all specications,

including IV. In our preferred specication, being within 2 km of a railway station increased

unit population growth by 15.9 percentage points (pp) over 50 years, or an increased growth

rate of 0.3% per year. To put this estimate into perspective, the total population in Eng-

land and Wales increased by 79 pp between 1841 and 1891. The average population growth

across all units was 1 pp, which is equivalent to a 0.02% annual growth rate.

Our second main nding is that proximity to pre-rail infrastructures also had a large eect

on population growth between 1841 and 1891. Being within 2 km of an inland waterway

(turnpike road) increased population growth by 5.6 pp (3.8 pp). We also nd that being

within 4 km of a port increased population growth by 15.5 pp, which is similar to railways.

Separating the eects across time reveals that railways had a much larger eect than

inland waterways and turnpike roads from 1841 to 1871. Railways were a strong substitute

for roads and inland waterways initially, but by 1891 the latter gained new users like om-

5 The US census reports that commuting times in US cities average 26.1 minutes. See U.S. Census Bureau,
2012-2016 American Community Survey 5-year estimates.
6 Our methodology draws on the so-called inconsequential place approach and other studies which least
cost paths as instruments for infrastructure. See Chandra and Thompson (2000), Michaels (2008), Faber
(2014), and Lipscombe et. al. (2013).

2
nibuses and steamboats. Port eects are largest from 1841 to 1871 and less signicant by

1891. Coastal shipping was highly productive in the mid-1800s and was crucial in supplying

London's coal. But coastal shipping increasingly lost its market share to railways and only

some ports gained from international shipping.

The third main nding is that units within a short distance to all infrastructures con-

tributed to higher population growth between 1891 and 2011. The long run eects are

sizable. A one standard deviation change in proximity to 1851 railway stations accounts for

0.161 standard deviations of growth. The same for ports, turnpike roads, and inland water-

ways are 0.078, 0.044, and 0.065 standard deviations. Together our results show that railways

were the most important driver of long-run urbanisation among the historic infrastructures.

However, railways were not indispensable for all growth because pre-rail infrastructures also

made signicant contributions.

There is a remaining question as to whether units close to infrastructures pulled popu-

lation from areas more distant and grew at their expense. In extensions, we use units more

than 10 km from infrastructures as the control group. For railway stations, we nd more

growth between 0 and 6 km distance, but there was no dierence in growth for units between

6 to 10 km from stations. Thus, we do not nd any evidence that railways contributed to

relative population declines just beyond the commuting zone measured by 6 km. Neverthe-

less, we still think railways pulled population near stations. There are two reasons. First,

migration rates were very high in nineteenth century England (Long 2005). Second, we use

additional data sources to show that proxies for migration, like the Irish born population

percentage, increases more near railway stations. Also, fertility, another driver of population

growth, decreases more near railway stations.

The previous ndings raise another question: did railways or other infrastructures reor-

ganize population with little impact on productivity? In extensions, we provide evidence

railways increased population growth by 25 pp for units at the 75th percentile of 1841 pop-

ulation density, 15 pp for units at the 50th percentile, and 8 pp at the 25th percentile. The

same is true for inland waterways, although to a lesser extent. This heterogeneity suggests

that railways and waterways attracted migrants to localities that were more productive.

Some supporting evidence suggests moving a worker from a unit in the 50th to the 85th per-

centile of population density increased their wages by 6.8%. Over the longer term, we think

the higher population and productivity near early infrastructures attracted new investments

and technologies and hence attracted even more migrants during the twentieth century. In

other words, infrastructure, technology, and migration were reinforcing processes.

Our results contribute to a broader literature on the spatial patterns of growth during

3
the industrial revolution. A wide range of factors are discussed like endowments, markets,
7
and human capital. Our new data set can test many growth channels. Here we show that

transport infrastructures had signicant eects on population growth. But our analysis and

data also document the importance of other factors like coal, climate, and ruggedness.
8
We also add to the large literature on railways and nineteenth century growth. Perhaps

our main contribution is to reemphasize the importance of comparing railways with inland

waterways, roads, and ports. Modal substitution and complementarity are central to the

analysis of any transport improvement.

Finally, our study contributes to the literature on infrastructure and urbanisation in


9
contemporary contexts. Over the last 50 years there has been a dramatic rise in urbanisation

across the world. Given the signicant social and economic implications, it is useful to look

at history. The English and Welsh case shows that multiple infrastructures can evolve and

inuence urbanisation long into the future.

2 Background on transport infrastructure


England and Wales (EW) had a well-developed transport network long before its railway

network grew. Figure 1 shows the length of turnpike road, inland waterway, and railway
10
networks from 1700 to 1890. In this section, we discuss each of these networks and ports.
11
As early as 1680 EW had about 12,000 km of main roads. But they were in poor

condition and local governments, then in charge, had little capability to improve them. As

an alternative, EW turned to using tolls and non-governmental organizations called turnpike

trusts. Their powers came from an act of parliament. Acts named local landowners and

merchants to serve as trustees. Parliament oered little in subsidies. Locals purchased

bonds to nance improvements and were repaid using toll revenues.

The rst trusts generally improved the main roads already in place. Later trusts extended

the road network and transformed dirt paths into roads suitable for wheeled trac. As gure

1 shows the turnpike network grew from about 8000 km in 1750 to about 38,000 km in 1830.

7 See Fernihough and Hjortshøj O'Rourke (2014), Crafts and Wolf (2014), Klein and Crafts (2012), Becker,
Hornung, and Woessmann (2011).
8 For previous studies on English and Welsh railways see Hawke (1974), Leunig (2006), Casson (2013),
Gregory and Marti Henneberg (2010), Alvarez et. al. (2013), and Heblich, Redding, and Sturm (2018),
who focus on London. For other countries see Berger and Eno (2015) for Sweden, Tang (2014, 2017) for
Japan, Hornung (2015) for Prussia, Atack, Bateman, Haines, and Margo (2010), Attack and Margo (2011),
Donaldson and Hornbeck (2016), and Hodgson (2018) for the US, and Donaldson (2014) for India.
9 See Redding and Turner (2015) for an overview. Some papers of related interest include Duranton and
Turner (2012), Faber (2014), Jedwab et. al. (2015), Storeygard (2016), and Baum-Snow et. al. (2017).
10 Network maps are available at https://www.campop.geog.cam.ac.uk/research/projects/transport/data/
11 These are documented in Ogilby's Britannia Atlas. See Satchell (2017) for a description of these roads.

4
Figure 1: Evolution and size of infrastructure networks in England and Wales 1700-1890

Sources: see data section.

At their peak there were approximately 1000 dierent trusts. They managed all main roads,
12
although some turnpike roads could be considered secondary in importance.

Many users of turnpike roads obtained transport services from public carriers and coach-

ing companies. How did they benet from turnpike roads? Transport costs could have
13
increased because of the tolls and the localism of trusts, but that did not happen. The

shift to y-by-night services and stagecoaches with steel springs meant that passenger travel

times fell substantially between 1750 and 1820. Real freight rates also fell by over 40% as

wagons got bigger and load sizes increased. The growing use of stagecoaches and wagons
14
led to the concentration of economic activity around turnpike roads.

Turnpike trusts faced a crisis when railways were widespread. The nances of many

trusts deteriorated and most stop functioning by the 1870s (see their decline in gure 1).

Responsibility for maintaining turnpike roads passed to newly formed highway districts and

county councils.

The inland waterway network developed at the same time as turnpike roads. Around

1700 EW had a large system of navigable rivers including the Thames, Severn, Great Ouse,

12 For a summary see Bogart (2017).


13 For a summary of the eects of turnpike trusts see Bogart (2005).
14 See Bogart (2009) and Pawson (1977) for this evidence.

5
and Trent (Willan 1964). River navigations, or improved rivers which bypassed dicult

sections, were added between 1700 and 1750. Canals or articial waterways were built

between 1760 and 1830. Like turnpike roads, river navigations and canals were authorized

by acts of parliament (Bogart 2017). Acts granted authority to companies and included

procedures to negotiate the purchase of land. Most canals required many investors and were

organized as joint stock companies.

By 1830 there were several long-distance canals linking important centers. One example

is the Leeds and Liverpool Canal, which connected the leading woolen and cotton textile

towns. Another example is the Grand Junction Canal, which shortened the waterway dis-

tance between London and Manchester. Independent carriers were hired by individuals and

rms to provide freight services on canals. Like road carriers, they relied on horsepower to

draw their boats. Nevertheless, the eciency of hauling over water brought low cost trans-
15
port to inland regions. Canals were especially important in the movement of coal. As one

illustration, the price of coal in Manchester fell by half after the completion of the nearby

Bridgewater Canal in 1761. Some historians argue that canals led to the development of

inland industrial centers by providing cheap fuel (Maw 2013, Crafts and Wolf 2014).

There was also signicant investment in ports. It is estimated there were 391 acres of wet

dock space and 50 harbors in 1830. By contrast, England had no wet docks and a handful

of harbors in 1660 (Pope and Swann 1960). The ports of Liverpool and London provide two

illustrations. In Liverpool, dock acreage increased 11-fold between 1710 and 1830, including

the rst commercial wet dock. The investment was nanced by Liverpool merchants who

wanted to facilitate the import of cotton and foodstus. London was a center for domestic

and international trade, but there was little investment in its ports between 1700 and 1799.

Then there was a dock building boom from 1799 to 1825 which transformed the capacity of

the capital port (Dyos and Aldcroft 1974, p. 58).

Port infrastructures complemented improvements in shipping technology. After 1800

sailing vessels became larger and more durable with metaled hulls. Sails and rigging also

improved. One indication is the greater speeds achieved by sailing vessels in the early 1800s

(Solar 2013). The arrival of the steamship was even more revolutionary, although its impact

was delayed until the 1860s. Before that steamships were expensive and not as cost ecient

as sailing ships (Dyos and Aldcroft 1974, p. 257). Improvements in engine eciency, steel

hulls, and propellers eventually turned the tide. Steamship capacity exceeded sail for the

rst time in 1883. Steamships would go on to revolutionize trade and travel across the

oceans (Pascali 2017). However, coastal sailing vessels continued at many ports (Armstrong

15 See Turnbull (1987) and Bogart, Lefors, and Satchell (forthcoming) for a discussion of canal carriers.

6
2009, Langton and Morris 2002).

The rst steam powered rail service open to the public came in 1825 in the northern

coal mining region between Stockton and Darlington. In 1830, the Liverpool and Manch-

ester railway opened to facilitate passenger trac. The rail network expanded dramatically

following the `Mania' of the mid-1840s. The signicance of the Mania can be seen in Figure

1 through the growth of track mileage. By 1851 regional rail networks had formed around
16
the large towns in addition to trunk lines connecting larger towns.

Railways were built and operated by joint stock companies. They provided passenger

and freight services directly to customers. Passengers accounted for most revenues initially,

but after 1850 freight accounted for more. One of the most dicult challenges facing railway

companies was their high construction costs. A key factor was the route of their lines. A

distinction was made between the original line, which often aimed to connect large trading

towns, and the branch lines, which linked smaller towns to the original lines. Railway

companies preferred original lines whenever possible. One promoter advised the following,

stick to the original line; keep down the capital and let competing schemes do their worst

(quoted in Simmons 1986, p. 271).

Railway's impact on the EW economy is often emphasized by historians. Dyos and

Aldcroft (1974, p. 229) argue that urban growth was the most `conspicuous product of

railway development.' Their impact seems to have grown over time as regulations forced

railway companies to provide transport to lower socioeconomic groups. The Cheap Trains

Act of 1883 made daily workman's trains mandatory and led to lower commuting fares.

Despite the popular view that railways were economically crucial, there is a debate as

to whether they signicantly changed the location of population and economic activity.

Central to this debate is the degree of substitution or complementarity between railways

and preexisting modes. We now turn to this issue.

3 Modal substitution and complementarity


In this section, we dene how transport modes can be substitutes or complements and

we briey examine evidence from the literature. The standard mode choice model considers
17
a traveler or shipper who has N transport options (i.e. modes). Each mode has a set of

attributes like the fare, travel time, and convenience. A traveler will also have an idiosyn-

cratic preference yielding an individual utility from each transport mode Ui . The traveler

will choose mode i if Ui >Uj for all j 6= i.


16 For the literature on the mania see Casson (2009), Odlyzko (2010), Campbell and Turner (2012, 2015)
17 See Small (2013) for an overview of transport demand.

7
Modal substitution occurs if the demand for transport mode i decreases when the fare or
travel time of another mode j decreases. Consider a case where a new mode is better than

existing modes on all attributes. Only those travelers with a high idiosyncratic demand will

use the old mode. Every other will shift to the new. We call this `complete' substitution.

There is another case where the new transport mode is better on some attributes. Say

the new mode oers lower travel time than an existing transport mode, but its fare is larger.

In that case, there will be some travelers that shift to the new because they value time more

and others will continue to use the old because they are more fare sensitive. We call this

`partial' substitution. It implies both modes co-exist in a market. High xed costs can also

lead to co-existence because it prevents a mode from being available to all travelers. In this

case, one could observe two modes in use even though one is better on all attributes.

Transport modes can also be complements, which means the demand for transport mode

i increases when the fare or travel time of another transport mode j decreases. In one case,

dierent transport modes are links in the same journey. Introducing better attributes on

one link, increases the demand on all links. Complementarity can also arise if the new

mode increases overall transport demand. Here there will be more travel by those who value

attributes of the old.

What does the literature say about substitution and complementarity concerning rail-

ways in EW? It appears railways were a complete substitute for long distance road transport.

Railways oered faster services at less than half the fares and freight rates. As a result, long-
18
distance coaching and road freight services were largely displaced by the 1850s.

There is a counter-argument that railways could not be built everywhere due to xed

costs. This allowed some short-distance road transport to continue. There was also inno-

vation in road. The omnibus spread in the mid 1800s. It carried more passengers at lower

fares than coaches of old. Highway districts and county councils also assisted by improving

the former turnpike roads (Dyos and Aldcroft 1974, p. 241).

The standard view is that most canals failed to compete with railways on long distance

trac because of their slow speed. Some tried to compete on cost, but the lack of coordi-

nation led many canal companies to sell out to railways. In 1883 half of inland waterway

mileage was leased or owned by railway companies. The remaining canals served short dis-

tance trac in industrialised areas. There is some evidence for a canal revival after the

1870s. New regulations helped by requiring canals to publish through rates on long distance

journeys and by limiting railway control. The application of steam power to canal boats

18 Between 1845 and 1850, the number of passenger journeys by rail increased by 117%, and again by 65%
between 1850 and 1855 ( Mitchell 1998).

8
was another factor (Boughey and Hadeld 2012).

Railways are thought to have been a partial substitute for coastal shipping. For example,

railways gained in the biggest marketthe transport of coal to London. In 1850, 98.4% of

coal imported into London came by coastal ship. By 1870 the rail share was 55.7% and
19
in 1880 it was 62%. Armstrong (2009) argues that steamships halted the further decline

in coastal shipping. However, the extent to which the two modes co-existed is debatable

because many ports came under the authority of railways (Dyos and Aldcroft 1974).

In the case of international shipping, railways were a complement. There was a tremen-

dous growth in foreign trade from the 1840s, especially in grain (O'Rourke 1997). Sailing

ships and then steamships transported the grain from the Americas, India, and Russia to

EW, where it was transported inland by rail. Hawke (1970, p. 128) estimates that imports

represented more than half of all wheat hauled by English railways in 1865. Some would

even argue that railways and shipping created the world grain market together.

4 Theoretical and empirical frameworks


Our main goal is to identify the relative importance of rail versus pre-rail infrastructures

in nineteenth century population growth in EW. This section discuss how we adapt common

theoretical and empirical frameworks for our research question. Redding and Turner (2015)

summarize a theoretical model, which links transport infrastructure and location of economic

activity. They show equilibrium population in any location is increasing in the quality of its

commuting technology and its rm and consumer market access. Consumer market access

is measured by the variety of goods available and the trade costs of shipping those varieties

to the location. Firm market access is a weighted sum of rm demands and depends on the

cost of shipping goods to other markets. Better transport infrastructure plays a role in this

model by reducing trade costs and hence increasing market access for some locations. Better

infrastructure also reduces commuting time and hence increases eective units of labor.

Reading and Turner (2015) argue that a fairly standard regression specication provides

a reduced form version of the model. City i population in year t, Yit , is regressed on a

measure of transport infrastructure access, such as an indicator for connection to a highway

network dit , plus time-varying controls and location and time specic xed eects. Duranton

and Turner (2012) adapt a similar specication to incorporate a partial adjustment process

where population growth is a function of the dierence between a city's actual population

and its equilibrium or target population. The estimating equation becomes

19 These gures are reported in Hawke (1970, p. 168).

9
yit+1 − yit = λyit + adit + cxit + εit (1)

where the left hand-side variable yit+1 − yit is the log dierence in city population between t
and t + 1. The right hand side includes the log of initial population yit , indicators for being

connected to a transport network dit , and controls xit .

Specication (1) is appealing for our study because we can estimate the growth impacts of

railways, turnpikes, ports, and waterways by including indicator variables for being within a

short commuting or shipping distance of these infrastructures. If the railway was a complete

substitute for say canals, then we should expect zero eect for the inland waterway indicator

all else equal. The reason is that all shippers, outside of idiosyncratic types, would have

preferred using railways. Over time individuals should migrate from areas with canals to

areas with railway stations leading to population growth near the latter. By contrast, if

railways were a partial substitute say for canals, then being near inland waterways should

contribute to some growth. Users that preferred cheap water transport would migrate to

areas with rivers or canals and users that preferred the speed of railways would migrate to

stations.

There are two limitations to specication (1). First, the indicator variable for infras-

tructure access does not account for network structure. Some studies address this issue by

estimating market access, or population-weighted inverse trade costs between all locations

(see Donaldson and Hornbeck 2016). We do not follow the market access approach because

it estimates trade costs for a single user type. In our case, multiple users appear to be

important. Also, the market access approach identies the eects of trade costs without

dierentiating by infrastructure type. Therefore, the methodology does not easily lend itself
20
to identifying the eects of railway stations, roads, inland waterways, and ports.

Second, specication (1) cannot account for spatial reorganization. Localities just beyond

a short commuting distance of infrastructures may not be a clean control group for localities

within the commuting distance. They are potentially treated by infrastructure and could

lose population. Therefore, estimates based on (1) identify dierences in relative growth, not

absolute growth. Below we address this issue further by studying dierent control groups

and by looking at migration proxies near railways. We also examine heterogeneity according

to initial population density.

20 Alsoas our data includes population for 9489 units, we would need to calculate trade costs for more
than 45 million unit-pairs. That presents a major computational issue.

10
5 Data
Our population data come from British censuses, available every decade starting in

1801. They are digitized at the smallest census place level (e.g. parishes and townships)
21
up to 1891. The census published the same for occupational counts starting in the early

nineteenth century. The counts for 1851 and 1881 are available through the Integrated

Census Micro data project (Schürer and Higgs 2014). The census places with population

and occupations from 1801 to 1891 are not always the same across time. To address boundary

changes, researchers at Cambridge University have created consistent spatial units between
22
1801 and 1891 and linked them with census population data. Using similar techniques, we

create 9489 consistent units mapping population from 1801 to 1891 and male occupations
23
from 1851 to 1881. We call these `units' for short. Units are 15 square km on average

and they belong to a larger jurisdiction called registration districts. There are 616 unique

registration districts in our data and they average 250 square km.

Very long-run outcomes are studied by merging our 9489 historical units with 34,753
24
Lower Super Output Areas (LSOAs) with population in 2011. We use the intersect function

in ArcMap applied to the boundary lines of LSOAs and the boundary lines of our units.

The population variables are expressed in natural log dierences over time (see table 1).

The mean 1841 to 1891 log dierence is 0.01, which implies a mean population growth of

1 percentage point between 1841 and 1891. The mean is low in part because some units

in central London experienced large population declines due to out-migration of residents.

Overall there was an increase in urbanisation. The share of the population living in units

with at least 400 persons per square km increased from 42% in 1841 to 68% by 1891.

Our infrastructure data includes GIS shapeles for turnpike roads in 1830, inland wa-

terways in 1680 and 1830, and railway lines and stations in every census year starting in
25 26
1831. We also have GIS data on the main roads in 1680 as surveyed by John Ogilby. In

all cases, the networks are created using historical sources, improving their accuracy.

21 The Cambridge Group for the History of Population and Social Structure kindly provided this data.
22 For details see https://www.campop.geog.cam.ac.uk/research/occupations/datasets/catalogues/documentation.
23 Ms Gill Newton, of the Cambridge Group, developed the Python code for Transitive Closure as part
of the research project `The occupational structure of Britain, 1379-1911' based at the Cambridge Group.
Xuesheng You implemented this code for this particular paper.
24 Oce for National Statistics ; National Records of Scotland ; Northern Ireland Statistics and Re-
search Agency (2017): 2011 Census aggregate data. UK Data Service (Edition: February 2017). DOI:
http://dx.doi.org/10.5257/census/aggregate-2011-2.
25 See Rosevear et. al. (2017), Martí-Henneberg et. al. (2017a, b),
and Satchell, Shaw-Taylor, and Wrigley (2017a, b). For a description see
https://www.campop.geog.cam.ac.uk/research/occupations/datasets/catalogues/documentation
26 For a description of the 1680 Ogilby roads data see Satchell (2017).

11
For ports we draw on a list provided in The Shipowner's and Shipmaster's Directory pub-

lished in 1842. This source identies 247 ports in use. It also describes whether loading

occurred on the beach and water depths at spring and neap tides. Our baseline model

considers all 247 ports regardless of their features. Thresholds for water depth yielded less

precise results. We also use a source published in 1787, which lists the main ports in 1680

(see Alvarez et. al. 2017).

To analyze infrastructures, a straight line is drawn from the center of each unit to its

nearest station, road, waterway, and port. The unit center corresponds to the market square
27
if it had a town or the centroid if the unit had no town. The mean distance to an 1851

station is 10.4 km (see table 1). The mean distance to a waterway or turnpike road in 1830

was less at 7.2 and 1.9 km. As expected, mean distances to ports were greater, but given

that England had such a large network of ports in 1842 the average was only 30.2 km. We

use these distances to calculate indicators for being close to infrastructures.

An important fact concerns the spread of railway stations over time. In 1841, 1851, and

1861, 4.6%, 13.6% and 19.7% of units had a railway station within 2 km. By 1881 29.9% of

units had railway stations within 2 km.

The geographic data include variables for being on exposed coalelds, being on the coast,

ruggedness, average rainfall, average temperature, an index for wheat suitability, and the
28
share of land in 10 dierent soil types. We call these `rst-nature' variables following the

literature in economic geography (see Fujita et. al. 2001). Coastal is identied using an

intersection of the seacoast with unit boundaries. The ruggedness measures include average

elevation within units, the average elevation slope, and the standard deviation in elevation

slope. See appendix A.2 for details. Rainfall, temperature, and wheat suitability come from
29
FAO. Of special signicance, Satchell and Shaw Taylor (2013) identify those areas with

exposed coal bearing strata (i.e. not overlain by younger rocks). Exposed coalelds were
30
more easily exploited by early nineteenth century technology compared to concealed coal.

27 We identify if a market existed at some point between 1600 and 1850. This ap-
plies to 746 of the 9489 units. It should be noted that little error is introduced by us-
ing the market or the centroid since units are so small. For a description of towns see
https://www.campop.geog.cam.ac.uk/research/occupations/datasets/catalogues/documentation
28 Soils data (c) Craneld University (NSRI) 2017 used with permission. The 10 soil categories are based
on Avery (1980) and Clayden and Hollis (1985). They include (1) Raw gley, (2) Lithomorphic, (3) Pelosols,
(4) Brown, (5) Podzolic, (6) Surface-water gley, (7), Ground-water gley, (8) Man made, (9) peat soils, and
(10) other. See http://www.landis.org.uk/downloads/classication.cfm#Clayden_and_Hollis. Brown soil
is the most common and serves as the comparison group in the regression analysis.
29 See the Global Agro-Ecological Zones data at http://www.fao.org/nr/gaez/about-data-
portal/agricultural-suitability-and-potential-yields/en/. We selected low input and rain fed for wheat
suitability.
30 For a description see https://www.campop.geog.cam.ac.uk/research/occupations/datasets/catalogues/documentation

12
Table 1: Summary statistics
Variable Obs. Mean Std. Dev. Min Max
Population growth variables
Ln di. population 1841 to 1891 9489 0.010 0.513 -3.079 4.874
Ln di. population 1891 to 2011 9488 0.545 0.965 -4.202 5.617
Infrastructure variables
Distance to rail station in 1851 km 9489 10.45 11.065 0.021 73.12
Distance to LCP km 9489 11.86 16.548 0.000 116.3
Distance to inland waterway 1830 km 9489 7.231 6.501 0.000 48.38
Distance to turnpike road 1830 km 9489 1.983 2.458 0.000 22.47
Distance to port 1842 km 9489 30.20 22.81 0.059 99.71
Indicator distance to rail station in 1851<2km 9489 0.136 0.342 0 1
Indicator distance to inland waterway in 1830 <2km 9489 0.233 0.423 0 1
Indicator distance to turnpike road in 1830<2km 9489 0.662 0.472 0 1
Indicator distance to port in 1842 <2km 9489 0.027 0.163 0 1
First-nature controls
Indicator exposed coal 9489 0.080 0.271 0 1
Indicator coastal unit 9489 0.147 0.355 0 1
Elevation 9489 89.72 74.02 -1.243 524.3
Average elevation slope within unit 9489 4.767 3.615 0.484 37.42
SD elevation slope within unit 9489 3.432 2.717 0 23.17
Average rainfall 9484 755.7 191.7 555 1424
Average temperature 9484 8.958 0.658 5.5 10
Wheat suitability (low input level rain-fed) 9484 2188.1 273.25 272 2503
Land area in sq. km. 9484 15.63 22.18 0.003 499.8
Perc. of land with Raw gley soil 9489 0.084 1.327 0 76.49
Perc. of land with Lithomorphic soil 9489 8.615 19.83 0 100
Perc. of land with Pelosols soil 9489 8.203 20.63 0 100
Perc. of land with Podzolic soil 9489 4.624 14.32 0 99.56
Perc. of land with Surface-water gley soil 9489 24.63 29.46 0 100
Perc. of land with Ground-water gley soil 9489 10.187 20.11 0 100
Perc. of land with Man made soil 9489 0.363 3.262 0 94.99
Perc. of land with Peat soil 9489 1.187 5.279 0 91.44
Perc. of other soil 9489 0.535 1.966 0 65.15
Second nature controls
Ln 1841 population per sq. km 9489 4.209 1.346 0.805 11.53
Share of male tertiary empl. in 1851 9489 0.149 0.109 0 0.941
Share of male secondary empl. in 1851 9489 0.196 0.123 0 0.800
Share of male agricultural empl. in 1851 9489 0.553 0.227 0 1
Share of male mining & forestry empl. in 1851 9489 0.025 0.076 0 0.745
Share of male unspecied empl. in 1851 9489 0.074 0.090 0 0.760
Ln distance to major city in 1801 9487 4.756 0.620 0.594 6.037
Sources: see text.

13
8% of our units are on exposed coalelds.

We have another set of unit-level variables called `second-nature' factors. These include

distance to one of the ten largest cities in 1801, log population density in 1841, and 1851 male

occupational shares in ve categories: (1) tertiary, (2) agriculture, (3) secondary, (4) min-
31
ing/forestry, and (5) unspecied. Population density in 1841 varied signicantly, although

much was concentrated near large cities, like Manchester and London. Male occupational

structures also exhibit concentration in 1851, especially in secondary employment. The top
32
1% of units accounted for 57% of male secondary employment in 1851.

Figure 2 shows the kernel density estimates for the distribution of population growth

from 1841 to 1891 depending on whether units are within 2 km of various infrastructures.

The rst panel clearly shows that units within 2 km of 1851 railway stations tended to have

higher growth than units more than 2 km from 1851 stations. There were some exceptions

however as growth was sometimes negative for units within 2 km of stations as indicated

by the longer left tail. Panels b to d show a similar pattern for being within 2 km of 1830

inland waterways and turnpike roads and 1842 ports. Hence there is some initial evidence

that railways were one of several infrastructures increasing population growth in the second

half of the nineteenth century.

6 Main results
In this section, we estimate how population growth was aected by infrastructures. We

begin by analyzing the following `long dierences' specication:

rail pre−rail
yi1891 − yi1841 = β1 Ii1851 + β2 Ii1840 + γxi + εi (2)

where yi1891 −yi1841 is the natural log dierence. The initial year 1841 is chosen because there

were few railway stations open in 1831. 1891 is the last year for which we have historical

data.

One main explanatory variable is


rail
Ii1851 equal to one if unit i is within 2 km of a railway

station in 1851 and 0 otherwise. 1851 is chosen because the rail network underwent its

largest 10-year expansion in the 1840s. As robustness, we check whether station proximity in

earlier or later years changes the conclusions. 2 km is chosen because it takes approximately

30 minutes to walk 2 km. We think 30 minutes represents a typical commute time for

31 Here we follow the primary, secondary, and tertiary (PST) coding system described in detail in Shaw
Taylor et. al. (2014) and Wrigley (2015). We do not code female occupations because there is less agreement
in the literature (see You 2014).
32 For more details on occupational structure see Shaw-Taylor and Wrigley (2014).

14
Figure 2: The distribution of population growth and infrastructure access

Sources: see text.

individuals who worked near the station or for rms carting their goods to the station for

quick delivery. However, this assumption is based on limited data, and therefore in a later
33
section, we consider greater distances. Other main explanatory variables are included in
pre−rail
Ii1840 . They are three indicators identifying whether a unit is within 2 km distance from

turnpike roads, inland waterways, and ports.

There are two sets of control variables included in xi . The rst nature controls are

listed in table 1 except for the square of temperature and rainfall, which allow for non-linear

eects in climate variables. The second nature controls are also listed in table 1. The

log of 1841 population density accounts for the regularity that initially dense units tend

to grow less. The 1851 male occupational shares address the possibility that areas more

specialized in agriculture grow less. Note that roads and canals built in the 1700s may have

33 Insupport of this assumption, Heblich, Redding, and Sturm (2018) use data from a single London rm
to show that 90% of workers lived within 5 km of their residence from 1857 to 1877.

15
caused development by 1841. Therefore, specications with second nature controls hide

some of their eects. However, they capture persistent eects of roads and canals in the era

supposedly dominated by railways.

Our list of control variables is large, but even so there are some factors that cannot

be measured. We use several approaches to address unobserved heterogeneity. First, we

include registration district xed eects. Districts are approximately 250 square km, and

within such an area there were factors aecting growth that are similar across units. Our

second approach recognizes that even within a district there could be unobservable factors

correlated with infrastructures. Some can be captured by a variable for population growth

in the decades before railways. Other approaches include panel regressions, propensity score

matching, and instrumental variables. These approaches are discussed in the next section.

The main coecient estimates for equation 2 are shown in table 2. The standard errors

are clustered on registration districts. In column (1), the only explanatory variable is the

indicator for units within 2 km of 1851 stations, which is associated with 24.5 higher log

points of population growth (approximately 28 percentage points or pp). Column (2) adds

indicators for pre-rail infrastructures. Being within 2 km of inland waterways and turnpike

roads has positive and signicant eects on population growth equal to 9.1 and 6.4 log points

respectively. Being within 2 km of ports is associated with 13.9 higher log points of growth

but the coecient is not precisely estimated.

We now consider specications that include more controls. Column (3) in table 2 adds

the rst nature controls. The estimates for these additional variables are not shown to
34
save space. Interested readers should consult table 10 in appendix A.3. The coecients

for railways, turnpike roads, and inland waterways change little. In fact, the estimates

become more precise. But the estimate for ports falls substantially and becomes close to

zero. Examining this specication more closely we nd that being coastal is correlated with

being within 2 km of a port. We will return to the impact of ports later. The specication

in column (4) adds second nature controls. The estimates are broadly similar except the

railway coecient increases in magnitude and the turnpike and inland waterways coecients

decrease. The latter makes sense because some of road and waterway's contribution is being

captured by population density in 1841 and occupational shares in 1851. The specication in

column (5) adds 616 district xed eects (FEs). The coecients on infrastructures decline

but they remain signicant. The specication in (6) adds a control for unit population

growth from 1801 to 1831. The results are nearly identical diminishing concerns about

34 Of most importance we nd that units with coal have 36.7 higher log points population growth from
1841 to 1891.

16
pre-trends.

In our preferred model (5), being close to stations increased the annual growth rate by

0.3%, while being close to inland waterways increased the annual growth rate by 0.1%. In

terms of beta coecients, a one standard deviation increase in the station variable increases

population growth by 0.106 standard deviations. A one standard deviation increase in the

inland waterway and turnpike variables increased population growth by 0.046 and 0.035

standard deviations. These results imply that in terms of explaining population growth,

being close to railways was more important. Nevertheless, it is striking that roads and

inland waterways had quantitatively signicant eects even after accounting for railways

and other factors.

Table 2: Access to infrastructures and local population growth: baseline estimates

Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3) (4) (5) (6)
coe coe coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.) (std. err.) (std. err.)
Indicator dist. to rail station in 1851<2km 0.245*** 0.186* 0.173*** 0.199*** 0.159*** 0.159***
(0.114) (0.099) (0.051) (0.023) (0.019) (0.020)
Indicator dist. to inland waterway in 1830 <2km 0.091** 0.081*** 0.068*** 0.056** 0.054**
(0.042) (0.019) (0.020) (0.019) (0.019)
Indicator dist, to turnpike road in 1830<2km 0.064*** 0.064*** 0.046*** 0.038*** 0.038***
(0.015) (0.015) (0.011) (0.010) (0.010)
Indicator dist. to port in 1842 <2km 0.130 0.009 0.041 0.094 0.094
(0.088) (0.074) (0.068) (0.061) (0.062)

First nature controls No No Yes Yes Yes Yes


Second nature controls No No No Yes Yes Yes
Registration district Fixed eects No No No No Yes Yes
Control for Pop. growth 1801 to 1831 No No No No No Yes
N 9489 9489 9484 9482 9482 9478
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
and second nature controls see table 1.

The preceding conclusion is supported in alternative specications. These are reported in

appendix A.3 table 11. One alternative uses indicators for units within 2 km of 1841, 1861,

or 1871 stations instead of 1851 stations. The coecients are very similar. Using dierent

dates for nearby railway stations does not matter. A second alternative specication uses an

indicator for whether a unit has any railway line in its boundaries. Our estimates show that

units with any railway line had 16.4 more log points of growth. One might nd it surprising

that indicators for stations and railway lines have similar estimated eects. We think the

high number of stations and relative uniformity across lines in EW meant that units close

17
to railway lines were generally close to stations.

A third alternative specication includes an indicator if the unit had more than one

station. For reference 2.4% of units had more than 1 station in 1851. The results in table 11

appendix A.3 show these units had 27 log points higher growth compared to units without

any stations. The coecients for one station, inland waterways and turnpike roads are

similar to before. These ndings make sense since greater station density oered more local
35
and long-distance connections.

A fourth alternative examines the eects of infrastructures over dierent periods. We

regress population growth from 1841 to 1861 on the same variables including all controls.

The same is done for growth from 1841 to 1871 and so on up to 1841 to 1891. The eects of

infrastructure could diminish with time, in which case the coecient should stay the same

or increase slightly as the time frame increases. The results are reported in table 3. The

eects of railways diminished little. In the specication for 1841 to 1861, the 0.074 coecient

implies a 0.35% higher annual growth rate. For 1841 to 1891, the coecient 0.159 implies a

0.30% higher annual growth rate.

The eects of turnpikes and inland waterways are small and insignicant from 1841 to

1871. This era marked the peak of railway inuence as turnpike and canal companies failed

to compete. However, after 1871 their impact becomes larger and more signicant. These

ndings suggest that turnpike roads and inland waterways were put to new uses after 1871.

The estimated eects of ports diminish with time. For example, being within 2 km of

ports increases the annual growth rate by 0.33% up to 1861 and by 0.18% up to 1891. These

ndings are consistent with (1) shipping playing an important role in the mid-nineteenth

century and (2) railways eventually making inroads into markets previously dominated by

coastal shipping.

35 Inanother related specication, we use log meters of 1851 railway line per square km, log meters of
1830 turnpike road per square km, and log meters of 1830 waterway per sq km. The results show a similar
importance of railways. Railways density has a beta coecient of 0.144, waterway density has a beta
coecient of 0.026, and turnpike road density has a beta coecient of 0.04.

18
Table 3: Access to infrastructures and local population growth over dierent periods

Dep. var.: unit pop. growth in 1841 to 61 1841 to 71 1841 to 81 1841 to 91


(1) (2) (3) (4)
coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.)
Indicator dist. to rail station in 1851<2km 0.074*** 0.106*** 0.143*** 0.159***
(0.009) (0.013) (0.016) (0.019)
Indicator dist. to inland waterway in 1830 <2km 0.006 0.017 0.032** 0.054***
(0.007) (0.011) (0.014) (0.019)
Indicator dist, to turnpike road in 1830<2km 0.004 0.007 0.029*** 0.038***
(0.005) (0.007) (0.009) (0.010)
Indicator dist. to port in 1842 <2km 0.069*** 0.071** 0.085* 0.094
(0.026) (0.032) (0.046) (0.062)

First nature controls Yes Yes Yes Yes


Second nature controls Yes Yes Yes Yes
Registration district Fixed eects Yes Yes Yes Yes
Control for Pop. growth 1801 to 1831 Yes Yes Yes Yes
N 9478 9478 9478 9478
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
and second nature controls see table 1.

7 Addressing endogeneity of stations


Railway stations were not randomly assigned across space and even with controls our

previous estimates could be biased. An upward bias is the most obvious concern. Railway

companies might have selected units with better growth prospects to earn higher future

revenues. If there is an upward bias then perhaps we are over-stating the relative importance

of railways. In this section, we use three approaches to assess the bias and its direction.

The rst approach estimates a panel regression with unit and census year xed eects.

Recall we observe population and whether a unit has a station nearby in each decade from

1801 to 1891. Estimates from a panel regression are shown in appendix A.4 table 14. Briey

they show getting a railway station within 2 km raised unit population density by 16%. They

also reveal pre-trends, in which population grew before stations opened. Therefore, panel

regression estimates could also be biased because the parallel trends assumption is violated.

The second approach applies propensity score matching. The treatment variable is being

within 2 km of an 1851 railway station. We use a parsimonious set of matching covariates:

(1) population density in 1841, (2) the share of male agricultural employment in 1851, (3)

having exposed coal, and (4) population growth from 1801 to 1831. We match exactly one

19
Table 4: Matching estimator for eect of distance to railway stations
Units within 2km 1851 stations (1 vs. 0)
Covariate Standardized dierencesraw Variance ratioraw
Ln pop. per sq. km 1841 1.071 8.184
Has exposed coal 0.291 2.123
Share of 1851 male emp. in agric. -1.137 1.753
Ln dierence pop. 1831 and 1801 0.230 2.517
N 9,489
Covariate Standardized dierencesmatched Variance ratiomatched
Ln pop. per sq. km 1841 0.007 1.010
Has exposed coal -0.020 0.938
Share of 1851 male emp. in agric. 0.004 0.922
Ln dierence pop. 1831 and 1801 -0.016 1.153
N 9,485
Units within 2km 1851 stations (1 vs. 0)
Av. Ln di, pop. 1891 and 1841 Di. in meansraw data Di. in meansmatched data
(standard error) (robust standard error)
0.010 0.244 0.206
(0.015)*** (0.022)***
N 9,489 9,485
Notes: * p<0.05, ** p<0.01, *** p<0.001.

nearest neighbor using the logit model. This set of covariates yields a balanced matched

sample. Table 4 shows the standard dierences in the covariate means are close to zero in

the matched sample but not in the raw data. The bottom rows of table 4 show the average

dierence in means for population growth from 1841 to 1891. In the raw data units within 2

km of stations have 24.4 log points higher population growth. In the matched sample they

have 20.6 log points higher growth. By comparison our preferred OLS specication in table

2 implied being near railways increased population growth by 15.9 log points. Thus, our

parsimonious matching exercise implies slightly larger eects.

Our third and most detailed approach uses an instrumental variable derived from the
36
`inconsequential places' approach. The key assumption is that some units became close to

railway stations simply because they were on the route designed to connect larger towns at

a low capital cost. In other words, they were not selected based on their potential for future

growth. The rst step in creating the instrument is to select the towns that will be connected

by railways. We start with all English and Welsh towns having a population greater than
37
5000 in 1801. Their larger size meant they were almost certain to get at least one railway

36 See Chandra and Thompson (2000), Michaels (2008), Faber (2014), and Lipscombe et. al. (2013).
37 The data come from Law (1967) and Robson (2006)

20
line connecting them with another town above 5000. But not all large town-pairs would

be connected. Existing levels of trade and communication were often lower between distant

towns or towns of moderate size. A prot-seeking promoter would see little value in building

a railway to connect them. We use a simple gravity model (GM) to calculate the relative

value of connecting any town-pairs each with a population above 5000. The equation for
P opi P opj
town pairs i and j is GMij = Distij
, where Distij is the straight line distance between

town i and j .
Next we identied a least cost path (LCP) connecting town pairs above a threshold

GMij > 10, 000.38 We assume that in considering their routes, railway companies tried to

minimize the construction costs considering distance and elevation slope. We use construc-

tion cost data for railways built in the 1830s and early 1840s. We also measure the distance

of the lines and total elevation changes between towns at the two ends of the line. The

construction cost is then regressed on the distance and the elevation change to identify the

parameters (the details are in appendix A.1). Based on this analysis we nd a baseline

construction cost per km when the slope is zero and for every 1% increase in slope the

construction cost rises by three times the baseline (costperkm = 1 + 3 ∗ slope%). Next, we

use this formula to identify the LCP connecting our town pairs above the threshold. The

result is a network of candidate railway lines.

The LCP network is shown in the right of gure 3. The left shows the real railway

network in 1851. The overlap is fairly high. Locations close to the LCP are also generally

close to railway stations because they were so numerous along the line.

We use an indicator for being within 2 km of the LCP as our instrument for within 2 km

of stations. The exclusion restriction requires that the instrument only aects population

growth between 1841 and 1891 through its eect on station access. We think this assumption

is plausible under two conditions. First, units containing the town nodes used to construct

the LCP are excluded. They were clearly targeted by railways for their size and possibly

their growth potential. Second, the regression model should contain distance to pre-rail

infrastructures as control variables. If omitted, then one might worry the instrument aects

growth partly through road and waterway access.

We provide a `plausibility check' for the exclusion restriction by testing whether less

than 2 km from the LCP is correlated with unit population growth between 1801 and 1831.

The results are shown in table 5. Note in all specications we exclude 364 units within 2

km of the town nodes used to construct the LCP. The standard errors are always clustered

38 The 10,000 threshold is arbitrary, but as shown below this threshold does a good job predicting the
location of lines and stations.

21
Figure 3: The rail network in 1851 and the least cost path (LCP) network

Sources: see text.

on the registration district. In column (1), being within 2 km of the LCP is positively

and signicantly associated with higher population growth from 1801 to 1831. Columns (2)

and (3) show the same result holds after including district FEs and rst nature controls.

The conclusion changes in column (4), which adds pre-rail infrastructure controls. In this

specication, being within 2 km of the LCP is not signicantly associated with higher

population growth from 1801 to 1831. In column (5) we add second nature controls and

the results are unchanged. Similar specications use decade population growth (e.g. from

1811 to 1821) as dependent variables. The results are reported in the appendix A.3 table

12. None nds a large and signicant eect from being within 2 km of the LCP.

22
Table 5: Pre-trend tests for the validity of instrument distance to LCP

Dep. var.: unit pop. growth 1801 to 1831 (1) (2) (3) (4) (5)
coe coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.) (std. err.)
Distance to LCP for railways <2k 0.0303*** 0.0188* 0.0178* 0.0098 0.0057
(0.008) (0.007) (0.007) (0.007) (0.007)
Indicator dist. to inland waterway in 1830 <2km 0.0362*** 0.0181*
(0.008) (0.007)
Indicator dist. to turnpike road in 1830<2km 0.0325*** 0.0111*
(0.005) (0.005)
Indicator dist. to port in 1842 <2km 0.0806** 0.0566*
(.026) (0.025)

Units with 2 km of LCP nodes removed? Yes Yes Yes Yes Yes
First nature controls No No Yes Yes Yes
Second nature controls No No No No Yes
Registration district Fixed eects No Yes Yes Yes Yes
N 9121 9121 9116 9116 9114
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
and second nature controls see table 1.

The previous results conrm the importance of including pre-rail infrastructures as con-

trols in the IV specication for railway access. However, one might worry that turnpike

roads, waterways, and ports are endogenous too. To address this issue, we also estimate a

specication instrumenting with indicators for the LCP and 1680 main roads, waterways,

and ports. Using historic infrastructure to instrument for later infrastructure is another

common approach in the literature (e.g. Duranton and Turner 2012).

The IV results for the baseline model are shown in column (2) of table 6 along with
39
the OLS for comparison in (1). The Kleibergen-Paap F statistic is fairly large indicating

there is not a weak instruments problem. The IV estimate implies that being within 2 km

of a railway station caused population growth to rise by 35.5 log points. In OLS the same

estimate is 16.5 log points. Note that the IV estimate is less precise, but it is statistically

signicant at the 10% level. The lower precision is expected given the instrument needs to

predict whether a unit is within 2 km of a station.

39 The OLS and IV models excludes units within 2 km of the LCP. The estimates are similar if they are
included, but we think it is appropriate to drop them as explained earlier.

23
Table 6: Railway stations and population growth: IV estimates

Dep. var.: unit pop. growth 1841 to 1891 OLS IV rail IV rail IV rail, road, water, and port
(1) (2) (3) (4)
coe coe coe coe
variable (t-stat) (t-stat) (t-stat) (t-stat)
Indicator distance to 1851 railway station <2km 0.165*** 0.355* 0.368* 0.376*
(0.019) (0.193) (0.191) (0.196)
Indicator distance to 1830 inl. waterway <2km 0.048** 0.031 0.027 -0.012
(0.019) (0.026) (0.026) (0.037)
Indicator distance to 1830 turnpike road<2km 0.034*** 0.027** 0.027** 0.127**
(0.009) (0.011) (0.011) (0.046)
Indicator distance to 1842 port <2km 0.141** 0.139** 0.140** 0.271**
(0.068) (0.067) (0.065) (0.138)

Kleibergen-Paap rk Wald F statistic 33.12 33.64 8.24

Variables for pop. growth 1821 to 31 and 1831 to 41 No No Yes No


Units with 2 km of LCP nodes removed? Yes Yes Yes Yes
First nature controls Yes Yes Yes Yes
Second nature controls Yes Yes Yes Yes
Registration district Fixed eects Yes Yes Yes Yes
N 9118 9118 9118 9118
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
and second nature controls see table 1. The instrument for being within 2 km of rail is an indicator for being within 2 km of
the LCP. The instruments for being within 2 km of 1830 turnpike roads, inland waterways, and ports are the equivalent for
1680 main roads, waterways, and ports.

The increased size of the IV estimate is contrary to the argument that railways were

built in units with unobservable factors related to higher growth. One speculation is that

individuals anticipated the building of railways. In order to gain from increased property

values or employment, they moved to future railway units prior to 1841. In that case, one

might expect OLS to yield a downward estimate for population growth from 1841 to 1891.

We test for this type of mechanism by including controls for population growth from 1821

to 31 and from 1831 to 41. They would capture the movement into units just before railway

stations opened. The IV estimates are fairly similar as shown in column (3) of table 6.

Therefore, it does not appear that OLS is downward biased because of anticipation eects.

There is another explanation. According to Kellet (2012), railway companies sometimes

selected routes that went through dilapidated residential areas. These presented less political

opposition and they tended to have a single landowner making right-of-way negotiations

easier. If this was generally true, then units within 2 km of stations probably had negative

24
growth potential in the absence of railways.

The nal IV specication instruments for all infrastructure variables (column 4). The

Kleibergen-Paap F statistic is smaller in this case, so these results need to be interpreted

with caution. The estimated eect of being close to stations is very similar, suggesting

our estimate for railways is not aected by endogeneity of turnpike roads, waterways, and

ports. These results also show that units close to ports and turnpike roads grow signicantly

more even in the IV model. The eect of inland waterways are close to zero, but the same

is true in the previous IV specications (see columns 2 and 3 in table 6). Overall the IV

results further conrm the importance of at least three infrastructures for nineteenth century

growth (ports, turnpikes, and railways).

8 Reorganization and Heterogeneous eects


Our analysis thus far does not account for spatial reorganization. Yet there is some

evidence it mattered. One of the leading historians argues "the railway did not necessarily

produce growth in population or business. It might take people or business away (Simmons

1986 p. 16)." Redding and Turner (2015) propose a method to identify reorganization eects.

They suggest dening a control group more distant from infrastructure and to compare them

with a set of `treated' groups nearby. In our setting one might expect that units just beyond

the 1.5 or 2-hour commuting distance to infrastructures (6 or 8 km) might lose population

due to out-migration to closer units. To identify such an eect, we use units beyond 10 km

as the control group. This approach is not perfect because we don't know if units 10 km

away from infrastructures are truly unaected. Nevertheless, this approach yields insights

on the relative growth eects of infrastructure at varying distances up to 10 km.

We estimate a model with ve distance bins to stations, inland waterways, and ports: 0

to 2, 2 to 4, 4 to 6, 6 to 8, and 8 to 10 km. For turnpikes, around 1% of units were more

than 10 km so we continue to use the simple indicator for being less than 2 km as the only

treatment. The results are reported in table 7. Units 0 to 2 km, 2 to 4 km, and 4 to 6

km from 1851 stations all have higher population growth relative to units more than 10 km

from stations. We also nd that population growth is not signicantly dierent from zero

in units between 6 and 10 km from stations.

25
Table 7: Population growth at varying distances from infrastructures

coe coe coe


variable (std. err.) variable (std. err.) variable (std. err.)
rail station <2km 0.225*** waterway <2km 0.068*** Port <2km 0.168**
(0.027) (0.025) (0.068)
rail station >2km & <4km 0.1087*** waterway >2km & <4km 0.039* Port >2km & <4km 0.167***
(0.023) (0.021) (0.045)
rail station >4km & <6km 0.0438** waterway>4km & <6km 0.028 Port >4km & <6km 0.018
(0.021) (0.018) (0.026)
rail station >6km & <8km 0.071 waterway>6km & <8km -0.007 Port >6km & <8km 0.015
(0.020) (-0.018) (0.028)
rail station >8km & <10km 0.0133 waterway>8km & <10km -0.010 Port >8km & <10km -0.001
(0.022) (0.016) (0.021)
turnpike <2km 0.030
(0.010)***

First nature controls Yes


Second nature controls Yes
Reg. District Fixed eects Yes
N 9482
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district. For the list of rst and second nature
controls see table 1.

The estimates for waterways and ports yield a similar conclusion. Units within 4 km

had higher growth compared to units more than 10 km from inland waterways and ports.

By contrast, units between 4 and 10 km of either did not grow signicantly less than units

more than 10 km. Also striking is that units within 4 km of ports increased growth by

approximately the same amount as units within 4 km of railway stations. However, ports

cannot explain as much of the variation. The beta coecients for having railway stations

within 0 to 2 and 2 to 4 km are 0.151 and 0.078. The same for having ports within 0 to 2

and 2 to 4 km are 0.053 and 0.067.

The main takeaway from table 7 is that units just outside the commuting zone of stations

or other infrastructures did not grow less than units farther from the commuting zone. This

nding does not imply that spatial reorganization was absent. Railways might have pulled

in population equally from units between 6 and 10 km as they did from units between 10

and 15 km or farther. In other words, these results do not rule out higher net migration as

the key reason population grew more near railway stations.

The signicance of migration over fertility and mortality eects is supported by a more

aggregated data analysis. We observe the fertility rate (number of children per woman)

26
and the infant mortality rate (number of children born per 1000 that died before their rst
40
birthday) at the sub-registration district level at each decennial census from 1851. Sub-

districts are 70 square km on average and equal about 4 or 5 of our units. There is also

data at the sub-district level on the percentage of the population that is Irish born and the

number of working age men per 100 working age women. The percentage born in Ireland

is a good indicator of in-migration. The sex ratio is more subtle. A decline in working age

males to females is thought to have been caused by greater in-migration of young women to

work as servants. Of course, this assumes women are more mobile than men, which is not

true in all cases.

To make use of this data we need to match sub-districts across time. Unfortunately, the

sub-districts are not always spatially consistent. We matched sub-districts in 1851 and 1861

based on name and total land area. At this step, we lose about 8% of sub-districts due to

boundary and name changes. We also link our earlier units to sub-districts to identify which

had at least one railway station in 1851. This second step reduces our sample to about 75%

of all sub-districts based on inconsistency in names.

Table 8 reports specications that regress the change in demographic or migration vari-
41
ables from 1851 to 1861 on an indicator for having at least one station in 1851. Panel A

report specications for the change in fertility. Column (1) includes only the station vari-

able. It shows the fertility rate decreased more in sub-districts with stations. On average

fertility rates change by -0.014 and therefore the coecient -0.048 is fairly large. Column

(2) adds a quadratic in sub-district latitude and longitude. The coecient on stations is

similar. Column (3) adds county xed eects. Now the coecient decreases in size and is

no longer signicant. If anything, these results go against population growth near stations

being caused by higher fertility.

Panels B, C, and D analyze changes in infant mortality, the % Irish born, and the male

to female ratio respectively using the same specications. Railways do not have a signicant

eect on infant mortality in any specication. Changes in the % Irish born are positively

associated with stations in the rst two specications without county xed eects. The

average change in % Irish born is 0.102 percentage points, indicating a fairly large eect

40 This data comes from Populations Past. https://www.populationspast.org/imr/1861/#7/53.035/-


2.895. This data has been produced by the 'Atlas of Victorian Fertility Decline' project (PI: A.M. Reid) with
funding from the ESRC (ES/L015463/1), using an enhanced version of data from Schurer, K. and Higgs, E.
(2014). Integrated Census Microdata (I-CeM), 1851-1911. [data collection]. Colchester, Essex: UK Data
Archive [distributor]. SN: 7481, http://dx.doi.org/10.5255/UKDA-SN-7481-1. Dataset last updated: 24th
May 2018.
41 We also use the data to run a regression of sub-district population growth from 1851 to 1861 on an
indicator for having at least one station in 1851. The results are reported in table 13 appendix A.3. They
conrm our earlier conclusion that being near railway stations increased population growth.

27
from stations. Changes in the male to female ratio are negatively associated with railway

stations in the rst two specications, but again the estimate is not signicant with county

xed eects. Overall these results support the argument that railways grew population

through in-migration with the caveat that the estimates are not always precise.

Table 8: Stations, demography, and migration: estimates for sub-districts

Panel A ∆ fertility rate Panel B ∆ inf. mortality rate


(1) (2) (3) (4) (5) (6)
coe coe coe coe coe coe
variable (t-stat) (t-stat) (t-stat) (t-stat) (t-stat) (t-stat)
Indicator for station in 1851 -0.0487** -0.0662*** -0.0284 0.0348 -0.0296 -0.237
(0.0232) (0.0226) (0.0219) (0.282) (0.290) (0.402)

Quadratic in lat. and long. No Yes Yes No Yes Yes


County xed eects No No Yes No No Yes
N 1,568 1,568 1,568 1,360 1,360 1,360
Panel C ∆ % Irish born Panel D ∆ male to female ratio
(7) (8) (9) (10) (11) (12)
coe coe coe coe coe coe
variable (t-stat) (t-stat) (t-stat) (t-stat) (t-stat) (t-stat)
Indicator for station in 1851 0.141** 0.127*** 0.0540 -1.130** -1.248** -1.045
(0.0533) (0.0445) (0.0494) (0.519) (0.486) (0.681)

Quadratic in lat. and long. No Yes Yes No Yes Yes


County xed eects No No Yes No No Yes
N 1,591 1,591 1,591 1,340 1,340 1,340
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on county in all specications.

Assuming that migration was the main factor, then one may ask: did railways draw

population into more or less dense units? If the answer is more dense, then railways likely
42
raised the productivity of migrants due to agglomeration eects. We test for heterogeneous

eects by 1841 density using the following specication.

rail
yi1891 − yi1841 = β0 Ii1851 rail
+ β1 Ii1851 rail
lnpop41i + β2 Ii1851 (lnpop41i )2 + γxi + εi (3)

where the natural log of 1841 population density and its square are interacted with the

indicator for being within 2 km of an 1851 station. The quadratic formulation is exible

and allows for non-linear eects. Note that 1841 population density and its square are

included as controls in xi , along with district FEs and rst and second nature controls.

42 See Fujita et. al. (2001) and Desment and Rossi-Hansberg (2014) for agglomeration models.

28
Figure 4: Heterogeneity with initial population density

Sources: see text.

The estimates show that being close to railway stations had a signicantly larger growth

eect for units with medium to large population density. To illustrate, we plot our predicted

population growth for units between the 5th and 95th percentiles in 1841 population density.

One prediction is for units less than 2 km from 1851 stations and the other is for units more

than 2 km from stations (see gure 4). Railways have their largest eect for population

densities between the 75th and 90th percentiles. The increase in population was around 25

percentage points for these units. At the 50th percentile railways increased population by

around 16 pp. To put these gures into perspective, a unit at the 85th percentile of 1841

population density was 183% more populous than a unit at the 50th percentile. How did this

density matter? Leunig and Crafts estimate that doubling a town's population increased its
43
wages by 11% in 1868. Using this gure, if railways reallocated population from the 50th

to the 85th percentile then it would raise their wages by 20%a signicant change.

We use a similar methodology to test for an interaction eect between proximity to inland

waterways and 1841 population density. The predictions are summarized in the right-hand

panel of gure 4. They also show bigger eects on units around the 75th percentile. It

appears that inland waterways also had productivity enhancing eects on migrants.

9 Persistence results
We have shown that units within a short commuting or shipping distance of infrastruc-

tures c.1840 aected population growth in the nineteenth century. Now we want to know if

they aected population growth in the twentieth century and up to the present. This issue

43 See Crafts and Leunig, 'Transport improvements '.

29
is related to the impact of adopting railways at an early stage. Previous studies show that
44
some infrastructures, like railways, have signicant persistent eects. If turnpike roads,

waterways, and ports also have signicant persistent eects then this would cast further

doubt on railways being indispensable for all urbanisation.

Persistence is tested using our historical units merged with Lower Super Output Areas

(LSOAs) in 2011. We estimate the following `very' long dierences specication

rail pre−rail
yi2011 − yi1891 = β1 Ii1851 + β2 Ii1840 + γxi + εi (4)

where the dependent variable yi2011 − yi1891 measures the log dierence in population 1891 to
rail pre−rail
2011. The variables Ii1851 and Ii1840 are indicators for being within 4 km of mid-nineteenth

century infrastructures.

The results are reported in table 9. Column (1) includes 1841 population density as a

control along with all the others in table 1. Column (2) shows a similar specication but

replaces 1841 with 1891 population density as a control. The results are similar. In (1)

being within 4 km of 1851 railway stations increases population growth by 29.4 log points

from 1891 to 2011, or 0.21% higher annual growth. The coecients also reveal that being

within 4 km of turnpike roads and inland waterways increased annual growth by 0.10%. The

same for ports increased annual growth by 0.18%. The beta coecients show that railways

explain more of the variation in population growth (0.14 for railways compared to 0.058,

0.045, and 0.066 for waterways, turnpikes, and ports). Perhaps more striking is how much

growth is explained by the pre-rail infrastructures. While we cannot trace out exactly why

pre-rail matters so much up to the present, it is likely their uses evolved with the modern

era. For example, inland waterways are now seen as amenities in many areas.

44 See
Bleakley and Lin (2011), Redding, Sturm, and Wolf (2011), Garcia-López et. al. (2015), Jedwab
and Moradi (2016).

30
Table 9: Infrastructure access and population growth over the very long run

Dep. var.: unit pop. growth 1891 to 2011 (1) (2)


OLS OLS
coe coe
variable (std. err.) (std. err.)
Indicator distance to 1851 railway station <4km 0.294*** 0.338***
(0.031) (0.031)
Indicator distance to 1830 inland waterway <4km 0.115*** 0.129***
(0.032) (0.031)
Indicator distance to 1830 turnpike road<4km 0.124*** 0.121***
(0.031) (0.031)
Indicator distance to 1842 port <4km 0.245*** 0.293***
(0.062) (0.067)

Control for 1841 pop. density Yes No


Control for 1891 pop. density No Yes
First nature controls Yes Yes
Second nature controls Yes Yes
Reg. district Fixed eects Yes Yes
N 9481 9481
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
and second nature controls see table 1.

10 Conclusion
This paper examines whether railway's eect on urbanisation was substantially larger

than other infrastructures, like turnpike roads, canals, and ports. The paper has several

main points. First, England had a well-developed transport network before railways. It had

many good roads, a huge inland waterway network based on improved rivers and canals,

and numerous ports along its coastline. These early transport improvements helped develop

the English economy before the 1830s when railways started to spread. Many of these

older transport modes went into decline in the mid-nineteenth century when faced with

competition from railways. However, some of these modes had a resurgence as their uses

changed. As our estimates show population growth from 1841 to 1891 was higher near

railway stations, but it was also high near turnpike roads, waterways, and especially ports.

This case reminds us that the impacts of a single infrastructural improvement, however

large, may not be the only important factor aecting urbanisation.

Second, this paper shows that the location of transport infrastructures can help explain

the spatial patterns of economic growth during the later phases of the industrial revolution.

31
Previous studies have not been able to identify the eects of all infrastructures and not at a

disaggregated level. Using our preferred OLS estimates, a counterfactual calculation implies

that if no units were within 2 km of railways then aggregate population growth would have

been 20% lower between 1841 and 1891. A dierent counterfactual implies that if no units

were within 2 km of railways, waterways, or turnpike roads then aggregate growth would

have been 35% lower.

Third, we provide evidence that railways mainly grew population by attracting migrants.

Higher fertility near stations, another candidate mechanism, is rejected by the data. How

then did railways grow the economy beyond providing a new transport mode with better

attributes? Railways increased productivity by moving the population from low to high

density areas, where agglomeration was present.

Fourth, we show that population growth in England and Wales between 1891 and 2011

was inuenced by infrastructures in the mid-nineteenth century. This suggests the policy

decisions made today regarding transport infrastructure will have eects on urbanisation for

decades, perhaps even centuries.

References
1. Aldcroft, Derek Howard, and H. J. Dyos. British transport: an economic survey from the
seventeenth century to the twentieth. Penguin Books, 1974.

2. Alvarez, Eduard, Xavi Franch, and Jordi Martí-Henneberg. "Evolution of the territorial
coverage of the railway network and its inuence on population growth: The case of England
and Wales, 18711931." Historical Methods: A Journal of Quantitative and Interdisciplinary
History 46.3 (2013): 175-191.

3. Alvarez, E, Dunn, O., Bogart, D., Satchell, M., Shaw-Taylor, L. , 'Ports of England and
Wales, 1680-1911', 2017.

4. Armstrong, John. The Vital Spark: The British Coastal Trade, 1700-1930. International
Maritime Economic History Association, 2009.

5. Atack, Jeremy, Fred Bateman, Michael Haines, and Robert A. Margo. "Did railroads induce

or follow economic growth?." Social Science History 34, no. 2 (2010): 171-197.

6. Atack, Jeremy, and Robert A. Margo. "The Impact of Access to Rail Transportation on

Agricultural Improvement: The American Midwest as a Test Case, 1850-1860." Journal of

Transport and Land Use 4.2 (2011).

7. Avery, Brian William. Soil classication for England and Wiles: higher categories. No.

631.44 A87. 1980.

32
8. Baines, Dudley. Migration in a mature economy: emigration and internal migration in

England and Wales 1861-1900. Vol. 3. Cambridge University Press, 2002.

9. Baum-Snow, N., Brandt, L., Henderson, J. V., Turner, M. A., & Zhang, Q. (2017). Roads,

railroads, and decentralization of Chinese cities. Review of Economics and Statistics, 99(3),

435-448.

10. Becker, Sascha O., Erik Hornung, and Ludger Woessmann. "Education and catch-up in the

industrial revolution." American Economic Journal: Macroeconomics (2011): 92-126.

11. Berger, Thor, and Kerstin Eno. "Locomotives of local growth: The short-and long-term

impact of railroads in Sweden." Journal of Urban Economics (2015).

12. Bleakley, Hoyt, and Jerey Lin. "Portage and path dependence." The quarterly journal of

economics 127.2 (2012): 587-644.

13. Bogart, Dan. "Turnpike trusts and the transportation revolution in 18th century England."

Explorations in Economic History 42.4 (2005): 479-508.

14. Bogart, Dan. "Turnpike trusts and property income: new evidence on the eects of trans-

port improvements and legislation in eighteenth-century England 1." The Economic History

Review 62.1 (2009): 128-152.

15. Bogart, Dan. The Transport Revolution in Industrializing Britain, in Floud, Roderick, Jane

Humphries, and Paul Johnson, eds. The Cambridge Economic History of Modern Britain:

Volume 1, Industrialisation, 17001870. Cambridge University Press, 2014.

16. Bogart, Dan. "Party Connections, Interest Groups and The Slow Diusion of Infrastructure:
Evidence From Britain'S First Transport Revolution." The Economic Journal 128.609 (2017):
541-575.

17. Bogart, Dan. `The Turnpike Roads of England and Wales', in The Online Historical Atlas

of Transport, Urbanization and Economic Development in England and Wales c.1680-1911.

Eds. L. Shaw-Taylor, D. Bogart and A.E.M. Satchell, 2017.

18. Bogart, Dan, Michael Lefors, and A. E. M. Satchell. "Canal carriers and creative destruction

in English transport." Explorations in Economic History (forthcoming).

19. Boughey, J., & Hadeld, C. (2012). British Canals: The Standard History. The History

Press.

20. Campbell, Gareth, and John D. Turner. "Dispelling the Myth of the Naive Investor during

the British Railway Mania, 18451846." Business History Review 86.01 (2012): 3-41.

33
21. Campbell, Gareth, and John D. Turner. "Managerial failure in mid-Victorian Britain?:

Corporate expansion during a promotion boom." Business History 57.8 (2015): 1248-1276.

22. Casson, Mark. The world's rst railway system: enterprise, competition, and regulation on

the railway network in Victorian Britain. Oxford University Press, 2009.

23. Casson, Mark. "The determinants of local population growth: A study of Oxfordshire in the

nineteenth century." Explorations in Economic History 50.1 (2013): 28-45.

24. Chandra, Amitabh, and Eric Thompson. "Does public infrastructure aect economic ac-

tivity?: Evidence from the rural interstate highway system." Regional Science and Urban

Economics 30.4 (2000): 457-490.

25. Clayden, Benjamin, and John Marcus Hollis. Criteria for dierentiating soil series. No. Tech

Monograph 17. 1985.

26. Cormen, Thomas H., Charles E Leiserson, Ronald L Rivest and Cliord Stein: Introduction

to Algorithms, Cambridge, MA, MIT Press (3rd ed., 2009) pp.695-6.

27. Crafts, Nicholas, and Abay Mulatu. "How did the location of industry respond to falling

transport costs in Britain before World War I?." The Journal of Economic History 66.03

(2006): 575-607.

28. Crafts, Nicholas, and Nikolaus Wolf. "The location of the UK cotton textiles industry in

1838: A quantitative analysis." The Journal of Economic History 74.04 (2014): 1103-1139.

29. Crafts, Nicholas, and Tim Leunig. Transport improvements, agglomeration economies and

city productivity: did commuter trains raise nineteenth century British wages?', working

paper.

30. Desmet, Klaus, and Esteban Rossi-Hansberg. "Spatial development." The American Eco-

nomic Review 104.4 (2014): 1211-1243.

31. Donaldson, Dave. Railroads of the Raj: Estimating the impact of transportation infrastruc-

ture. No. w16487. National Bureau of Economic Research, 2010.

32. Donaldson, Dave, and Richard Hornbeck. "Railroads and American economic growth: A

market access approach." The Quarterly Journal of Economics 131.2 (2016): 799-858.

33. Duranton, Gilles, and Matthew A. Turner. "Urban growth and transportation." The Review

of Economic Studies 79.4 (2012): 1407-1440.

34
34. Faber, Benjamin. "Trade integration, market size, and industrialization: evidence from

China's National Trunk Highway System." Review of Economic Studies 81.3 (2014): 1046-

1070.

35. Fernihough, Alan, and Kevin Hjortshøj O'Rourke. Coal and the European industrial revolu-

tion. No. w19802. National Bureau of Economic Research, 2014.

36. Fishlow, Albert. American Railroads and the Transformation of the Ante-bellum Economy.

Vol. 127. Cambridge, MA: Harvard University Press, 1965.

37. Fogel, R. "Railways and American Economic Growth." Baltimore: Johns Hopkins Press.(1964).

38. Freeman, Michael J., and Derek H. Aldcroft, eds. Transport in Victorian Britain. Manchester

University Press, 1991.

39. Fujita, Masahisa, Paul R. Krugman, and Anthony Venables. The spatial economy: Cities,

regions, and international trade. MIT press, 2001.

40. Garcia-López, Miquel-Àngel, Adelheid Holl, and Elisabet Viladecans-Marsal. "Suburbaniza-


tion and highways in Spain when the Romans and the Bourbons still shape its cities." Journal
of Urban Economics 85 (2015): 52-67.

41. Gourvish, Terence Richard. Railways and the British economy, 1830-1914. Macmillan Inter-

national Higher Education, 1980.

42. Gregory, Ian N., and Jordi Martí Henneberg. "The railways, urbanization, and local demog-

raphy in England and Wales, 18251911." Social Science History 34.2 (2010): 199-228.

43. Hawke, Gary Richard. Railways and economic growth in England and Wales, 1840-1870.

Clarendon Press, 1970.

44. Heblich, Stephan, Stephen J. Redding, and Daniel M. Sturm. The Making of the Modern

Metropolis: Evidence from London. No. w25047. National Bureau of Economic Research,

2018.

45. Hodgson, Charles. "The eect of transport infrastructure on the location of economic activity:

Railroads and post oces in the American West." Journal of Urban Economics 104 (2018):

59-76.

46. Hornung, Erik. "Railroads and growth in Prussia." Journal of the European Economic As-

sociation 13.4 (2015): 699-736.

35
47. Jedwab, Remi, Edward Kerby, and Alexander Moradi. "History, path dependence and de-

velopment: Evidence from colonial railroads, settlers and cities in Kenya." The Economic

Journal (2015).

48. Jedwab, Remi, and Alexander Moradi. "The permanent eects of transportation revolutions

in poor countries: evidence from Africa." Review of economics and statistics 98.2 (2016):

268-284.

49. Jarvis A., H.I. Reuter, A. Nelson, E. Guevara (2008). Hole-lled seamless SRTM data V4, In-

ternational Centre for Tropical Agriculture (CIAT), available from http://srtm.csi.cgiar.org.

50. Jaworski, Taylor, and Carl T. Kitchens. "National Policy for Regional Development: Histor-

ical Evidence from Appalachian Highways." (2017).

51. Kellett, John R. The impact of railways on Victorian cities. Routledge, 2012.

52. Klein, Alexander, and Nicholas Crafts. "Making sense of the manufacturing belt: determi-

nants of US industrial location, 18801920." Journal of Economic Geography 12.4 (2012):

775-807.

53. Langton, John, and Robert John Morris. Atlas of industrializing Britain, 1780-1914. Rout-

ledge, 2002.

54. Law, Christopher M. "The growth of urban population in England and Wales, 1801-1911."

Transactions of the Institute of British Geographers (1967): 125-143.

55. Leunig, Timothy. "Time is money: a re-assessment of the passenger social savings from

Victorian British railways." The Journal of Economic History 66.3 (2006): 635-673.

56. Lipscomb, Molly, Mushq A. Mobarak, and Tania Barham. "Development eects of electri-

cation: Evidence from the topographic placement of hydropower plants in Brazil." American

Economic Journal: Applied Economics 5.2 (2013): 200-231.

57. Long, Jason. "Rural-urban migration and socioeconomic mobility in Victorian Britain." The

Journal of Economic History 65.1 (2005): 1-35.

58. Martí-Henneberg, J., Satchell, M., You, X., Shaw-Taylor, L., Wrigley E.A., 'England Wales

and Scotland rail lines shapele' (2017a).

59. Martí-Henneberg, J., Satchell, M., You, X., Shaw-Taylor, L., Wrigley E.A., 'England, Wales

and Scotland railway stations 1807-1994 shapele' (2017b).

60. Michaels, Guy. "The eect of trade on the demand for skill: Evidence from the interstate

highway system." The Review of Economics and Statistics 90.4 (2008): 683-701.

36
61. Odlyzko, Andrew. "Collective hallucinations and inecient markets: The British Railway

Mania of the 1840s." University of Minnesota (2010).

62. O'Rourke, Kevin H. "The European grain invasion, 18701913." The Journal of Economic

History 57.4 (1997): 775-801.

63. Pascali, Luigi. "The wind of change: Maritime technology, trade, and economic develop-

ment." American Economic Review 107.9 (2017): 2821-54.

64. Pascual Domènech, P. (1999). Los caminos de la era industrial: la construcción y nanciación

de la red ferroviaria catalana, 1843-1898 (Vol. 1). Edicions Universitat Barcelona.

65. Pawson, Eric. Transport and economy: the turnpike roads of eighteenth century Britain.

Academic Press, 1977.

66. Pope, Alexander, and D. Swann. "The pace and progress of port investment in England

16601830." Bulletin of Economic Research 12.1 (1960): 32-44.

67. Poveda, G. (2003). El antiguo ferrocarril de Caldas. Dyna, 70 (139), pp. 1-10.

68. Purcar, Cristina. "Designing the space of transportation: railway planning theory in nine-

teenth and early twentieth century treatises." Planning Perspectives 22.3 (2007): 325-352.

69. Redding, Stephen J., Daniel M. Sturm, and Nikolaus Wolf. "History and industry location:

evidence from German airports." Review of Economics and Statistics 93.3 (2011): 814-831.

70. Redding, Stephen J., and Matthew A. Turner. "Transportation costs and the spatial organi-

zation of economic activity." Handbook of regional and urban economics. Vol. 5. Elsevier,

2015. 1339-1398.

71. Redford, Arthur. Labour migration in England, 1800-1850. Manchester University Press,

1976.

72. Riley, S. J., S. D. Gloria, and R. Elliot (1999). A terrain Ruggedness Index that quanties

Topographic Heterogeneity, Intermountain Journal of Sciences, 5(2-4), 23-27.

73. Robson, Brian T. Urban growth: an approach. Vol. 9. Routledge, 2006.

74. Rosevear, A., Satchell, M., Bogart, D., Shaw Taylor, L., Aidt, T. and Leon, G., 'Turnpike

roads of England and Wales, 1667-1892', 2017.

75. Satchell, M. 'Identifying the Trunk Roads of Early Modern England and Wales,' 2017.

76. Satchell, M. and Shaw-Taylor, L., `Exposed coalelds of England and Wales' 2013.

37
77. Satchell, M., Shaw-Taylor, L., Wrigley E.A., '1680 England and Wales navigable waterways

shapele', 2017a.

78. Satchell, M., Shaw-Taylor, L., Wrigley E.A., '1830 England and Wales navigable waterways

shapele', 2017b.

79. Schurer, K., Higgs, E. (2014). Integrated Census Microdata (I-CeM), 1851-1911. [data

collection]. UK Data Service. SN: 7481, http://doi.org/10.5255/UKDA-SN-7481-1.

80. Shaw-Taylor, L. and Wrigley, E. A. Occupational Structure and Population Change, in

Floud, Roderick, Jane Humphries, and Paul Johnson, eds. The Cambridge Economic History

of Modern Britain: Volume 1, Industrialisation, 17001870. Cambridge University Press,

2014.

81. Simmons, Jack. The railway in town and country, 1830-1914. (1986).

82. Small, Kenneth. Urban transportation economics. Taylor & Francis, 2013.

83. Storeygard, Adam. "Farther on down the road: transport costs, trade and urban growth in

sub-Saharan Africa." The Review of Economic Studies 83.3 (2016): 1263-1295.

84. Tang, John P. "Railroad expansion and industrialization: evidence from Meiji Japan." The

Journal of Economic History 74.03 (2014): 863-886.

85. Tang, John P. "The Engine and the Reaper: Industrialization and mortality in late nineteenth

century Japan." Journal of health economics 56 (2017): 145-162.

86. Turnbull, Gerard. "Canals, coal and regional growth during the industrial revolution." The

Economic History Review 40.4 (1987): 537-560.

87. Wellington, A.M. The Economic Theory of the Location of Railways: An Analysis of the

Conditions Controlling the Laying Out of Railways to Eect the Most Judicious Expenditure

of Capital. Ed. J. Wiley & sons, 1877.

88. Willan, Thomas Stuart. River navigation in England, 1600-1750. Psychology Press, 1964.

89. Wrigley, Edward Anthony. Energy and the English industrial revolution. Cambridge Uni-

versity Press, 2010.

90. Wrigley, E. A. The PST system of classifying occupations, Working paper 2015.

91. You, Xuesheng. Women's employment in England and Wales, 1851-1911, University of Cam-

bridge, unpublished phd dissertation, 2014.

38
92. U.S. Census Bureau, 2012-2016 American Community Survey 5-year estimates.

A Appendices:
A.1 The least cost path instrument

In this appendix, we describe the instrument for distance to railway stations. The rst

step is to select the nodes of the hypothetical network and then which nodes will become

origins and destinations connected by the least cost path (LCP). The candidate nodes are all

the towns with a population over 5,000 inhabitants in 1801. These were the major population

centers. Each pair of towns, both with a population above 5000, is a potential origin and

destination for railway lines. A gravitational model selects the origins and destinations that

will be connected based on an approximation for the value of trade between the potential

origin and destination. We assume the value of connecting an origin and destination pair is
P opi P opj
given by GMij = Distij
GMij is the gravitational potential between town i and j,
, where

P opi is the 1801 population of town i, and Distit is the straight line distance between i and
j. We chose the town pair i and j as origins and destinations in our LCP if GMij > 10, 000.

The second step is to identify the LCP connecting our nodes. The main criteria used

to plan linear projects is usually the minimization of earth-moving works. Assuming that

the track structure (composed by rails, sleepers and ballast) is equal for the entire length,

it is in the track foundation where more dierences can be observed. Thus, terrains with

higher slopes require larger earth-moving and, in consequence, construction costs become

higher (Pascual 1999, Poveda 2003, Purcar 2007). The power of traction of the locomotives

and the potential adherence between wheels and rails could be the main reason. Besides,

it is also important to highlight that having slopes over 2% might imply the necessity of

building tunnels, cut-and-cover tunnels or even viaducts. The perpendicular slope was also

crucial. During the construction of the track section, excavation and lling have to be

balanced in order to minimize provisions, waste and transportation of land. Nowadays,

bulldozers and trailers are used, but historically workers did it manually. It implied a direct

linkage between construction cost, wages and availability of skilled laborers. In fact, it is

commonly accepted in the literature that former railways were highly restricted by several

factors. The quality of the soil, the necessity of construction tunnels and bridges or the

interference with preexistences (building and land dispossession) were several. Longitudinal

and perpendicular slope were the more signicant ones and we focus on these below.

39
Slopes are determined using elevation data. Several DEM rasters have been analyzed

in preliminary tests, but we nally chose the Shuttle Radar Topography Mission (SRTM)

obtained in 90 meter measurements (3 arc-second). Although being a current raster data

set, created in 2000 from a radar system on-board the Space Shuttle, the results oered

in historical perspective should not dier much from the reality. The LCP tool calculates

the route between an origin and a destination, minimizing the elevation dierence (or cost

in our case) in accumulative terms. The method developed was based on the ESRI Least-

Cost-Path algorithm, although additional tasks were implemented to optimize the results

and to oer dierent scenarios. The input data was the SRTM elevation raster, converted

into slope. This conversion was necessary in order to input dierent construction costs.

The third step is to specify the relationship between construction costs and slope. One

approach is to use the historical engineering literature. Wellington (1877) discusses elevation

slope (i.e. gradients), distance, and operational costs of railways, but this is not ideal as we

are interested in construction costs. We could not nd an engineering text that specied

the relationship between construction costs and slopes. As an alternative we use historical

construction cost data. The following details our data and procedure.

A select committee on railways in 1844 published a table on the construction costs of 54


45
railways. There were 45 with a clear origin and destination, to which we can measure total

elevation change along the route (details are available). For these 45 railways we calculate

the distance of the railway line in meters and the total elevation change (all meters of ascent

and descent). We then ran the following regression for railway i:

ConstructionCostsi = αDistance100M etersi + βElevationchangeM etersi + εi . (5)

where construction costs are measured in ’. This regression produces unsatisfactory results,
with total elevation change having a negative sign. We think the main reason is that the

sample includes railways with London as an origin and destination. Land values in London

were much higher than elsewhere and thus construction costs were higher there. Therefore,

we omit railways with a London connection. We also think it is important to account for

railways in mining areas as they were typically built to serve freight trac rather than a

mix with passenger.

Our extended model uses construction costs for 36 non-London railways and follows the

following specication:

45 See the Fifth report from the Select Committee on Railways; together with the minutes of evidence,
appendix and index (BPP 1844 XI). The specic section with the data is appendix number 2, report to the
lords of the committee of the privy council for trade on the statistics of British and Foreign railways, pp.
4-5.

40
ConstructionCostsi = αDistance100M etersi +βElevationchangeM etersi +µminingrailwayi +εi
(6)

The results imply that for every 100 meters of distance construction costs rise by ’128.9
(st. err 45.27) and holding distance constant construction costs rise by ’382.6 (st. err.

274.5) for every 1 meter increase in total elevation change. Construction costs for min-

ing railways are ’340,418 less (st. err. 179,815). For our LCP model we assume a

non-mining railway, re-scale the gures into construction costs per 100 meters, and nor-

malize so that costs per 100 meters are 1 at zero elevation change. The formula becomes

N ormalizedCostper100meters = 1+2.96∗(ElevationChangeM eters/Distance100meters).


The elevation change divided by distance can be considered as the slope in percent, in which

case our formula becomes Cost = 1 + 2.96 ∗ %slope. We think this is a reasonable approxi-

mation of the relationship between construction costs, distance, and elevation slope.

For computational purposes it is convenient to divide slope into bins of 0 to 1%, 1 to 2%,

and so on. The following table gives the costs over a standardized distance for dierent slope

bins in our preferred, which is labeled scenario 2. For comparison, we also show parameters

assuming a constant unitary linear cost in slope (scenario 1) and case where slope costs

are graded, and are constant up to 2 to 3% and then rise up to 6-7% when costs become

constant (scenario 3).

slope % cost scenario 1 cost scenario 2 (preferred) cost scenario 3

0 0 1 1

0-1 1 4 1

1-2 2 7 1

2-3 3 10 4

3-4 4 13 7

4-5 5 16 11

5-6 6 19 15

6-7 7 22 19

7-8 8 25 19

8-9 9 28 19

9-10 10 31 19

>10 ... 34 19

The LCP algorithm is implemented using ESRI python, using as initial variables the

elevation slope raster, the reclassication table of construction costs, and the node origin-

destination nodes. The cost distance and the back-link rasters using the formulation below:

41
(CostSurf ace(a) ∗ HF (a)) + CostSurf ace(b) ∗ HF (b))
GMij = ( )∗Surf aceDistance(ab)∗V F (ab)
2
(7)

where CostSurf ace(j) is the cost of travel for cell j, HF (j) is the horizontal factor for cell
j, Surf aceDistance(ab) is the surface distance for a to b, and V F (ab) is the vertical factor
from a to b. Note that the division by 2 of the friction of the segments is deferred until

the horizontal factor is integrated. Finally, we implemented the least-cost-path function to

obtain the LCP corridors. These corridors were converted to lines, exported, merged and

post-processed. Maps of our preferred LCP using scenario 2 are shown in the text.

A.2 Elevation, slope, and ruggedness variables

The aim of this appendix is to explain the creation of the elevation variables, including

the original sources and method we followed to estimate them. There are several initiatives

working on the provision of high-resolution elevation raster data across the world. The

geographical coverage, the precision of the data and the treatment of urban surroundings

concentrate the main dierences between databases.

We obtained several elevation DEM rasters, preferably DTM , covering the entire Eng-

land and Wales. In decreasing order in terms of accuracy, the most precise one database was

LIDAR (5x5m.), Landmap Data set contained in the NEODC Landmap Archive (Centre

for Environmental Data Archival). In second instance, we used EU-DEM (25x25m.) from

the GMES RDA project, available in the EEA Geospatial Data Catalogue (European En-

vironment Agency). The third dataset was the Shuttle Radar Topography Mission (SRTM

90x90m), created in 2000 from a radar system on-board the Space Shuttle Endeavor by

the National Geospatial-Intelligence Agency (NGA) and NASA. And nally, we have also

used GTOPO30 (1,000x1,000m) developed by a collaborative eort led by sta at the U.S.

Geological Survey's Center for Earth Resources Observation and Science (EROS). All those

sources have been created using satellite data, which means all of them are based in cur-

rent data. The lack of historical sources of elevation data obligate us to use them. This

simplication may be considered reasonable for rural places but it is more inconsistent in

urban surroundings where the urbanization process altered the original landscape. Even

using DTM rasters, the construction of buildings and technical networks involved a severe

change in the surface of the terrain. Several tests at a local scale were conducted with the

dierent rasters in order to establish a balance between precision and operational time spend

in the calculations. Total size of the les, time spend in dierent calculations and precision

in relation to the nest data were some of the comparisons carried on. After these, we opted

42
Figure 5: Slope and ruggedness measures

for SRTM90.

As stated in the text, the spatial units used as a basis for the present paper were civil

parishes, comprising over 9000 continuous units. In this regard, we had to provide a method

to obtain unique elevation variables for each unit, keeping the comparability across the

country. We estimated six variables in total: elevation mean, elevation std, slope mean,

slope std, ruggedness mean and ruggedness std. Before starting with the creation of the

dierent variables, some work had to be done to prepare the data. In order to obtain fully

coverage of England and Wales with SRTM data, we had to download 7 raster tiles. Those

images were merged together, projected into the British National Grid and cut externally

using the coastline in ArcGIS software.

Having the elevation raster of England and Wales, we proceed to calculate the rst two

variables: the elevation mean and its standard deviation. A python script was written to

split the raster using the continuous units, to calculate the raster properties (mean and

standard deviation) of all the cells in each sub-raster, and to aggregate the information

obtained in a text le. These les were subsequently joined to the previous shapele of civil

parishes, oering the possibility to plot the results.

The second derivative of those results aimed to identify the variability of elevation be-

tween adjacent cells. In this regard, two methods were developed to measure this phe-

nomenon: ruggedness and slope. Ruggedness is a measure of topographical heterogeneity

43
dened by Riley et al (1999). In order to calculate the ruggedness index for each unit, a

python script was written to convert each raster cell into a point keeping the elevation value,

to select the adjacent values using a distance tool, to implement the stated equation to every

single point, to spatially join the points to their spatial units and to calculate aggregated

indicators (mean and standard deviation) per each continuous units.

In order to calculate the slope variable for each unit, a python script was written to

convert the elevation into a slope raster, to split the raster using the continuous units, to

calculate the raster properties (mean and standard deviation) of all the cells in each sub-

raster, and to aggregate the information obtained in a text le. The obtained results for both

ruggedness and slope are displayed at the end of this note. As the reader will appreciate, the

scale of the indices is dierent (1 - 2 times) but the geographical pattern is rather similar.

In this regard, we used for the paper those variables derived from slope measures because

the time spend in calculations was rather lower.

44
A.3 Additional results

Table 10: All Coecients in model without district FEs


Dep. var.: unit pop. growth 1841 to 1891 coe (std. err.)
Indicator distance to rail station in 1851<2km 0.199*** (0.0231)
Indicator distance to inland waterway in 1830 <2km 0.0680*** (0.0205)
Indicator distance to turnpike road in 1830<2km 0.0468*** (0.0119)
Indicator distance to port in 1842 <2km 0.0411 (0.0681)
First-nature controls
Indicator exposed coal 0.274*** (0.0448)
Indicator coastal unit 0.215*** (0.0361)
Elevation -0.000818*** (0.000196)
Average elevation slope within unit -0.0275* (0.0149)
SD elevation slope within unit 0.0301** (0.0137)
Average rainfall 0.000233 (0.000496)
Average rainfall squared -1.38e-07 (2.57e-07)
Average temperature -0.848*** (0.301)
Average temperature squared 0.0470*** (0.017)
Wheat suitability (low input level rain-fed) 0.0471*** (0.0173)
Land area in sq. km. 0.000207 (0.000308)
Perc. of land with Raw gley soil -0.00225 (0.00657)
Perc. of land with Lithomorphic soil 3.85e-05 (0.000360)
Perc. of land with Pelosols soil -0.000954*** (0.000361)
Perc. of land with Podzolic soil 0.00256*** (0.000866)
Perc. of land with Surface-water gley soil 0.000612 (0.000378)
Perc. of land with Ground-water gley soil -0.00243 (0.00161)
Perc. of land with Man made soil 0.00534** (0.00251
Perc. of land with Peat soil -0.00271** (0.00130)
Perc. of other soil 0.00205 (0.00305)
Second nature controls
Ln 1841 population per sq. km -0.193*** (0.0333)
Share of male tertiary empl. in 1851 -0.390 (0.342)
Share of male agricultural empl. in 1851 -1.498*** (0.215)
Share of male mining & forestry empl. in 1851 -0.510** (0.224)
Share of male unspecied empl. in 1851 -1.183*** (0.171)
Ln distance to major city in 1801 -0.00853 (0.00739)
Registration district Fixed eects NO
R-square 0.256
N 9482
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications..

45
Table 11: Dierent specications for railway variables

Dep. var.: unit pop. growth 1841 to 1891 (1) (2) (3) (4) (5)
coe coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.) (std. err.)
Indicator distance to station in 1841 <2km 0.186***
(0.0469)
Indicator distance to station in 1861 <2km 0.177***
(0.0159)
Indicator distance to station in 1871 <2km 0.175***
(0.0141)
Indicator any rail line in 1851 0.164***
(0.0145)
Exactly one railway station in 1851 0.168***
(0.0208)
More than one railway station in 1851 0.270***
(0.0419)

First nature controls Yes Yes Yes Yes Yes


Second nature controls Yes Yes Yes Yes Yes
District Fixed eects Yes Yes Yes Yes Yes
N 9482 9482 9482 9482 9482
notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
and second nature controls see table 1.

46
Table 12: Pre-trend tests for the validity of instrument distance to LCP

Dep. var.: unit pop. growth 1801 to 1831 1801 to 1811 1811 to 1821 1821 to 1831 1831 to 1841
coe coe coe coe
variable (std. err.) (std. err.) (std. err.) (std. err.)
Distance to LCP for railways <2k 0.00335 0.00321 -0.000787 -0.00343
(0.00439) (0.00379) (0.00462) (0.00518)
Indicator dist. to inland waterway in 1830 <2km 0.000107 0.00319 0.0148*** 0.0118**
(0.00426) (0.00468) (0.00418) (0.00499)
Indicator dist. to turnpike road in 1830<2km 0.00177 0.00405 0.00531 -0.00174
(0.00408) (0.00350) (0.00364) (0.00374)
Indicator dist. to port in 1842 <2km 0.0406*** 0.0140 0.00198 0.0329**
(0.0154) (0.0127) (0.0117) (0.0145)

Units with 2 km of LCP nodes removed? Yes Yes Yes Yes


First nature controls Yes Yes Yes Yes
Second nature controls Yes Yes Yes Yes
Registration district Fixed eects Yes Yes Yes Yes
N 9114 9114 9114 9114
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on district in all specications. For the list of rst
and second nature controls see table 1.

Table 13: Stations and population growth: estimates for sub-districts

Dep. var: log dierent pop. 1851 and 1861


(1) (2) (3)
coe coe coe
variable (t-stat) (t-stat) (t-stat)
Indicator for station in 1851 0.0736*** 0.0645*** 0.0432***
(0.0108) (0.0107) (0.00818)

Quadratic in lat. and long. No Yes Yes


County xed eects No No Yes
N 1,599 1,599 1,599
Notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on county in all specications.

A.4 Panel regression results

We observe whether a unit has a railway station nearby in each decade from 1831 to 1891.

A panel or dierence-in-dierence model will identify whether population density increased

after a unit got its rst station nearby. The specication is the following:

47
yit = βIitrail + αi + δt + γxi δt + εi (8)

where yit is the log of population density in unit i in year t, Iitrail is an indicator equal to 1 if

unit i is within 2 km of a station in year t, αi is a unit xed eect, δt is a census year xed

eect, and xi δt is an interaction between the census year eects and several time-invariant
46
controls. The time span is 10 decades and there are 85,365 unit-year observations. The

standard errors are clustered on the unit to allow for correlation within a unit across time.

The results are reported in column 1 of table 14. The estimated coecient on Iitrail is 0.161

with a standard error of 0.008. In other words, getting a railway station nearby raises the

unit population density by approximately 16%.

We also modify the panel specication to test for pre-trends. Specically, we create a

dummy variable for two decades before a unit gets its rst station, one decade before a

unit gets its rst station, and so on up to an indicator for at least 5 decades after a unit

gets its rst station. The omitted group is three or more decades before. The estimates

are shown in column 2 of table 14. The estimates reveal a signicant pre-trend. Two

decades prior a unit has 2.2% higher population density and one decade prior it has 5.3%

higher population density. However, after the rst railway station is open population growth

increases signicantly. In the rst decade after the station opens the coecient is 0.146 and

the second decade it opens the coecient is 0.164.

46 We include interactions with coal, coastal, elevation, average slope, the standard deviation of slope, a
cubic polynomial in longitude, the same for latitude, and the bottom three quartiles of 1801 population
density.

48
Table 14: Panel Regression estimates

Dep. var.: log unit pop. density in year t (1) (2)


coe coe
variable (std. err.) (std. err.)
Indicator distance to station <2km 0.161***
(0.008)
Indicator two decades before distance to rst station <2km 0.022***
(0.053)
Indicator one decade before distance to rst station <2km 0.053***
(0.006)
Indicator decade distance to rst station <2km 0.109***
(0.007)
Indicator one decade after distance to rst station <2km 0.146***
(0.008)
Indicator second decade distance to rst station <2km 0.164***
(0.009)
Indicator third decade distance to rst station <2km 0.133***
(0.009)
Indicator fourth decade distance to rst station <2km 0.066***
(0.008)
Indicator fth decade distance to rst station <2km 0.020***
(0.004)
Census year xed eects Yes Yes
Unit Fixed eects Yes Yes
Census year xed eects * rst nature controls Yes Yes
N 85,365 85,365
notes: * p<0.05, ** p<0.01, *** p<0.001. Standard errors are clustered on the unit in all specications. For the list of rst
nature controls see table 1.

49

You might also like