FULLTEXT02

TRITA-SoM 2012-03
XINTAO LIU The Principle of Scaling of Geographic Space and its Application in Urban Studies
ISSN 1653-6126
ISRN KTH/SoM/12-03/SE
ISBN 978-91-7501-277-3 The Principle of Scaling of
Geographic Space and its
Application in Urban Studies
X I N TA O L I U
Doctoral Thesis in Geodesy and Geoinformatics

with Specialisation in Geoinformatics
Stockholm, Sweden 2012
KTH 2012
www.kth.se
The Principle of Scaling of Geographic Space and The Principle of Scaling of Geographic Space and
its Application in Urban Studies its Application in Urban Studies
Xintao Liu Xintao Liu
Doctoral Thesis Doctoral Thesis

© Xintao Liu 2012 © Xintao Liu 2012
Doctoral dissertation Doctoral dissertation

Royal Institute of Technology (KTH) Royal Institute of Technology (KTH)
Department of Urban Planning and Environment Department of Urban Planning and Environment
Division of Geodesy and Geoinformatics Division of Geodesy and Geoinformatics
SE-100 44 Stockholm SE-100 44 Stockholm
Sweden Sweden
TRITA-SoM 2012-03 TRITA-SoM 2012-03

ISSN 1653-6126 ISSN 1653-6126
ISRN KTH/SoM/12-03/SE ISRN KTH/SoM/12-03/SE
ISBN 978-91-7501-277-3 ISBN 978-91-7501-277-3
Printed by e-print, Sweden 2012 Printed by e-print, Sweden 2012

Abstract Abstract
Geographic space is the large-scale and continuous space that encircles the earth Geographic space is the large-scale and continuous space that encircles the earth
and in which human activities occur. The study of geographic space has drawn and in which human activities occur. The study of geographic space has drawn
attention in many different fields and has been applied in a variety of studies, attention in many different fields and has been applied in a variety of studies,
including those on cognition, urban planning and navigation systems. A scaling including those on cognition, urban planning and navigation systems. A scaling
property indicates that small objects are far more numerous than large ones, i.e., property indicates that small objects are far more numerous than large ones, i.e.,
the size of objects is extremely diverse. The concept of scaling resembles a the size of objects is extremely diverse. The concept of scaling resembles a
fractal in geometric terms and a power law distribution from the perspective of fractal in geometric terms and a power law distribution from the perspective of
statistical physics, but it is different from both in terms of application. statistical physics, but it is different from both in terms of application.
Combining the concepts of geographic space and scaling, this thesis proposes Combining the concepts of geographic space and scaling, this thesis proposes
the concept of the scaling of geographic space, which refers to the phenomenon the concept of the scaling of geographic space, which refers to the phenomenon
that small geographic objects or representations are far more numerous than that small geographic objects or representations are far more numerous than
large ones. From the perspectives of statistics and mathematics, the scaling of large ones. From the perspectives of statistics and mathematics, the scaling of
geographic space can be characterized by the fact that the sizes of geographic geographic space can be characterized by the fact that the sizes of geographic
objects follow heavy-tailed distributions, i.e., the special non-linear relationships objects follow heavy-tailed distributions, i.e., the special non-linear relationships
between variables and their probability. between variables and their probability.
In this thesis, the heavy-tailed distributions refer to the power law, lognormal, In this thesis, the heavy-tailed distributions refer to the power law, lognormal,
exponential, power law with an exponential cutoff and stretched exponential. exponential, power law with an exponential cutoff and stretched exponential.
The first three are the basic distributions, and the last two are their degenerate The first three are the basic distributions, and the last two are their degenerate
versions. If the measurements of the geographic objects follow a heavy-tailed versions. If the measurements of the geographic objects follow a heavy-tailed
distribution, then their mean value can divide them into two groups: large ones distribution, then their mean value can divide them into two groups: large ones
(a low percentage) whose values lie above the mean value and small ones (a (a low percentage) whose values lie above the mean value and small ones (a
high percentage) whose values lie below. This regularity is termed as the high percentage) whose values lie below. This regularity is termed as the
head/tail division rule. That is, a two-tier hierarchical structure can be obtained head/tail division rule. That is, a two-tier hierarchical structure can be obtained
naturally. The scaling property of geographic space and the head/tail division naturally. The scaling property of geographic space and the head/tail division
rule are verified at city and country levels from the perspectives of axial lines rule are verified at city and country levels from the perspectives of axial lines
and blocks, respectively. and blocks, respectively.
In the study of geographic space, the most important concept is geographic In the study of geographic space, the most important concept is geographic
representation, which represents or partitions a large-scale geographic space into representation, which represents or partitions a large-scale geographic space into
numerous small pieces, e.g., vector and raster data in conventional spatial numerous small pieces, e.g., vector and raster data in conventional spatial
analysis. In a different context, each geographic representation possesses analysis. In a different context, each geographic representation possesses
different geographic implications and a rich partial knowledge of space. The different geographic implications and a rich partial knowledge of space. The
emergence of geographic information science (GIScience) and volunteered emergence of geographic information science (GIScience) and volunteered
geographic information (VGI) greatly enable the generation of new types of geographic information (VGI) greatly enable the generation of new types of
geographic representations. In addition to the old axial lines, this thesis geographic representations. In addition to the old axial lines, this thesis
generated several types of representations of geographic space: (a) blocks that generated several types of representations of geographic space: (a) blocks that
were decomposed from road segments, each of which forms a minimum cycle were decomposed from road segments, each of which forms a minimum cycle
such as city and field blocks (b) natural streets that were generated from street such as city and field blocks (b) natural streets that were generated from street
center lines using the Gestalt principle of good continuity; (c) new axial lines center lines using the Gestalt principle of good continuity; (c) new axial lines
iii iii
that were defined as the least number of individual straight line segments that were defined as the least number of individual straight line segments
mutually intersected along natural streets; (d) the fewest-turn map direction mutually intersected along natural streets; (d) the fewest-turn map direction
(route) that possesses the hierarchical structure and indicates the scaling of (route) that possesses the hierarchical structure and indicates the scaling of
geographic space; (e) spatio-temporal clusters of the stop points in the geographic space; (e) spatio-temporal clusters of the stop points in the
trajectories of large-scale floating car data. trajectories of large-scale floating car data.
Based on the generated geographic representations, this thesis further applies the Based on the generated geographic representations, this thesis further applies the
scaling property and the head/tail division rule to these representations for urban scaling property and the head/tail division rule to these representations for urban
studies. First, all of the above geographic representations demonstrate the studies. First, all of the above geographic representations demonstrate the
scaling property, which indicates the scaling of geographic space. Furthermore, scaling property, which indicates the scaling of geographic space. Furthermore,
the head/tail division rule performs well in obtaining the hierarchical structures the head/tail division rule performs well in obtaining the hierarchical structures
of geographic objects. In a sense, the scaling property reveals the hierarchical of geographic objects. In a sense, the scaling property reveals the hierarchical
structures of geographic objects. According to the above analysis and findings, structures of geographic objects. According to the above analysis and findings,
several urban studies are performed as follows: (1) generate new axial lines several urban studies are performed as follows: (1) generate new axial lines
based on natural streets for a better understanding of urban morphologies; (2) based on natural streets for a better understanding of urban morphologies; (2)
compute the fewest-turn and shortest map direction; (3) identify urban sprawl compute the fewest-turn and shortest map direction; (3) identify urban sprawl
patches based on the statistics of blocks and natural cities; (4) categorize spatio- patches based on the statistics of blocks and natural cities; (4) categorize spatio-
temporal clusters of long stop points into hotspots and traffic jams; and (5) temporal clusters of long stop points into hotspots and traffic jams; and (5)
perform an across-country comparison of hierarchical spatial structures. perform an across-country comparison of hierarchical spatial structures.
The overall contribution of this thesis is first to propose the principle of scaling The overall contribution of this thesis is first to propose the principle of scaling
of geographic space as well as the head/tail division rule, which provide a new of geographic space as well as the head/tail division rule, which provide a new
and quantitative perspective to efficiently reduce the high degree of complexity and quantitative perspective to efficiently reduce the high degree of complexity
and effectively solve the issues in urban studies. Several successful applications and effectively solve the issues in urban studies. Several successful applications
prove that the scaling of geographic space and the head/tail division rule are prove that the scaling of geographic space and the head/tail division rule are
inspiring and can in fact be applied as a universal law, in particular, to urban inspiring and can in fact be applied as a universal law, in particular, to urban
studies and other fields. The data sets that were generated via an intensive geo- studies and other fields. The data sets that were generated via an intensive geo-
computation process are as large as hundreds of gigabytes and will be of great computation process are as large as hundreds of gigabytes and will be of great
value to further data mining studies. value to further data mining studies.
Keywords: geographic space, scaling, GIScience, VGI, OSM, heavy-tailed Keywords: geographic space, scaling, GIScience, VGI, OSM, heavy-tailed
distribution, the head/tail division rule, space syntax, nature street, urban sprawl, distribution, the head/tail division rule, space syntax, nature street, urban sprawl,
floating car data, hierarchical spatial structure. floating car data, hierarchical spatial structure.
iv iv
List of papers List of papers
I: Liu X. and Jiang B. (2011), Defining and generating axial lines from street I: Liu X. and Jiang B. (2011), Defining and generating axial lines from street
center lines for better understanding of urban morphologies, International center lines for better understanding of urban morphologies, International
Journal of Geographical Information Science, Accepted. Journal of Geographical Information Science, Accepted.
II: Liu X. and Jiang B. (2011), A novel approach to the identification of urban II: Liu X. and Jiang B. (2011), A novel approach to the identification of urban
sprawl patches based on the scaling of geographic space, International Journal sprawl patches based on the scaling of geographic space, International Journal
of Geomatics and Geosciences, 2(2), 415-429. of Geomatics and Geosciences, 2(2), 415-429.
III: Liu X. and Ban Y. (2012), Uncovering urban mobility patterns with massive III: Liu X. and Ban Y. (2012), Uncovering urban mobility patterns with massive
floating car data, Submitted to ISPRS Journal of Photogrammetry and Remote floating car data, Submitted to ISPRS Journal of Photogrammetry and Remote
Sensing. Sensing.
IV: Liu X., Jiang B. and Ban Y. (2012), An across-country comparison of IV: Liu X., Jiang B. and Ban Y. (2012), An across-country comparison of
hierarchical spatial structures of cities, Submitted to Computers, Environment hierarchical spatial structures of cities, Submitted to Computers, Environment
and Urban Systems. and Urban Systems.
V: Jiang B. and Liu X. (2011), Computing the fewest-turn map directions based V: Jiang B. and Liu X. (2011), Computing the fewest-turn map directions based
on the connectivity of natural roads, International Journal of Geographical on the connectivity of natural roads, International Journal of Geographical
Information Science, 25(7), 1069-1082 (with equivalent contribution). Information Science, 25(7), 1069-1082 (with equivalent contribution).
VI: Jiang B. and Liu X. (2011), Scaling of geographic space from the VI: Jiang B. and Liu X. (2011), Scaling of geographic space from the
perspective of city and field blocks and using volunteered geographic perspective of city and field blocks and using volunteered geographic
information, International Journal of Geographical Information Science, information, International Journal of Geographical Information Science,
Accepted. Accepted.
VII: Jiang B. and Liu X. (2010), Automatic generation of the axial lines of VII: Jiang B. and Liu X. (2010), Automatic generation of the axial lines of
urban environments to capture what we perceive, International Journal of urban environments to capture what we perceive, International Journal of
Geographical Information Science, 24(4), 545-558. Geographical Information Science, 24(4), 545-558.
v v
Table of Contents Table of Contents
Abstract ............................................................................................................... iii Abstract ............................................................................................................... iii

List of papers ....................................................................................................... v List of papers ....................................................................................................... v
Table of Contents................................................................................................ vi Table of Contents................................................................................................ vi
List of abbreviations.........................................................................................viii List of abbreviations.........................................................................................viii
List of figures ...................................................................................................... ix List of figures ...................................................................................................... ix
List of tables ........................................................................................................ xi List of tables ........................................................................................................ xi
Acknowledgements ............................................................................................ xii Acknowledgements ............................................................................................ xii
1. Introduction ................................................................................................... 1 1. Introduction ................................................................................................... 1

1.1. Background ............................................................................................... 1 1.1. Background ............................................................................................... 1
1.1.1. Geographic information science ............................................................ 2 1.1.1. Geographic information science ............................................................ 2
1.1.2. Geographic space and its representations .............................................. 4 1.1.2. Geographic space and its representations .............................................. 4
1.1.3. The scale in geography .......................................................................... 6 1.1.3. The scale in geography .......................................................................... 6
1.2. Research objectives .................................................................................. 6 1.2. Research objectives .................................................................................. 6
1.3. Thesis outline ............................................................................................ 7 1.3. Thesis outline ............................................................................................ 7
2. Literature review ......................................................................................... 11 2. Literature review ......................................................................................... 11
2.1. The principle of scaling of geographic space ......................................... 11 2.1. The principle of scaling of geographic space ......................................... 11
2.1.1. Preliminaries ........................................................................................ 11 2.1.1. Preliminaries ........................................................................................ 11
2.1.2. VGI and OSM...................................................................................... 17 2.1.2. VGI and OSM...................................................................................... 17
2.1.3. Understanding VGI.............................................................................. 17 2.1.3. Understanding VGI.............................................................................. 17
2.1.4. OpenStreetMap project ........................................................................ 21 2.1.4. OpenStreetMap project ........................................................................ 21
2.1.5. Scaling of geographic space ................................................................ 26 2.1.5. Scaling of geographic space ................................................................ 26
2.2. Related urban theories and studies ......................................................... 31 2.2. Related urban theories and studies ......................................................... 31
2.3. Applications in urban studies ................................................................. 37 2.3. Applications in urban studies ................................................................. 37
3. Experimental design ................................................................................... 39 3. Experimental design ................................................................................... 39
3.1. Description of the study areas ................................................................ 39 3.1. Description of the study areas ................................................................ 39
3.2. Processing road networks ....................................................................... 41 3.2. Processing road networks ....................................................................... 41
3.2.1. Highway extraction.............................................................................. 41 3.2.1. Highway extraction.............................................................................. 41
vi vi
3.2.2. Topological correction ......................................................................... 44 3.2.2. Topological correction ......................................................................... 44
3.3. Floating car data ..................................................................................... 47 3.3. Floating car data ..................................................................................... 47
3.4. Implementations ..................................................................................... 48 3.4. Implementations ..................................................................................... 48
4. Methodology ................................................................................................ 53 4. Methodology ................................................................................................ 53
4.1. Overall structure ..................................................................................... 53 4.1. Overall structure ..................................................................................... 53
4.2. Heavy-tailed distributions....................................................................... 54 4.2. Heavy-tailed distributions....................................................................... 54
4.2.1. Concept and definitions ....................................................................... 54 4.2.1. Concept and definitions ....................................................................... 54
4.2.2. Mathematical detection ....................................................................... 56 4.2.2. Mathematical detection ....................................................................... 56
4.3. Scaling, hierarchies and head/tail division rule ...................................... 58 4.3. Scaling, hierarchies and head/tail division rule ...................................... 58
4.3.1. The head/tail division rule ................................................................... 58 4.3.1. The head/tail division rule ................................................................... 58
4.3.2. Hierarchical structures and geographic implications .......................... 60 4.3.2. Hierarchical structures and geographic implications .......................... 60
5. Results and discussion ................................................................................ 63 5. Results and discussion ................................................................................ 63
5.1. Overview................................................................................................. 63 5.1. Overview................................................................................................. 63
5.2. Paper VII: Scaling at city level from axial line perspective ................... 64 5.2. Paper VII: Scaling at city level from axial line perspective ................... 64
5.3. Paper VI: Scaling at country level from block perspective .................... 66 5.3. Paper VI: Scaling at country level from block perspective .................... 66
5.4. Paper V: Computing the fewest-turn map directions ............................. 67 5.4. Paper V: Computing the fewest-turn map directions ............................. 67
5.5. Paper I: Defining and auto-generating axial lines .................................. 70 5.5. Paper I: Defining and auto-generating axial lines .................................. 70
5.6. Paper II: Identification of urban sprawl patches..................................... 73 5.6. Paper II: Identification of urban sprawl patches..................................... 73
5.7. Paper III: Uncovering urban mobility patterns ....................................... 74 5.7. Paper III: Uncovering urban mobility patterns ....................................... 74
5.8. Paper IV: Comparison of hierarchical spatial structures ........................ 76 5.8. Paper IV: Comparison of hierarchical spatial structures ........................ 76
6. Conclusions and future research ............................................................... 79 6. Conclusions and future research ............................................................... 79
6.1. Conclusions............................................................................................. 79 6.1. Conclusions............................................................................................. 79
6.2. Future research........................................................................................ 81 6.2. Future research........................................................................................ 81
References .......................................................................................................... 83 References .......................................................................................................... 83
vii vii
List of abbreviations List of abbreviations
AJAX - Asynchronous JavaScript and XML AJAX - Asynchronous JavaScript and XML
API - Application Programming Interface API - Application Programming Interface
BFS - Breadth-first Search BFS - Breadth-first Search
CBD - Central Business District CBD - Central Business District
CPT - Central Place Theory CPT - Central Place Theory
DFS - Depth-first Search DFS - Depth-first Search
ERM - Entity Relationship Model ERM - Entity Relationship Model
FCD - Floating Car Data FCD - Floating Car Data
GDP - Gross Domestic Product GDP - Gross Domestic Product
GPS - Global Position System GPS - Global Position System
LOD - Level of Detail LOD - Level of Detail
OGC - Open Geospatial Consortium OGC - Open Geospatial Consortium
OSM - OpenStreetMap OSM - OpenStreetMap
SOAP - Simple Object Access Protocol SOAP - Simple Object Access Protocol
TIGER - Topologically Integrated Geographic Encoding and Referencing TIGER - Topologically Integrated Geographic Encoding and Referencing
XML - Extensible Markup Language XML - Extensible Markup Language
VGI - Volunteered Geographic Information VGI - Volunteered Geographic Information
W3C - World Wide Web Consortium W3C - World Wide Web Consortium
WMS - Web Map Service WMS - Web Map Service
WSDL - Web Service Definition Language WSDL - Web Service Definition Language
viii viii
List of figures List of figures
Figure 1: three basic geographic representations .................................................. 4 Figure 1: three basic geographic representations .................................................. 4
Figure 2: Illustration of simple graph and multi graph ....................................... 12 Figure 2: Illustration of simple graph and multi graph ....................................... 12
Figure 3: Concepts of axial lines and converted graph of Gassin town.............. 13 Figure 3: Concepts of axial lines and converted graph of Gassin town.............. 13
Figure 4: Examples of Gestalt principle of continuity ........................................ 14 Figure 4: Examples of Gestalt principle of continuity ........................................ 14
Figure 5: Deflection angles between road segments ........................................... 15 Figure 5: Deflection angles between road segments ........................................... 15
Figure 6: graph processed by BFS algorithm to structured graph ...................... 16 Figure 6: graph processed by BFS algorithm to structured graph ...................... 16
Figure 7: Web 2.0 conceptual model for VGI..................................................... 20 Figure 7: Web 2.0 conceptual model for VGI..................................................... 20
Figure 8: Main web site of OSM......................................................................... 22 Figure 8: Main web site of OSM......................................................................... 22
Figure 9: XML examples of three OSM data primitives .................................... 23 Figure 9: XML examples of three OSM data primitives .................................... 23
Figure 10: Hierarchical structure of node, way and relation............................... 24 Figure 10: Hierarchical structure of node, way and relation............................... 24
Figure 11: Illustration of Koch curve .................................................................. 27 Figure 11: Illustration of Koch curve .................................................................. 27
Figure 12: Two basic concepts threshold and range in central place model ...... 32 Figure 12: Two basic concepts threshold and range in central place model ...... 32
Figure 13: Hierarchical structure of center places .............................................. 32 Figure 13: Hierarchical structure of center places .............................................. 32
Figure 14: Concentric zone model (Source: Wikipedia). ................................... 33 Figure 14: Concentric zone model (Source: Wikipedia). ................................... 33
Figure 15: A basic version of the Sector model (Source: Wikipedia). ............... 34 Figure 15: A basic version of the Sector model (Source: Wikipedia). ............... 34
Figure 16: Multiple nuclei model (Source: Wikipedia). ..................................... 35 Figure 16: Multiple nuclei model (Source: Wikipedia). ..................................... 35
Figure 17: The urban realms model, which includes a central downtown ......... 36 Figure 17: The urban realms model, which includes a central downtown ......... 36
Figure 18: The selected study area where the OSM data are used ..................... 39 Figure 18: The selected study area where the OSM data are used ..................... 39
Figure 19: Study area of Wuhan city, China where FCD data is applied. .......... 40 Figure 19: Study area of Wuhan city, China where FCD data is applied. .......... 40
Figure 20: Pseudo codes and example of extracted highway ............................. 43 Figure 20: Pseudo codes and example of extracted highway ............................. 43
Figure 21: Clip overlapped highway. .................................................................. 45 Figure 21: Clip overlapped highway. .................................................................. 45
Figure 22: Highway intersection. ........................................................................ 46 Figure 22: Highway intersection. ........................................................................ 46
Figure 23: Data model representation of FCD .................................................... 47 Figure 23: Data model representation of FCD .................................................... 47
Figure 24: Flow chart to extract highway from OSM data ................................. 48 Figure 24: Flow chart to extract highway from OSM data ................................. 48
Figure 25: Flow chart of topological processing highway to road segments ..... 49 Figure 25: Flow chart of topological processing highway to road segments ..... 49
Figure 26: Logical data model for processing data. ............................................ 50 Figure 26: Logical data model for processing data. ............................................ 50
Figure 27: Generated physical data model for OSM data................................... 51 Figure 27: Generated physical data model for OSM data................................... 51
ix ix
Figure 28: Generated physical data model for floating car data ......................... 52 Figure 28: Generated physical data model for floating car data ......................... 52
Figure 29: Schematic overall structure for this thesis ......................................... 53 Figure 29: Schematic overall structure for this thesis ......................................... 53
Figure 30: Example of normal distribution and heavy-tailed distribution......... 55 Figure 30: Example of normal distribution and heavy-tailed distribution......... 55
Figure 31: Flow chart of the detection of heavy-tailed distributions .................. 57 Figure 31: Flow chart of the detection of heavy-tailed distributions .................. 57
Figure 32: The rank-size plot of block areas in Texas. ....................................... 60 Figure 32: The rank-size plot of block areas in Texas. ....................................... 60
Figure 33: The flow chart of generation of axial lines from algorithmic view .. 65 Figure 33: The flow chart of generation of axial lines from algorithmic view .. 65
Figure 34: Comparison between urban and rural blocks .................................... 66 Figure 34: Comparison between urban and rural blocks .................................... 66
Figure 35: Small spaces perceived by people along the route ............................ 68 Figure 35: Small spaces perceived by people along the route ............................ 68
Figure 36: Shortest and the fewest-turn routes on grid road network................. 70 Figure 36: Shortest and the fewest-turn routes on grid road network................. 70
Figure 37: Illustration of generating new axial lines based on natural street ..... 71 Figure 37: Illustration of generating new axial lines based on natural street ..... 71
Figure 38: Conceptual model of the three-step method ...................................... 72 Figure 38: Conceptual model of the three-step method ...................................... 72
Figure 39: The basic view of the method by order ............................................. 74 Figure 39: The basic view of the method by order ............................................. 74
Figure 40: Comparison between traffic jam and hotspot .................................... 75 Figure 40: Comparison between traffic jam and hotspot .................................... 75
Figure 41: Step-by-step data processing solution ............................................... 76 Figure 41: Step-by-step data processing solution ............................................... 76
x x
List of tables List of tables
Table 1: Types of highways and the count.......................................................... 42 Table 1: Types of highways and the count.......................................................... 42
Table 2: Four levels of natural cites in Germany according to the city size....... 59 Table 2: Four levels of natural cites in Germany according to the city size....... 59
Table 3: The percentages in the head and tail of all blocks for Texas ................ 59 Table 3: The percentages in the head and tail of all blocks for Texas ................ 59
xi xi
Acknowledgements Acknowledgements
This doctoral dissertation is the end journey of my PhD study. I still remember This doctoral dissertation is the end journey of my PhD study. I still remember
the first time when I contacted my primary supervisor Professor Bin Jiang. It the first time when I contacted my primary supervisor Professor Bin Jiang. It
was four years ago, but it just like happened only yesterday. Looking back at the was four years ago, but it just like happened only yesterday. Looking back at the
past years of study, I want to express thanks to everybody who has been on my past years of study, I want to express thanks to everybody who has been on my
way, or supporting me at all stages. way, or supporting me at all stages.
I want to first thank my primary supervisor Professor Bin Jiang. He accepted I want to first thank my primary supervisor Professor Bin Jiang. He accepted
and registered me as a PhD candidate in Royal Institute of Technology when I and registered me as a PhD candidate in Royal Institute of Technology when I
worked up the courage to continue PhD study after years of work in China. The worked up the courage to continue PhD study after years of work in China. The
following experiences of studying under him give me a valuable chance to be following experiences of studying under him give me a valuable chance to be
exposed to the cutting edge of geographic information system and science. His exposed to the cutting edge of geographic information system and science. His
enthusiasm and concentration set me a good example of real spirit of research. enthusiasm and concentration set me a good example of real spirit of research.
I would like to thank my assistant supervisor, Professor Yifang Ban for her I would like to thank my assistant supervisor, Professor Yifang Ban for her
guidance, support and encouragement during the PhD study. I am grateful to her guidance, support and encouragement during the PhD study. I am grateful to her
important comments and suggestions on the thesis. I also wish to thank important comments and suggestions on the thesis. I also wish to thank
Associate Professor Hans Hauska for his comments and suggestions on the Associate Professor Hans Hauska for his comments and suggestions on the
thesis. In addition, I appreciate the financial support from Hägerstrand project thesis. In addition, I appreciate the financial support from Hägerstrand project
entitled “GIS-based mobility information for sustainable urban planning and entitled “GIS-based mobility information for sustainable urban planning and
design” and Lundbergs scholarship. design” and Lundbergs scholarship.
I own big thanks to the colleagues in University of Gävle and Royal Institute of I own big thanks to the colleagues in University of Gävle and Royal Institute of
Technology who helped me during my study in Sweden. Specially, I would like Technology who helped me during my study in Sweden. Specially, I would like
to thank my colleagues Tao Jia and Yingying Duan for the time we had together to thank my colleagues Tao Jia and Yingying Duan for the time we had together
and the discussions on this research work. and the discussions on this research work.
Lastly, I want to dedicate this dissertation to my parents and share it with my Lastly, I want to dedicate this dissertation to my parents and share it with my
wife Hui Wang. They hold different meanings in my life. If I did not meet my wife Hui Wang. They hold different meanings in my life. If I did not meet my
wife, I would be a very different person today. Special thanks go to her for wife, I would be a very different person today. Special thanks go to her for
taking care of me all the time. taking care of me all the time.
Xintao Liu Xintao Liu

Gävle, December 2011 Gävle, December 2011
xii xii
1. Introduction 1. Introduction
1.1. Background 1.1. Background
Geographic space refers to the continuous and large-scale space that covers the Geographic space refers to the continuous and large-scale space that covers the
earth surface and in which human activities occur. Its spatial arrangements as earth surface and in which human activities occur. Its spatial arrangements as
well as its configuration inevitably affect human activities, people’s lives and well as its configuration inevitably affect human activities, people’s lives and
urban environments. Today, people and space have become more closely related urban environments. Today, people and space have become more closely related
to each other because human motilities are increased more than ever before. to each other because human motilities are increased more than ever before.
Meanwhile, it becomes feasible and, in fact, easier to collect large-scale mobility Meanwhile, it becomes feasible and, in fact, easier to collect large-scale mobility
data due to amazing advances in information technology, particularly location- data due to amazing advances in information technology, particularly location-
aware devices such as GPS and cell phones. Consequently, research with the aware devices such as GPS and cell phones. Consequently, research with the
aim of understanding of geographic space and its regularities has become a very aim of understanding of geographic space and its regularities has become a very
hot issue and is essential for the applications in various fields, especially urban hot issue and is essential for the applications in various fields, especially urban
studies. studies.
With the evolution from geographic information system (GIS) to geographic With the evolution from geographic information system (GIS) to geographic
information science (GIScience), new data sets, theories and technologies have information science (GIScience), new data sets, theories and technologies have
been developed and dedicated to the study of geographic space, directly and been developed and dedicated to the study of geographic space, directly and
indirectly. The study of geographic space has drawn great attention from both indirectly. The study of geographic space has drawn great attention from both
the research and industrial communities. For instance, space syntax (Hillier 1996 the research and industrial communities. For instance, space syntax (Hillier 1996
and 1997) decomposes a continuous and large-scale geographic space into and 1997) decomposes a continuous and large-scale geographic space into
connected axial lines covering an entire space, and then the parameter metrics connected axial lines covering an entire space, and then the parameter metrics
are calculated based on graph theory to evaluate differences between small parts are calculated based on graph theory to evaluate differences between small parts
of geographic space that are represented by axial lines. Applying space syntax as of geographic space that are represented by axial lines. Applying space syntax as
a tool, the Space Syntax Limited Company in London provides strategic and a tool, the Space Syntax Limited Company in London provides strategic and
evidence-based consulting services in economics, planning, design, transport evidence-based consulting services in economics, planning, design, transport
and property development. and property development.
Nevertheless, as a complex system, geographic space is born with a high degree Nevertheless, as a complex system, geographic space is born with a high degree
of complexity. Due to the limitations of the collection of geographic data, of complexity. Due to the limitations of the collection of geographic data,
technologies and theories, the study of geographic space is heavily restricted to technologies and theories, the study of geographic space is heavily restricted to
some extent. For instance, from the technical and theoretical perspectives, some extent. For instance, from the technical and theoretical perspectives,
geographic representation is the most important issue in the study of geographic geographic representation is the most important issue in the study of geographic
space. It represents or partitions large geographic space into small pieces, such space. It represents or partitions large geographic space into small pieces, such
as vector and raster data, in conventional spatial analysis. However, it is hard to as vector and raster data, in conventional spatial analysis. However, it is hard to
only rely on conventional geographic representations to explore the diverse the only rely on conventional geographic representations to explore the diverse the
diverse geographic phenomena with the major advances in GIScience and the diverse geographic phenomena with the major advances in GIScience and the
data explosion. Moreover, the scaling properties of geographic representations data explosion. Moreover, the scaling properties of geographic representations
are now being re-examined and casted in new light from other fields, such as are now being re-examined and casted in new light from other fields, such as
statistics and mathematics. statistics and mathematics.
1 1
This nest of this section first presents a brief review of GIScience as the context This nest of this section first presents a brief review of GIScience as the context
of this thesis and then describes the concept of geographic space and the state- of this thesis and then describes the concept of geographic space and the state-
of-the-art on geographic representations. Then this section shifts to the scale as of-the-art on geographic representations. Then this section shifts to the scale as
well as the scaling property in geography. well as the scaling property in geography.
1.1.1. Geographic information science 1.1.1. Geographic information science
To some extent, geographic information science (GIScience) is evolved from the To some extent, geographic information science (GIScience) is evolved from the
conventional geographic information system (GIS), which is a computer-based conventional geographic information system (GIS), which is a computer-based
system designed to capture, store and analyze spatially referenced data. To system designed to capture, store and analyze spatially referenced data. To
provide a better understanding of GIScience as the context of this thesis, this provide a better understanding of GIScience as the context of this thesis, this
section first provides a brief review of the definition as well as the evolution of section first provides a brief review of the definition as well as the evolution of
conventional GIS to GIScience and then describes how geographic conventional GIS to GIScience and then describes how geographic
representation and space are important to GIScience and why they have been representation and space are important to GIScience and why they have been
selected as the topics of this thesis. selected as the topics of this thesis.
The first fully operational GIS (Canada Geographic Information System, CGIS) The first fully operational GIS (Canada Geographic Information System, CGIS)
was conceived and developed in 1960s by Roger Tomlinson who also coined the was conceived and developed in 1960s by Roger Tomlinson who also coined the
term GIS. Definitions of GIS have been offered by many people during its term GIS. Definitions of GIS have been offered by many people during its
development. For example, Tomlin (1990) defined GIS as “a configuration of development. For example, Tomlin (1990) defined GIS as “a configuration of
computer hardware and software specifically designed for the acquisition, computer hardware and software specifically designed for the acquisition,
maintenance, and use of cartographic data”. The definition given by Star and maintenance, and use of cartographic data”. The definition given by Star and
Estes (1990) was “a GIS is both a database system with specific capabilities for Estes (1990) was “a GIS is both a database system with specific capabilities for
spatially referenced data, as well [as] a set of operations for working with data”. spatially referenced data, as well [as] a set of operations for working with data”.
Environmental systems research institute (ESRI, 1990) defined GIS as “an Environmental systems research institute (ESRI, 1990) defined GIS as “an
organized collection of computer hardware, software, geographic data, and organized collection of computer hardware, software, geographic data, and
personnel designed to efficiently capture, store, update, manipulate, analyze, and personnel designed to efficiently capture, store, update, manipulate, analyze, and
display all forms of geographically referenced information”. Practitioners also display all forms of geographically referenced information”. Practitioners also
believe that people who operate GIS and the data should also be regarded as part believe that people who operate GIS and the data should also be regarded as part
of GIS. From these definitions, we can see that spatial references and data of GIS. From these definitions, we can see that spatial references and data
analysis are emphasized as the essential features of GIS. Additionally, it can be analysis are emphasized as the essential features of GIS. Additionally, it can be
noted that the definitions of GIS place much emphasis on computer hardware noted that the definitions of GIS place much emphasis on computer hardware
and software. This is because the computing, storing and visual displaying and software. This is because the computing, storing and visual displaying
capabilities of computers were quite expensive and limited at the earlier capabilities of computers were quite expensive and limited at the earlier
development stage of GIS. development stage of GIS.
With the great advances in computer science and other information technology With the great advances in computer science and other information technology
(e.g., the Internet) in recent years, conventional GIS has also experienced rapid (e.g., the Internet) in recent years, conventional GIS has also experienced rapid
growth from a desktop expert system (large or small but complete) to various growth from a desktop expert system (large or small but complete) to various
GIS-based applications, such as web mapping and online navigation services. GIS-based applications, such as web mapping and online navigation services.
Based on the concept of cloud computing technology, the functions of GIS Based on the concept of cloud computing technology, the functions of GIS
become more distributed and can be accessed in any computer terminal without become more distributed and can be accessed in any computer terminal without
2 2
installing GIS software. GIS has become ubiquitous and increasingly popular. installing GIS software. GIS has become ubiquitous and increasingly popular.
Despite this, GIS is still criticized for being technology driven, i.e., “a Despite this, GIS is still criticized for being technology driven, i.e., “a
technology in search of applications” (Goodchild 1992). GIS itself is focused on technology in search of applications” (Goodchild 1992). GIS itself is focused on
functionalities for spatial analysis and decision making. In a sense, GIS is only a functionalities for spatial analysis and decision making. In a sense, GIS is only a
spatial decision-making support system and a tool for science. Some people spatial decision-making support system and a tool for science. Some people
even believe that GIS seems to be drowning in a sea of technology and that it even believe that GIS seems to be drowning in a sea of technology and that it
will finally become a common technique. However, a paradigm shift from will finally become a common technique. However, a paradigm shift from
conventional GIS to geographic information science (GIScience) has conventional GIS to geographic information science (GIScience) has
fundamentally changed the face of GIS in the last two decades. fundamentally changed the face of GIS in the last two decades.
GIScience is the science behind the GIS software technology, i.e., the science of GIScience is the science behind the GIS software technology, i.e., the science of
GIS. GIScience examines and casts new light on many of the most fundamental GIS. GIScience examines and casts new light on many of the most fundamental
themes of conventional GIS, while incorporating new advances in and themes of conventional GIS, while incorporating new advances in and
overlapping with other fields, such as cognitive and information science, overlapping with other fields, such as cognitive and information science,
computer science, statistics, mathematics and others (Mark 2000). Nevertheless, computer science, statistics, mathematics and others (Mark 2000). Nevertheless,
some fundamental laws of geography still remain applicable. For example, some fundamental laws of geography still remain applicable. For example,
Miller (2004) noted that the first law of geography made by Tobler (1970), i.e., Miller (2004) noted that the first law of geography made by Tobler (1970), i.e.,
“everything is related to everything else, but near things are more related than “everything is related to everything else, but near things are more related than
distant things”, is useful to guide geographic research in the future. The distant things”, is useful to guide geographic research in the future. The
development of GIScience can be dated back to the 1980s. The first step was to development of GIScience can be dated back to the 1980s. The first step was to
seek funding devoted to GIS from the US National Science Foundation (NSF) seek funding devoted to GIS from the US National Science Foundation (NSF)
(Mark 2003). After that, many organizations were formed, such as the National (Mark 2003). After that, many organizations were formed, such as the National
Center for Geographic Information and Analysis (NCGIA, founded in 1988) and Center for Geographic Information and Analysis (NCGIA, founded in 1988) and
the University Consortium for Geographic Information Science (UCGIS, the University Consortium for Geographic Information Science (UCGIS,
founded in 1995). Many reports have been dedicated to describing GIScience founded in 1995). Many reports have been dedicated to describing GIScience
(Goodchild 1992, Goodchild 2004, Longley et al. 2001, Mark 2000 and 2003). (Goodchild 1992, Goodchild 2004, Longley et al. 2001, Mark 2000 and 2003).
Although a clear definition of GIScience is not given by these reports or Although a clear definition of GIScience is not given by these reports or
organizations, the components of GIScience are described. Mark (2003) organizations, the components of GIScience are described. Mark (2003)
reviewed these components (topics) and summarized the ones that characterize reviewed these components (topics) and summarized the ones that characterize
the nature of GIScience. Ontology and representation of geographic phenomena the nature of GIScience. Ontology and representation of geographic phenomena
are the most important among the topics. are the most important among the topics.
Meanwhile, on the research agenda of the university consortium for geographic Meanwhile, on the research agenda of the university consortium for geographic
information science (UCGIS 2004), the scale in geography and geographic information science (UCGIS 2004), the scale in geography and geographic
representation are two important topics. Geographic representation is the representation are two important topics. Geographic representation is the
description of the phenomena of real world and their relationships from a description of the phenomena of real world and their relationships from a
philosophical point of view. It is vital to modeling our world and is a key to the philosophical point of view. It is vital to modeling our world and is a key to the
discipline of GIScience. The scale of geography remains “a broad and difficult discipline of GIScience. The scale of geography remains “a broad and difficult
topic to tackle” (UCGIS 2004). topic to tackle” (UCGIS 2004).
3 3
1.1.2. Geographic space and its representations 1.1.2. Geographic space and its representations
Geographic space is one of the most important geographic phenomena, both in Geographic space is one of the most important geographic phenomena, both in
conventional GIS and GIScience (e.g., Wrighta and Wang 2011) and, thus, is conventional GIS and GIScience (e.g., Wrighta and Wang 2011) and, thus, is
selected as the research topic of this thesis. As mentioned previously, selected as the research topic of this thesis. As mentioned previously,
geographic space is the large-scale and continuous space that covers the earth geographic space is the large-scale and continuous space that covers the earth
and in which human activities are involved. It is too large to be perceived and and in which human activities are involved. It is too large to be perceived and
studied from a single viewpoint; rather, it must be learned via symbolic studied from a single viewpoint; rather, it must be learned via symbolic
representations (Montello 1993). representations (Montello 1993).
As mentioned above, to study the large-scale geographic space, we must As mentioned above, to study the large-scale geographic space, we must
represent or partition it into small pieces, which constitute what we call represent or partition it into small pieces, which constitute what we call
geographic representations, such as vector and raster data in conventional spatial geographic representations, such as vector and raster data in conventional spatial
analysis. In a sense, geographic representation is a type of ontology, through analysis. In a sense, geographic representation is a type of ontology, through
which people specify and conceptualize the real world. It is like a bridge which people specify and conceptualize the real world. It is like a bridge
between the real world and our research models. The representations possess between the real world and our research models. The representations possess
rich partial knowledge of space (Kuipers 1978). How we represent the real rich partial knowledge of space (Kuipers 1978). How we represent the real
world fundamentally affects the way we perform the corresponding world fundamentally affects the way we perform the corresponding
interpretation and analysis. In conventional GIS, there are three basic geographic interpretation and analysis. In conventional GIS, there are three basic geographic
representations: “as a collection of discrete features in vector format, as a grid of representations: “as a collection of discrete features in vector format, as a grid of
cells with spectral or attribute data, or as a set of triangulated points modeling a cells with spectral or attribute data, or as a set of triangulated points modeling a
surface” (Zeiler 2000). This set of representations is adopted by most surface” (Zeiler 2000). This set of representations is adopted by most
mainstream GIS software, including the GIS giant ESRI (environmental systems mainstream GIS software, including the GIS giant ESRI (environmental systems
research institute, Figure 1). research institute, Figure 1).
(a) (b) (c) (a) (b) (c)
Figure 1: three basic geographic representations (a) vector data (b) raster data Figure 1: three basic geographic representations (a) vector data (b) raster data
and (c) triangulated data (Source: Zeiler 2000, pp. 66) and (c) triangulated data (Source: Zeiler 2000, pp. 66)
However, current representations are somewhat rooted in a two-dimensional and However, current representations are somewhat rooted in a two-dimensional and
geometrically oriented map paradigm and, thus, inherit limitations to geometrically oriented map paradigm and, thus, inherit limitations to
representing some geographic phenomena, such as volumetric and temporal representing some geographic phenomena, such as volumetric and temporal
objects, heterogeneous types of data from an integrated global perspective and at objects, heterogeneous types of data from an integrated global perspective and at
multiple scales, dynamic geographic processes, and others (Yuan et al. 2004). multiple scales, dynamic geographic processes, and others (Yuan et al. 2004).
To some extent, conventional spatial analysis simplifies geographic To some extent, conventional spatial analysis simplifies geographic
representations, whereas the major advances in GIScience and other fields representations, whereas the major advances in GIScience and other fields
suggest that strongly re-examining geographic representations is worthwhile suggest that strongly re-examining geographic representations is worthwhile
4 4
(Miller 2000). In turn, an increasing number of efforts have been made to extend (Miller 2000). In turn, an increasing number of efforts have been made to extend
geographic representations by many researchers (e.g., Reitz 2010) from different geographic representations by many researchers (e.g., Reitz 2010) from different
perspectives to provide new representational theories. These efforts contributed perspectives to provide new representational theories. These efforts contributed
significantly to the extension of geographic representations. For instance, the significantly to the extension of geographic representations. For instance, the
concept of time-geography developed by Hägerstrand (1970) and the research concept of time-geography developed by Hägerstrand (1970) and the research
on geospatial lifelines (e.g., Mark and Egenhofer 1998) are very successful and on geospatial lifelines (e.g., Mark and Egenhofer 1998) are very successful and
inspiring. Nevertheless, with regard to these great demands, progress has been inspiring. Nevertheless, with regard to these great demands, progress has been
limited, and some proposed methods are not yet feasible. limited, and some proposed methods are not yet feasible.
In a sense, geographic representation is a process of selecting and generalizing In a sense, geographic representation is a process of selecting and generalizing
geographic information from our real world to some appropriate levels, which geographic information from our real world to some appropriate levels, which
also depend on the context of the study and application purposes. Technically also depend on the context of the study and application purposes. Technically
speaking, any of the geographic representations must be implemented as binary speaking, any of the geographic representations must be implemented as binary
bytes in a computer system, e.g., GIS software and some research prototypes. In bytes in a computer system, e.g., GIS software and some research prototypes. In
contrast, based on the same type of GIS techniques, different representations can contrast, based on the same type of GIS techniques, different representations can
be generated in terms of the geographic implications. For example, the adjacent be generated in terms of the geographic implications. For example, the adjacent
road segments in a road network with the smallest deflection angles can be road segments in a road network with the smallest deflection angles can be
connected to form a stroke or a natural street/road (Jiang et al. 2008). These connected to form a stroke or a natural street/road (Jiang et al. 2008). These
natural streets possess geographic meanings that are completely different from natural streets possess geographic meanings that are completely different from
the road segments. From this perspective, this thesis presents several types of the road segments. From this perspective, this thesis presents several types of
geographic representations under different contexts in terms of research and geographic representations under different contexts in terms of research and
urban applications. urban applications.
To obtain the representation of real geographic space, we must rely on available To obtain the representation of real geographic space, we must rely on available
geographic datasets. Due to significant advances in information technology, e.g., geographic datasets. Due to significant advances in information technology, e.g.,
the innovation of Internet, mobile technologies and global positioning system the innovation of Internet, mobile technologies and global positioning system
(GPS), the collection and dissemination of geographic data has undergone great (GPS), the collection and dissemination of geographic data has undergone great
changes. A brand new geographic data acquisition method has emerged that is changes. A brand new geographic data acquisition method has emerged that is
different from traditional surveying and remote-sensing (RS) methods. Many different from traditional surveying and remote-sensing (RS) methods. Many
ordinary individuals continue to create and update geographic information ordinary individuals continue to create and update geographic information
voluntarily using data from portable GPS devices, aerial photography and other voluntarily using data from portable GPS devices, aerial photography and other
free sources or simply from local knowledge with the support of Web 2.0 free sources or simply from local knowledge with the support of Web 2.0
technologies. This phenomenon has been identified as volunteered geographic technologies. This phenomenon has been identified as volunteered geographic
information (VGI) by Michael Goodchild (2007a), and this type of geospatial information (VGI) by Michael Goodchild (2007a), and this type of geospatial
data is originated with the features of free access, fast updates and rich data is originated with the features of free access, fast updates and rich
attributes. With the proliferation of VGI and other data sets, it is difficult to attributes. With the proliferation of VGI and other data sets, it is difficult to
analyze geographic phenomena only based on basic representations. Thus, to analyze geographic phenomena only based on basic representations. Thus, to
investigate the scaling of geographic space from the perspectives of geographic investigate the scaling of geographic space from the perspectives of geographic
representations and their applications to urban studies in the context of representations and their applications to urban studies in the context of
GIScience forms the primary motivation of this thesis. GIScience forms the primary motivation of this thesis.
5 5
1.1.3. The scale in geography 1.1.3. The scale in geography
Scale holds different meanings in different geographic contexts and involves Scale holds different meanings in different geographic contexts and involves
important geospatial issues. From the cartographic point of view, scale is the important geospatial issues. From the cartographic point of view, scale is the
ratio of a distance on a map to the corresponding distance on the Earth, i.e., a ratio of a distance on a map to the corresponding distance on the Earth, i.e., a
smaller-scale map covers a larger area and shows less details of that area, and smaller-scale map covers a larger area and shows less details of that area, and
vice versa. Montello (2001) noted that scale concerns space as well as temporal vice versa. Montello (2001) noted that scale concerns space as well as temporal
and thematic domains in geography. In this same study, Montello summerized and thematic domains in geography. In this same study, Montello summerized
two other types of scales other than cartographic scale: analysis scale and two other types of scales other than cartographic scale: analysis scale and
phenomenon scale: “Analysis scale refers to the size of the unit at which some phenomenon scale: “Analysis scale refers to the size of the unit at which some
problem is analyzed, such as at the county or state level. Phenomenon scale problem is analyzed, such as at the county or state level. Phenomenon scale
refers to the size at which human or physical earth structures or processes exist, refers to the size at which human or physical earth structures or processes exist,
regardless of how they are studied or represented”. These three scales are, in regardless of how they are studied or represented”. These three scales are, in
fact, interrelated to each other. fact, interrelated to each other.
This thesis focuses on the latter two types of scales in geography and, if This thesis focuses on the latter two types of scales in geography and, if
possible, does not differentiate between the types of scales because both possible, does not differentiate between the types of scales because both
phenomenon scale and analysis scale place emphasis on the size of the phenomenon scale and analysis scale place emphasis on the size of the
geographic entities at different levels and from different perspectives. As one of geographic entities at different levels and from different perspectives. As one of
the important topics on the agenda of UCGIS (2004), scale has drawn much the important topics on the agenda of UCGIS (2004), scale has drawn much
attention. In fact, in other fields (e.g., statistical physics and biology), the term attention. In fact, in other fields (e.g., statistical physics and biology), the term
scale also indicates the size, shape, extent and function of the corresponding scale also indicates the size, shape, extent and function of the corresponding
processes (originally in [Church and Mark 1980], see also [Mark 2003]). processes (originally in [Church and Mark 1980], see also [Mark 2003]).
Recently, studies on the size distribution of geographic entities (e.g., cities and Recently, studies on the size distribution of geographic entities (e.g., cities and
roads) show that the sizes follow heavy-tailed distributions and that some of the roads) show that the sizes follow heavy-tailed distributions and that some of the
entities possess scaling property. With the emergence of complex network entities possess scaling property. With the emergence of complex network
theory and the small world model (Barabási 1999, Watts and Strogatz 1998), theory and the small world model (Barabási 1999, Watts and Strogatz 1998),
several new methods have appeared (Clauset et al. 2009, Newman 2005), and several new methods have appeared (Clauset et al. 2009, Newman 2005), and
studies (e.g., Lämmer et al. 2006) on the size of geographic entities have made studies (e.g., Lämmer et al. 2006) on the size of geographic entities have made
profound contributions. However, these findings are not summarized and treated profound contributions. However, these findings are not summarized and treated
as a universal regularity in geographic space and not applied widely. as a universal regularity in geographic space and not applied widely.
The challenges facing studies of scale in geography include the verification and The challenges facing studies of scale in geography include the verification and
summary of scaling as a universal regularity and provide a quantitative way to summary of scaling as a universal regularity and provide a quantitative way to
apply the rules in urban studies. In this thesis, the principle of scaling of apply the rules in urban studies. In this thesis, the principle of scaling of
geographic space and the head/tail division rule are presented as a universal law. geographic space and the head/tail division rule are presented as a universal law.
Furthermore, different types of geographic representations are designed and Furthermore, different types of geographic representations are designed and
implemented in various urban studies. implemented in various urban studies.
1.2. Research objectives 1.2. Research objectives
The overall objective of this thesis is to examine the phenomenon of the scaling The overall objective of this thesis is to examine the phenomenon of the scaling
of geographic space and to propose and verify the principle of scaling of of geographic space and to propose and verify the principle of scaling of
6 6
geographic space at the local (city) and global (country) levels. Based on the geographic space at the local (city) and global (country) levels. Based on the
proposed principle, a quantitative method will be presented for use in proposed principle, a quantitative method will be presented for use in
application to urban studies. In addition, appropriate geographic representation application to urban studies. In addition, appropriate geographic representation
and entities should be selected and generated. Combining the scaling of and entities should be selected and generated. Combining the scaling of
geographic space and the head/tail division rule and geographic representation, geographic space and the head/tail division rule and geographic representation,
this thesis seeks applications in the context of real urban environments. this thesis seeks applications in the context of real urban environments.
To achieve the objectives, there are three basic research issues to be addressed: To achieve the objectives, there are three basic research issues to be addressed:
(1) How to select and generate appropriate geographic representations? (1) How to select and generate appropriate geographic representations?
(2) How to verify the principle of scaling of geographic space and provide a (2) How to verify the principle of scaling of geographic space and provide a
quantitative approach? quantitative approach?
(3) How to apply the principle and rule in urban studies and further assess (3) How to apply the principle and rule in urban studies and further assess
these applications? these applications?
According to the research issues, the corresponding research aims are threefold: According to the research issues, the corresponding research aims are threefold:
(1) Study the new theories, methods and related work of GIScience and then (1) Study the new theories, methods and related work of GIScience and then
analyze the characteristics of VGI (i.e., OSM) data and other data sets to analyze the characteristics of VGI (i.e., OSM) data and other data sets to
generate suitable geographic representations. generate suitable geographic representations.
(2) Detect the heavy-tailed distributions of geographic representations and (2) Detect the heavy-tailed distributions of geographic representations and
further obtain their hierarchical structures, based on the scaling principle further obtain their hierarchical structures, based on the scaling principle
and the head/tail division rule. and the head/tail division rule.
(3) Explore the geographic implications behind the hierarchical structures of (3) Explore the geographic implications behind the hierarchical structures of
the generated geographic representations and then apply them to urban the generated geographic representations and then apply them to urban
studies to solve practical urban problems, such as those related to urban studies to solve practical urban problems, such as those related to urban
morphology, urban sprawl and urban mobility patterns. morphology, urban sprawl and urban mobility patterns.
This study has been performed as part of the research project from the Swedish This study has been performed as part of the research project from the Swedish
Research Council FORMAS entitled “Hägerstrand: GIS-based mobility Research Council FORMAS entitled “Hägerstrand: GIS-based mobility
information for sustainable urban planning and design” (Jiang 2006). This information for sustainable urban planning and design” (Jiang 2006). This
research is believed to make contributions to provide a new approach to urban research is believed to make contributions to provide a new approach to urban
studies from a geospatial perspective, using data sets of hundreds gigabytes in a studies from a geospatial perspective, using data sets of hundreds gigabytes in a
data-intensive geo-computation process for a range of applications in other data-intensive geo-computation process for a range of applications in other
fields. fields.
1.3. Thesis outline 1.3. Thesis outline
The six chapters of this thesis are structured as follows. Chapter 1 briefly The six chapters of this thesis are structured as follows. Chapter 1 briefly
introduces the background, research objectives and outline of this thesis. introduces the background, research objectives and outline of this thesis.
Chapter 2 reviews important related work and theories, such as preliminaries, Chapter 2 reviews important related work and theories, such as preliminaries,
data sets and applications, which are helpful to understand the entire study. data sets and applications, which are helpful to understand the entire study.
Chapter 3 describes the study area and experiments, with particular emphasis on Chapter 3 describes the study area and experiments, with particular emphasis on
data processing, as the major data sets of this thesis. In particular, the data processing, as the major data sets of this thesis. In particular, the
implementation of processing road networks from OSM data is described in implementation of processing road networks from OSM data is described in
7 7
detail. Thereafter, Chapter 4 presents methodologies, such as those used to detail. Thereafter, Chapter 4 presents methodologies, such as those used to
identify the phenomenon of the scaling of geographic space and to reveal identify the phenomenon of the scaling of geographic space and to reveal
hierarchical structures indicating geographic implications. Chapter 5 hierarchical structures indicating geographic implications. Chapter 5
summarizes the implementation details and the main results of selected papers summarizes the implementation details and the main results of selected papers
and discusses their contributions to this thesis. Finally, Chapter 6 presents the and discusses their contributions to this thesis. Finally, Chapter 6 presents the
conclusions of this thesis and then describes the challenges that lie ahead and conclusions of this thesis and then describes the challenges that lie ahead and
opportunities for further study. opportunities for further study.
This thesis is based on the papers listed below, which are referred to in the text This thesis is based on the papers listed below, which are referred to in the text
by the corresponding Roman numerals: by the corresponding Roman numerals:
I: Liu X. and Jiang B. (2011), Defining and generating axial lines from street I: Liu X. and Jiang B. (2011), Defining and generating axial lines from street
center lines for better understanding of urban morphologies, International center lines for better understanding of urban morphologies, International
Journal of Geographical Information Science, Accepted. Journal of Geographical Information Science, Accepted.
II: Liu X. and Jiang B. (2011), A novel approach to the identification of urban II: Liu X. and Jiang B. (2011), A novel approach to the identification of urban
sprawl patches based on the scaling of geographic space, International Journal sprawl patches based on the scaling of geographic space, International Journal
of Geomatics and Geosciences, 2(2), 415-429. of Geomatics and Geosciences, 2(2), 415-429.
III: Liu X. and Ban Y. (2012), Uncovering urban mobility patterns with massive III: Liu X. and Ban Y. (2012), Uncovering urban mobility patterns with massive
floating car data, Submitted to ISPRS Journal of Photogrammetry and Remote floating car data, Submitted to ISPRS Journal of Photogrammetry and Remote
Sensing. Sensing.
IV: Liu X., Jiang B. and Ban Y. (2012), An across-country comparison of IV: Liu X., Jiang B. and Ban Y. (2012), An across-country comparison of
hierarchical spatial structures of cities, Submitted to Computers, Environment hierarchical spatial structures of cities, Submitted to Computers, Environment
and Urban Systems. and Urban Systems.
V: Jiang B. and Liu X. (2011), Computing the fewest-turn map directions based V: Jiang B. and Liu X. (2011), Computing the fewest-turn map directions based
on the connectivity of natural roads, International Journal of Geographical on the connectivity of natural roads, International Journal of Geographical
Information Science, 25(7), 1069-1082 (with equivalent contribution). Information Science, 25(7), 1069-1082 (with equivalent contribution).
VI: Jiang B. and Liu X. (2011), Scaling of geographic space from the VI: Jiang B. and Liu X. (2011), Scaling of geographic space from the
perspective of city and field blocks and using volunteered geographic perspective of city and field blocks and using volunteered geographic
information, International Journal of Geographical Information Science, information, International Journal of Geographical Information Science,
Accepted. Accepted.
VII: Jiang B. and Liu X. (2010), Automatic generation of the axial lines of VII: Jiang B. and Liu X. (2010), Automatic generation of the axial lines of
urban environments to capture what we perceive, International Journal of urban environments to capture what we perceive, International Journal of
Geographical Information Science, 24(4), 545-558. Geographical Information Science, 24(4), 545-558.
Specifically, paper VII uses automatically generated axial lines to represent the Specifically, paper VII uses automatically generated axial lines to represent the
geographic space and verifies that there are many more trivial axial lines than geographic space and verifies that there are many more trivial axial lines than
important ones through visualized pattern and histogram statistics. Paper VI important ones through visualized pattern and histogram statistics. Paper VI
8 8
characterizes the scaling property of geographic space with heavy-tailed characterizes the scaling property of geographic space with heavy-tailed
distributions in a quantitative manner, from the perspectives of two geographic distributions in a quantitative manner, from the perspectives of two geographic
representations: blocks and natural cities. Moreover, the head/tail division rule is representations: blocks and natural cities. Moreover, the head/tail division rule is
described. That is, papers VI and VII verify the principle of geographic space described. That is, papers VI and VII verify the principle of geographic space
and the head/tail division rule (Objective 2 and part of Objective 1), which also and the head/tail division rule (Objective 2 and part of Objective 1), which also
provides a theoretical framework for this thesis. provides a theoretical framework for this thesis.
Based on the scaling property of geographic space and the head/tail division rule Based on the scaling property of geographic space and the head/tail division rule
stated above, papers I, II, III, IV and V report on different experiments using stated above, papers I, II, III, IV and V report on different experiments using
various geographic representations, such as blocks, natural streets, routing turns various geographic representations, such as blocks, natural streets, routing turns
and spatio-temporal trajectories (Objectives 1 and 3). Paper V computes the and spatio-temporal trajectories (Objectives 1 and 3). Paper V computes the
fewest-turn and shortest route (map direction) based on the concept of natural fewest-turn and shortest route (map direction) based on the concept of natural
roads. In such a route, there are many more small spaces than numbers of road roads. In such a route, there are many more small spaces than numbers of road
segments and many more numbers of road segments than numbers of turns, segments and many more numbers of road segments than numbers of turns,
which indicates the scaling property and hierarchical structure of the geographic which indicates the scaling property and hierarchical structure of the geographic
space at the levels of perception and cognition. In turn, the cognitive burden of space at the levels of perception and cognition. In turn, the cognitive burden of
the route is reduced efficiently and effectively. Paper I redefines the axial lines the route is reduced efficiently and effectively. Paper I redefines the axial lines
as the fewest number of straight lines that are mutually intersected along as the fewest number of straight lines that are mutually intersected along
individual natural streets. According to the new definition, this paper develops a individual natural streets. According to the new definition, this paper develops a
model to generate the new axial lines, where the parameters are naturally model to generate the new axial lines, where the parameters are naturally
obtained based on the head/tail division rule as well as scaling of geographic obtained based on the head/tail division rule as well as scaling of geographic
space. The generated new axial lines provide a better understanding of the urban space. The generated new axial lines provide a better understanding of the urban
morphology in terms of the scaling pattern. Paper II develops a statistical morphology in terms of the scaling pattern. Paper II develops a statistical
approach to identify the urban sprawl patches based on the scaling properties of approach to identify the urban sprawl patches based on the scaling properties of
block sizes and morphologies. Similarly, the threshold of variables that group block sizes and morphologies. Similarly, the threshold of variables that group
the blocks into sprawling and non-sprawling ones are also calculated by the the blocks into sprawling and non-sprawling ones are also calculated by the
head/tail division rule. Paper III adopts a massive data set of GPS points head/tail division rule. Paper III adopts a massive data set of GPS points
collected from taxicabs as a geographic representation. These GPS points show collected from taxicabs as a geographic representation. These GPS points show
the scaling property and traffic jams, and hot spots are generated and the scaling property and traffic jams, and hot spots are generated and
differentiated using head/tail division rule. Paper IV defines the hierarchical differentiated using head/tail division rule. Paper IV defines the hierarchical
spatial structures from the perspective of blocks and natural cities and then spatial structures from the perspective of blocks and natural cities and then
describes an across-country comparison from both the geospatial and economic describes an across-country comparison from both the geospatial and economic
viewpoints. viewpoints.
The author of this doctoral dissertation is responsible for the writing of all first- The author of this doctoral dissertation is responsible for the writing of all first-
author papers under the supervision of the advisor. In addition, the author author papers under the supervision of the advisor. In addition, the author
contributed equally to one co-first author IJGIS paper in which his name is listed contributed equally to one co-first author IJGIS paper in which his name is listed
as the second author. He is responsible for all the coding and the implementation as the second author. He is responsible for all the coding and the implementation
of all listed papers. of all listed papers.
9 9
10 10
2. Literature review 2. Literature review
The study of geographic space has drawn much attention from both the research The study of geographic space has drawn much attention from both the research
and industrial communities, and the idea of the scaling property has been and industrial communities, and the idea of the scaling property has been
applied to various urban studies. With the emergence of GIScience and VGI applied to various urban studies. With the emergence of GIScience and VGI
data, there is an increasing demand for the scaling property and its applications. data, there is an increasing demand for the scaling property and its applications.
This chapter aims to provide a thorough review of the literature on the state-of- This chapter aims to provide a thorough review of the literature on the state-of-
the-art on the scaling of geographic space, applications to urban studies and the the-art on the scaling of geographic space, applications to urban studies and the
phenomena of VGI, including some related concepts. phenomena of VGI, including some related concepts.
2.1. The principle of scaling of geographic space 2.1. The principle of scaling of geographic space
2.1.1. Preliminaries 2.1.1. Preliminaries
To provide easier reading and better understanding of this thesis, this section To provide easier reading and better understanding of this thesis, this section
presents a brief introduction to some related important concepts, theories, presents a brief introduction to some related important concepts, theories,
algorithms and notations that are used but cannot be covered in later chapters of algorithms and notations that are used but cannot be covered in later chapters of
the listed papers due to space limitations. The part concerning the algorithms is the listed papers due to space limitations. The part concerning the algorithms is
especially important for those who are interested in the implementations of the especially important for those who are interested in the implementations of the
ideas in this thesis. ideas in this thesis.
Graph and related concepts Graph and related concepts
Graph is one of the most used concepts in this thesis, both in terms of axial lines Graph is one of the most used concepts in this thesis, both in terms of axial lines
in space syntax and other algorithms. Generally, a graph is defined to represent a in space syntax and other algorithms. Generally, a graph is defined to represent a
set of objects, some of which are connected by links. The objects denote set of objects, some of which are connected by links. The objects denote
vertices/nodes, and the connections between the vertices/nodes are called vertices/nodes, and the connections between the vertices/nodes are called
edges/links. The motivation to use a graph is that a graph is a convenient and edges/links. The motivation to use a graph is that a graph is a convenient and
intuitive way to represent the relationship between objects in the real world. intuitive way to represent the relationship between objects in the real world.
That is, a graph is topologically oriented in comparison with geometric method. That is, a graph is topologically oriented in comparison with geometric method.
In this way, a graph can be treated as a mathematical model, based on the In this way, a graph can be treated as a mathematical model, based on the
original problems and even further theoretical ones that can be appropriately original problems and even further theoretical ones that can be appropriately
solved and deducted. Graph theory is the study of graphs, and the first theorem solved and deducted. Graph theory is the study of graphs, and the first theorem
of graph theory is generally considered to be Euler’s solution of seven bridges of graph theory is generally considered to be Euler’s solution of seven bridges
problem. problem.
There are many classifications of graphs from different perspectives, e.g. finite There are many classifications of graphs from different perspectives, e.g. finite
and infinite graphs, weighted and un-weighted graphs, simple and multi graphs and infinite graphs, weighted and un-weighted graphs, simple and multi graphs
and so on and so forth. Mathematically, a graph can be written as G = (V, E) that and so on and so forth. Mathematically, a graph can be written as G = (V, E) that
consists of nodes V and edges E. For example, axial lines and natural street consists of nodes V and edges E. For example, axial lines and natural street
networks (Paper I and II) can be represented as a graph where nodes V means networks (Paper I and II) can be represented as a graph where nodes V means
axial line or natural streets and edges E are connections between the nodes. axial line or natural streets and edges E are connections between the nodes.
11 11
These kinds of graphs are called multi graph, which means that more than one These kinds of graphs are called multi graph, which means that more than one
edge could exist between two nodes (Harary 1994). For instance, some parallel edge could exist between two nodes (Harary 1994). For instance, some parallel
roads without side streets that diverge and join again. Simple graph and multi roads without side streets that diverge and join again. Simple graph and multi
graph are illustrated in Figure 2 (a), where nodes V are represented by black dots graph are illustrated in Figure 2 (a), where nodes V are represented by black dots
(v1, v2 and v3), and edges E are represented by curves and straight line segments (v1, v2 and v3), and edges E are represented by curves and straight line segments
(e1, e11, e12, e2, e3, e31, e32, and e4). In Figure 2 (b), there are three edges e12, e1 (e1, e11, e12, e2, e3, e31, e32, and e4). In Figure 2 (b), there are three edges e12, e1
and e11 between nodes v1 and v2; while there are two edges e31, e3 between v2 and e11 between nodes v1 and v2; while there are two edges e31, e3 between v2
and v4. There is a loop edge on node v4. and v4. There is a loop edge on node v4.
v2 e2 v3 v2 e2 v3 v2 e2 v3 v2 e2 v3
e12 e12
e1 e3 e1 e3 e1 e3 e1 e3
e11 e31 e11 e31
e32 e32
v1 v4 v1 v4 v1 v4 v1 v4
e4 e4 e4 e4
(a) (b) (a) (b)
Figure 2: Illustration of (a) simple graph and (b) multi graph Figure 2: Illustration of (a) simple graph and (b) multi graph
Below are some frequently used concepts and definitions in this thesis: Below are some frequently used concepts and definitions in this thesis:
 Adjacent nodes and lines: if two nodes are connected by an edge, e.g. (v1, v2)  Adjacent nodes and lines: if two nodes are connected by an edge, e.g. (v1, v2)
and (v1, v4), then we say they are adjacent; if two edges are sharing one node, and (v1, v4), then we say they are adjacent; if two edges are sharing one node,
e.g. (e1, e2) and (e1, e4), then we say they are adjacent. e.g. (e1, e2) and (e1, e4), then we say they are adjacent.
 Connectivity or degree of a node is the total number of edges incident upon  Connectivity or degree of a node is the total number of edges incident upon
it. In Figure 2 (a), the degree of v2 is 3, while in (b) it is 6. Node v3 is called a it. In Figure 2 (a), the degree of v2 is 3, while in (b) it is 6. Node v3 is called a
pendant because it has degree 1. An isolated node has no degree. pendant because it has degree 1. An isolated node has no degree.
 Path, walk and cycle. Given start node S and destination node D, we define a  Path, walk and cycle. Given start node S and destination node D, we define a
path as a sequence of adjacent edges P = (e1, e2…en). A path does not visit path as a sequence of adjacent edges P = (e1, e2…en). A path does not visit
any point more than once. A walk is like a path except that there is no any point more than once. A walk is like a path except that there is no
restriction on the number of times a point can be visited. That is, a path is a restriction on the number of times a point can be visited. That is, a path is a
kind of walk. A cycle is like a path except that it starts and ends at the same kind of walk. A cycle is like a path except that it starts and ends at the same
point. point.
 Topological distance: the length of the shortest path in a graph, i.e. the  Topological distance: the length of the shortest path in a graph, i.e. the
number of intermediate nodes between node S and destination node D plus number of intermediate nodes between node S and destination node D plus
one. one.
Space syntax Space syntax
Space syntax is a set of theories and tools used for spatial morphological Space syntax is a set of theories and tools used for spatial morphological
analysis with particular applications in urban science (Hillier 1996, Hillier 1997, analysis with particular applications in urban science (Hillier 1996, Hillier 1997,
Hillier and Hanson 1984). Space syntax decomposes a graphic space into small Hillier and Hanson 1984). Space syntax decomposes a graphic space into small
spaces, and then converts the small spaces into connectivity graph based on the spaces, and then converts the small spaces into connectivity graph based on the
spatial intersection relationship. According to graph theory, we can spatial intersection relationship. According to graph theory, we can
12 12
quantitatively assess each individual part. That is, space syntax is the network quantitatively assess each individual part. That is, space syntax is the network
representation of geographic space in nature. In this case, the network is the representation of geographic space in nature. In this case, the network is the
graph inferred from geographic representations, i.e. axial lines. Therefore, axial graph inferred from geographic representations, i.e. axial lines. Therefore, axial
line is the fundamental tool. Axial lines are defined as the longest visibility lines line is the fundamental tool. Axial lines are defined as the longest visibility lines
for representing individual linear spaces in urban environments. The least for representing individual linear spaces in urban environments. The least
number of axial lines that cover the free space of an urban environment or the number of axial lines that cover the free space of an urban environment or the
space between buildings constitute what is often called an axial map. space between buildings constitute what is often called an axial map.
The most used example in space syntax is Gassin town (Figure 3). In Figure 3 The most used example in space syntax is Gassin town (Figure 3). In Figure 3
(a), solid red lines are the original axial lines of Gassin town. Each axial line (a), solid red lines are the original axial lines of Gassin town. Each axial line
represents an individual space. They connect each other and cover the entire represents an individual space. They connect each other and cover the entire
geographic space. All the axial lines together are called axial map. The geographic space. All the axial lines together are called axial map. The
important feature is that the number of axial lines is the least, which means each important feature is that the number of axial lines is the least, which means each
axial line is the longest one in the represented space. It is easy to understand that axial line is the longest one in the represented space. It is easy to understand that
axial lines are one of the representations of geographic space with particular axial lines are one of the representations of geographic space with particular
rules. That is, it decomposes the geographic space in a special way. Then we can rules. That is, it decomposes the geographic space in a special way. Then we can
convert the axial map into a graph, where axial lines denote the nodes and the convert the axial map into a graph, where axial lines denote the nodes and the
connections are the links (Figure 3b). This kind of connectivity graph enables connections are the links (Figure 3b). This kind of connectivity graph enables
the quantitative computation of variables of space syntax. the quantitative computation of variables of space syntax.
(a) (b) (a) (b)
Figure 3: Concepts of space syntax (a) original axial lines of Gassin town and Figure 3: Concepts of space syntax (a) original axial lines of Gassin town and
(b) converted graph of axial lines of Gassin town (b) converted graph of axial lines of Gassin town
Based on the converted graph, we can use metrics of space syntax to quantify Based on the converted graph, we can use metrics of space syntax to quantify
the represented individual spaces based on graph theory. They are Connect, the represented individual spaces based on graph theory. They are Connect,
Control, Mean Depth, Global Integration, Local Integration, Total Depth and Control, Mean Depth, Global Integration, Local Integration, Total Depth and
Local Depth. According to the values of these parameters, the difference Local Depth. According to the values of these parameters, the difference
between axial lines can be quantified. In fact, these variables can also be seen in between axial lines can be quantified. In fact, these variables can also be seen in
graph analysis. To this point, we have explained how space syntax represents graph analysis. To this point, we have explained how space syntax represents
and quantitatively assesses a geographic space. For more details, please refer to and quantitatively assesses a geographic space. For more details, please refer to
literatures such as Hillier (1996, 1997) and Jiang (2003) etc. literatures such as Hillier (1996, 1997) and Jiang (2003) etc.
13 13
Gestalt principle Gestalt principle
Gestalt is a psychological theory of visual perception. It was developed by Gestalt is a psychological theory of visual perception. It was developed by
German psychologists in the 1920s. In fact, it is a German word that means German psychologists in the 1920s. In fact, it is a German word that means
“meaningful whole”. Gestalt theory focuses on the idea of “grouping”, and it “meaningful whole”. Gestalt theory focuses on the idea of “grouping”, and it
follows the basic principle that the whole is greater than the sum of its parts. It follows the basic principle that the whole is greater than the sum of its parts. It
describes how people apply different principles to organize visual elements into describes how people apply different principles to organize visual elements into
groups or unified wholes. Specifically, Gestalt theory is a series of principles of groups or unified wholes. Specifically, Gestalt theory is a series of principles of
totality in terms of perception, and continuance (continuity) is one of them. totality in terms of perception, and continuance (continuity) is one of them.
Continuity occurs when the eye is compelled to move through one object and Continuity occurs when the eye is compelled to move through one object and
continue to another object (SFCC 2011). That is, once you begin to look at a continue to another object (SFCC 2011). That is, once you begin to look at a
composition in a particular direction you will continue looking in that direction composition in a particular direction you will continue looking in that direction
until you see something significant (Saw 2000). until you see something significant (Saw 2000).
(a) (b) (a) (b)
Figure 4: Examples of Gestalt principle of continuity Figure 4: Examples of Gestalt principle of continuity
Figure 4 presents two examples of the Gestalt principle of continuity. In Figure Figure 4 presents two examples of the Gestalt principle of continuity. In Figure
4 (a), the gray part represents buildings or spatial obstacles where people cannot 4 (a), the gray part represents buildings or spatial obstacles where people cannot
walk/drive. Correspondingly, the white part is the free continuous space. When walk/drive. Correspondingly, the white part is the free continuous space. When
people look at a picture, their eyes will follow the smooth boundaries and thus people look at a picture, their eyes will follow the smooth boundaries and thus
form a curve. In this way, people can see the separate roads from the entire free form a curve. In this way, people can see the separate roads from the entire free
space. In Figure 4 (b), the red points are the location points of a moving taxicab. space. In Figure 4 (b), the red points are the location points of a moving taxicab.
When people look at these discrete points, the trajectory of the taxicab can be When people look at these discrete points, the trajectory of the taxicab can be
perceived though these points that are not connected. The reason is the function perceived though these points that are not connected. The reason is the function
of the principle of continuity predicting the preference for continuous figures. of the principle of continuity predicting the preference for continuous figures.
Natural roads/Streets Natural roads/Streets
Natural roads/streets are defined as joined road segments which have the Natural roads/streets are defined as joined road segments which have the
appropriate deflection angles based on the Gestalt principle of good continuity, appropriate deflection angles based on the Gestalt principle of good continuity,
and they are self-organized in nature (Jiang et al. 2008). Three principles, i.e. and they are self-organized in nature (Jiang et al. 2008). Three principles, i.e.
14 14
every-best-fit, self-best-fit and self-fit, of generating natural roads are well every-best-fit, self-best-fit and self-fit, of generating natural roads are well
documented in the same study. These three principles are created from the documented in the same study. These three principles are created from the
algorithm’s point of view: when given a threshold angle, the join process is the algorithm’s point of view: when given a threshold angle, the join process is the
comparison between the current deviation angle and the threshold iteratively. comparison between the current deviation angle and the threshold iteratively.
Deflection angle is the angle between the forward tangent of the first segment at Deflection angle is the angle between the forward tangent of the first segment at
junction point and back tangent of second segment at junction point. As show in junction point and back tangent of second segment at junction point. As show in
Figure 5, there are three road segments in red (1), blue (2) and green (3). The Figure 5, there are three road segments in red (1), blue (2) and green (3). The
three angles α, β and σ in corresponding colors are the deflection angles of their three angles α, β and σ in corresponding colors are the deflection angles of their
segment to the others. segment to the others.
Figure 5: Deflection angles between road segments Figure 5: Deflection angles between road segments
In Figure 5, we can see that the red road segment will connect with the green In Figure 5, we can see that the red road segment will connect with the green
one to form a natural road, rather than the blue one according to the Gestalt one to form a natural road, rather than the blue one according to the Gestalt
principle of good continuity. In the process of organizing natural roads, if there principle of good continuity. In the process of organizing natural roads, if there
are more than one deviation angles that are not greater than the threshold, then are more than one deviation angles that are not greater than the threshold, then
we have three choices, i.e. these three join principles. In fact, according to the we have three choices, i.e. these three join principles. In fact, according to the
concept of continuity explained above, when starting to look at a segment in a concept of continuity explained above, when starting to look at a segment in a
road network from any of its endpoints, people will tend to select the best fit road network from any of its endpoints, people will tend to select the best fit
segment, to form a natural street, rather than a worse one. This process is in segment, to form a natural street, rather than a worse one. This process is in
coincidence with the every-best-fit principle, which in this thesis is adopted to coincidence with the every-best-fit principle, which in this thesis is adopted to
form natural roads with a natural threshold angle of 45 degrees. Thus, these form natural roads with a natural threshold angle of 45 degrees. Thus, these
generated natural roads are more “natural”. As a proof, they match fairly well to generated natural roads are more “natural”. As a proof, they match fairly well to
named roads (Jiang et al. 2008), which are also some kind of “natural” roads. named roads (Jiang et al. 2008), which are also some kind of “natural” roads.
Algorithmic strategy Algorithmic strategy
In the age of data intensive computation, computing efficiency is essential to the In the age of data intensive computation, computing efficiency is essential to the
success of a project. In this thesis, there are many algorithms developed to success of a project. In this thesis, there are many algorithms developed to
traverse a large graph or network to calculate designed parameters in different traverse a large graph or network to calculate designed parameters in different
applications. Some strategies are adopted to improve the computing efficiency, applications. Some strategies are adopted to improve the computing efficiency,
15 15
such as Breadth-first search (BFS), Depth-first search (DFS) method and binary such as Breadth-first search (BFS), Depth-first search (DFS) method and binary
search (Cormen et al. 2001) strategy, through which the searching efficiency can search (Cormen et al. 2001) strategy, through which the searching efficiency can
be dramatically improved. be dramatically improved.
1) Breadth First Search 1) Breadth First Search

As a graph search algorithm, Breadth-first search (BFS) traverses a graph from As a graph search algorithm, Breadth-first search (BFS) traverses a graph from
the given source node. In fact, BFS is very similar to Dijkstra’s algorithm in the given source node. In fact, BFS is very similar to Dijkstra’s algorithm in
nature, but it is suitable for graphs in which each edge has the same value, i.e. a nature, but it is suitable for graphs in which each edge has the same value, i.e. a
connectivity graph. BFS is, however, simpler. It begins at a source node, connectivity graph. BFS is, however, simpler. It begins at a source node,
searches all the adjacent nodes and adds adjacent nodes to the searching list; searches all the adjacent nodes and adds adjacent nodes to the searching list;
then for each node in the searching list, removes it and searches its adjacent then for each node in the searching list, removes it and searches its adjacent
nodes which have not been searched and add to searching list. Repeat this nodes which have not been searched and add to searching list. Repeat this
process until the target node is reached. It does not use any heuristic method process until the target node is reached. It does not use any heuristic method
such as the A* algorithm (first described by Hart et al., 1968). such as the A* algorithm (first described by Hart et al., 1968).
During the process, BFS exhausts all the nodes in a graph step by step before it During the process, BFS exhausts all the nodes in a graph step by step before it
finds the target node. Then the step value increases by one at each step. We can finds the target node. Then the step value increases by one at each step. We can
assign the step value to the new found nodes at each step. Thus, the nodes of the assign the step value to the new found nodes at each step. Thus, the nodes of the
graph have a level like structure based on the step values from the source node. graph have a level like structure based on the step values from the source node.
Obviously, the step value is the length of the shortest path between the source Obviously, the step value is the length of the shortest path between the source
node and other node, i.e. the topological distance. In Figure 6 (a), there is no node and other node, i.e. the topological distance. In Figure 6 (a), there is no
level structure because there is no source node. Let’s assume node A as source level structure because there is no source node. Let’s assume node A as source
node and apply BFS to the graph, then we can easily get the level structure as node and apply BFS to the graph, then we can easily get the level structure as
shown in Figure 10 (b). There are four levels: source node A is level 0; nodes B, shown in Figure 10 (b). There are four levels: source node A is level 0; nodes B,
C, D and E are at level 1; nodes F, G and H are at level 2; nodes I and J are at C, D and E are at level 1; nodes F, G and H are at level 2; nodes I and J are at
level 3. level 3.
I G B I G B
F F
F C F C
BFS C I BFS C I
D D
A G A G
B A J B A J
D D
H H
E J E J
H E H E
(a) (b) (a) (b)

Figure 6: graph processed by BFS algorithm to structured graph Figure 6: graph processed by BFS algorithm to structured graph
2) Depth-first search 2) Depth-first search
Depth-first search (DFS) is a general technique which can be used to search a Depth-first search (DFS) is a general technique which can be used to search a
tree or graph. It is often used to traverse all the possible paths between source tree or graph. It is often used to traverse all the possible paths between source
and target nodes. Similarly, it also starts from a source node and generally runs and target nodes. Similarly, it also starts from a source node and generally runs
16 16
in a recursive way. When the target node is found or there is not any adjacent in a recursive way. When the target node is found or there is not any adjacent
node, it goes back to previous stack. There are two states in the process before node, it goes back to previous stack. There are two states in the process before
the target node is found: visited and unvisited. According to graph theory, a path the target node is found: visited and unvisited. According to graph theory, a path
visits any node only one time. If the non-recursive method is adopted, the newly visits any node only one time. If the non-recursive method is adopted, the newly
found nodes are added to the exploring list and removed when selected. found nodes are added to the exploring list and removed when selected.
Apparently, the paths traversed using DFS are not the shortest paths. However, it Apparently, the paths traversed using DFS are not the shortest paths. However, it
is easy to use a modification of the DFS algorithm to compute the shortest path is easy to use a modification of the DFS algorithm to compute the shortest path
in a graph. First, we use BFS to construct the level of nodes graph. After that, we in a graph. First, we use BFS to construct the level of nodes graph. After that, we
use DFS to traverse the paths between source and target nodes. When we search use DFS to traverse the paths between source and target nodes. When we search
the adjacent nodes, we only select nodes that have greater step value. In this the adjacent nodes, we only select nodes that have greater step value. In this
way, we will get the shortest path and the corresponding topological distance. way, we will get the shortest path and the corresponding topological distance.
For example, in Figure 6 (b), let’s assume that node A is source and node I is For example, in Figure 6 (b), let’s assume that node A is source and node I is
target. According to the modification, path (A, B, C, G and I) is not the shortest target. According to the modification, path (A, B, C, G and I) is not the shortest
path, while path (A, B, F and I) is the shortest path. path, while path (A, B, F and I) is the shortest path.
2.1.2. VGI and OSM 2.1.2. VGI and OSM
This section first presents an overview on the phenomenon of VGI in terms of This section first presents an overview on the phenomenon of VGI in terms of
its development, major characteristics and the technical framework behind VGI. its development, major characteristics and the technical framework behind VGI.
Then we present the basic content of the OSM project and how to make Then we present the basic content of the OSM project and how to make
contributions to OSM data, i.e., what is OSM data, how to access them and the contributions to OSM data, i.e., what is OSM data, how to access them and the
data quality. It can be seen that VGI data sets play an important role and fit well data quality. It can be seen that VGI data sets play an important role and fit well
into this thesis. into this thesis.
2.1.3. Understanding VGI 2.1.3. Understanding VGI
Goodchild (2007a) coined the term volunteered geographic information (VGI) Goodchild (2007a) coined the term volunteered geographic information (VGI)
and defined VGI as the phenomenon that numerous people create, assemble and and defined VGI as the phenomenon that numerous people create, assemble and
disseminate geographic information voluntarily under the support of web 2.0 disseminate geographic information voluntarily under the support of web 2.0
technologies. In the same study, Goodchild also pointed out that VGI has technologies. In the same study, Goodchild also pointed out that VGI has
emerged in the big background of the more general Web phenomenon of user- emerged in the big background of the more general Web phenomenon of user-
generated content (UGC) driven by web 2.0 technologies. VGI is a special case generated content (UGC) driven by web 2.0 technologies. VGI is a special case
of UGC because it is geographically related in comparison with other kinds of of UGC because it is geographically related in comparison with other kinds of
UGC. VGI is a new way of collecting geographic data from the bottom-up, UGC. VGI is a new way of collecting geographic data from the bottom-up,
which is totally different from the conventional top-down mode. Extensive which is totally different from the conventional top-down mode. Extensive
research has been carried out to examine this phenomenon from different research has been carried out to examine this phenomenon from different
perspectives such as theory and application (e.g. Craglia 2007, Elwood 2008, perspectives such as theory and application (e.g. Craglia 2007, Elwood 2008,
Flanagin and Metzger 2008, Goodchild 2007b, Grossner and Glennon 2007, Flanagin and Metzger 2008, Goodchild 2007b, Grossner and Glennon 2007,
Gupta 2007, Sui 2008). Among these studies, several typical questions such as Gupta 2007, Sui 2008). Among these studies, several typical questions such as
“how good is the data”, “why do people do this”, “what are the driving “how good is the data”, “why do people do this”, “what are the driving
technologies” and “what kind of societal impacts it brings” have attracted technologies” and “what kind of societal impacts it brings” have attracted
considerable attentions. considerable attentions.
17 17
These questions can be considered as some of the important issues of VGI. First, These questions can be considered as some of the important issues of VGI. First,
with regards to the data quality of VGI, OpenStreetMap (OSM) is an excellent with regards to the data quality of VGI, OpenStreetMap (OSM) is an excellent
example. OSM is one of the most successful VGI examples such as Google, example. OSM is one of the most successful VGI examples such as Google,
MyMaps, Wikimepia and Flickr. It proves that OSM data can satisfy various MyMaps, Wikimepia and Flickr. It proves that OSM data can satisfy various
kinds of applications in many fields (ref. section 3.2). Second, the motivations kinds of applications in many fields (ref. section 3.2). Second, the motivations
that drive people to get involved in this phenomenon are very complex. The that drive people to get involved in this phenomenon are very complex. The
people who are involved in VGI can be anyone. Most are ordinary, untrained people who are involved in VGI can be anyone. Most are ordinary, untrained
and amateur compared with the professional ones working for conventional and amateur compared with the professional ones working for conventional
mapping agencies. Some people believe that good map would change the world. mapping agencies. Some people believe that good map would change the world.
And the number of such amateurs is much greater than the one of professionals. And the number of such amateurs is much greater than the one of professionals.
The huge number of people shows the diverse motivations. Third, as stated The huge number of people shows the diverse motivations. Third, as stated
above, web 2.0 is one of the key driving technologies. In addition, the increasing above, web 2.0 is one of the key driving technologies. In addition, the increasing
availability of mobile devices equipped with GPS also plays an important role ( availability of mobile devices equipped with GPS also plays an important role (
this is discussed in section 2.1.4). Last, the societal impacts of VGI are manifold this is discussed in section 2.1.4). Last, the societal impacts of VGI are manifold
and profound. The emergence of VGI fundamentally changed many aspects of and profound. The emergence of VGI fundamentally changed many aspects of
conventional GIS. To date, tens of hundreds of gigabytes of geographic conventional GIS. To date, tens of hundreds of gigabytes of geographic
information are freely accessible, but the human privacy still remains disputable. information are freely accessible, but the human privacy still remains disputable.
The latter is not discussed in this thesis. The latter is not discussed in this thesis.
To understand the difference of data collection between VGI and conventional To understand the difference of data collection between VGI and conventional
GIS, we first take a look at how they are carried out. Generally, data collection GIS, we first take a look at how they are carried out. Generally, data collection
in conventional GIS is done via traditional surveying and mapping missions, in conventional GIS is done via traditional surveying and mapping missions,
which can be categorized into different types, e.g. general mapping, homestead which can be categorized into different types, e.g. general mapping, homestead
surveying and engineering surveying. Each type has its own emphasis and code surveying and engineering surveying. Each type has its own emphasis and code
such as the code for engineering surveying and each code provides all of the such as the code for engineering surveying and each code provides all of the
technical specifications and contents in detail. In other words, when people take technical specifications and contents in detail. In other words, when people take
a surveying task, what to survey and how to survey have to be taken according a surveying task, what to survey and how to survey have to be taken according
to the corresponding code, e.g. the mapping scale, plotting accuracy and surface to the corresponding code, e.g. the mapping scale, plotting accuracy and surface
features to be surveyed and so on. After the mission is finished, the results have features to be surveyed and so on. After the mission is finished, the results have
to pass the quality control before submitting. Noticeably, the people involved in to pass the quality control before submitting. Noticeably, the people involved in
these tasks are trained for years and qualified by national administration of these tasks are trained for years and qualified by national administration of
surveying and mapping. We can see that in the model of traditional surveying surveying and mapping. We can see that in the model of traditional surveying
and mapping, it is expensive to obtain geographic information. As a result, such and mapping, it is expensive to obtain geographic information. As a result, such
geographic data are beyond the access of common users or researchers. geographic data are beyond the access of common users or researchers.
Generally speaking, it is a top-down model. Generally speaking, it is a top-down model.
On the contrary, there is not any code for what kind of and how to obtain On the contrary, there is not any code for what kind of and how to obtain
geographic information in VGI. Obviously, traditional geographic information geographic information in VGI. Obviously, traditional geographic information
such as roads, buildings, lakes, farmlands, power facilities and so on is included. such as roads, buildings, lakes, farmlands, power facilities and so on is included.
More than this, people can use any method to create any kind of geographic More than this, people can use any method to create any kind of geographic
information. Because of this, the methods and geographic data are information. Because of this, the methods and geographic data are
unprecedented. It means that geographic information can be much richer than unprecedented. It means that geographic information can be much richer than
18 18
the ones specified in the traditional mode. For example, people can share their the ones specified in the traditional mode. For example, people can share their
experienced routes, mental maps and so on, which may never be considered in a experienced routes, mental maps and so on, which may never be considered in a
traditional mode. The obtained geographic information is added to the server or traditional mode. The obtained geographic information is added to the server or
used to edit the existing one. Most importantly, all the shared geographic used to edit the existing one. Most importantly, all the shared geographic
information on the server is freely accessible and also editable to anyone. In this information on the server is freely accessible and also editable to anyone. In this
process of VGI, no one organizes the collection of data, and all the collected process of VGI, no one organizes the collection of data, and all the collected
data is licensed to no one. Everyone can use the shared data, which is hard to data is licensed to no one. Everyone can use the shared data, which is hard to
imagine in the conventional mode. In addition to the advantage of free access of imagine in the conventional mode. In addition to the advantage of free access of
VGI, there are also other advantages. For example, the richer VGI data is more VGI, there are also other advantages. For example, the richer VGI data is more
potential and suitable for mining geographic related knowledge and modeling potential and suitable for mining geographic related knowledge and modeling
the real world; some regions that are not convenient for traditional method to get the real world; some regions that are not convenient for traditional method to get
the geographic data now can be mapped by using VGI mode, e.g. the the geographic data now can be mapped by using VGI mode, e.g. the
WikiProject of Gaza Strip and Haiti. Compared with the traditional top-down WikiProject of Gaza Strip and Haiti. Compared with the traditional top-down
mode, VGI is entirely bottom-up. mode, VGI is entirely bottom-up.
As stated above, VGI has emerged as a special case of UGC, both of which As stated above, VGI has emerged as a special case of UGC, both of which
share the underlying web2.0 and other key technologies. The next section moves share the underlying web2.0 and other key technologies. The next section moves
to the technical framework of VGI, i.e. how technology enables VGI. to the technical framework of VGI, i.e. how technology enables VGI.
Technical perspectives of VGI Technical perspectives of VGI
A series of technologies such as web 2.0, geo-referencing, geo-tags, GPS, A series of technologies such as web 2.0, geo-referencing, geo-tags, GPS,
graphics and broadband communication collectively enable VGI (Goodchild graphics and broadband communication collectively enable VGI (Goodchild
2007a). The web 2.0 technologies are not new advances compared with other 2007a). The web 2.0 technologies are not new advances compared with other
ones, but it is the key of them. As shown in Figure 7, web 2.0 just inherits the ones, but it is the key of them. As shown in Figure 7, web 2.0 just inherits the
common 3-tires Brower/Server (B/S) structure, which means there is no specific common 3-tires Brower/Server (B/S) structure, which means there is no specific
technical update, although web 2.0 indicates the fundamental change of internet technical update, although web 2.0 indicates the fundamental change of internet
usage by people. In the stage of web 1.0, people just retrieve information from usage by people. In the stage of web 1.0, people just retrieve information from
web pages. Some early big websites (e.g. Yahoo) provide information including web pages. Some early big websites (e.g. Yahoo) provide information including
text, pictures and animation on their web pages, and people only can browse text, pictures and animation on their web pages, and people only can browse
these webpage. Such an application mode is one-way. Web 2.0, on the other these webpage. Such an application mode is one-way. Web 2.0, on the other
hand, allows people to participate in the content of the WebPages. People can hand, allows people to participate in the content of the WebPages. People can
contribute and share the information to web sites, and even edit the content from contribute and share the information to web sites, and even edit the content from
other users and design the web pages. Although the owners of websites still other users and design the web pages. Although the owners of websites still
publish contents, some user-oriented websites such as blogs and social networks publish contents, some user-oriented websites such as blogs and social networks
are empty without the participation of users. The quantity of UGC surges and are empty without the participation of users. The quantity of UGC surges and
the quality is surprisingly as good as wisdom from experts, which we call the quality is surprisingly as good as wisdom from experts, which we call
collective intelligence. Tim Berners-Lee (2005), who is one of the World Wide collective intelligence. Tim Berners-Lee (2005), who is one of the World Wide
Web inventors, precisely called this new version of web as “a collaborative Web inventors, precisely called this new version of web as “a collaborative
medium, a place where we could all meet and read and write”, i.e. “Read/Write medium, a place where we could all meet and read and write”, i.e. “Read/Write
Web”. Web”.
19 19
Internent Internent
Mobile Mobile
... ...
Web2.0 Web 2.0 server Web2.0 Web 2.0 server

interface/mashup applications: Volunteered interface/mashup applications: Volunteered
(Ajax, html, web services, Geographic (Ajax, html, web services, Geographic
Users flash...) XML ... Information Users flash...) XML ... Information
Other... Other...
Client Middle tier Dataset Client Middle tier Dataset
Figure 7: Web 2.0 conceptual model for VGI Figure 7: Web 2.0 conceptual model for VGI
From Figure 7 we can see that the client tier is more enhanced in comparison From Figure 7 we can see that the client tier is more enhanced in comparison
with web 1.0. For instance, people use devices equipped with GPS to obtain with web 1.0. For instance, people use devices equipped with GPS to obtain
geographic data based on their location. The data could be a collection of points geographic data based on their location. The data could be a collection of points
with X, Y and Z in the form of Latitude, Longitude and height. Then these with X, Y and Z in the form of Latitude, Longitude and height. Then these
points can be connected together to form a highway. People can also digitize points can be connected together to form a highway. People can also digitize
satellite images or scanned paper maps to get geographic data. There are also satellite images or scanned paper maps to get geographic data. There are also
many other unrestricted methods which make the data sets unprecedented. many other unrestricted methods which make the data sets unprecedented.
People can also add the obtained geographic information to other media such as People can also add the obtained geographic information to other media such as
photos (generally called geo-tag). In this stage, the higher level geographic photos (generally called geo-tag). In this stage, the higher level geographic
information is produced. After that, people got to the web 2.0 interfaces and information is produced. After that, people got to the web 2.0 interfaces and
share the collected information to the server via broadband communication and share the collected information to the server via broadband communication and
web services. Noticeably, this process is a repeating loop, which means that web services. Noticeably, this process is a repeating loop, which means that
each next step will refine the previous results iteratively. Kuhn (2007) calls it as each next step will refine the previous results iteratively. Kuhn (2007) calls it as
“scaling up of closed loops”. Under the web 2.0 situation, the server tier “scaling up of closed loops”. Under the web 2.0 situation, the server tier
provides facilities for amateurs to publish their own content. For example, in the provides facilities for amateurs to publish their own content. For example, in the
traditional Encyclopedia, only few authors are writing the content, whereas most traditional Encyclopedia, only few authors are writing the content, whereas most
people are only readers. On the contrary, in the successful VGI example people are only readers. On the contrary, in the successful VGI example
Wikipedia, people are readers and writers at the same time. Wikipedia, people are readers and writers at the same time.
The core technique in web 2.0 is a web service. However, there are many things The core technique in web 2.0 is a web service. However, there are many things
on the web called “services” or “web services”. For people who visit web sites on the web called “services” or “web services”. For people who visit web sites
with a web browser, all these reachable functions such as searching map and with a web browser, all these reachable functions such as searching map and
electronic shopping are “services”. They do not have to care about how to call electronic shopping are “services”. They do not have to care about how to call
them or how these functions are implemented and distributed. Nevertheless, it is them or how these functions are implemented and distributed. Nevertheless, it is
oversimplified from the technical point of view. World Wide Web Consortium oversimplified from the technical point of view. World Wide Web Consortium
(W3C) defines that web service is an abstract notion that must be implemented (W3C) defines that web service is an abstract notion that must be implemented
by a concrete program. More specifically, a true web service is a kind of by a concrete program. More specifically, a true web service is a kind of
20 20
distributed interoperable application programming interface (API) and distributed interoperable application programming interface (API) and
standardization protocol, and it involves a lot of technologies such as Extensible standardization protocol, and it involves a lot of technologies such as Extensible
Markup Language (XML), Web Service Definition Language (WSDL), Simple Markup Language (XML), Web Service Definition Language (WSDL), Simple
Object Access Protocol (SOAP), etc. For the sake of detail and later discussion, Object Access Protocol (SOAP), etc. For the sake of detail and later discussion,
below is the definition based on W3C (2012). below is the definition based on W3C (2012).
A web service is a software system designed to support interoperable machine- A web service is a software system designed to support interoperable machine-
to-machine interaction over a network. It has an interface described in a to-machine interaction over a network. It has an interface described in a
machine-processable format (specifically WSDL). Other systems interact with machine-processable format (specifically WSDL). Other systems interact with
the web service in a manner prescribed by its description using SOAP messages, the web service in a manner prescribed by its description using SOAP messages,
typically conveyed using HTTP with an XML serialization in conjunction with typically conveyed using HTTP with an XML serialization in conjunction with
other Web-related standards. other Web-related standards.
From the above definition we can see that web service focuses on the From the above definition we can see that web service focuses on the
interoperability of online applications. The supporting technologies successfully interoperability of online applications. The supporting technologies successfully
bring the adventure of VGI. OpenStreetMap (OSM) is perhaps one of the most bring the adventure of VGI. OpenStreetMap (OSM) is perhaps one of the most
successful VGI examples and will be discussed in the following section as the successful VGI examples and will be discussed in the following section as the
main data source of this thesis. main data source of this thesis.
2.1.4. OpenStreetMap project 2.1.4. OpenStreetMap project
OSM was inspired by sites such as Wikipedia and founded in July 2004 by OSM was inspired by sites such as Wikipedia and founded in July 2004 by
Steve Coast at University College London (UCL). The main objective of OSM Steve Coast at University College London (UCL). The main objective of OSM
is to create a set of map that is free to use, editable, and licensed under new is to create a set of map that is free to use, editable, and licensed under new
copyright schemes (Haklay and Weber 2008). Since the day OSM came into copyright schemes (Haklay and Weber 2008). Since the day OSM came into
being, it has experienced a rapid growth. In April 2006, a foundation was being, it has experienced a rapid growth. In April 2006, a foundation was
established to encourage the growth. So far, OSM planet data is more than 100 established to encourage the growth. So far, OSM planet data is more than 100
gigabytes and covers most of the world, and the number of registered users is gigabytes and covers most of the world, and the number of registered users is
more than 420,000 (OpenStreetMap Statistics 2012). According to the same more than 420,000 (OpenStreetMap Statistics 2012). According to the same
statistical graph, the amount of data and the number of users are still growing in statistical graph, the amount of data and the number of users are still growing in
an accelerated speed. OSM is probably the most effective and successful VGI an accelerated speed. OSM is probably the most effective and successful VGI
project. project.
21 21
Figure 8: Main web site of OSM. Figure 8: Main web site of OSM.
In the first place OSM is a web 2.0 mapping site, i.e. www.OpenStreetMap.org In the first place OSM is a web 2.0 mapping site, i.e. www.OpenStreetMap.org
(Figure 8). People go to this website to view the map, edit the data, download (Figure 8). People go to this website to view the map, edit the data, download
the data, etc. (sees top tabs and left panel in Figure 8). Obviously, edit is the key the data, etc. (sees top tabs and left panel in Figure 8). Obviously, edit is the key
function to contribute to OSM. function to contribute to OSM.
 View: visitors are first greeted with a Google Maps style online mapping  View: visitors are first greeted with a Google Maps style online mapping
interface, which lets visitors pan, zoom, and search the OSM world map interface, which lets visitors pan, zoom, and search the OSM world map
and discover which geographical areas are completed. and discover which geographical areas are completed.
 Edit: The editing tab allows anyone to contribute to the project by  Edit: The editing tab allows anyone to contribute to the project by
digitizing geographical features, uploading GPS traces from hand-held digitizing geographical features, uploading GPS traces from hand-held
GPS units, or correcting errors they might have discovered in their local GPS units, or correcting errors they might have discovered in their local
areas. To make changes to the OpenStreetMap data, you must have an areas. To make changes to the OpenStreetMap data, you must have an
account or an alternative OpenID (e.g. Google and Yahoo) to log in, account or an alternative OpenID (e.g. Google and Yahoo) to log in,
which is a bit different from Wiki’s strategy. which is a bit different from Wiki’s strategy.
 Export: it allows users to download specified area of OSM data in  Export: it allows users to download specified area of OSM data in
different raster and vector formats. different raster and vector formats.
 GPS Traces: users can view the uploaded public traces one by one,  GPS Traces: users can view the uploaded public traces one by one,
including its animation and information. Users also can upload their own including its animation and information. Users also can upload their own
collected data, such as GPS traces. collected data, such as GPS traces.
 User diaries: a page where users can publish diaries, and also add  User diaries: a page where users can publish diaries, and also add
comments. comments.
Next to the website, OSM is also a technological platform. OSM is built Next to the website, OSM is also a technological platform. OSM is built
iteratively using the principle of the simplest approach to ensure the success of iteratively using the principle of the simplest approach to ensure the success of
the project as a whole. OSM’s developers deliberately steered away from using the project as a whole. OSM’s developers deliberately steered away from using
existing standards for geographical information from standard bodies such as the existing standards for geographical information from standard bodies such as the
22 22
Open Geospatial Consortium (OGC), e.g., its WMS standard. They felt that Open Geospatial Consortium (OGC), e.g., its WMS standard. They felt that
most such tools and standards are hard to use and maintain, because these tools most such tools and standards are hard to use and maintain, because these tools
and standards lack of adaptability of OGC-compliant software packages to and standards lack of adaptability of OGC-compliant software packages to
support wiki-style behavior. The key to the technical infrastructure is the central support wiki-style behavior. The key to the technical infrastructure is the central
database, where to handle tens of gigabytes of living data. The key features of database, where to handle tens of gigabytes of living data. The key features of
the database schema are to support wiki- style behaviors in OSM operations, e.g. the database schema are to support wiki- style behaviors in OSM operations, e.g.
handle the transactions and keep the all kinds of versions of data as history handle the transactions and keep the all kinds of versions of data as history
(Haklay 2008). At the beginning, the database was MySQL, and now it changed (Haklay 2008). At the beginning, the database was MySQL, and now it changed
to PostgreSQL server on April 19, 2009. (Ramsey 2009). to PostgreSQL server on April 19, 2009. (Ramsey 2009).
OpenStreetMap data OpenStreetMap data

There are three kinds of basic and simple elements in OSM data, i.e. node, way There are three kinds of basic and simple elements in OSM data, i.e. node, way
and relation (XML schema). All of three OSM data primitives are organized in and relation (XML schema). All of three OSM data primitives are organized in
XML format. The most important feature is the attributes used to describe an XML format. The most important feature is the attributes used to describe an
object, which can be extended accordingly. Simply to say, when you want to object, which can be extended accordingly. Simply to say, when you want to
store the data for something, you can create your own tags as attributes. All the store the data for something, you can create your own tags as attributes. All the
attributes will be stored and transported in a simple structure, i.e. key and value. attributes will be stored and transported in a simple structure, i.e. key and value.
The key shows for the meaning of the tag, and the value shows the specific The key shows for the meaning of the tag, and the value shows the specific
content. Figure 9 gives an XML example of node, way and relation. content. Figure 9 gives an XML example of node, way and relation.
Figure 9: XML examples of three OSM data primitives Figure 9: XML examples of three OSM data primitives
23 23
Nodes consist of latitude and longitude coordinates, along with user name and Nodes consist of latitude and longitude coordinates, along with user name and
timestamp and other optional information. Ways are linear features such as road, timestamp and other optional information. Ways are linear features such as road,
river, and power line and so on, which are a series of ordered nodes referred to river, and power line and so on, which are a series of ordered nodes referred to
by node ID. A relation consists of a list of ways with associated roles. It can be by node ID. A relation consists of a list of ways with associated roles. It can be
used to model both real and abstract objects. In addition to the default tags in used to model both real and abstract objects. In addition to the default tags in
node, way and relation, the optional tags are defined as semicolon separated node, way and relation, the optional tags are defined as semicolon separated
key=value pairs (Figure 9). The XML tags are self-explanatory, i.e. you can key=value pairs (Figure 9). The XML tags are self-explanatory, i.e. you can
understand the meaning of a tag from its name. For example, the pair “<tag understand the meaning of a tag from its name. For example, the pair “<tag
k="highway" v="trunk"/>” of the way in Figure 9 denotes the way is a k="highway" v="trunk"/>” of the way in Figure 9 denotes the way is a
major/trunk highway; “<tag k="name" v="Tvörvägen"/>” means the name of major/trunk highway; “<tag k="name" v="Tvörvägen"/>” means the name of
this way is Tvörvägen and “<tag k="surface" v="paved"/>” shows the material this way is Tvörvägen and “<tag k="surface" v="paved"/>” shows the material
of the highway surface. Anyone can contribute by adding a new pair of of the highway surface. Anyone can contribute by adding a new pair of
key=value. The tagging schema is one of the OSM major initiatives and it is key=value. The tagging schema is one of the OSM major initiatives and it is
community-driven (Haklay 2008). community-driven (Haklay 2008).
Node Node
Way Way
Closed way/Area Closed way/Area
Relation Relation
Figure 10: Hierarchical structure of node, way and relation Figure 10: Hierarchical structure of node, way and relation
It is easy to see that there is a hierarchical structure between these three elements It is easy to see that there is a hierarchical structure between these three elements
(Figure 10). From Figure 9 we can see that, the blue ID of the node is referred (Figure 10). From Figure 9 we can see that, the blue ID of the node is referred
by the way, and the green ID of the way is referred by the relation. That is, the by the way, and the green ID of the way is referred by the relation. That is, the
lower level elements are always referred by the higher level elements. lower level elements are always referred by the higher level elements.
Meanwhile, a relation can be embedded in a relation, too. In this way, OSM data Meanwhile, a relation can be embedded in a relation, too. In this way, OSM data
schema avoids redundant storage. This structure is very important for retrieving schema avoids redundant storage. This structure is very important for retrieving
information from OSM data. information from OSM data.
Now that we know how to view OSM data, make contributions to it and what it Now that we know how to view OSM data, make contributions to it and what it
is, then how can we get access to OSM data? More and more developers are is, then how can we get access to OSM data? More and more developers are
involved in developing software to access OSM data “across different involved in developing software to access OSM data “across different
application domains, software platforms, and hardware devices” (Haklay 2008). application domains, software platforms, and hardware devices” (Haklay 2008).
24 24
OSM officially provides three ways to access OSM data: URLs, OSMIS and OSM officially provides three ways to access OSM data: URLs, OSMIS and
planet download: planet download:
(1) URLs. For the interested area, users can send URL to download OSM data, (1) URLs. For the interested area, users can send URL to download OSM data,
e.g. http://api.openstreetmap.org/api/0.6/map?bbox=18.05,59.32,18.06,59.33. e.g. http://api.openstreetmap.org/api/0.6/map?bbox=18.05,59.32,18.06,59.33.
In this link, the coordinates after “bbox” are the minimum longitude, In this link, the coordinates after “bbox” are the minimum longitude,
minimum latitude, maximum longitude and maximum latitude of the minimum latitude, maximum longitude and maximum latitude of the
interested area. In this way, the database is accessed via OSM API. This can interested area. In this way, the database is accessed via OSM API. This can
also be done via the OSM website function tab “Export”, where the also be done via the OSM website function tab “Export”, where the
minimum and maximum longitudes and latitudes are automatically filled and minimum and maximum longitudes and latitudes are automatically filled and
also can be edited by users. However, the extent of interested area in this also can be edited by users. However, the extent of interested area in this
method is limited. method is limited.
(2) Osmosis. Osmosis (http://wiki.openstreetmap.org/wiki/Osmosis) is a Java (2) Osmosis. Osmosis (http://wiki.openstreetmap.org/wiki/Osmosis) is a Java
application. It includes a list of pluggable components that can perform a lot application. It includes a list of pluggable components that can perform a lot
of functions. Extracting data inside a bounding box or polygon is one of its of functions. Extracting data inside a bounding box or polygon is one of its
main functions. This method needs to download the software and configure main functions. This method needs to download the software and configure
Java runtime environment on computer, and then run the commands. It is a Java runtime environment on computer, and then run the commands. It is a
bit complicated and relies on the storage and memory on the computers. bit complicated and relies on the storage and memory on the computers.
(3) Web downloads. If complete OSM data is needed, OSM provides the FTP (3) Web downloads. If complete OSM data is needed, OSM provides the FTP
site http://planet.osm.org to download directly. There are also many site http://planet.osm.org to download directly. There are also many
alternatives (http://wiki.openstreetmap.org/wiki/Planet.osm) to download alternatives (http://wiki.openstreetmap.org/wiki/Planet.osm) to download
data. Some of the website split OSM data into pieces according to country or data. Some of the website split OSM data into pieces according to country or
region area, and www.CloudMade.com is perhaps the most complete one. region area, and www.CloudMade.com is perhaps the most complete one.
There is no data size limitation to this method. There is no data size limitation to this method.
Spatial data sets are the basis of GIS and its related applications, while quality is Spatial data sets are the basis of GIS and its related applications, while quality is
the core of spatial data. Obviously, the collective way that OSM data collected the core of spatial data. Obviously, the collective way that OSM data collected
could cause more concern about data quality: contributors are not even trained could cause more concern about data quality: contributors are not even trained
for any geographic data collection, and they contribute the data with their own for any geographic data collection, and they contribute the data with their own
interests; various techniques and heterogeneous data sources are used, and there interests; various techniques and heterogeneous data sources are used, and there
is no spatial data quality control. Data quality is the guarantee of application, is no spatial data quality control. Data quality is the guarantee of application,
and this is particularly important for application using OSM data. Positional and and this is particularly important for application using OSM data. Positional and
attribute accuracy, completeness and consistency are three key concepts of attribute accuracy, completeness and consistency are three key concepts of
spatial data quality: spatial data quality:
(1) Positional accuracy mainly refers to the geometric error of real objects in (1) Positional accuracy mainly refers to the geometric error of real objects in
terms of coordinates and attribute accuracy means the attribute correction in terms of coordinates and attribute accuracy means the attribute correction in
terms of geographic information, e.g. whether the name of a street is correct; terms of geographic information, e.g. whether the name of a street is correct;
(2) Completeness mainly refers to the ratio of the missing features of a given (2) Completeness mainly refers to the ratio of the missing features of a given
study or application area, e.g. the national river data should cover all the study or application area, e.g. the national river data should cover all the
rivers throughout the country; rivers throughout the country;
(3) Consistency mainly refers to data definition of unity, i.e. in the same spatial (3) Consistency mainly refers to data definition of unity, i.e. in the same spatial
database, data definition should be consistent. database, data definition should be consistent.
25 25
Some efforts have been applied to assess the quality of OSM data. For example, Some efforts have been applied to assess the quality of OSM data. For example,
Cherldu (2007) presented a paper on quality in the first OSM community Cherldu (2007) presented a paper on quality in the first OSM community
conference; Mauer (2008) evaluated OSM data quality by comparing with conference; Mauer (2008) evaluated OSM data quality by comparing with
Google Maps using a visual method, and the result shows that OSM is slightly Google Maps using a visual method, and the result shows that OSM is slightly
ahead of Google worldwide (especially Europe) in terms of completeness; ahead of Google worldwide (especially Europe) in terms of completeness;
Haklay (2008) analyzed OSM data quality systematically by comparing with Haklay (2008) analyzed OSM data quality systematically by comparing with
London and England datasets of Ordnance Survey (Great Britain’s national London and England datasets of Ordnance Survey (Great Britain’s national
mapping agency), and the analytical results show that “OpenStreetMap mapping agency), and the analytical results show that “OpenStreetMap
information can be fairly accurate: on average within about 6 meters of the information can be fairly accurate: on average within about 6 meters of the
position recorded by the Ordnance Survey, and with approximately 80% overlap position recorded by the Ordnance Survey, and with approximately 80% overlap
of motorway objects between the two datasets”. of motorway objects between the two datasets”.
Furthermore, some tools such as KeepRight (http://keepright.ipax.at/) and data Furthermore, some tools such as KeepRight (http://keepright.ipax.at/) and data
consistency check tool by Harald Kleiner, are developed for improving OSM consistency check tool by Harald Kleiner, are developed for improving OSM
data quality, and the quality is getting better. data quality, and the quality is getting better.
2.1.5. Scaling of geographic space 2.1.5. Scaling of geographic space
Many aspects of geographic space have been studied from different perspectives Many aspects of geographic space have been studied from different perspectives
in terms of hierarchy, size distribution, scaling, fractality and self-similarity. in terms of hierarchy, size distribution, scaling, fractality and self-similarity.
This section first presents a brief overview of these terms and then provides a This section first presents a brief overview of these terms and then provides a
review of the state-of-the-art on the scaling property. review of the state-of-the-art on the scaling property.
Fractal, scaling and self-similarity Fractal, scaling and self-similarity
To understand scaling, some related concepts must be explained: fractal, self- To understand scaling, some related concepts must be explained: fractal, self-
similarity and scale-free. The term fractal was coined by Benoit Mandelbrot, in similarity and scale-free. The term fractal was coined by Benoit Mandelbrot, in
1950 (Batty and Longley 1994). Fractal was derived from the Latin fractus, 1950 (Batty and Longley 1994). Fractal was derived from the Latin fractus,
which means broken or fractured. Mandelbrot (1982) provided the definition of which means broken or fractured. Mandelbrot (1982) provided the definition of
a fractal as “a rough or fragmented geometric shape that can be split into parts, a fractal as “a rough or fragmented geometric shape that can be split into parts,
each of which is (at least approximately) a reduced-size copy of the whole”. In each of which is (at least approximately) a reduced-size copy of the whole”. In
other words, a fractal means that the spatial form of an object is “nowhere other words, a fractal means that the spatial form of an object is “nowhere
smooth” (irregular) and the irregularity of form maintains similarity from scale smooth” (irregular) and the irregularity of form maintains similarity from scale
to scale (Batty and Longley 1994). We call this property self-similarity or scale- to scale (Batty and Longley 1994). We call this property self-similarity or scale-
invariance, and if an object possesses such a property, it is fractal. There is much invariance, and if an object possesses such a property, it is fractal. There is much
literature dedicated to fractal. For instance, Lauwerier (1991) stated that “A literature dedicated to fractal. For instance, Lauwerier (1991) stated that “A
fractal is a geometrical figure that consists of an identical motif repeating itself fractal is a geometrical figure that consists of an identical motif repeating itself
on an ever-reduced scale". We can see that the key characteristic of a fractal is on an ever-reduced scale". We can see that the key characteristic of a fractal is
self-similarity or scale-invariance. Some reports do not strictly distinguish self-similarity or scale-invariance. Some reports do not strictly distinguish
between fractality and self-similarity in examining such phenomena (Song et al. between fractality and self-similarity in examining such phenomena (Song et al.
2005, 2006). 2005, 2006).
26 26
The study of fractals is rooted in mathematics and geometry, and there are many The study of fractals is rooted in mathematics and geometry, and there are many
classic fractal geometries, such as the Mandelbrot set, the Cantor set, the classic fractal geometries, such as the Mandelbrot set, the Cantor set, the
Sierpinski triangle, the Sierpinski carpet, the Koch curve and others. Here, we Sierpinski triangle, the Sierpinski carpet, the Koch curve and others. Here, we
take the Koch curve (Figure 11) as an example to discuss the concepts of scale- take the Koch curve (Figure 11) as an example to discuss the concepts of scale-
invariance and fractal dimension. Here, we take the Hausdorff dimension to invariance and fractal dimension. Here, we take the Hausdorff dimension to
calculate the value: “if we take an object residing in Euclidean dimension D and calculate the value: “if we take an object residing in Euclidean dimension D and
reduce its linear size by 1/r in each spatial direction, it takes N = rD number of reduce its linear size by 1/r in each spatial direction, it takes N = rD number of
self similar objects to cover the original object”. That is, D = ln(N)/ln(r). The self similar objects to cover the original object”. That is, D = ln(N)/ln(r). The
process of construction of a Koch curve is shown in Figure 11 (a). At each level, process of construction of a Koch curve is shown in Figure 11 (a). At each level,
the r is 3, and N is 4. Thus, the D = ln(4)/ln(3), D = 1.2618. Obviously, the D the r is 3, and N is 4. Thus, the D = ln(4)/ln(3), D = 1.2618. Obviously, the D
will remain the same across all the levels (scales). Fractal dimension is the will remain the same across all the levels (scales). Fractal dimension is the
counterpart of dimensions 1, 2 and 3, where the D is kept the same for each type counterpart of dimensions 1, 2 and 3, where the D is kept the same for each type
of geometry. In a sense, fractal dimension reflects the structure and irregularity of geometry. In a sense, fractal dimension reflects the structure and irregularity
of a fractal object or how efficiently fractal objects occupy the space. Thus, the of a fractal object or how efficiently fractal objects occupy the space. Thus, the
fractal dimension of a curve is always between 1 and 2 because a curve cannot fractal dimension of a curve is always between 1 and 2 because a curve cannot
occupy all the space of the plane, while the fractal dimension of a plane is occupy all the space of the plane, while the fractal dimension of a plane is
always between 2 and 3 because a plane cannot occupy all of the volume. always between 2 and 3 because a plane cannot occupy all of the volume.
(a) (b) (a) (b)
Figure 11: Illustration of Koch curve (a) process of construction and (b) Figure 11: Illustration of Koch curve (a) process of construction and (b)
measurement of length with different scales measurement of length with different scales
As stated previously, the use of the term scale varies depending on the area of As stated previously, the use of the term scale varies depending on the area of
interest. First of all, scale refers to cartographic or map scale, i.e., the ratio of a interest. First of all, scale refers to cartographic or map scale, i.e., the ratio of a
distance on the map to the corresponding distance on the land. It is analogous to distance on the map to the corresponding distance on the land. It is analogous to
the resolution used in the raster dataset. That is, if the length of an object is the resolution used in the raster dataset. That is, if the length of an object is
shorter than the scale, then it will be neglected. For example, when a scale is shorter than the scale, then it will be neglected. For example, when a scale is
27 27
given, e.g., 1 cm, to measure the length of a curve, if the size of any part of the given, e.g., 1 cm, to measure the length of a curve, if the size of any part of the
curve is smaller than the given scale, then that part is neglected and will not be curve is smaller than the given scale, then that part is neglected and will not be
counted in total length. In Figure 11 (b), we now use three types of scales to counted in total length. In Figure 11 (b), we now use three types of scales to
measure the length of the fractal curve of level 5. We can see that the smaller the measure the length of the fractal curve of level 5. We can see that the smaller the
scale is, the longer the length becomes. If the scale is reduced to zero, i.e., a scale is, the longer the length becomes. If the scale is reduced to zero, i.e., a
point, then the length equals positive infinity. In this sense, there is no scale that point, then the length equals positive infinity. In this sense, there is no scale that
can characterize the curve, which is one of the underlying meanings of scale- can characterize the curve, which is one of the underlying meanings of scale-
free or scale-invariance. The fractal geometry, such as the Koch curve (Figure free or scale-invariance. The fractal geometry, such as the Koch curve (Figure
11 (a)), is too mathematically restricted by formulae, and thus, they are rarely 11 (a)), is too mathematically restricted by formulae, and thus, they are rarely
found in natural phenomena. However, the core meaning of fractal is self- found in natural phenomena. However, the core meaning of fractal is self-
similarity, i.e., the same structure can be found at various scales. similarity, i.e., the same structure can be found at various scales.
Scaling in this thesis is more related to the concept of scale-free in network from Scaling in this thesis is more related to the concept of scale-free in network from
the perspective of statistics. Scaling refers to the phenomenon that small things the perspective of statistics. Scaling refers to the phenomenon that small things
are very common, while large ones are very rare. From this viewpoint, scaling is are very common, while large ones are very rare. From this viewpoint, scaling is
rooted in the ideas of geometric fractal and related to scale-free in network. rooted in the ideas of geometric fractal and related to scale-free in network.
However, scaling is to discover the hierarchical structures and the geographic However, scaling is to discover the hierarchical structures and the geographic
implications behind them. In this sense, scaling is different with regards to its implications behind them. In this sense, scaling is different with regards to its
specific emphases. Mathematically and statistically, a scaling phenomenon can specific emphases. Mathematically and statistically, a scaling phenomenon can
be characterized by heavy-tailed distributions in this thesis. be characterized by heavy-tailed distributions in this thesis.
Studies of the scaling property Studies of the scaling property
As mentioned above, the scaling property focuses on the hierarchical structures As mentioned above, the scaling property focuses on the hierarchical structures
from the statistical perspective. The scaling property of geographic phenomena from the statistical perspective. The scaling property of geographic phenomena
in geographic space has been studied from different aspects, although Egenhofer in geographic space has been studied from different aspects, although Egenhofer
(1993) thought that geographic space is irregular but does not follow any (1993) thought that geographic space is irregular but does not follow any
preconceived patterns in the early stage. A lot of studies have been dedicated on preconceived patterns in the early stage. A lot of studies have been dedicated on
the size distribution of cities, spatial structures of urban environments and the size distribution of cities, spatial structures of urban environments and
universal scaling laws for urban mechanisms, etc. universal scaling laws for urban mechanisms, etc.
City size distributions: it has been assumed for several decades that distributions City size distributions: it has been assumed for several decades that distributions
of city size follow a universal power function (Decker et al. 2007). This pattern of city size follow a universal power function (Decker et al. 2007). This pattern
of distribution has been examined and explained by many studies and theories of distribution has been examined and explained by many studies and theories
(such as Gabaix 1999, Song and Zhang 2002, Ioannides and Overman 2003, (such as Gabaix 1999, Song and Zhang 2002, Ioannides and Overman 2003,
Eeckhout 2004, Soo 2005, Soo 2007, Decker et al. 2007, Córdoba 2008, Peng Eeckhout 2004, Soo 2005, Soo 2007, Decker et al. 2007, Córdoba 2008, Peng
2010, Xu and Harriss 2010, Giesen and Sudekum 2011, Jiang and Jia 2011, etc). 2010, Xu and Harriss 2010, Giesen and Sudekum 2011, Jiang and Jia 2011, etc).
In the cited studies above, different data sets including census data of urban In the cited studies above, different data sets including census data of urban
areas (Ioannides and Overman 2003), city population (Peng 2010) and generated areas (Ioannides and Overman 2003), city population (Peng 2010) and generated
natural cities (Jiang and Jia 2011) are employed to validate the size distribution natural cities (Jiang and Jia 2011) are employed to validate the size distribution
of cites at different scales (regional or global level). In some cases, Zipf’s or of cites at different scales (regional or global level). In some cases, Zipf’s or
Pareto law is found to hold anywhere in space (Giesen and Sudekum 2011, Jiang Pareto law is found to hold anywhere in space (Giesen and Sudekum 2011, Jiang
28 28
and Jia 2011, Song and Zhang 2002), whereas in some cases Zipf’s law only and Jia 2011, Song and Zhang 2002), whereas in some cases Zipf’s law only
hold at some part (e.g. upper tail, Eeckhout 2004) or does not hold but can be hold at some part (e.g. upper tail, Eeckhout 2004) or does not hold but can be
replaced by other kind of heavy-tailed distributions such as lognormal. That is, replaced by other kind of heavy-tailed distributions such as lognormal. That is,
the city sizes (at a reasonable scale) always follow a heavy-tailed distribution. the city sizes (at a reasonable scale) always follow a heavy-tailed distribution.
However, the exponents for specific distributions in different situation vary. However, the exponents for specific distributions in different situation vary.
Different reasons as well as dynamics have been explored in these studies, such Different reasons as well as dynamics have been explored in these studies, such
as the proportional growth process (Gabaix 1999), growth rate (Ioannides and as the proportional growth process (Gabaix 1999), growth rate (Ioannides and
Overman 2003), random productivity process of local economies and the perfect Overman 2003), random productivity process of local economies and the perfect
mobility of workers (Eeckhout 2004), political economy variables (Soo 2005), mobility of workers (Eeckhout 2004), political economy variables (Soo 2005),
smaller cities grow faster (Soo 2007) and fundamental ecological principles smaller cities grow faster (Soo 2007) and fundamental ecological principles
(Decker 2007), etc. (Decker 2007), etc.
From the statistical point of view, the heavy-tailed distributions of city sizes From the statistical point of view, the heavy-tailed distributions of city sizes
indicate the scaling property. And the induced reasons as well as dynamics are indicate the scaling property. And the induced reasons as well as dynamics are
interesting and inspiring. Some of the studies cited above aimed to derive urban interesting and inspiring. Some of the studies cited above aimed to derive urban
models with restrictions (e.g. preferences and technologies) to explain this models with restrictions (e.g. preferences and technologies) to explain this
regularity (Córdoba 2008). Xu and Harriss (2010) also developed a strategy that regularity (Córdoba 2008). Xu and Harriss (2010) also developed a strategy that
can be used to explain the spatial and temporal auto-correlated growth in Texas can be used to explain the spatial and temporal auto-correlated growth in Texas
cities and reconstruct the empirical distribution. cities and reconstruct the empirical distribution.
Spatial structure: the above studies on the scaling property are focused on the Spatial structure: the above studies on the scaling property are focused on the
size, which is identified as the “major determinant of most characteristics of a size, which is identified as the “major determinant of most characteristics of a
city” (Bettencourt and West 2010). In the mean time, some other studies put the city” (Bettencourt and West 2010). In the mean time, some other studies put the
emphasis on the spatial structure of geographic environment. As early as in emphasis on the spatial structure of geographic environment. As early as in
1976, Bon pointed out the allometry in topologic structure of transportation 1976, Bon pointed out the allometry in topologic structure of transportation
networks. Maritan et al. (1996) verified that the length and area of rive networks networks. Maritan et al. (1996) verified that the length and area of rive networks
possess scaling property, and the power law exponent is “directly related to a possess scaling property, and the power law exponent is “directly related to a
suitable fractal dimension of the boundaries, to the elongation of the basin and to suitable fractal dimension of the boundaries, to the elongation of the basin and to
the scaling exponent of mainstream length”. Mark and Frank (1996) modeled the scaling exponent of mainstream length”. Mark and Frank (1996) modeled
the geographic space from both experiential and formal models. the geographic space from both experiential and formal models.
Interestingly, Rui and Alan (2004) represent the urban open space via axial lines Interestingly, Rui and Alan (2004) represent the urban open space via axial lines
and found that urban open space structures show a size-independent universal and found that urban open space structures show a size-independent universal
feature, and the open space is self-similar and possesses a fractal structure. feature, and the open space is self-similar and possesses a fractal structure.
Lämmer et al. (2006) analyzed the urban road networks of the 20 largest Lämmer et al. (2006) analyzed the urban road networks of the 20 largest
German cities, and found that several aspects of road networks demonstrated German cities, and found that several aspects of road networks demonstrated
scaling property, such as the distribution of cell sizes and the number of nodes scaling property, such as the distribution of cell sizes and the number of nodes
reachable within a travel-time budget. Similarly, Kalapala et al. (2006) reachable within a travel-time budget. Similarly, Kalapala et al. (2006)
converted the road network of the U.S., Denmark und United Kingdom into dual converted the road network of the U.S., Denmark und United Kingdom into dual
graph, of which the degree distributions demonstrate scale invariance therefore graph, of which the degree distributions demonstrate scale invariance therefore
scaling property. Furthermore, a fractal toy model for the placement of roads is scaling property. Furthermore, a fractal toy model for the placement of roads is
introduced. introduced.
29 29
Samaniego and Moses (2008) presented another perspective to the scaling Samaniego and Moses (2008) presented another perspective to the scaling
property of urban road networks. They analyzed the cities across U.S. and found property of urban road networks. They analyzed the cities across U.S. and found
that urban road networks show allometric relationships with city population and that urban road networks show allometric relationships with city population and
area, which are different from biological vascular networks and determine the area, which are different from biological vascular networks and determine the
traffic behavior in cities. Jiang et al. 2008 started from the topological traffic behavior in cities. Jiang et al. 2008 started from the topological
perspective of self-organized road networks and found that they demonstrate the perspective of self-organized road networks and found that they demonstrate the
scaling property. Hu et al. (2009) found that the geographic feature, i.e. the scaling property. Hu et al. (2009) found that the geographic feature, i.e. the
distances distributions between friendships in social networks demonstrate distances distributions between friendships in social networks demonstrate
scaling property and it is a universal property for such networks. scaling property and it is a universal property for such networks.
The above studies present good examples of the scaling property of the spatial The above studies present good examples of the scaling property of the spatial
structures of urban environments mainly based on the size and spatial structures. structures of urban environments mainly based on the size and spatial structures.
Some are direct scaling property, whereas others are allometric relationships. Some are direct scaling property, whereas others are allometric relationships.
Scaling of urban mechanism. The following studies aim to provide a kind of Scaling of urban mechanism. The following studies aim to provide a kind of
theoretic framework to model and explain the dynamics behind the scaling theoretic framework to model and explain the dynamics behind the scaling
properties of cities. Makse et al. (1995) proposed a diffusion-limited aggregation properties of cities. Makse et al. (1995) proposed a diffusion-limited aggregation
model to simulate the urban growth, through which the city morphology and model to simulate the urban growth, through which the city morphology and
area distribution can be better reproduced. This physical model is based on the area distribution can be better reproduced. This physical model is based on the
fact that development attracts further development in urban areas, through which fact that development attracts further development in urban areas, through which
scaling behavior of urban morphologies can be predicted. The scale free scaling behavior of urban morphologies can be predicted. The scale free
property of cites have been studied by Montello and Golledge (1999) and property of cites have been studied by Montello and Golledge (1999) and
Warren et al. (2002). Chen (2006) pointed out the scaling laws in cites, and that Warren et al. (2002). Chen (2006) pointed out the scaling laws in cites, and that
the hierarchical structures can be described by power distributions. Moreover, it the hierarchical structures can be described by power distributions. Moreover, it
shows that the scaling laws are originated from the self-organized criticality in shows that the scaling laws are originated from the self-organized criticality in
urban systems. urban systems.
Bettencourt et al. (2007) proposed that in the process of urbanization, diverse Bettencourt et al. (2007) proposed that in the process of urbanization, diverse
properties of cities such as patent production and personal income and so on properties of cities such as patent production and personal income and so on
demonstrate scaling property. They also predict the similarity between a city and demonstrate scaling property. They also predict the similarity between a city and
the organism in terms of scaling property. Moreover, the scaling relations are the organism in terms of scaling property. Moreover, the scaling relations are
explored and the results suggest that urban systems could collapse unless the explored and the results suggest that urban systems could collapse unless the
major innovation cycles are kept being generated at a continually accelerating major innovation cycles are kept being generated at a continually accelerating
rate as population grows. Bettencourt et al. (2010) further prove the rate as population grows. Bettencourt et al. (2010) further prove the
agglomeration nonlinearities in cities, which are verified by the super-linear agglomeration nonlinearities in cities, which are verified by the super-linear
power law scaling of most urban socioeconomic indicators with population size. power law scaling of most urban socioeconomic indicators with population size.
Thus, city size is not proportional to its properties such as wealth and crimes, Thus, city size is not proportional to its properties such as wealth and crimes,
and a new metric is built up to re-evaluate the U.S. cities. Moreover, Bettencourt and a new metric is built up to re-evaluate the U.S. cities. Moreover, Bettencourt
and West (2010) said that besides the scaling property of city characteristics, and West (2010) said that besides the scaling property of city characteristics,
cities are universal, and quantifiable and predictable. Shalizi (2011) re-analyzed cities are universal, and quantifiable and predictable. Shalizi (2011) re-analyzed
the properties of cities such as gross economic production and personal income. the properties of cities such as gross economic production and personal income.
Although some of these properties do not strictly follow power law scaling, Although some of these properties do not strictly follow power law scaling,
30 30
some other heavy-tailed distribution can fit them, which also indicates scaling. some other heavy-tailed distribution can fit them, which also indicates scaling.
Shalizi further noted the hierarchical structure of cities. Shalizi further noted the hierarchical structure of cities.
The scaling property and hierarchy in urban environment has been studied from The scaling property and hierarchy in urban environment has been studied from
different perspectives such as size, spatial structures and even economic factors. different perspectives such as size, spatial structures and even economic factors.
However, some of these findings are either implicit or limited to the theoretic However, some of these findings are either implicit or limited to the theoretic
level, and thus lack of ability to be applied in urban studies in a quantitative level, and thus lack of ability to be applied in urban studies in a quantitative
manner to some extent, although they are expected at the beginning. Next manner to some extent, although they are expected at the beginning. Next
section will focus on the applications in potential urban studies. section will focus on the applications in potential urban studies.
2.2. Related urban theories and studies 2.2. Related urban theories and studies
Urban study is born to be very broad and interdisciplinary because of the high Urban study is born to be very broad and interdisciplinary because of the high
degree of complexity of urban systems. At the beginning urban theories were degree of complexity of urban systems. At the beginning urban theories were
believed to be a collection of social theories ranging from economy and believed to be a collection of social theories ranging from economy and
psychology to religion (e.g. Simmel 1903). In more than one century of studies psychology to religion (e.g. Simmel 1903). In more than one century of studies
of urban phenomena, urban theories as well as studies have been developed in a of urban phenomena, urban theories as well as studies have been developed in a
creative way and some are very influential such as the “Chicago school” in creative way and some are very influential such as the “Chicago school” in
terms of sociology and economy. Nowadays, urban theories and studies have terms of sociology and economy. Nowadays, urban theories and studies have
grown much attention across many fields. This section will focus on the grown much attention across many fields. This section will focus on the
geographically related urban theories and studies such as fractal city (Batty and geographically related urban theories and studies such as fractal city (Batty and
Longley 1994, see previous section), space syntax (Hillier 1996 and 1997, see Longley 1994, see previous section), space syntax (Hillier 1996 and 1997, see
previous section), central place theory (Christaller 1933), concentric zone theory previous section), central place theory (Christaller 1933), concentric zone theory
(Part et al. 1925), sector theory (Hoyt 1939), multiple nuclei model (Harris and (Part et al. 1925), sector theory (Hoyt 1939), multiple nuclei model (Harris and
Ullman 1945) and urban realm models (Vance 1964). This section presents a Ullman 1945) and urban realm models (Vance 1964). This section presents a
brief review on the concepts of these famous models and some related studies. brief review on the concepts of these famous models and some related studies.
Central place theory Central place theory
Originally the central place theory is proposed by Walter Christaller (1933). In Originally the central place theory is proposed by Walter Christaller (1933). In
this theory, Christaller aimed to explain the spatial patterns and size of central this theory, Christaller aimed to explain the spatial patterns and size of central
places all over the world, i.e. how central places locate and affect each other and places all over the world, i.e. how central places locate and affect each other and
why they are shaped in different functions such as villages and towns. why they are shaped in different functions such as villages and towns.
Christaller first defined the central place as a place or a settlement where Christaller first defined the central place as a place or a settlement where
services and goods are provided to the people living around it. The services are services and goods are provided to the people living around it. The services are
grouped into low order (e.g. shopping store) and high order ones (e.g. school), grouped into low order (e.g. shopping store) and high order ones (e.g. school),
and high order services are normally surrounded by lower ones. Based on this and high order services are normally surrounded by lower ones. Based on this
definition, Christaller made some assumptions such as the population and definition, Christaller made some assumptions such as the population and
resources are evenly distributed and transportation fee costs the same in any resources are evenly distributed and transportation fee costs the same in any
directions. directions.
31 31
Figure 12: Two basic concepts threshold and range in central place model. Figure 12: Two basic concepts threshold and range in central place model.
(Sources: Vogeler 2012) (Sources: Vogeler 2012)
To illustrate this spatial theory, two concepts need to be introduced (Figure 12): To illustrate this spatial theory, two concepts need to be introduced (Figure 12):
(a) “threshold -- the minimum market area needed to bring a firm or city selling (a) “threshold -- the minimum market area needed to bring a firm or city selling
goods and services into existence and to keep it in business (b) range -- the goods and services into existence and to keep it in business (b) range -- the
average maximum distance people will travel to purchase goods and services”. average maximum distance people will travel to purchase goods and services”.
Figure 13: Hierarchical structure of center places (Source: Altonen 2011) Figure 13: Hierarchical structure of center places (Source: Altonen 2011)
Based on the above conditions, the central place model evolves in the following Based on the above conditions, the central place model evolves in the following
way: use the hexagonal region to enclose each minimum central place, and then way: use the hexagonal region to enclose each minimum central place, and then
every higher level that enclose the lower hexagonal regions are the higher levels every higher level that enclose the lower hexagonal regions are the higher levels
of central places, and so on and so forth. Suppose the minimum central place is of central places, and so on and so forth. Suppose the minimum central place is
hamlet, the next upper level is village, thereafter town and finally city. In Figure hamlet, the next upper level is village, thereafter town and finally city. In Figure
13 we can see that the spatial distribution of different levels of central places is 13 we can see that the spatial distribution of different levels of central places is
in a hierarchical structure. Most importantly, this structure is emerged from the in a hierarchical structure. Most importantly, this structure is emerged from the
32 32
bottom up. Dennis et al. (2002) carried out a practical experiment in the United bottom up. Dennis et al. (2002) carried out a practical experiment in the United
Kingdom based the central place model. Kingdom based the central place model.
Concentric zone theory Concentric zone theory
Part et al. (1925) developed the concentric zone model (Figure 14) with several Part et al. (1925) developed the concentric zone model (Figure 14) with several
zones which describes the spatial layout of city structure as a radioactive pattern. zones which describes the spatial layout of city structure as a radioactive pattern.
In this way, the mixed and complex social pattern was greatly simplified as In this way, the mixed and complex social pattern was greatly simplified as
concentric pattern. First of all, at the core of the model is the central business concentric pattern. First of all, at the core of the model is the central business
district (CBD) in yellow. Next to the core zone, the city grows in different land district (CBD) in yellow. Next to the core zone, the city grows in different land
use rings. The second zone (in pink) is characterized by industrial or wholesale use rings. The second zone (in pink) is characterized by industrial or wholesale
manufacturing, sometimes are called factory zone. The next three zones are all manufacturing, sometimes are called factory zone. The next three zones are all
residential areas: the third zone (in red) is mainly filled with lowest income residential areas: the third zone (in red) is mainly filled with lowest income
residential, including some slums and sometimes called transition zone; the residential, including some slums and sometimes called transition zone; the
fourth zone (in blue) is the middle-class residential area, where most of the fourth zone (in blue) is the middle-class residential area, where most of the
people are blue-collar class; the last residential ring is the fifth zone (in green), people are blue-collar class; the last residential ring is the fifth zone (in green),
which is filled with higher-income families. On the periphery of this model lays which is filled with higher-income families. On the periphery of this model lays
the commuter zone, which is not included in the explanation of this model by the commuter zone, which is not included in the explanation of this model by
some literatures. some literatures.
Figure 14: Concentric zone model (Source: Originally in Haggett 1965, see also Figure 14: Concentric zone model (Source: Originally in Haggett 1965, see also
Garner 1968). Garner 1968).
Although there are some differences in the descriptions of this model in different Although there are some differences in the descriptions of this model in different
literatures, the main structure keeps the same. One of the most striking literatures, the main structure keeps the same. One of the most striking
characteristics of this model is that the structure does not vary for different cities. characteristics of this model is that the structure does not vary for different cities.
The main city that Park et al. worked on with this model is Chicago, which The main city that Park et al. worked on with this model is Chicago, which
results in limitations.. For example, because of some barriers, some zones rarely results in limitations.. For example, because of some barriers, some zones rarely
are complete although they do exist, and thus some real city maps do not match are complete although they do exist, and thus some real city maps do not match
this model well. this model well.
33 33
Sector theory Sector theory
Homer Hoyt conceived the sector model (Hoyt model) (Figure 15) in 1939. In Homer Hoyt conceived the sector model (Hoyt model) (Figure 15) in 1939. In
essence, the sector model is the modified/improved version of the concentric essence, the sector model is the modified/improved version of the concentric
zone model. When Hoyt proposed the sector model, the existence of CBD was zone model. When Hoyt proposed the sector model, the existence of CBD was
accepted. However, Hoyt realized that some factors, especially transportation accepted. However, Hoyt realized that some factors, especially transportation
system, make the urban area grow along these routes or places, and thus the city system, make the urban area grow along these routes or places, and thus the city
tends to shape into the wedge-like or sector patterns. In Figure 15 we can see tends to shape into the wedge-like or sector patterns. In Figure 15 we can see
that the zones are in coincident with the ones in the concentric zone model, but that the zones are in coincident with the ones in the concentric zone model, but
distributed in different forms. Moreover, compared with the concentric zone distributed in different forms. Moreover, compared with the concentric zone
model, the spatial patterns become more diverse in different cities. The model, the spatial patterns become more diverse in different cities. The
advantage of this model is the outward progression of urban growth. advantage of this model is the outward progression of urban growth.
Figure 15: A basic version of the Sector model (Source: Originally in Haggett Figure 15: A basic version of the Sector model (Source: Originally in Haggett
1965, see also Garner 1968). 1965, see also Garner 1968).
Some efforts have been dedicated to urban studies based on the sector model, Some efforts have been dedicated to urban studies based on the sector model,
e.g. Beauregard (2007) reviewed Hoyt’s contributions in terms of the theory as e.g. Beauregard (2007) reviewed Hoyt’s contributions in terms of the theory as
well as the model. However, the patterns of modern cities are different from well as the model. However, the patterns of modern cities are different from
what Hoyt supposed. In Hoyt’s model, transportation routes play an import role what Hoyt supposed. In Hoyt’s model, transportation routes play an import role
in shaping urban patterns and basically the high-rent sector expand around them; in shaping urban patterns and basically the high-rent sector expand around them;
today such area become low-rent, which is in contrary with Hoyt’s theory. today such area become low-rent, which is in contrary with Hoyt’s theory.
Multiple nuclei model Multiple nuclei model
Harris and Ullman (1945) proposed the multiple nuclei model (Figure 16). As Harris and Ullman (1945) proposed the multiple nuclei model (Figure 16). As
the name implies, CBD is not the only core of the city any more. It means that the name implies, CBD is not the only core of the city any more. It means that
the cities can develop at different points with equal density, i.e. the land use the cities can develop at different points with equal density, i.e. the land use
pattern can be built around several discrete nuclei (Harris and Ullman 1945). pattern can be built around several discrete nuclei (Harris and Ullman 1945).
Compared with the concentric zone model and sector model, the multiple nuclei Compared with the concentric zone model and sector model, the multiple nuclei
model takes more factors into account in the formalization of urban structure, model takes more factors into account in the formalization of urban structure,
e.g. the importance of location and aggregation effect. e.g. the importance of location and aggregation effect.
34 34
Figure 16: Multiple nuclei model (Source: Originally in Haggett 1965, see also Figure 16: Multiple nuclei model (Source: Originally in Haggett 1965, see also
Garner 1968). Garner 1968).
In this model, Harris and Ullman successfully integrated diverse factors together In this model, Harris and Ullman successfully integrated diverse factors together
and made it workable in many fields and thus supported by many urban and made it workable in many fields and thus supported by many urban
scholars. However, the model cannot be adaptive to all kinds of complex cities scholars. However, the model cannot be adaptive to all kinds of complex cities
in the world. In addition, there are some criticisms such as the traditional spatial in the world. In addition, there are some criticisms such as the traditional spatial
separation of workplace and home is no longer appropriate today. separation of workplace and home is no longer appropriate today.
Urban Realms model Urban Realms model
Vance (1964) suggested the urban realms model (Figure 17). In this model, Vance (1964) suggested the urban realms model (Figure 17). In this model,
Vance pointed out the cities had become non-centric. Instead of multiple nuclei Vance pointed out the cities had become non-centric. Instead of multiple nuclei
model, some areas of the cities become self-contained and independent of the model, some areas of the cities become self-contained and independent of the
CBD, which we call urban realms; in fact, these realms are becoming new CBD. CBD, which we call urban realms; in fact, these realms are becoming new CBD.
Although there exists a central city (Figure 17 in yellow), the whole structure of Although there exists a central city (Figure 17 in yellow), the whole structure of
the spatial patterns changed. the spatial patterns changed.
35 35
Figure 17: The urban realms model, which includes a central downtown Figure 17: The urban realms model, which includes a central downtown
(Source: Hartshorn and Muller 1989). (Source: Hartshorn and Muller 1989).
These urban theories and models give deep insights into the regularities and These urban theories and models give deep insights into the regularities and
laws of urban system and space from different perspectives, which deeply affect laws of urban system and space from different perspectives, which deeply affect
the people’s understanding of city. As a conclusion to the brief of the above the people’s understanding of city. As a conclusion to the brief of the above
geographically related urban theories, hierarchical structure and how each part at geographically related urban theories, hierarchical structure and how each part at
different hierarchy functions are of the most importance. However, these different hierarchy functions are of the most importance. However, these
theories as well as workable models keep evolving as the previous assumptions theories as well as workable models keep evolving as the previous assumptions
and conditions changed with the development of society, e.g. the modern and conditions changed with the development of society, e.g. the modern
information technology plays a more import role and makes more profound information technology plays a more import role and makes more profound
impact in shaping urban system (Graham and Marvin 1996). The research impact in shaping urban system (Graham and Marvin 1996). The research
towards how to reduce and understand the high degree of complexity of urban towards how to reduce and understand the high degree of complexity of urban
systems will continue to move forward using new theories and methods. In this systems will continue to move forward using new theories and methods. In this
thesis, the statistical methods are used to differentiate the hierarchical structures thesis, the statistical methods are used to differentiate the hierarchical structures
of geographic phenomena and then explore the behind implications for different of geographic phenomena and then explore the behind implications for different
urban studies. urban studies.
Moreover, besides the above urban theories and models, remote sensing has Moreover, besides the above urban theories and models, remote sensing has
been widely used in addressing urban issues such as urban growth and spatial been widely used in addressing urban issues such as urban growth and spatial
patterns, and urban issues have become a research topic in remote sensing patterns, and urban issues have become a research topic in remote sensing
(Rashed and Jürgens 2010). In this same book, methods and techniques are (Rashed and Jürgens 2010). In this same book, methods and techniques are
provided based on remote sensing data to tackle various urban phenomena. For provided based on remote sensing data to tackle various urban phenomena. For
example, Maktav and Sunar (chapter 15 in this book by Rashed and Jürgens, example, Maktav and Sunar (chapter 15 in this book by Rashed and Jürgens,
2010) use remotely sensed data sets to detect land use change in developing 2010) use remotely sensed data sets to detect land use change in developing
countries; Sutton, Taylor and Elvidge (chapter 17 in this book by Rashed and countries; Sutton, Taylor and Elvidge (chapter 17 in this book by Rashed and
36 36
Jürgens, 2010) use remote sensing imagery to charaterize urban poplulation. Jürgens, 2010) use remote sensing imagery to charaterize urban poplulation.
Urban sprawl is also analyzed using remote sensing techniques. Urban sprawl is also analyzed using remote sensing techniques.
2.3. Applications in urban studies 2.3. Applications in urban studies
The notion of scaling property (fractal, scale invariance/free) is a universal law The notion of scaling property (fractal, scale invariance/free) is a universal law
for many fields and to some extent it has significant impacts on some related for many fields and to some extent it has significant impacts on some related
studies. It has also been widely applied in many fields such as urban science studies. It has also been widely applied in many fields such as urban science
(Batty and Longley 1994). Actually, the scaling property can be applied in both (Batty and Longley 1994). Actually, the scaling property can be applied in both
geographically related cases and non-geographically related cases (e.g. Sheridan geographically related cases and non-geographically related cases (e.g. Sheridan
et al. 2010). et al. 2010).
However, the applications in urban studies are just at the beginning, and thus the However, the applications in urban studies are just at the beginning, and thus the
applications are worth more exploring. In the case of urban studies, this thesis applications are worth more exploring. In the case of urban studies, this thesis
focuses on the applications of the geographic phenomena rather than non- focuses on the applications of the geographic phenomena rather than non-
geographic ones. In the mean time, the applications based on the scaling geographic ones. In the mean time, the applications based on the scaling
property is very promising and wide, and thus this thesis is not trying to exhaust property is very promising and wide, and thus this thesis is not trying to exhaust
all kinds of the applications of scaling property in urban studies; rather, some of all kinds of the applications of scaling property in urban studies; rather, some of
the most common and successful examples are listed. the most common and successful examples are listed.
The essence of the application of scaling property is the identifiable hierarchies The essence of the application of scaling property is the identifiable hierarchies
or irregularities. Take complex network as an example, scaling property is in or irregularities. Take complex network as an example, scaling property is in
two folders: first, there is no limitation of the size of a variable, e.g. the two folders: first, there is no limitation of the size of a variable, e.g. the
connectivity of nodes in complex networks is extremely diverse; second, the connectivity of nodes in complex networks is extremely diverse; second, the
values follow a heavy-tailed distribution. In such a situation, the network values follow a heavy-tailed distribution. In such a situation, the network
possesses some special features, e.g. few hubs with massive connected nodes, possesses some special features, e.g. few hubs with massive connected nodes,
robustness to random attack and fragileness to coordinated attack. robustness to random attack and fragileness to coordinated attack.
Urban planning: Batty (2008) reviewed a lot of studies on the scaling and fractal Urban planning: Batty (2008) reviewed a lot of studies on the scaling and fractal
cities. He also pointed out a new way for urban planning, which is to combine cities. He also pointed out a new way for urban planning, which is to combine
the scaling property of cities together with other urban regularities to form a the scaling property of cities together with other urban regularities to form a
bottom-up planning strategy, instead of the past urban planning policies. bottom-up planning strategy, instead of the past urban planning policies.
However, this application is just at the beginning. However, this application is just at the beginning.
Anti-epidemic: The main purpose of the anti-epidemic measures is to reduce the Anti-epidemic: The main purpose of the anti-epidemic measures is to reduce the
circulation of influenza viruses in the air space. People flow between cities and circulation of influenza viruses in the air space. People flow between cities and
form a connected network. By calculating the accumulated flows of cities and form a connected network. By calculating the accumulated flows of cities and
measure them using graph theory, it is found that there exist very few cities that measure them using graph theory, it is found that there exist very few cities that
have huge flow volumes, acting like hubs. Therefore, the best strategy for anti- have huge flow volumes, acting like hubs. Therefore, the best strategy for anti-
epidemic is to take good control of such hub cities. Pastor and Vespignani epidemic is to take good control of such hub cities. Pastor and Vespignani
(2001) discussed on the epidemic spreading on scale free network using the (2001) discussed on the epidemic spreading on scale free network using the
above strategy. above strategy.
37 37
Generalization: According to the hierarchical structures (scaling property) of the Generalization: According to the hierarchical structures (scaling property) of the
centralities (e.g. degree, closeness and betweenness) of urban road networks, centralities (e.g. degree, closeness and betweenness) of urban road networks,
Jiang and Claramunt (2004) proposed a simple model to generalize the road Jiang and Claramunt (2004) proposed a simple model to generalize the road
network by selecting the important roads. Jiang et al. (2011) extended this network by selecting the important roads. Jiang et al. (2011) extended this
generalization work to the level of country and river network with the head/tail generalization work to the level of country and river network with the head/tail
division rule. division rule.
Transportation: Jiang et al. (2008) predicted the traffic flow based on the Transportation: Jiang et al. (2008) predicted the traffic flow based on the
scaling property of urban road network. They found that there is a strong scaling property of urban road network. They found that there is a strong
correlation between degree of connectivity of roads and traffic flow, based on correlation between degree of connectivity of roads and traffic flow, based on
which the traffic flow for the road network can be predicted. Jiang (2009) also which the traffic flow for the road network can be predicted. Jiang (2009) also
mentioned that the application areas include way finding and navigation and mentioned that the application areas include way finding and navigation and
pedestrian modeling. pedestrian modeling.
38 38
3. Experimental design 3. Experimental design
In this study, data-intensive geo-computation is a key step, and thus, the data In this study, data-intensive geo-computation is a key step, and thus, the data
sets and data processing play an important role. This chapter presents the data sets and data processing play an important role. This chapter presents the data
structures, algorithms and software design models, through which the functions structures, algorithms and software design models, through which the functions
of data processing can be implemented as a framework. These functions also of data processing can be implemented as a framework. These functions also
include how to convert data sets and generated geographic representations into include how to convert data sets and generated geographic representations into
ArcGIS data format for visualization. Two main data sets are involved: ArcGIS data format for visualization. Two main data sets are involved:
OpenStreetMap (OSM) data and floating car data (FCD), both are a type of VGI OpenStreetMap (OSM) data and floating car data (FCD), both are a type of VGI
data because they are collected through the non-professional volunteered data because they are collected through the non-professional volunteered
individuals from the bottom-up. These contents are not fully covered in the individuals from the bottom-up. These contents are not fully covered in the
listed papers because of limited space. listed papers because of limited space.
3.1. Description of the study areas 3.1. Description of the study areas
The data sets in this thesis are mainly the VGI data, and the most important part The data sets in this thesis are mainly the VGI data, and the most important part
is OSM data. Therefore, the selection of study area relies heavily on the is OSM data. Therefore, the selection of study area relies heavily on the
availability and usability of OSM data in terms of data quality, i.e., accuracy availability and usability of OSM data in terms of data quality, i.e., accuracy
completeness and consistency (see section 2.1.4). In this thesis, the study area completeness and consistency (see section 2.1.4). In this thesis, the study area
where the OSM data sets are used includes Texas, USA and twenty-nine where the OSM data sets are used includes Texas, USA and twenty-nine
European countries (Figure 18). European countries (Figure 18).
(a) (b) (a) (b)
Figure 18: The selected study area where the OSM data are used (a) the largest Figure 18: The selected study area where the OSM data are used (a) the largest
inland state, Texas, USA and (b) twenty-nine European countries. inland state, Texas, USA and (b) twenty-nine European countries.
The reason why the above study area was selected to use OSM is based The reason why the above study area was selected to use OSM is based
primarily on the fact that there is an imbalance of the OSM data in terms of the primarily on the fact that there is an imbalance of the OSM data in terms of the
data quality in different areas. Visually and according to the sizes of the data data quality in different areas. Visually and according to the sizes of the data
sets, it is obvious that the quality OSM data in European countries and the U.S. sets, it is obvious that the quality OSM data in European countries and the U.S.
are much better than those from the other areas, such as Africa and Asia. First, are much better than those from the other areas, such as Africa and Asia. First,
39 39
OSM originated in Europe, and it is expected that the OSM project will be more OSM originated in Europe, and it is expected that the OSM project will be more
popular in European countries. More significantly, because of the openness and popular in European countries. More significantly, because of the openness and
popularity of GPS techniques/devices in these countries, it is easy for European popularity of GPS techniques/devices in these countries, it is easy for European
countries to use these advantages to develop an OSM project. As a good countries to use these advantages to develop an OSM project. As a good
example, in addition to the contributions of volunteers, some extremely large example, in addition to the contributions of volunteers, some extremely large
GIS-related companies have donated vast amount of spatial data to the OSM GIS-related companies have donated vast amount of spatial data to the OSM
projects. For example, the US Bureau of Census donated all roads projects. For example, the US Bureau of Census donated all roads
(Topologically Integrated Geographic Encoding and Referencing (TIGER)) in (Topologically Integrated Geographic Encoding and Referencing (TIGER)) in
2007 and 2008 to the OSM databases; the Yahoo! Company allowed OSM 2007 and 2008 to the OSM databases; the Yahoo! Company allowed OSM
project to use its aerial photography as a backdrop for map production for free project to use its aerial photography as a backdrop for map production for free
(Coast 2006); Automotive Navigation Data (AND) donated a complete road data (Coast 2006); Automotive Navigation Data (AND) donated a complete road data
set for the Netherlands, and trunk road data for India and China to the project set for the Netherlands, and trunk road data for India and China to the project
(Coast 2007), etc. (Coast 2007), etc.
As stated at the end of section 2.1.4, the data quality is analyzed to ensure that it As stated at the end of section 2.1.4, the data quality is analyzed to ensure that it
is high enough in the European areas. Moreover, data quality is evaluated is high enough in the European areas. Moreover, data quality is evaluated
according to the degree of how data satisfies the given task. According to the according to the degree of how data satisfies the given task. According to the
above study, the quality of OSM data is sufficient to the applications in this above study, the quality of OSM data is sufficient to the applications in this
thesis. thesis.
Wuhan city Wuhan city
Figure 19: Study area of Wuhan city, China where FCD data is applied (Note: Figure 19: Study area of Wuhan city, China where FCD data is applied (Note:
the image is snapshots from Google Maps). the image is snapshots from Google Maps).
Another study area is the large city of Wuhan, Hubei province, China (Figure Another study area is the large city of Wuhan, Hubei province, China (Figure
19). The massive floating car data (FCD) in this city are shared by one of the 19). The massive floating car data (FCD) in this city are shared by one of the
papers in the special issue on data-intensive geospatial computing with papers in the special issue on data-intensive geospatial computing with
International Journal of Geographical Information Science (IJGIS). Although International Journal of Geographical Information Science (IJGIS). Although
the GPS devices become cheaper and the modern information technologies the GPS devices become cheaper and the modern information technologies
make it feasible to collect such mobile data with pretty low price, it is hard to make it feasible to collect such mobile data with pretty low price, it is hard to
put our hands on such data sets because of privacy and security reasons, and put our hands on such data sets because of privacy and security reasons, and
thus the usage of such data sets should be regulated by relevant rules. This kind thus the usage of such data sets should be regulated by relevant rules. This kind
40 40
of data sets is of great value in exploring urban and human mobility patterns. To of data sets is of great value in exploring urban and human mobility patterns. To
conclude, the study areas are diverse as they range from Europe, North America conclude, the study areas are diverse as they range from Europe, North America
and Asia. and Asia.
3.2. Processing road networks 3.2. Processing road networks
Road networks are the underlying dataset of this thesis. In this section, we focus Road networks are the underlying dataset of this thesis. In this section, we focus
on the analysis and extraction of road network from OSM data. We download on the analysis and extraction of road network from OSM data. We download
the OSM data directly from www.CloudMade.com website for convenience, the OSM data directly from www.CloudMade.com website for convenience,
because planet OSM XML data has been split into pieces according to because planet OSM XML data has been split into pieces according to
continents as well as countries and regions. This website also provides web continents as well as countries and regions. This website also provides web
mapping service and navigation service based on OSM data. It is created by mapping service and navigation service based on OSM data. It is created by
Steve Coast, who is also the founder of OSM project. The data sets on the Steve Coast, who is also the founder of OSM project. The data sets on the
website are updated regularly (one time per week). website are updated regularly (one time per week).
3.2.1. Highway extraction 3.2.1. Highway extraction
Highways are an important part in the OSM data. As a general rule, the highway Highways are an important part in the OSM data. As a general rule, the highway
tag is the primary (often the only) tag used for highway, of which the tag is the primary (often the only) tag used for highway, of which the
corresponding values are the description of the importance of the highway. corresponding values are the description of the importance of the highway.
Although the description is vague, it does not affect the usability of the data sets. Although the description is vague, it does not affect the usability of the data sets.
Some tag statistics (e.g., http://tagstat.hypercube.telascience.org) are available Some tag statistics (e.g., http://tagstat.hypercube.telascience.org) are available
online. But it is not enough for highway analysis. Thus, before extracting the online. But it is not enough for highway analysis. Thus, before extracting the
highways from OSM data, we carried out some statistics on the tags by reading highways from OSM data, we carried out some statistics on the tags by reading
through the OSM data line by line. This experiment was done with all the through the OSM data line by line. This experiment was done with all the
European OSM data (around 40 gigabytes, downloaded in 2009). We get European OSM data (around 40 gigabytes, downloaded in 2009). We get
19,495,156 ways in total, of which 10,963,307 highways and 482 unique 19,495,156 ways in total, of which 10,963,307 highways and 482 unique
highway types as shown in Table 1. Except tertiary_link, the twenty-six highway types as shown in Table 1. Except tertiary_link, the twenty-six
different highway unique types which the corresponding counts are all greater different highway unique types which the corresponding counts are all greater
than 1000. The levels of highway are obtained according to the normal tag usage than 1000. The levels of highway are obtained according to the normal tag usage
of OSM highway. And the two transport modes, i.e., drive and walk, are marked of OSM highway. And the two transport modes, i.e., drive and walk, are marked
with “” or “” to show whether current type of highway is accessible for car with “” or “” to show whether current type of highway is accessible for car
or pedestrian. The standards and the levels are concluded according to the or pedestrian. The standards and the levels are concluded according to the
comments from online Wikipedia OSM map features available at comments from online Wikipedia OSM map features available at
http://wiki.openstreetmap.org/wiki/Map_Features#Highway. http://wiki.openstreetmap.org/wiki/Map_Features#Highway.
41 41
Table 1: Types of highways and the count (total 482) Table 1: Types of highways and the count (total 482)
Level Highway types Count Drive Walk Level Highway types Count Drive Walk
1 Motorway 94,725   1 Motorway 94,725  
2 Motorway_link 87,373   2 Motorway_link 87,373  
3 Trunk 97,573   3 Trunk 97,573  
4 Trunk_link 47,719   4 Trunk_link 47,719  
5 Primary 305,341   5 Primary 305,341  
6 Primary_link 36,800   6 Primary_link 36,800  
7 Secondary 552,149   7 Secondary 552,149  
8 Secondary_link 4,196   8 Secondary_link 4,196  
9 Tertiary 709,754   9 Tertiary 709,754  
10 Tertiary_link 116   10 Tertiary_link 116  
11 Residential 3,389,694   11 Residential 3,389,694  
12 Unclassified 149,9826   12 Unclassified 149,9826  
13 Road 187,525   13 Road 187,525  
14 living_street 94,460   14 living_street 94,460  
15 Service 847,179   15 Service 847,179  
16 Track 1,234,278   16 Track 1,234,278  
17 Pedestrian 123,569   17 Pedestrian 123,569  
18 Path 310,685   18 Path 310,685  
19 Cycleway 270,328   19 Cycleway 270,328  
20 Footway 930,008   20 Footway 930,008  
21 Byway 1,693   21 Byway 1,693  
22 Steps 95,095   22 Steps 95,095  
23 Bridleway 18,999   23 Bridleway 18,999  
24 Construction 9,899   24 Construction 9,899  
25 Unsurfaced 5,413   25 Unsurfaced 5,413  
26 Platform 1,068   26 Platform 1,068  
… … … …
This statistics is a byproduct in the process of extraction. It provides a complete This statistics is a byproduct in the process of extraction. It provides a complete
view to the tag schema mentioned above. At the same time, it is also a test of the view to the tag schema mentioned above. At the same time, it is also a test of the
data completeness. More than that, the result is a contribution to potential further data completeness. More than that, the result is a contribution to potential further
research. During the process of extracting, we first read through the OSM file to research. During the process of extracting, we first read through the OSM file to
get the all of the highway IDs and the node IDs that referred by them, and then get the all of the highway IDs and the node IDs that referred by them, and then
we sort both ID lists. After that, we create X array and Y array according to the we sort both ID lists. After that, we create X array and Y array according to the
size of highway node ID list. Then we read the OSM file from the beginning size of highway node ID list. Then we read the OSM file from the beginning
again. When read the nodes, if the node ID is in the highway node ID list, then again. When read the nodes, if the node ID is in the highway node ID list, then
coordinates x and y of the node are stored to X array and Y array. Here we use coordinates x and y of the node are stored to X array and Y array. Here we use
the binary search method to find the ID position in the array, which can the binary search method to find the ID position in the array, which can
dramatically improve the performance. When finished reading the nodes, we dramatically improve the performance. When finished reading the nodes, we
read the ways. We also use binary search to check whether the way ID is in the read the ways. We also use binary search to check whether the way ID is in the
highway ID list. The obtained highway is written to XML file using self-defined highway ID list. The obtained highway is written to XML file using self-defined
schema. This process can be described by the pseudo codes shown below: schema. This process can be described by the pseudo codes shown below:
42 42
Algorithm I: Extracting highways from OSM XML file Algorithm I: Extracting highways from OSM XML file
---------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------
Function ReadNodeHighwayIDs (OSM File) Function ReadNodeHighwayIDs (OSM File)
While (Not EndOfFile) While (Not EndOfFile)
Readline (); Readline ();
If (Highway) Then If (Highway) Then
Record highway IDs Record highway IDs
Record highway node IDs Record highway node IDs
Sort highway IDs; Sort highway IDs;
Sort highway node IDs; Sort highway node IDs;
Function ReadHighway (OSM file, Sorted highway IDs, Sorted highway node IDs) Function ReadHighway (OSM file, Sorted highway IDs, Sorted highway node IDs)
Create double array of X and Y according to the size of highway Node IDs Create double array of X and Y according to the size of highway Node IDs
While (Not EndOfFile) While (Not EndOfFile)
Readline (); Readline ();
If (Node) Then If (Node) Then
If (Binary Search Node ID in Sort highway Node IDs) Then If (Binary Search Node ID in Sort highway Node IDs) Then
Record to X array and Y array Record to X array and Y array
Else If (Way) Then Else If (Way) Then
If (Binary Search way ID in Sorted highway IDs) Then If (Binary Search way ID in Sorted highway IDs) Then
Binary Search node ID and get X and Y coordinates Binary Search node ID and get X and Y coordinates
Transform routing mode and write to output file Transform routing mode and write to output file
<highway id="61431" points="5" length="742.17"/> <highway id="61431" points="5" length="742.17"/>

<vt x="1909587.90340671" y="8517596.45565811"/> <vt x="1909587.90340671" y="8517596.45565811"/>
<vt x="1909555.99924065" y="8517422.07682731"/> <vt x="1909555.99924065" y="8517422.07682731"/>
<vt x="1909503.55662854" y="8517287.15801793"/> <vt x="1909503.55662854" y="8517287.15801793"/>
<vt x="1909417.5623219" y="8517105.90807998"/> <vt x="1909417.5623219" y="8517105.90807998"/>
<vt x="1909389.120192" y="8516888.22796572"/> <vt x="1909389.120192" y="8516888.22796572"/>
<tag k="name" v="Strömsbrovägen"/> <tag k="name" v="Strömsbrovägen"/>
<tag k="highway" v="secondary"/> <tag k="highway" v="secondary"/>
<tag k="car" v="yes"/> <tag k="car" v="yes"/>
<tag k="walk" v="yes"/> <tag k="walk" v="yes"/>
</highway> </highway>
Figure 20: Pseudo codes and example of extracted highway Figure 20: Pseudo codes and example of extracted highway
In the extracted highway, we only keep the name and highway type for further In the extracted highway, we only keep the name and highway type for further
use, rather than copy all the tags of the original way, which can reduce the size use, rather than copy all the tags of the original way, which can reduce the size
of file dramatically. Meanwhile, we transform the original latitude and longitude of file dramatically. Meanwhile, we transform the original latitude and longitude
into World Mercator projection coordinates. Noticeably, the extracted file into World Mercator projection coordinates. Noticeably, the extracted file
(Figure 20) is also organized in XML format, which is self-explanatory and (Figure 20) is also organized in XML format, which is self-explanatory and
flexible. flexible.
43 43
3.2.2. Topological correction 3.2.2. Topological correction
The extracted highways are named or unnamed roads which are stored in the The extracted highways are named or unnamed roads which are stored in the
format of a series of ordered nodes with geometric coordinates. It resembles the format of a series of ordered nodes with geometric coordinates. It resembles the
concept of polyline in ArcGIS and other conventional GIS datasets. However, concept of polyline in ArcGIS and other conventional GIS datasets. However,
the topological relationships between highways are implicit and need to be the topological relationships between highways are implicit and need to be
checked and corrected. The objective is to build up a “clean” road network, checked and corrected. The objective is to build up a “clean” road network,
which is similar to the Coverage data used in ArcGIS without intersections and which is similar to the Coverage data used in ArcGIS without intersections and
overlaps between arcs. Generally, there are two kinds of topological errors: first overlaps between arcs. Generally, there are two kinds of topological errors: first
the self-overlap and self-intersection, and second the overlap and intersection the self-overlap and self-intersection, and second the overlap and intersection
between highways. To eliminate these topological errors, two kinds of methods between highways. To eliminate these topological errors, two kinds of methods
can be adopted. First is to convert all the highways into shape file and use can be adopted. First is to convert all the highways into shape file and use
ArcTools to transfer shape file to coverage format. This method is relatively ArcTools to transfer shape file to coverage format. This method is relatively
straightforward and simple. But the limitation is that the size of shape file is up straightforward and simple. But the limitation is that the size of shape file is up
to 2 gigabytes. Thus, this method does not fit into massive OSM data. The to 2 gigabytes. Thus, this method does not fit into massive OSM data. The
second method can be divided into the following three steps: second method can be divided into the following three steps:
a) Check self-intersection and self-overlap of highway. a) Check self-intersection and self-overlap of highway.
The computational algorithm to calculate the self-intersection and self-overlap The computational algorithm to calculate the self-intersection and self-overlap
of polyline is complicated, thus it is hard and time-consuming to be of polyline is complicated, thus it is hard and time-consuming to be
implemented. For the sake of efficiency and simplicity, we use the ArcGIS implemented. For the sake of efficiency and simplicity, we use the ArcGIS
Object method ITopologicalOperator.Simplify to achieve this goal. In this way, Object method ITopologicalOperator.Simplify to achieve this goal. In this way,
the road networks become topologically correct, and it proves to be performed the road networks become topologically correct, and it proves to be performed
well. For more details of this ArcObject method, it can be referred to the ESRI well. For more details of this ArcObject method, it can be referred to the ESRI
online resource (ESRI 2012). online resource (ESRI 2012).
First we read highway nodes (a series of ordered vertices with coordinates) and First we read highway nodes (a series of ordered vertices with coordinates) and
convert them into ArcGIS polyline objects. After that, we use the polyline convert them into ArcGIS polyline objects. After that, we use the polyline
simplify method to correct the errors of self-intersection and self-overlap. simplify method to correct the errors of self-intersection and self-overlap.
Finally, we write the result to the self-defined XML file again. For example, if Finally, we write the result to the self-defined XML file again. For example, if
we get more than one parts from the polyline, then we treat the different part as we get more than one parts from the polyline, then we treat the different part as
new highway, while the original attributes are kept for later use. Here we name new highway, while the original attributes are kept for later use. Here we name
the result of processed highway data as simplified highway to show the the result of processed highway data as simplified highway to show the
distinction. The algorithm can be described using the following pseudo codes: distinction. The algorithm can be described using the following pseudo codes:
Algorithm II: Check self-intersection and self-overlap of highway Algorithm II: Check self-intersection and self-overlap of highway
---------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------
Function ReadHighwayIDs (Extracted highways) Function ReadHighwayIDs (Extracted highways)
For each highway For each highway
Convert into ArcGIS polyline Convert into ArcGIS polyline
(Polyline as ITopologicalOperator).Simplify () (Polyline as ITopologicalOperator).Simplify ()
For each polyline part For each polyline part
Inherits tag attributes from original highway Inherits tag attributes from original highway
Write to output file as new highway Write to output file as new highway
44 44
b) Check overlap between simplified highways. b) Check overlap between simplified highways.
In table 1, there are hierarchies which show the importance of highways. Thus In table 1, there are hierarchies which show the importance of highways. Thus
we rank the highway according to the hierarchy at first. Then we start from the we rank the highway according to the hierarchy at first. Then we start from the
“lowest” highway to check the overlap. If any other highway overlaps the “lowest” highway to check the overlap. If any other highway overlaps the
current highway, then the current highway is clipped at the current position. The current highway, then the current highway is clipped at the current position. The
purpose that we rank highways by their importance and start from the lowest purpose that we rank highways by their importance and start from the lowest
one is to use major highway to clip the minor one. In this way, the more one is to use major highway to clip the minor one. In this way, the more
important parts of major highway can be kept. As shown in Figure 21, the blue important parts of major highway can be kept. As shown in Figure 21, the blue
one is the minor highway, while the red is the major one. one is the minor highway, while the red is the major one.
① ① ① ①
② ② ③ ② ② ③
Figure 21: Clip overlapped highway. Figure 21: Clip overlapped highway.
The basic idea to implement the clip function is: for each line segment (made up The basic idea to implement the clip function is: for each line segment (made up
of two consecutive points of current highway), if there are consecutive points of of two consecutive points of current highway), if there are consecutive points of
any other highway on the line segment, then use the other highway to clip the any other highway on the line segment, then use the other highway to clip the
highway at these points. As we can see, on the left of Figure 21, there are two highway at these points. As we can see, on the left of Figure 21, there are two
highways: ① and ②. After overlap operation, on the right ② and ③ are equal highways: ① and ②. After overlap operation, on the right ② and ③ are equal
to ② on the left, while highway ① stays the same. To tell if a point is on a line to ② on the left, while highway ① stays the same. To tell if a point is on a line
segment, we define the line segment, e.g. point1 (x1, y1) and point2 (x2, y2), as segment, we define the line segment, e.g. point1 (x1, y1) and point2 (x2, y2), as
vector P1P2. For any other point P3 (x3, y3), we can get another vector P1P3. vector P1P2. For any other point P3 (x3, y3), we can get another vector P1P3.
Then we get the cross product of vector: Then we get the cross product of vector:
P1P2 ∗ P1P3 = (X2 − X1) ∗ (Y3 − Y1) − (X3 − X1) ∗ (Y2 − Y1) (1) P1P2 ∗ P1P3 = (X2 − X1) ∗ (Y3 − Y1) − (X3 − X1) ∗ (Y2 − Y1) (1)
If the cross product is equal to 0, then P1, P2 and P3 are collinear. Furthermore, If the cross product is equal to 0, then P1, P2 and P3 are collinear. Furthermore,
if the following equation (2) is satisfied, then point P3 is on line segment P1P2. if the following equation (2) is satisfied, then point P3 is on line segment P1P2.
X3 ≥ 𝑀𝑖𝑛 (𝑋1, 𝑋2) & 𝑋3 ≤ 𝑀𝑎𝑥 (𝑋1, 𝑋2) & 𝑌3 ≥ 𝑀𝑖𝑛 (𝑌1, 𝑌2) &𝑌3 ≤ 𝑀𝑎𝑥 (𝑌1, 𝑌2) (2) X3 ≥ 𝑀𝑖𝑛 (𝑋1, 𝑋2) & 𝑋3 ≤ 𝑀𝑎𝑥 (𝑋1, 𝑋2) & 𝑌3 ≥ 𝑀𝑖𝑛 (𝑌1, 𝑌2) &𝑌3 ≤ 𝑀𝑎𝑥 (𝑌1, 𝑌2) (2)
Algorithm III: Check overlap between simplified highways Algorithm III: Check overlap between simplified highways
---------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------
Function CheckOverlap (Current Highway) Function CheckOverlap (Current Highway)
For each line segment in current highway For each line segment in current highway
If (any other highway overlaps) Then If (any other highway overlaps) Then
Mark at overlap positions Mark at overlap positions
Split current highway at each marked position Split current highway at each marked position
Inherits tag attributes from current highway Inherits tag attributes from current highway
45 45
c) Check intersection between simplified and non-overlap highways. c) Check intersection between simplified and non-overlap highways.
There are intersected highways without nodes (Figure 22). For each line There are intersected highways without nodes (Figure 22). For each line
segment in highway, if it is intersected with other highway line segment and the segment in highway, if it is intersected with other highway line segment and the
intersection point is not the line segment end points, then split both of the two intersection point is not the line segment end points, then split both of the two
line segments at intersection point. Before calculating the intersection point line segments at intersection point. Before calculating the intersection point
between two line segments, we use the boundary of two line segments to check between two line segments, we use the boundary of two line segments to check
whether they can intersect. Suppose line segment ① is P1P2 which P1 with (X1, whether they can intersect. Suppose line segment ① is P1P2 which P1 with (X1,
Y1), P2 with (X2, Y2) and line segment ② is P3P4 and P3 with (X3, Y3), P4 Y1), P2 with (X2, Y2) and line segment ② is P3P4 and P3 with (X3, Y3), P4
with (X4, Y4)), then we use the following formula to calculate the intersection with (X4, Y4)), then we use the following formula to calculate the intersection
point: point:
Delta = �Y4 – Y3� ∗ �X1 – X2� − �Y2 – Y1� ∗ �X3 – X4� Delta = �Y4 – Y3� ∗ �X1 – X2� − �Y2 – Y1� ∗ �X3 – X4�
� X = ( (X3 ∗ Y4 – X4 ∗ Y3) ∗ (X1 – X2) – (X1 ∗ Y2 – X2 ∗ Y1) ∗ (X3 – X4)) / Delta (3) � X = ( (X3 ∗ Y4 – X4 ∗ Y3) ∗ (X1 – X2) – (X1 ∗ Y2 – X2 ∗ Y1) ∗ (X3 – X4)) / Delta (3)
Y = ( (Y4 – Y3) ∗ (X1 ∗ Y2 – X2 ∗ Y1) – (Y2 – Y1) ∗ (X3 ∗ Y4 – X4 ∗ Y3)) / Delta Y = ( (Y4 – Y3) ∗ (X1 ∗ Y2 – X2 ∗ Y1) – (Y2 – Y1) ∗ (X3 ∗ Y4 – X4 ∗ Y3)) / Delta
③ ③
① ① ② ① ① ②
② ④ ② ④
Figure 22: Highway intersection. Figure 22: Highway intersection.

Algorithm IV: Check intersection without node Algorithm IV: Check intersection without node
---------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------
Function CheckIntersection (Current Highway) Function CheckIntersection (Current Highway)
For each line segment in current highway For each line segment in current highway
If (any other line segment intersects) Then If (any other line segment intersects) Then
Calculate the intersection point Calculate the intersection point
If (intersection point does not contain line segment end points) Then If (intersection point does not contain line segment end points) Then
Mark the line segment at intersection point Mark the line segment at intersection point
Split current highway at each marked position Split current highway at each marked position
Inherits tag attributes from current highway Inherits tag attributes from current highway
As stated above, the final objective is to obtain the road segments as a network As stated above, the final objective is to obtain the road segments as a network
from OSM data for further study. The datasets involved in this study is very from OSM data for further study. The datasets involved in this study is very
large, and thus the process can be called data intensive computing to some large, and thus the process can be called data intensive computing to some
extent. A lot of efforts have been applied to the design the data structures and extent. A lot of efforts have been applied to the design the data structures and
algorithms. Including the pre-processing part, there are hundreds of thousands algorithms. Including the pre-processing part, there are hundreds of thousands
of lines of codes in total. Specific implementations are discussed and presented of lines of codes in total. Specific implementations are discussed and presented
in the next section. in the next section.
46 46
3.3. Floating car data 3.3. Floating car data
Floating car data is a kind of GPS data collected in this way: the taxicabs Floating car data is a kind of GPS data collected in this way: the taxicabs
equipped with GPS devices continue to drive in the city and send information to equipped with GPS devices continue to drive in the city and send information to
the data server at a given time resolution. The information includes car ID, the data server at a given time resolution. The information includes car ID,
timestamp, coordinates, speed and angle (Figure 23a). The time resolution is set timestamp, coordinates, speed and angle (Figure 23a). The time resolution is set
around 10 to 30 seconds. It should be noted that the original FCD include many around 10 to 30 seconds. It should be noted that the original FCD include many
other kinds of useful information such as passenger information which are other kinds of useful information such as passenger information which are
filtered because of security and other reasons (for more details refer to Li et al. filtered because of security and other reasons (for more details refer to Li et al.
2011). The FCD used in this study is shared by the same study, and the original 2011). The FCD used in this study is shared by the same study, and the original
data is given in simple text format. data is given in simple text format.
As mentioned above, the FCD is also a kind of VGI data because all the drivers As mentioned above, the FCD is also a kind of VGI data because all the drivers
contribute the data voluntarily. To have an intuitive impression on the FCD, we contribute the data voluntarily. To have an intuitive impression on the FCD, we
convert it into ArcGIS shapefile (Figure 23b). The red points are the location convert it into ArcGIS shapefile (Figure 23b). The red points are the location
where taxicabs stop. From the spatial distribution, it is easy to see the city space where taxicabs stop. From the spatial distribution, it is easy to see the city space
is full covered, which makes FCD suitable for urban studies without data is full covered, which makes FCD suitable for urban studies without data
completion problem. More than that, the spatial patterns are rich and inspiring. completion problem. More than that, the spatial patterns are rich and inspiring.
The algorithms for the analysis of urban mobility pattern are elaborated in Paper The algorithms for the analysis of urban mobility pattern are elaborated in Paper
II. The implementation on the same platform is described in the next section. II. The implementation on the same platform is described in the next section.
(a) (b) (a) (b)
Figure 23: Data model representation (a) a slice of real data sorted by time in Figure 23: Data model representation (a) a slice of real data sorted by time in
ascending order and (b) stop points in floating car data covers the whole city ascending order and (b) stop points in floating car data covers the whole city
space of Wuhan city, China (Note: the middle part is the Yangtze River) space of Wuhan city, China (Note: the middle part is the Yangtze River)
47 47
3.4. Implementations 3.4. Implementations
Hardware is the basis of any computing project. In this computing project, large Hardware is the basis of any computing project. In this computing project, large
memory and high-speed processors are both necessary. We adopt the HP memory and high-speed processors are both necessary. We adopt the HP
ProLiant DL380 G6 Server: two Intel® Xeon® Processor 5500 series (270 ProLiant DL380 G6 Server: two Intel® Xeon® Processor 5500 series (270
clocked at 2.5 GHz) with Intel Turbo Boost technology, 42GB physical clocked at 2.5 GHz) with Intel Turbo Boost technology, 42GB physical
memory, 2TB hard drive and running 64-bit Windows Server 2008 SP2. The memory, 2TB hard drive and running 64-bit Windows Server 2008 SP2. The
development environment is Microsoft Visual Studio 2010 (VS2010) ASP.net development environment is Microsoft Visual Studio 2010 (VS2010) ASP.net
using C# programming language, a high-integrated, object-oriented and user- using C# programming language, a high-integrated, object-oriented and user-
friendly developing tool for the Windows platform. Moreover, as a software friendly developing tool for the Windows platform. Moreover, as a software
giant, Microsoft has good continuity in terms of version update, maintainability giant, Microsoft has good continuity in terms of version update, maintainability
of code and extending the life cycle of a software project. of code and extending the life cycle of a software project.
OSM XML file OSM XML file
Start to read Start to read Start to read Start to read
Read next line Yes Read next line Read next line Yes Read next line
End of file? End of file? End of file? End of file?
No No
No No No No
No Entry of Way? Entry of Node? Entry of Way? No No Entry of Way? Entry of Node? Entry of Way? No
Yes Yes
Yes Yes Yes Yes Yes Yes
Rea all node IDs Read X, Y and add Write to Rea all node IDs Read X, Y and add Write to
and add to ID list to array highway file and add to ID list to array highway file
End End
Figure 24: Flow chart to extract highway from OSM data Figure 24: Flow chart to extract highway from OSM data
48 48
Highway XML file Highway XML file
Yes Yes
Output to Output to
Load all simplified Load all simplified
Read line checked file Read line checked file
highways highways
Yes Yes
Load all checked Load all checked
Intilize grid index highways Intilize grid index highways
End of File? End of File?
No Intilize grid index No Intilize grid index

Read next highway Read next highway
Entry of Entry of
Highway? Highway?
Read next highway Read next highway
End of all End of all
highways? highways?
Yes Yes
Convert to ArcGIS No End of all Convert to ArcGIS No End of all

polyline and simplify highways? polyline and simplify highways?
Check overlap Check overlap
Output to No Output to No
simplified file simplified file
Check intersection Check intersection
Yes Yes
Output to Output to
End End
clean file clean file
Figure 25: Flow chart of topological processing highway to road segments Figure 25: Flow chart of topological processing highway to road segments
Throughout the process of implementation, simplicity principle permeates the Throughout the process of implementation, simplicity principle permeates the
whole project. We are not following all the complicated specifications and whole project. We are not following all the complicated specifications and
standardizations. Instead, we inherit the idea from Agile data modeling method standardizations. Instead, we inherit the idea from Agile data modeling method
(Wells 2009) and implement the design of data mode by using object-oriented (Wells 2009) and implement the design of data mode by using object-oriented
technique. Agile methods are evolutionary, iterative and proven to be very technique. Agile methods are evolutionary, iterative and proven to be very
successful processes, especially for research project. As the name implies, the successful processes, especially for research project. As the name implies, the
project team just starts from the basic things, focus on the whole project goals project team just starts from the basic things, focus on the whole project goals
and then keep communicating as well as modifying in an iterative way. The key and then keep communicating as well as modifying in an iterative way. The key
to understand Agile methods is that there is no such thing, and there are only to understand Agile methods is that there is no such thing, and there are only
Agile teams that learn how to be agile (Wells 2009). Figure 24 and 25 present Agile teams that learn how to be agile (Wells 2009). Figure 24 and 25 present
the algorithm design processes, which can be efficiently implemented in an the algorithm design processes, which can be efficiently implemented in an
object-oriented development environment. Furthermore, the algorithms can be object-oriented development environment. Furthermore, the algorithms can be
easily extended based on the ideas of Agile program. easily extended based on the ideas of Agile program.
Generally, a data model serves as a link between application requirements and Generally, a data model serves as a link between application requirements and
software data structures, which project team members can communicate with software data structures, which project team members can communicate with
49 49
each other. Hoberman (2009) defines that “a data model is a wayfinding tool for each other. Hoberman (2009) defines that “a data model is a wayfinding tool for
both business and IT professionals, which uses a set of symbols and text to both business and IT professionals, which uses a set of symbols and text to
precisely explain a subset of real information to improve communication within precisely explain a subset of real information to improve communication within
the organization and thereby lead to a more flexible and stable application the organization and thereby lead to a more flexible and stable application
environment.” When application requirements are defined, data models should environment.” When application requirements are defined, data models should
be created and also visualized to make clear that the requirements are finished be created and also visualized to make clear that the requirements are finished
without omitting details. A well-defined data model is dependent with the without omitting details. A well-defined data model is dependent with the
application and will affect the performance of it. application and will affect the performance of it.
There are various traditional data modeling methods, e.g. entity-relationship There are various traditional data modeling methods, e.g. entity-relationship
model (ERM). Unfortunately, most of these work in a (near) serial manner, and model (ERM). Unfortunately, most of these work in a (near) serial manner, and
thus cannot reflect the iterative and incremental (evolutionary) processes. thus cannot reflect the iterative and incremental (evolutionary) processes.
Combined with object-oriented method, we start from the OSM XML file and Combined with object-oriented method, we start from the OSM XML file and
follow the simplicity and Agile principles (Wells 2009). First we define the follow the simplicity and Agile principles (Wells 2009). First we define the
highway data extracted from OSM data. Besides the coordinates, there are highway data extracted from OSM data. Besides the coordinates, there are
attributes such as name, highway, car and pedestrian. Then we create the arc or attributes such as name, highway, car and pedestrian. Then we create the arc or
segment data model, based on which we build up the connectivity graph. After segment data model, based on which we build up the connectivity graph. After
that, we generate blocks and natural streets. At the very beginning, we just that, we generate blocks and natural streets. At the very beginning, we just
outline the basic data types. As the process goes into details, we design and outline the basic data types. As the process goes into details, we design and
modify them iteratively. modify them iteratively.
OSM data OSM data
Highway Highway
Road arc/segments Road arc/segments
Connectivity graph Connectivity graph
Blocks Natural streets Blocks Natural streets
Figure 26: Logical data model for processing data. Figure 26: Logical data model for processing data.
In Figure 26, both highway and road segments are the intermediate results. In Figure 26, both highway and road segments are the intermediate results.
Blocks and natural streets store the results generated from segment-based Blocks and natural streets store the results generated from segment-based
connectivity graph. It should be noted that connective graph is not a real data, connectivity graph. It should be noted that connective graph is not a real data,
but a connection attribute integrated in road segments. In the VS2010 but a connection attribute integrated in road segments. In the VS2010
development environment, each of the data types is designed as an interface and development environment, each of the data types is designed as an interface and
class, where the data and method are both sealed inside. Figure 27 shows the class, where the data and method are both sealed inside. Figure 27 shows the
main physical data models. To build up the connectivity graph road segments, main physical data models. To build up the connectivity graph road segments,
50 50
we need to calculate the deflection angles between road segments and spatial we need to calculate the deflection angles between road segments and spatial
grid index is chosen to accelerate the computing efficiency. grid index is chosen to accelerate the computing efficiency.
Figure 27: Generated physical data model for OSM data Figure 27: Generated physical data model for OSM data
51 51
Figure 28: Generated physical data model for floating car data Figure 28: Generated physical data model for floating car data
The algorithms of generating natural roads as well as pseudo codes are given by The algorithms of generating natural roads as well as pseudo codes are given by
Jiang et al. (2008). However, their algorithms are based on the segment-based Jiang et al. (2008). However, their algorithms are based on the segment-based
ArcGIS shape files, which search connected road segments and calculate the ArcGIS shape files, which search connected road segments and calculate the
deflection angles between them in real time by using Arc Objects methods. The deflection angles between them in real time by using Arc Objects methods. The
data sets used in this research are self-defined XML file and not all of the data sets used in this research are self-defined XML file and not all of the
objects in the data sets are road segments. Thus, we adopt a new strategy to objects in the data sets are road segments. Thus, we adopt a new strategy to
generate the natural roads which could be more efficient for large dataset. First, generate the natural roads which could be more efficient for large dataset. First,
we build up the segment oriented connectivity graph, based on which the we build up the segment oriented connectivity graph, based on which the
deflection angles between segments are calculated and stored as pre-processed deflection angles between segments are calculated and stored as pre-processed
results. After that, the segments are loaded as connectivity graph and traversed results. After that, the segments are loaded as connectivity graph and traversed
to generate natural streets using a loop algorithm. The reason that we adopt loop to generate natural streets using a loop algorithm. The reason that we adopt loop
algorithm rather than iterative one is that: in large dataset iterative algorithm algorithm rather than iterative one is that: in large dataset iterative algorithm
consumes much bigger memory and lead to low performance, while loop consumes much bigger memory and lead to low performance, while loop
functions can avoid this problem. Blocks are also extracted in the same way functions can avoid this problem. Blocks are also extracted in the same way
stated above. For more details of the algorithms, please refer to Chapter 2. stated above. For more details of the algorithms, please refer to Chapter 2.
52 52
4. Methodology 4. Methodology
4.1. Overall structure 4.1. Overall structure
In this thesis, the overall strategy is as follows: first, to determine a phenomenon In this thesis, the overall strategy is as follows: first, to determine a phenomenon
of scaling geographic space when there is one and then, mathematically, to of scaling geographic space when there is one and then, mathematically, to
define a rule in a quantitative manner such that it can be applied. Following that, define a rule in a quantitative manner such that it can be applied. Following that,
the principle and rule (i.e., the head/tail division rule) are used to obtain the principle and rule (i.e., the head/tail division rule) are used to obtain
hierarchical structures in urban studies based on appropriately generated hierarchical structures in urban studies based on appropriately generated
geographic representations in the context of urban environments. Based on the geographic representations in the context of urban environments. Based on the
obtained hierarchies, the specific physical meanings behind the issues in obtained hierarchies, the specific physical meanings behind the issues in
different urban studies can be explored. In this way, the high degree of different urban studies can be explored. In this way, the high degree of
complexity of urban issues can be efficiently reduced, and thus, urban issues can complexity of urban issues can be efficiently reduced, and thus, urban issues can
be solved effectively at different levels. The overall structure of this thesis is be solved effectively at different levels. The overall structure of this thesis is
shown in Figure 29. shown in Figure 29.
Theoretical foundation
Theoretical foundation
Scaling of geographic space Scaling of geographic space
Scaling at city level Scaling at country level Scaling at city level Scaling at country level
(Paper VII) (Paper VI) (Paper VII) (Paper VI)
VGI VGI
data data
Urban route planning Urban morphologies Urban route planning Urban morphologies
Application in urban studies
Application in urban studies

service (Paper V) understanding (Paper I) service (Paper V) understanding (Paper I)
Urban sprawl patches Urban mobility patterns Urban sprawl patches Urban mobility patterns
identification (Paper II) exploration (Paper III) identification (Paper II) exploration (Paper III)
City level City level
Country hierachical structures Country hierachical structures

Country level Country level
comparison (Paper IV) comparison (Paper IV)
Figure 29: Overall schematic structure of this thesis Figure 29: Overall schematic structure of this thesis
53 53
First, this thesis verifies and elaborates on the principle of scaling of geographic First, this thesis verifies and elaborates on the principle of scaling of geographic
space and the head/tail division rule at the city and country levels from the space and the head/tail division rule at the city and country levels from the
perspectives of old axial lines, blocks and natural cities (Papers VI and VII). The perspectives of old axial lines, blocks and natural cities (Papers VI and VII). The
scaling of geographic space refers to the phenomenon that small geographic scaling of geographic space refers to the phenomenon that small geographic
objects or representations are far more numerous than large ones for a large- objects or representations are far more numerous than large ones for a large-
scale geographic space; whereas the head/tail division rule states that, given a scale geographic space; whereas the head/tail division rule states that, given a
variable X, if its values x follow a heavy-tailed distribution, then the mean (m) variable X, if its values x follow a heavy-tailed distribution, then the mean (m)
of the values can be used to divide all the values into two parts: large ones (a of the values can be used to divide all the values into two parts: large ones (a
low percentage) whose values lie above the mean value and small ones (a high low percentage) whose values lie above the mean value and small ones (a high
percentage) whose values lie below.. percentage) whose values lie below..
In this thesis, the scaling property is characterized by the fact that the values of In this thesis, the scaling property is characterized by the fact that the values of
geographic objects follow heavy-tailed distributions, which are the power law, geographic objects follow heavy-tailed distributions, which are the power law,
lognormal, exponential, power law with an exponential cutoff and stretched lognormal, exponential, power law with an exponential cutoff and stretched
exponential. In essence, a heavy-tailed distribution indicates the special exponential. In essence, a heavy-tailed distribution indicates the special
nonlinear relationship between a variable and its probability. Thus, this thesis nonlinear relationship between a variable and its probability. Thus, this thesis
then considers the definition and detection of heavy-tailed distributions (in the then considers the definition and detection of heavy-tailed distributions (in the
following chapter). following chapter).
The methodology then shifts to the discussion of how to choose the appropriate The methodology then shifts to the discussion of how to choose the appropriate
geographic representations in the context of urban environments and exploration geographic representations in the context of urban environments and exploration
of the physical meanings behind the hierarchical structures obtained through the of the physical meanings behind the hierarchical structures obtained through the
head/tail division rule. The particular emphasis is on the workings of the head/tail division rule. The particular emphasis is on the workings of the
head/tail division rule and how these obtained hierarchical structures can be head/tail division rule and how these obtained hierarchical structures can be
understood. Data processing has been described in Chapter 3. understood. Data processing has been described in Chapter 3.
4.2. Heavy-tailed distributions 4.2. Heavy-tailed distributions
The heavy-tailed distribution is one of the mainstays in the analysis of topics in The heavy-tailed distribution is one of the mainstays in the analysis of topics in
this thesis. This section first provides a brief introduction to the concept of this thesis. This section first provides a brief introduction to the concept of
heavy-tailed distributions, i.e., what a heavy-tailed distribution is and how to heavy-tailed distributions, i.e., what a heavy-tailed distribution is and how to
describe it with mathematic formulae. This section then illustrates how to detect describe it with mathematic formulae. This section then illustrates how to detect
a heavy-tailed distribution when one exists, i.e., given a series of values, how to a heavy-tailed distribution when one exists, i.e., given a series of values, how to
determine which type of heavy-tailed distribution will best fit the data. determine which type of heavy-tailed distribution will best fit the data.
4.2.1. Concept and definitions 4.2.1. Concept and definitions
A heavy-tailed distribution is the special non-linear relationship between a A heavy-tailed distribution is the special non-linear relationship between a
quality x and its probability of frequency, which can be described as a power quality x and its probability of frequency, which can be described as a power
law, lognormal, exponential, power law with an exponential cutoff and stretched law, lognormal, exponential, power law with an exponential cutoff and stretched
exponential (Clauset et al. 2009, Adamic 2002). The first three are the basic exponential (Clauset et al. 2009, Adamic 2002). The first three are the basic
distributions, and the last two are their degenerate versions. Obviously, these distributions, and the last two are their degenerate versions. Obviously, these
distributions are termed as heavy-tailed distributions because either the rank-size distributions are termed as heavy-tailed distributions because either the rank-size
54 54
plot (Figure 30 (b)) or the probability distribution function plot shows a “heavy” plot (Figure 30 (b)) or the probability distribution function plot shows a “heavy”
tail compared with a normal distribution. Many reports explore the heavy-tailed tail compared with a normal distribution. Many reports explore the heavy-tailed
distribution from different perspectives, and even the name, mathematical distribution from different perspectives, and even the name, mathematical
formula and the physical meaning associated with this phenomenon varies, e.g., formula and the physical meaning associated with this phenomenon varies, e.g.,
the long tail (Anderson 2006) and fat tail distributions. However, the physical the long tail (Anderson 2006) and fat tail distributions. However, the physical
meaning behind a heavy-tailed distribution is that objects with small size are meaning behind a heavy-tailed distribution is that objects with small size are
very common, while things with large sizes are very rare. The frequencies of very common, while things with large sizes are very rare. The frequencies of
probability follow the mathematic distributions mentioned above. Thus, our probability follow the mathematic distributions mentioned above. Thus, our
introduction is to provide a brief description of the concepts from a statistical introduction is to provide a brief description of the concepts from a statistical
viewpoint and several new advances in the context of this thesis. viewpoint and several new advances in the context of this thesis.
Comparatively, a normal distribution is fundamentally different from a heavy- Comparatively, a normal distribution is fundamentally different from a heavy-
tailed distribution (Figure 30), where we can see that a normal distribution has a tailed distribution (Figure 30), where we can see that a normal distribution has a
thin tail. Given a series of values X, if it follows a normal distribution, then the thin tail. Given a series of values X, if it follows a normal distribution, then the
values will cluster around the mean (μ) of the values. More precisely, Figure 30 values will cluster around the mean (μ) of the values. More precisely, Figure 30
(a) shows that around 68.2%, 95.4% and 99.6% of values lie within one, two (a) shows that around 68.2%, 95.4% and 99.6% of values lie within one, two
and three standard deviations (σ) away from the mean, separately. That is, and three standard deviations (σ) away from the mean, separately. That is,
approximately all (99.7%) of the values lie within 3 standard deviations (σ) of approximately all (99.7%) of the values lie within 3 standard deviations (σ) of
the mean (μ). This phenomenon is termed as the 68-95-99 rule, or the empirical the mean (μ). This phenomenon is termed as the 68-95-99 rule, or the empirical
rule, or the 3-sigma rule. For example, suppose the mean of heights of people in rule, or the 3-sigma rule. For example, suppose the mean of heights of people in
the world is 175cm and the standard deviation is 10cm, then heights of 68.2% of the world is 175cm and the standard deviation is 10cm, then heights of 68.2% of
the people lie within from 165 to 185, 95.4% of the people lie within from 155 the people lie within from 165 to 185, 95.4% of the people lie within from 155
to 195 and 99.7% of the people lie within from 145 to 205. A heavy-tailed to 195 and 99.7% of the people lie within from 145 to 205. A heavy-tailed
distribution behaves in the opposite way: the values do not center on its mean. distribution behaves in the opposite way: the values do not center on its mean.
As shown in Figure 30 (b), the mean divides the values into the head with a low As shown in Figure 30 (b), the mean divides the values into the head with a low
percentage and the tail with a high percentage, which is often characterized by percentage and the tail with a high percentage, which is often characterized by
the 80/20 rule. The key idea behind the 80/20 rule is the imbalance between the the 80/20 rule. The key idea behind the 80/20 rule is the imbalance between the
head and the tail, rather than how precise percentages are (Jiang and Jia 2011a). head and the tail, rather than how precise percentages are (Jiang and Jia 2011a).
That is, the percentage could also be around 90% and 10%, or even around 70% That is, the percentage could also be around 90% and 10%, or even around 70%
and 30%. An alternative graph is also shown by Jiang and Jia (2011a) with rank- and 30%. An alternative graph is also shown by Jiang and Jia (2011a) with rank-
size plot: the bigger the size is, the smaller the rank is. That is, the biggest size size plot: the bigger the size is, the smaller the rank is. That is, the biggest size
rank the first valued by 1, and so on and so forth. rank the first valued by 1, and so on and so forth.
P(x>=x)
P(x>=x)
Tail (80%) Tail (80%)
Head (20%) Head (20%)
X Log(x) X Log(x)
(a) (b) (a) (b)
Figure 30: Example graphs of (a) normal distribution (source: Sedgewick and Figure 30: Example graphs of (a) normal distribution (source: Sedgewick and
Wayne 2010) and (b) heavy-tailed distribution Wayne 2010) and (b) heavy-tailed distribution
55 55
To determine which kind of distribution fits the given data best, we first need to To determine which kind of distribution fits the given data best, we first need to
illustrate the mathematical formulas of the five heavy-tailed distributions. Given illustrate the mathematical formulas of the five heavy-tailed distributions. Given
the variable x, then its probability is set as f(x). The idealized of power law and the variable x, then its probability is set as f(x). The idealized of power law and
exponential distributions can be expressed as f(x) = x-α and f(x) = e-x, respectively. exponential distributions can be expressed as f(x) = x-α and f(x) = e-x, respectively.
For the sake of completeness, the following definitions are briefly given in the For the sake of completeness, the following definitions are briefly given in the
real cases. More details can be found in the literature (e.g., Clauset et al. 2009, real cases. More details can be found in the literature (e.g., Clauset et al. 2009,
Jiang and Jia 2011b). The relationship exists only for these values greater than a Jiang and Jia 2011b). The relationship exists only for these values greater than a
minimum value xmin. The power law can be expressed as follows: minimum value xmin. The power law can be expressed as follows:
𝛼−1 𝛼−1
𝑓(𝑥) = 𝐶1 𝑥 −𝛼 , 𝐶1 = (𝛼 − 1)𝑥𝑚𝑖𝑛 (1) 𝑓(𝑥) = 𝐶1 𝑥 −𝛼 , 𝐶1 = (𝛼 − 1)𝑥𝑚𝑖𝑛 (1)
The degenerated version of the power law is a power law with an exponential The degenerated version of the power law is a power law with an exponential
cutoff. The expression is: cutoff. The expression is:
𝜆1−𝛼 𝜆1−𝛼
𝑓(𝑥) = 𝐶2 𝑥 −𝛼 𝑒 −𝜆𝑥 , 𝐶2 = (2) 𝑓(𝑥) = 𝐶2 𝑥 −𝛼 𝑒 −𝜆𝑥 , 𝐶2 = (2)
Γ(1−𝛼,𝜆𝑥𝑚𝑖𝑛 ) Γ(1−𝛼,𝜆𝑥𝑚𝑖𝑛 )
Generally, an exponential distribution can be expressed as follows: Generally, an exponential distribution can be expressed as follows:
𝑓(𝑥) = 𝐶3 𝑒 −𝜆𝑥 , 𝐶3 = 𝜆𝑒 𝜆𝑥𝑚𝑖𝑛 (3) 𝑓(𝑥) = 𝐶3 𝑒 −𝜆𝑥 , 𝐶3 = 𝜆𝑒 𝜆𝑥𝑚𝑖𝑛 (3)
The degenerated version of an exponential distribution is stretched exponential, The degenerated version of an exponential distribution is stretched exponential,
which can be expressed as: which can be expressed as:
𝛽 𝛽 𝛽 𝛽
𝑓(𝑥) = 𝐶4 𝑥 𝛽−1 𝑒 −𝜆𝑥 , 𝐶4 = 𝛽𝜆𝑒 𝜆𝑥𝑚𝑖𝑛 (4) 𝑓(𝑥) = 𝐶4 𝑥 𝛽−1 𝑒 −𝜆𝑥 , 𝐶4 = 𝛽𝜆𝑒 𝜆𝑥𝑚𝑖𝑛 (4)
The lognormal distribution indicates that the logarithm of variable x demonstrate The lognormal distribution indicates that the logarithm of variable x demonstrate
a normal distribution. The expression is: a normal distribution. The expression is:
(𝑙𝑛𝑥−𝜇)2 𝛽 (𝑙𝑛𝑥−𝜇)2 𝛽
1 � � 𝜆𝑥𝑚𝑖𝑛 1 � �
𝑓(𝑥) = 𝐶5 𝑒 2𝜎2 , 𝐶5 = 𝛽𝜆𝑒 (5) 𝑓(𝑥) = 𝐶5 𝑒 2𝜎2 , 𝐶5 = 𝛽𝜆𝑒 𝜆𝑥𝑚𝑖𝑛 (5)
𝑥 𝑥
4.2.2. Mathematical detection 4.2.2. Mathematical detection
Clauset et al. (2009) detect a power law distribution based on some statistical Clauset et al. (2009) detect a power law distribution based on some statistical
tests. This method is extended by Jia (2012) and Jiang and Jia (2011b). In this tests. This method is extended by Jia (2012) and Jiang and Jia (2011b). In this
same study, the detection of the heavy-tailed distributions stated above can be same study, the detection of the heavy-tailed distributions stated above can be
summarized as follows: (a) the parameters, i.e. xmin, α, β, λ, σ and μ, in each of summarized as follows: (a) the parameters, i.e. xmin, α, β, λ, σ and μ, in each of
the definitions from (1), (2), (3), (4) to (5) are calculated via maximum the definitions from (1), (2), (3), (4) to (5) are calculated via maximum
likelihood estimation (MLE) method, (b) calculate the p values for each likelihood estimation (MLE) method, (b) calculate the p values for each
distribution via the Kolmogorov-Smirnov (KS) (Chakravarti et al. 1967) test for distribution via the Kolmogorov-Smirnov (KS) (Chakravarti et al. 1967) test for
goodness of fit to check out which kind of distribution can fit into the data and goodness of fit to check out which kind of distribution can fit into the data and
56 56
(c) if none of the distribution can fit into the data, then likelihood ratio (LR) (c) if none of the distribution can fit into the data, then likelihood ratio (LR)
values are calculated between any two of the distributions, based on which the values are calculated between any two of the distributions, based on which the
most fitted distribution is determined. Similarly, if there are more than one most fitted distribution is determined. Similarly, if there are more than one
distribution can be fit into the data, then the same process can be repeated on the distribution can be fit into the data, then the same process can be repeated on the
fitted distributions to find out the best fitted one. Undoubtedly, if only one fitted distributions to find out the best fitted one. Undoubtedly, if only one
distribution can fit the data, then it is the one to be selected. The implemented distribution can fit the data, then it is the one to be selected. The implemented
procedures for estimating the parameters can be found in the literature (Jiang procedures for estimating the parameters can be found in the literature (Jiang
and Jia 2011b). and Jia 2011b).
It should be noted that the detection of a heavy-tailed distribution is based on the It should be noted that the detection of a heavy-tailed distribution is based on the
definitions stated above. If there are more definitions of the heavy-tailed definitions stated above. If there are more definitions of the heavy-tailed
distributions can be given, the above process can be repeated. It is because the distributions can be given, the above process can be repeated. It is because the
MLE method, KS test and LR calculation all possess the advantage of MLE method, KS test and LR calculation all possess the advantage of
distribution dependent. That is, these methodologies can be adaptive according distribution dependent. That is, these methodologies can be adaptive according
to the different definitions of different heavy-tailed distributions. To provide a to the different definitions of different heavy-tailed distributions. To provide a
detailed description about the above computational process for a better detailed description about the above computational process for a better
understanding, a flow chart is shown as below (Figure 31) understanding, a flow chart is shown as below (Figure 31)
Quanlity (x) Quanlity (x)
MLE calculation for MLE calculation for

5 definitions 5 definitions
KS test for KS test for

None is passed None is passed
5 definitions 5 definitions
Only one is passed Only one is passed

More than one is passed More than one is passed
LR calculation for LR calculation for LR calculation for LR calculation for

passed ones 5 definitions passed ones 5 definitions
Select the passed one Select the best one Select the best one Select the passed one Select the best one Select the best one
End End
Figure 31: Flow chart of the detection of a heavy-tailed distribution Figure 31: Flow chart of the detection of a heavy-tailed distribution
57 57
As a conclusion to this section, the scaling property here inherits the core idea As a conclusion to this section, the scaling property here inherits the core idea
from fractal and is in line with the statistical methodology. However, the from fractal and is in line with the statistical methodology. However, the
definitions are relaxed and detection methods are extended. And the focuses are definitions are relaxed and detection methods are extended. And the focuses are
put on the discovery of hierarchical structure of geographic phenomena and the put on the discovery of hierarchical structure of geographic phenomena and the
physical meaning behind them, which will be elaborated in later chapters. physical meaning behind them, which will be elaborated in later chapters.
4.3. Scaling, hierarchies and head/tail division rule 4.3. Scaling, hierarchies and head/tail division rule
In this section, the concepts of and relationships between scaling of geographic In this section, the concepts of and relationships between scaling of geographic
space, heavy-tailed distributions, the head/tail division rule and the hierarchical space, heavy-tailed distributions, the head/tail division rule and the hierarchical
structures are elaborated in detail. Moreover, how to uncover the geographic structures are elaborated in detail. Moreover, how to uncover the geographic
implications behind the hierarchical structures and apply them to urban studies implications behind the hierarchical structures and apply them to urban studies
is also discussed. is also discussed.
4.3.1. The head/tail division rule 4.3.1. The head/tail division rule
The scaling property is the basic principle of geographic space, i.e. scaling of The scaling property is the basic principle of geographic space, i.e. scaling of
geographic space. It refers to the phenomenon that small geographic objects or geographic space. It refers to the phenomenon that small geographic objects or
representations are far more than large ones for a large-scale geographic space. representations are far more than large ones for a large-scale geographic space.
It can be characterized by heavy-tailed distributions mentioned above. To some It can be characterized by heavy-tailed distributions mentioned above. To some
extent, the heavy-tailed distribution of a series of geographic observations extent, the heavy-tailed distribution of a series of geographic observations
indicates a strong hierarchical structure, i.e., the small percentage with large size indicates a strong hierarchical structure, i.e., the small percentage with large size
in the head and the larger percentage with small size in the tail. The hierarchies in the head and the larger percentage with small size in the tail. The hierarchies
are divided by the mean value of all of the observations. This process or method are divided by the mean value of all of the observations. This process or method
is termed as the head/tail division rule: “given a variable X, if its values x follow is termed as the head/tail division rule: “given a variable X, if its values x follow
a heavy-tailed distribution, then the mean (m) of the values can divide all the a heavy-tailed distribution, then the mean (m) of the values can divide all the
values into two parts: a high percentage in the tail, and a low percentage in the values into two parts: a high percentage in the tail, and a low percentage in the
head” (Paper VI). Coincidentally, Adamic (2011) mentioned that the shared head” (Paper VI). Coincidentally, Adamic (2011) mentioned that the shared
feature of the distributions describes the division of items into groups. feature of the distributions describes the division of items into groups.
Moreover, in the head part of the hierarchies, further hierarchical structures Moreover, in the head part of the hierarchies, further hierarchical structures
could also be obtained by applying the head/tail division rule repeatedly. could also be obtained by applying the head/tail division rule repeatedly.
To give an intuitive explanation, let’s take the natural cities that delineated in To give an intuitive explanation, let’s take the natural cities that delineated in
Germany in Paper VI as a specific example. As we can see in Table 2, there are Germany in Paper VI as a specific example. As we can see in Table 2, there are
5160 natural cities whose size and area follow the power law and lognormal 5160 natural cities whose size and area follow the power law and lognormal
distributions, respectively (Paper VI). First, the mean value of 5160 cities is distributions, respectively (Paper VI). First, the mean value of 5160 cities is
calculated as 174, then the head part is defined as the cities whose sizes are calculated as 174, then the head part is defined as the cities whose sizes are
greater than or equal to the mean value 174, i.e. 682. Correspondingly, the greater than or equal to the mean value 174, i.e. 682. Correspondingly, the
percentage in the head is division of number of cities in the head (682) by percentage in the head is division of number of cities in the head (682) by
current total number (5160), 682/5160 = 13%. This process is the head/tail current total number (5160), 682/5160 = 13%. This process is the head/tail
division rule applied in the first level. For the second level, the 682 cities in the division rule applied in the first level. For the second level, the 682 cities in the
head part are treated as the current total cities, and then the same process is head part are treated as the current total cities, and then the same process is
58 58
repeated to get the results. In this way, the hierarchical structures of natural repeated to get the results. In this way, the hierarchical structures of natural
cities are clearly generated. cities are clearly generated.
Table 2: Four levels of natural cites in Germany according to the city size Table 2: Four levels of natural cites in Germany according to the city size
(Note: # = number, % = percentage) (Note: # = number, % = percentage)
Level # of cities # in Head % in Head # in Tail Mean (count) Level # of cities # in Head % in Head # in Tail Mean (count)
1 5160 682 13% 4478 174 1 5160 682 13% 4478 174
2 682 104 15% 578 980 2 682 104 15% 578 980
3 104 25 24% 79 4410 3 104 25 24% 79 4410
4 25 10 40% 15 12121 4 25 10 40% 15 12121
From the above we can see that the rule is a natural and objective way to find From the above we can see that the rule is a natural and objective way to find
out the hierarchical structures in geographic phenomena. And behind each out the hierarchical structures in geographic phenomena. And behind each
hierarchy some specific geographic implications are indicated. For example, the hierarchy some specific geographic implications are indicated. For example, the
small blocks imply urban area while large bocks imply rural area. In this sense, small blocks imply urban area while large bocks imply rural area. In this sense,
the mean value works like a natural and magic ruler. the mean value works like a natural and magic ruler.
There is still a question remain: what if use a range around the mean? Let’s take There is still a question remain: what if use a range around the mean? Let’s take
all the blocks of Texas as an example (Table 3): the mean value is 1,063,600 all the blocks of Texas as an example (Table 3): the mean value is 1,063,600
square meters, based on which all blocks are divided into a high percentage of square meters, based on which all blocks are divided into a high percentage of
85.8% in the tail, and a low percentage of 14.2% in the head. Given the range is 85.8% in the tail, and a low percentage of 14.2% in the head. Given the range is
λ = 50,000, we can get two more “rulers”: min value of Mean-λ and max value λ = 50,000, we can get two more “rulers”: min value of Mean-λ and max value
of Mean + λ. Based on these two values, the percentages in the head and tail are of Mean + λ. Based on these two values, the percentages in the head and tail are
re-calculated (Table 1). We can see that there are not many differences in terms re-calculated (Table 1). We can see that there are not many differences in terms
of the percentages. of the percentages.
Table 3: The percentages in the head and tail of all blocks for Texas (λ = 50,000 Table 3: The percentages in the head and tail of all blocks for Texas (λ = 50,000
square meters) square meters)
# of all # in Head (>= mean) % in Head # in Tail # of all # in Head (>= mean) % in Head # in Tail
Mean–λ: 1,013,600 622,841 90,301 14.5% 532,540 Mean–λ: 1,013,600 622,841 90,301 14.5% 532,540
Mean: 1,063,600 622,841 88,490 14.2% 534,351 Mean: 1,063,600 622,841 88,490 14.2% 534,351
Mean+λ: 1,113,600 622,841 86,803 13.9% 536,038 Mean+λ: 1,113,600 622,841 86,803 13.9% 536,038
Moreover, we plot all the block areas of Texas in rank-size form and the three Moreover, we plot all the block areas of Texas in rank-size form and the three
values of Mean, Mean – λ and Mean + λ (Figure 32). Obviously, we can see that values of Mean, Mean – λ and Mean + λ (Figure 32). Obviously, we can see that
the red point (Mean) is already the clear division of the hierarchy, while the two the red point (Mean) is already the clear division of the hierarchy, while the two
range points (Mean – λ and Mean + λ) are a bit ambiguous and arbitrary for range points (Mean – λ and Mean + λ) are a bit ambiguous and arbitrary for
division. Note that the differences of these three points are amplified for clarity. division. Note that the differences of these three points are amplified for clarity.
Actually they are very close to each other on the plot. Actually they are very close to each other on the plot.
59 59
Figure 32: the rank-size plot of block areas in Texas, where the red, yellow and Figure 32: the rank-size plot of block areas in Texas, where the red, yellow and
green points denote the Mean, Mean – λ and Mean + λ, respectively (Source: green points denote the Mean, Mean – λ and Mean + λ, respectively (Source:
Paper II). Paper II).
It should be noted that the principle of scaling of geographic space can be It should be noted that the principle of scaling of geographic space can be
characterized both by a heavy-tailed distribution or a dominated visualized characterized both by a heavy-tailed distribution or a dominated visualized
pattern or a comparison study. However, a heavy-tailed distribution is the pattern or a comparison study. However, a heavy-tailed distribution is the
necessary and sufficient condition to make the head/tail division rule makes necessary and sufficient condition to make the head/tail division rule makes
sense. This can be treated as their basic relationships. However, the more sense. This can be treated as their basic relationships. However, the more
important thing is to detect the physical meanings behind the hierarchical important thing is to detect the physical meanings behind the hierarchical
structures, and then apply them to urban studies. structures, and then apply them to urban studies.
4.3.2. Hierarchical structures and geographic implications 4.3.2. Hierarchical structures and geographic implications
The hierarchical theory and structure is one of the ways to tackle complexity The hierarchical theory and structure is one of the ways to tackle complexity
(Levin 1992). And there are evidences that people perceive and process spatial (Levin 1992). And there are evidences that people perceive and process spatial
information in a hierarchical way. The hierarchical structure is two-level, i.e. information in a hierarchical way. The hierarchical structure is two-level, i.e.
head and tail, in the scaling geographic objects. These two parts differ much head and tail, in the scaling geographic objects. These two parts differ much
from each other, which is the typical feature of scaling of geographic space. from each other, which is the typical feature of scaling of geographic space.
Each hierarchy indicates different geographic implications. Let’s take the Each hierarchy indicates different geographic implications. Let’s take the
example of the natural cities in the previous section again. There are 4 levels in example of the natural cities in the previous section again. There are 4 levels in
terms of city size. Then the cities can be ranked by their levels. For the 25 cities terms of city size. Then the cities can be ranked by their levels. For the 25 cities
in the fourth level, they can be treated as the large cities such as Berlin and in the fourth level, they can be treated as the large cities such as Berlin and
Frankfurt. The cities at remaining levels can be called middle cities, small cities Frankfurt. The cities at remaining levels can be called middle cities, small cities
and small towns. This classification of cities is incident to our common and small towns. This classification of cities is incident to our common
knowledge, and thus makes sense. Obviously, this is also the geographic knowledge, and thus makes sense. Obviously, this is also the geographic
60 60
implications behind the hierarchical structures differentiated by the head/tail implications behind the hierarchical structures differentiated by the head/tail
division rule. division rule.
The process of detecting hierarchical structures and the underlying geographic The process of detecting hierarchical structures and the underlying geographic
implications (physical meanings) is similar to the concept of knowledge implications (physical meanings) is similar to the concept of knowledge
discovery and data mining (KDD) but geographically related. To some extent, it discovery and data mining (KDD) but geographically related. To some extent, it
is a method of using hierarchy to infer spatial knowledge and thus resembles the is a method of using hierarchy to infer spatial knowledge and thus resembles the
concept of hierarchical spatial reasoning, where the hierarchy abstraction is a concept of hierarchical spatial reasoning, where the hierarchy abstraction is a
mechanism for problem solving (Car 1997). In this way, the complexity of mechanism for problem solving (Car 1997). In this way, the complexity of
origin problem is dramatically reduced and thus can be handled. Some studies origin problem is dramatically reduced and thus can be handled. Some studies
(e.g. Hirtle and Jonides 1985) also show that people tend to use hierarchical (e.g. Hirtle and Jonides 1985) also show that people tend to use hierarchical
strategy to reason in geographic space, thus this method has the virtue of strategy to reason in geographic space, thus this method has the virtue of
simplicity and advantage. For example, generally the houses with high prices are simplicity and advantage. For example, generally the houses with high prices are
located together, and vice versa. That is, those houses with similar price are located together, and vice versa. That is, those houses with similar price are
normally adjacent. To put it more generally, geographic objects with similar normally adjacent. To put it more generally, geographic objects with similar
property are related to each other in terms of geographic location. This is property are related to each other in terms of geographic location. This is
actually one case of the First Law of Geography, i.e. "everything is related to actually one case of the First Law of Geography, i.e. "everything is related to
everything else, but closer things are more closely related" (Tobler 1970). But everything else, but closer things are more closely related" (Tobler 1970). But
the process of inferring spatial knowledge is not that straightforward. the process of inferring spatial knowledge is not that straightforward.
In the above process, geographic observations are the quantified values of In the above process, geographic observations are the quantified values of
geographic objects or representations. Thus, to infer the geographic implication geographic objects or representations. Thus, to infer the geographic implication
also depends on the availability of appropriate representations in geographic also depends on the availability of appropriate representations in geographic
space. Urban system is a self-organized complex system and the heavy-tailed space. Urban system is a self-organized complex system and the heavy-tailed
distributions are the emergent patterns of such self-organized process (Chen and distributions are the emergent patterns of such self-organized process (Chen and
Zhou 2003, 2008). The geographic objects or representations are the basic units Zhou 2003, 2008). The geographic objects or representations are the basic units
of urban system. That is, if the geographic objects or representation is properly of urban system. That is, if the geographic objects or representation is properly
selected, the physical meaning behind the hierarchies can be obtained for urban selected, the physical meaning behind the hierarchies can be obtained for urban
studies. The hierarchical structures in the observed values reflect the internal studies. The hierarchical structures in the observed values reflect the internal
difference between these geographic objects. For instance, the above difference between these geographic objects. For instance, the above
classification of natural cities is four levels, i.e. large, middle, small cities and classification of natural cities is four levels, i.e. large, middle, small cities and
small town. City is the place where people change and share information and small town. City is the place where people change and share information and
materials. For the large cities, they play a role in the city system in a country, materials. For the large cities, they play a role in the city system in a country,
like hubs in the Internet network. This phenomenon verifies that the city system like hubs in the Internet network. This phenomenon verifies that the city system
in a country evolves in a self-organized manner, and the scale and function of in a country evolves in a self-organized manner, and the scale and function of
cities can be differentiated in a quantitative way in the principle of scaling of cities can be differentiated in a quantitative way in the principle of scaling of
geographic space and head/tail division rule. geographic space and head/tail division rule.
61 61
62 62
5. Results and discussion 5. Results and discussion
5.1. Overview 5.1. Overview
This chapter summarizes the main results of the listed papers and also discusses This chapter summarizes the main results of the listed papers and also discusses
the contributions of each to this thesis. The structure of these papers is illustrated the contributions of each to this thesis. The structure of these papers is illustrated
in Figure 29 and section 1.3, respectively. That is, papers VI and VII examine in Figure 29 and section 1.3, respectively. That is, papers VI and VII examine
and describe the theoretic framework of this thesis, i.e., the scaling of and describe the theoretic framework of this thesis, i.e., the scaling of
geographic space and the head/tail division rule; whereas papers I, II, III, IV and geographic space and the head/tail division rule; whereas papers I, II, III, IV and
V describe the applications of the above rules to urban studies, such as urban V describe the applications of the above rules to urban studies, such as urban
sprawl and mobility patterns. These papers verified the subject of scaling law sprawl and mobility patterns. These papers verified the subject of scaling law
from the perspectives of theory and application, and these two papers are closely from the perspectives of theory and application, and these two papers are closely
related to one another but with different emphases. related to one another but with different emphases.
First, paper VII decomposes the geographic space into old axial lines at the city First, paper VII decomposes the geographic space into old axial lines at the city
scale, and Paper VI represents the geographic space using blocks at the country scale, and Paper VI represents the geographic space using blocks at the country
level. Then, the sizes of blocks and attributes of axial lines are used to elaborate level. Then, the sizes of blocks and attributes of axial lines are used to elaborate
the scaling law of geographic space as well as the head/tail division rule in a the scaling law of geographic space as well as the head/tail division rule in a
general sense. In paper I, the geographic space refers to the “curvy space” of general sense. In paper I, the geographic space refers to the “curvy space” of
generated natural streets, and the attributes of curviness demonstrate the scaling generated natural streets, and the attributes of curviness demonstrate the scaling
property. Therefore, the head/tail division rule can be used to obtain thresholds property. Therefore, the head/tail division rule can be used to obtain thresholds
to generate new defined axial lines along street center lines. Although both to generate new defined axial lines along street center lines. Although both
paper I and VII are closely related to the concept of axial lines in space syntax, paper I and VII are closely related to the concept of axial lines in space syntax,
they are fundamentally different from one another: paper VII provides the they are fundamentally different from one another: paper VII provides the
evidence of the scaling law of geographic space, whereas paper I describes evidence of the scaling law of geographic space, whereas paper I describes
application of the scaling law to define and generate new axial lines to better application of the scaling law to define and generate new axial lines to better
understand the urban morphologies. understand the urban morphologies.
Similarly, papers II, III, IV and V apply the scaling principle to different Similarly, papers II, III, IV and V apply the scaling principle to different
geographic representations in the context of urban environments. Both papers II geographic representations in the context of urban environments. Both papers II
and IV use blocks as geographic representation. The former analyzes urban and IV use blocks as geographic representation. The former analyzes urban
sprawl at the city level, and the latter compares the hierarchical structure of cites sprawl at the city level, and the latter compares the hierarchical structure of cites
at the country level. Unlike papers II and IV, different types of geographic at the country level. Unlike papers II and IV, different types of geographic
representations are adopted to characterize the geographic space. In paper III, representations are adopted to characterize the geographic space. In paper III,
the stop parts in the trajectories of taxicabs are selected. The spatiotemporal the stop parts in the trajectories of taxicabs are selected. The spatiotemporal
clusters of stop points demonstrate the scaling property and further uncover clusters of stop points demonstrate the scaling property and further uncover
urban mobility patterns. Paper V describes the generation of the route with urban mobility patterns. Paper V describes the generation of the route with
fewest-turn, of which the visible small spaces in terms of perception along the fewest-turn, of which the visible small spaces in terms of perception along the
route is much greater than the cognitive number of turns from the origin space to route is much greater than the cognitive number of turns from the origin space to
the destination space. the destination space.
63 63
These experiments perform well, and the results verify the rules stated above. In These experiments perform well, and the results verify the rules stated above. In
turn, the results can be further used to reveal the scaling patterns of geographic turn, the results can be further used to reveal the scaling patterns of geographic
phenomena. By combing the theories and experiments, this thesis provides a phenomena. By combing the theories and experiments, this thesis provides a
fresh perspective to urban studies from a geospatial viewpoint. fresh perspective to urban studies from a geospatial viewpoint.
5.2. Paper VII: Scaling at city level from axial line perspective 5.2. Paper VII: Scaling at city level from axial line perspective
According to the classic definition in space syntax literature, axial lines refer to According to the classic definition in space syntax literature, axial lines refer to
the least number of longest visibility lines that can represent individual linear the least number of longest visibility lines that can represent individual linear
spaces of a geographic space of an urban environment. That is, the axial line is a spaces of a geographic space of an urban environment. That is, the axial line is a
kind of representation of geographic space, and all axial lines together cover the kind of representation of geographic space, and all axial lines together cover the
entire geographic space. This paper develops an automatic solution to generate entire geographic space. This paper develops an automatic solution to generate
axial lines based on the concepts of isovist and medial axes. The generated axial axial lines based on the concepts of isovist and medial axes. The generated axial
lines demonstrate a hierarchical structure, that is, a very few long, visually lines demonstrate a hierarchical structure, that is, a very few long, visually
dominating axial lines and many shorter, trivial ones. Moreover, the axial lines dominating axial lines and many shorter, trivial ones. Moreover, the axial lines
also bear a power law-like or heavy tail distribution. The above findings show also bear a power law-like or heavy tail distribution. The above findings show
the scaling property of geographic space and urban environment, which is also a the scaling property of geographic space and urban environment, which is also a
key feature of complex system. key feature of complex system.
As described in above sections, most previous urban theories were always As described in above sections, most previous urban theories were always
focused on the geographic elements or units such as the land use types in urban focused on the geographic elements or units such as the land use types in urban
environments. However, space syntax focuses on the opposite side: geographic environments. However, space syntax focuses on the opposite side: geographic
space. The geographic elements or units and space are like the two parts of a space. The geographic elements or units and space are like the two parts of a
black and white photo, which are complementary to each other. To this extent, black and white photo, which are complementary to each other. To this extent,
space syntax provides a special way to the study of urban phenomena. More space syntax provides a special way to the study of urban phenomena. More
than that, the study in this paper proves that space syntax not only revealed the than that, the study in this paper proves that space syntax not only revealed the
hierarchical structure and spatial distribution which in good agreement with hierarchical structure and spatial distribution which in good agreement with
previous urban models, but also provided the quantitative way to evaluate the previous urban models, but also provided the quantitative way to evaluate the
different parts of geographic space accurately. Based on the parameters different parts of geographic space accurately. Based on the parameters
calculated in space syntax, the statistical distribution can be identified. In nature, calculated in space syntax, the statistical distribution can be identified. In nature,
the geographic elements or units are also part of geographic space. In this sense, the geographic elements or units are also part of geographic space. In this sense,
space syntax is a better solution because of the direction research on space. And space syntax is a better solution because of the direction research on space. And
the revealed scaling patterns of geographic space have more representative of the revealed scaling patterns of geographic space have more representative of
significance and therefore deserve to be a more universal law for geographic significance and therefore deserve to be a more universal law for geographic
phenomena. phenomena.
The axial lines have been the key tool to the study of space syntax. However, the The axial lines have been the key tool to the study of space syntax. However, the
generation of them is conventionally done in GIS software by hand and thus is generation of them is conventionally done in GIS software by hand and thus is
criticized for being time consuming, subjective, or even arbitrary for a long criticized for being time consuming, subjective, or even arbitrary for a long
time. To eliminate such disadvantages, this paper develops an automatic solution time. To eliminate such disadvantages, this paper develops an automatic solution
to generate axial lines.. First of all, the geographic space refers to the space to generate axial lines.. First of all, the geographic space refers to the space
between obstacles such as buildings, where the obstacles are treated as simple between obstacles such as buildings, where the obstacles are treated as simple
2D polygons and the space is assumed to be freely visible and continuous. Then 2D polygons and the space is assumed to be freely visible and continuous. Then
64 64
the automatic solution generates axial in three steps: (1) create the medial axes the automatic solution generates axial in three steps: (1) create the medial axes
between the polygons, (2) along the medial axes, at each vertex there are 360 between the polygons, (2) along the medial axes, at each vertex there are 360
rays generated because there are 360 degrees. All the rays are sorted by length in rays generated because there are 360 degrees. All the rays are sorted by length in
descending order and (3) generate the bucket of the longest ray to eliminate the descending order and (3) generate the bucket of the longest ray to eliminate the
redundant rays and keep the current longest one until no more redundant rays redundant rays and keep the current longest one until no more redundant rays
left. In this way, the remaining rays are the axial lines. left. In this way, the remaining rays are the axial lines.
Obstacls in Obstacls in
geographic space geographic space
Obstacls to polygons Select the longest ray as Obstacls to polygons Select the longest ray as
current axial line current axial line
Voronoi regions of Generate bucket of Voronoi regions of Generate bucket of

polygons current axial line to polygons current axial line to
delete redundant ray No delete redundant ray No
Medial axes from Medial axes from

Voronoi regions Voronoi regions
No rays left? No rays left?
All rays on All rays on
medial axes medial axes
Yes Yes
Sort rays by length in End Sort rays by length in End

descending order descending order
Figure 33: The flow chart of generation of axial lines from algorithmic view Figure 33: The flow chart of generation of axial lines from algorithmic view
The above auto-solution relies on the obstacles in geographic space, and the The above auto-solution relies on the obstacles in geographic space, and the
concept of bucket is the mainstay in this study. Figure 33 gives the flow chart of concept of bucket is the mainstay in this study. Figure 33 gives the flow chart of
generating axial lines from the perspective of algorithm. The elimination of generating axial lines from the perspective of algorithm. The elimination of
redundant rays is based on how much percentage of a ray is contained within the redundant rays is based on how much percentage of a ray is contained within the
current bucket, and the percent is set as 80% by default. This solution exhausts current bucket, and the percent is set as 80% by default. This solution exhausts
the rays and the iterative algorithms ensure the correction of results. It should be the rays and the iterative algorithms ensure the correction of results. It should be
noted that when generating all rays, if the distance between two vertices is noted that when generating all rays, if the distance between two vertices is
greater than a threshold, e.g. one meter, then points will be interpolated in the greater than a threshold, e.g. one meter, then points will be interpolated in the
middle until the distance satisfies the threshold. middle until the distance satisfies the threshold.
The main contribution of this paper is two-fold: (1) developed an automatic The main contribution of this paper is two-fold: (1) developed an automatic
solution that can generate axial lines consisted of the least number of longest solution that can generate axial lines consisted of the least number of longest
visibility lines and works for different urban environments and (2) the partition visibility lines and works for different urban environments and (2) the partition
or representation of geographic space by the generated axial lines demonstrates or representation of geographic space by the generated axial lines demonstrates
65 65
the scaling pattern from the visualized and statistical perspectives. This study the scaling pattern from the visualized and statistical perspectives. This study
captures the true image of urban environments and is the first step to the scaling captures the true image of urban environments and is the first step to the scaling
of geographic space. of geographic space.
5.3. Paper VI: Scaling at country level from block perspective 5.3. Paper VI: Scaling at country level from block perspective
In this paper, the geographic space is set to the country level and the geographic In this paper, the geographic space is set to the country level and the geographic
representations are chosen as blocks, which are minimum cycles decomposed representations are chosen as blocks, which are minimum cycles decomposed
from country road networks. As we all know, the road network is closely related from country road networks. As we all know, the road network is closely related
to human activities and economy, i.e. without roads there is no society. As to human activities and economy, i.e. without roads there is no society. As
described in this paper, the blocks completely cover the whole geographic space described in this paper, the blocks completely cover the whole geographic space
of the country (see Figure 34 as an example). To this extent, blocks are suitable of the country (see Figure 34 as an example). To this extent, blocks are suitable
for the analyzing geographic phenomena at country level. Based on the statistics for the analyzing geographic phenomena at country level. Based on the statistics
of block sizes, i.e. areas, it is found that the distribution follows a heavy-tailed of block sizes, i.e. areas, it is found that the distribution follows a heavy-tailed
distribution. That is, there are far more small blocks than large ones. By distribution. That is, there are far more small blocks than large ones. By
extension, this paper defines the scaling of geographic space as the fact that extension, this paper defines the scaling of geographic space as the fact that
there are far more small geographic representations than large ones in a large- there are far more small geographic representations than large ones in a large-
scale geographic area. More than that, this paper further describes the head/tail scale geographic area. More than that, this paper further describes the head/tail
division rule as “given a variable X, if its values x follow a heavy-tailed division rule as “given a variable X, if its values x follow a heavy-tailed
distribution, then the mean (m) of the values can divide all the values into two distribution, then the mean (m) of the values can divide all the values into two
parts: a high percentage in the tail, and a low percentage in the head”. Based on parts: a high percentage in the tail, and a low percentage in the head”. Based on
this rule, blocks are categorized into two groups: small blocks whose areas are this rule, blocks are categorized into two groups: small blocks whose areas are
less than the mean value and imply city area and large blocks whose areas are less than the mean value and imply city area and large blocks whose areas are
greater than the mean value and indicate rural area. greater than the mean value and indicate rural area.
Figure 34: Comparison between urban and rural blocks (Note: red line is the Figure 34: Comparison between urban and rural blocks (Note: red line is the
boundary of natural city. Obviously, the blocks outside red line are large rural boundary of natural city. Obviously, the blocks outside red line are large rural
blocks and blocks inside the red line are the small urban blocks) blocks and blocks inside the red line are the small urban blocks)
66 66
Previous urban theories and studies aimed to discover reasons behind the spatial Previous urban theories and studies aimed to discover reasons behind the spatial
patterns and hierarchical structure of geographic phenomena such as city and patterns and hierarchical structure of geographic phenomena such as city and
land use types. In this paper, according to the geographic implications behind land use types. In this paper, according to the geographic implications behind
the hierarchies obtained via the head/tail division rule, the small blocks are the hierarchies obtained via the head/tail division rule, the small blocks are
clustered into groups, which constitute what we call natural cities. To give an clustered into groups, which constitute what we call natural cities. To give an
intuitive illustration of concepts of urban blocks, rural blocks and natural city, a intuitive illustration of concepts of urban blocks, rural blocks and natural city, a
small part of United Kingdom is zoomed in to show the difference between small part of United Kingdom is zoomed in to show the difference between
them (Figure 34). It is noticeable that the natural city is the aggregation of them (Figure 34). It is noticeable that the natural city is the aggregation of
small/urban blocks at a reasonable scale, e.g. 10 or 15 blocks at least. That is, if small/urban blocks at a reasonable scale, e.g. 10 or 15 blocks at least. That is, if
the number of aggregated blocks is too small, the aggregation will not be the number of aggregated blocks is too small, the aggregation will not be
grouped as a natural city. grouped as a natural city.
Experiments are carried out on the three biggest European countries, i.e. Experiments are carried out on the three biggest European countries, i.e.
Germany, France and United Kingdom. The scaling property of natural city size Germany, France and United Kingdom. The scaling property of natural city size
and the spatial distribution of natural city locations are both in good agreement and the spatial distribution of natural city locations are both in good agreement
with previous studies. And the hierarchical structures are identified in a with previous studies. And the hierarchical structures are identified in a
quantitative manner. However, the previous urban theories explained the spatial quantitative manner. However, the previous urban theories explained the spatial
patterns from the economic perspectives. Instead, we believe that the topological patterns from the economic perspectives. Instead, we believe that the topological
information of blocks (i.e. border number) and natural cities spatial distributions information of blocks (i.e. border number) and natural cities spatial distributions
map the real image and mental map of a country. This kind of understanding of map the real image and mental map of a country. This kind of understanding of
morphology is mainly from the geospatial point of view. This paper further morphology is mainly from the geospatial point of view. This paper further
makes a comparison of the pattern and structure between the blocks/cities and makes a comparison of the pattern and structure between the blocks/cities and
organism, and points out that there are many similarities between geographic organism, and points out that there are many similarities between geographic
space and complex organisms in terms of their self-organized functions. Thus, space and complex organisms in terms of their self-organized functions. Thus,
this paper advances the understanding of urban hierarchies than previous urban this paper advances the understanding of urban hierarchies than previous urban
studies and gave deeper insights into the scaling law of geographic spaces. studies and gave deeper insights into the scaling law of geographic spaces.
The main contributions of this paper are: (1) propose the concept of block in The main contributions of this paper are: (1) propose the concept of block in
road network as the new kind of geographic representation; (2) define the road network as the new kind of geographic representation; (2) define the
principle of scaling of geographic space as the phenomena that can be principle of scaling of geographic space as the phenomena that can be
characterized by heavy-tailed distributions; (3) as the specific application, the characterized by heavy-tailed distributions; (3) as the specific application, the
head/tail division rule is described; (4) delineate the natural cities based on the head/tail division rule is described; (4) delineate the natural cities based on the
small blocks differentiated by head/tail division rule. Both of the above rules are small blocks differentiated by head/tail division rule. Both of the above rules are
the theoretic basis for further studies. the theoretic basis for further studies.
5.4. Paper V: Computing the fewest-turn map directions 5.4. Paper V: Computing the fewest-turn map directions
In this paper, the authors use the criterion of the fewest-turn to model a route in In this paper, the authors use the criterion of the fewest-turn to model a route in
a real road network. Given a route for people to walk/drive through, what people a real road network. Given a route for people to walk/drive through, what people
perceive at each point along the route is small space (Figure 35). Perceptually, perceive at each point along the route is small space (Figure 35). Perceptually,
the route consists of numerous of such small spaces. From the geometrical point the route consists of numerous of such small spaces. From the geometrical point
of view, there are many road segments in the route, each of which consists of of view, there are many road segments in the route, each of which consists of
67 67
numerous small spaces. However, if the road segments are joined together based numerous small spaces. However, if the road segments are joined together based
on the Gestalt principle of good continuity, then they form the natural road or on the Gestalt principle of good continuity, then they form the natural road or
stroke, which is the people’s cognitive concept for space. Turn is defined as the stroke, which is the people’s cognitive concept for space. Turn is defined as the
change of one natural road to another (Figure 35), and thus the fewest-turn route change of one natural road to another (Figure 35), and thus the fewest-turn route
is the shortest path of topological connectivity in the natural road network. is the shortest path of topological connectivity in the natural road network.
Therefore, the fewest-turn route includes numerous perceptual small spaces, Therefore, the fewest-turn route includes numerous perceptual small spaces,
many geometric road segments and very few cognitive natural roads. many geometric road segments and very few cognitive natural roads.
In this sense, the fewest-turn map direction possesses and demonstrates the In this sense, the fewest-turn map direction possesses and demonstrates the
hierarchical structure and the scaling property of geographic space, i.e. in the hierarchical structure and the scaling property of geographic space, i.e. in the
route there are much more small spaces than number road segments and many route there are much more small spaces than number road segments and many
more number of road segments than number of turns. The number of turns in the more number of road segments than number of turns. The number of turns in the
routes is much smaller than we expected in terms of perception and geometry, routes is much smaller than we expected in terms of perception and geometry,
and thus the cognitive burden of the route is efficiently and effectively reduced. and thus the cognitive burden of the route is efficiently and effectively reduced.
As stated previously, the scaling of geographic space is to some extent As stated previously, the scaling of geographic space is to some extent
hierarchical view of space. Thus, the fewest-turn map direction is the scaling hierarchical view of space. Thus, the fewest-turn map direction is the scaling
view of geographic space in a cognitive and hierarchical way and demonstrates view of geographic space in a cognitive and hierarchical way and demonstrates
the scaling pattern at the levels of perception and cognition. the scaling pattern at the levels of perception and cognition.
One turn One turn
Figure 35: the numerous small spaces perceived by people along the route and Figure 35: the numerous small spaces perceived by people along the route and
the dramatically simplified cognitive space (turn), i.e., the perception vs the dramatically simplified cognitive space (turn), i.e., the perception vs
cognition hierachically reflects scaling of geographic space. (Note: the image is cognition hierachically reflects scaling of geographic space. (Note: the image is
obtained from hitta.se) obtained from hitta.se)
68 68
To obtain the route with minimized cognitive burden, this model relies on the To obtain the route with minimized cognitive burden, this model relies on the
concepts of natural roads and its converted topological representation, i.e. the concepts of natural roads and its converted topological representation, i.e. the
connectivity graph. Because the natural roads possess the property of good connectivity graph. Because the natural roads possess the property of good
continuity, the route derived from the connectivity graph that converted from continuity, the route derived from the connectivity graph that converted from
natural roads has the fewest-turn. Turns are the semantic information at decision natural roads has the fewest-turn. Turns are the semantic information at decision
points in the navigation task in the road network. It means the action of leaving points in the navigation task in the road network. It means the action of leaving
from on road to another. In fact, to turn is to leave from one segment to another. from on road to another. In fact, to turn is to leave from one segment to another.
Changing directions within a road segment is not “true” turn. For instance, if Changing directions within a road segment is not “true” turn. For instance, if
people are driving within a road segment, although they are changing directions people are driving within a road segment, although they are changing directions
from time to time by turning the wheels, these are not the turns in terms of from time to time by turning the wheels, these are not the turns in terms of
decision making. Physically, the decision points are junctions of road network. decision making. Physically, the decision points are junctions of road network.
The fewer turns in the planned route, the less the cognitive burden the route has. The fewer turns in the planned route, the less the cognitive burden the route has.
It is easy to see that two concepts, i.e. natural road and connectivity graph, are It is easy to see that two concepts, i.e. natural road and connectivity graph, are
the mainstay of this study. In addition, a series of algorithms that process road the mainstay of this study. In addition, a series of algorithms that process road
segments, build up the natural roads and calculate the routes from connectivity segments, build up the natural roads and calculate the routes from connectivity
graph also play an important role. The fewest-turn route is particularly favored graph also play an important role. The fewest-turn route is particularly favored
by people when navigating in an unfamiliar environment because it bears less by people when navigating in an unfamiliar environment because it bears less
cognitive burden. Moreover, the fewest-turn route implies less need to slow cognitive burden. Moreover, the fewest-turn route implies less need to slow
down and speed up, which is good for saving time, economizing petroleum, and down and speed up, which is good for saving time, economizing petroleum, and
reducing emission and thus environmentally responsible. reducing emission and thus environmentally responsible.
For illustration of these representations, we present an artificial grid city road For illustration of these representations, we present an artificial grid city road
network in Figure 36 (a). In this grid road network, there are 24, i.e. (1, 2…24), network in Figure 36 (a). In this grid road network, there are 24, i.e. (1, 2…24),
segments sharing 16 junctions, i.e. (j1, j2…j16). We assume that segments share segments sharing 16 junctions, i.e. (j1, j2…j16). We assume that segments share
the boundary junctions (i.e. expect j6, j7, j10, j11) extend forward in its the boundary junctions (i.e. expect j6, j7, j10, j11) extend forward in its
direction. For example, at junction j4, segment 3 and segment 22 will extend direction. For example, at junction j4, segment 3 and segment 22 will extend
forward in their directions, respectively. The reason why we make this forward in their directions, respectively. The reason why we make this
assumption is that the grid style artificial road network is a part of the block assumption is that the grid style artificial road network is a part of the block
structure of city road network. Only in this way, the segments and junctions will structure of city road network. Only in this way, the segments and junctions will
make sense. Furthermore, geometric lengths of 24 segments are strictly equal to make sense. Furthermore, geometric lengths of 24 segments are strictly equal to
each other. The green route is the one with the fewest turns, while the red route each other. The green route is the one with the fewest turns, while the red route
is the one with the most turns. We can note that the routes with the fewest turns is the one with the most turns. We can note that the routes with the fewest turns
are those with the shortest topological distances in the topological are those with the shortest topological distances in the topological
representation. representation.
69 69
R5 R6 R7 R8 R5 R6 R7 R8
R1 R1
R2 R2
R3 R3
R4 R4
(a) (b) (a) (b)
Figure 36: shortest (red) and the fewest-turn routes on (a) grid road network and Figure 36: shortest (red) and the fewest-turn routes on (a) grid road network and
(b) connectivity graph based on natural roads (R1, R2…R8). (b) connectivity graph based on natural roads (R1, R2…R8).
Based on the concepts and algorithms stated above, a model is designed to Based on the concepts and algorithms stated above, a model is designed to
calculate the fewest-turn-and-shortest paths. This approach adopts topological calculate the fewest-turn-and-shortest paths. This approach adopts topological
information and semantic information in named streets to derive map directions information and semantic information in named streets to derive map directions
Experiments are carried out on four cities from North America and four cities in Experiments are carried out on four cities from North America and four cities in
Europe with different morphological structures of road networks. The Europe with different morphological structures of road networks. The
experimental results show that the computation of the routes is more efficient experimental results show that the computation of the routes is more efficient
and the fewest-turn routes posses fewer turns and shorter distances than the and the fewest-turn routes posses fewer turns and shorter distances than the
simplest paths and the routes provided by Google Maps. For example, the simplest paths and the routes provided by Google Maps. For example, the
fewest-turn-and-shortest routes are on average 15% shorter than the routes fewest-turn-and-shortest routes are on average 15% shorter than the routes
suggested by Google Maps, while the number of turns is just half as much. suggested by Google Maps, while the number of turns is just half as much.
5.5. Paper I: Defining and auto-generating axial lines 5.5. Paper I: Defining and auto-generating axial lines
The objective of this paper is to redefine axial lines and develop an auto-solution The objective of this paper is to redefine axial lines and develop an auto-solution
to generate them by applying scaling of geographic space and head/tail division to generate them by applying scaling of geographic space and head/tail division
rule. The new definition of axial lines is based on the walkability or drivability rule. The new definition of axial lines is based on the walkability or drivability
of street center lines, whereas the old (classic) axial lines are defined based on of street center lines, whereas the old (classic) axial lines are defined based on
the visibility between spatial obstacles in urban environments. Based on this the visibility between spatial obstacles in urban environments. Based on this
new definition, a non-parametric model is designed to generate new axial lines new definition, a non-parametric model is designed to generate new axial lines
to provide a better understanding of urban morphologies. to provide a better understanding of urban morphologies.
First of all, this paper is an application of the principle of geographic space and First of all, this paper is an application of the principle of geographic space and
the head/tail division rule. That is, we need to select the appropriate geographic the head/tail division rule. That is, we need to select the appropriate geographic
entities in geographic space. In this paper, the natural streets (ref. Section 2.1.1) entities in geographic space. In this paper, the natural streets (ref. Section 2.1.1)
are selected to represent the geographic space because of their walkability and are selected to represent the geographic space because of their walkability and
70 70
drivability, which indicate the range of human activities. Based on the natural drivability, which indicate the range of human activities. Based on the natural
streets, the process of generating new axial lines is very similar to the Douglas- streets, the process of generating new axial lines is very similar to the Douglas-
Peucker simplification algorithm (Douglas and Peucker 1973) but fundamentally Peucker simplification algorithm (Douglas and Peucker 1973) but fundamentally
different because of the threshold. The process can be described as follows: first different because of the threshold. The process can be described as follows: first
we connect the first and last points of a natural street to form a straight line, we connect the first and last points of a natural street to form a straight line,
which we call base line; then we find the point that is furthest from the base line which we call base line; then we find the point that is furthest from the base line
on the natural street. In this way, we get three parameters: the distance x, the on the natural street. In this way, we get three parameters: the distance x, the
base line length d and the ratio of x/d. Repeat this process, the original natural base line length d and the ratio of x/d. Repeat this process, the original natural
street line is recursively divided. When distance x and ratio satisfy the given street line is recursively divided. When distance x and ratio satisfy the given
thresholds, this process stops. Figure 37 shows the process of generation of new thresholds, this process stops. Figure 37 shows the process of generation of new
axial lines step by step. The subscripts l and r mean left and right. When both axial lines step by step. The subscripts l and r mean left and right. When both
the x and ratio satisfy the given thresholds, the process stops and 5 axial lines the x and ratio satisfy the given thresholds, the process stops and 5 axial lines
are generated for the red natural street. Obviously, the new defined axial lines are generated for the red natural street. Obviously, the new defined axial lines
refer to the least number of individual straight line segments mutually refer to the least number of individual straight line segments mutually
intersected along natural streets. intersected along natural streets.
(a) step 1 (b) step 2 (a) step 1 (b) step 2
(a) step 3 (d) step 4 (a) step 3 (d) step 4
Figure 37: Illustration of generating new axial lines based on natural street Figure 37: Illustration of generating new axial lines based on natural street
(Note: the current natural street is the red one, and the black solid lines are the (Note: the current natural street is the red one, and the black solid lines are the
generated new axial lines) generated new axial lines)
71 71
Street center line segments Street center line segments
Gestalt principle of good continuity Gestalt principle of good continuity
Natural streets Natural streets

Get x and x/d by head/tail division rule Get x and x/d by head/tail division rule
New axial lines streets New axial lines streets
Figure 38: Conceptual model of the three-step method Figure 38: Conceptual model of the three-step method
According to the above process, a straightforward method (Figure 38) is According to the above process, a straightforward method (Figure 38) is
developed to generate the new axial lines: first is to generate natural street from developed to generate the new axial lines: first is to generate natural street from
street center line segments, second is to determine the thresholds of x and ratio. street center line segments, second is to determine the thresholds of x and ratio.
Third is to apply the algorithm to natural streets to generate new axial lines. The Third is to apply the algorithm to natural streets to generate new axial lines. The
process of calculating thresholds is: we repeat the algorithm to all natural streets process of calculating thresholds is: we repeat the algorithm to all natural streets
until x is equal to zero. Then we take the mean values of all x and ratio x/d as the until x is equal to zero. Then we take the mean values of all x and ratio x/d as the
thresholds, because both of them follow the heavy-tailed distribution. thresholds, because both of them follow the heavy-tailed distribution.
In this method, the two parameters x and x/d and their thresholds for later In this method, the two parameters x and x/d and their thresholds for later
calculation are both objectively obtained from the bottom up. In essence, they calculation are both objectively obtained from the bottom up. In essence, they
reflect the curviness of natural streets, which is a kind of spatial morphology. As reflect the curviness of natural streets, which is a kind of spatial morphology. As
stated above, the two parameters demonstrate scaling property, which means the stated above, the two parameters demonstrate scaling property, which means the
geographic space represented by natural street also possesses scaling property. geographic space represented by natural street also possesses scaling property.
The hierarchical structures of the curviness obtained via the head/tail division The hierarchical structures of the curviness obtained via the head/tail division
rule provide a natural way to generate the new defined axial lines. rule provide a natural way to generate the new defined axial lines.
The automatic solution is applied to six selected cities, i.e. Copenhagen, The automatic solution is applied to six selected cities, i.e. Copenhagen,
London, Paris, Manhattan, San Francisco and Toronto. These six cities reflect London, Paris, Manhattan, San Francisco and Toronto. These six cities reflect
two types of different typical street patterns: the three European cities are two types of different typical street patterns: the three European cities are
irregular and self-evolved, while the three North American cities are grid like irregular and self-evolved, while the three North American cities are grid like
and planned. Through the comparison study between the new axial lines and the and planned. Through the comparison study between the new axial lines and the
conventional old axial lines, and between the new axial lines and natural streets, conventional old axial lines, and between the new axial lines and natural streets,
the new axial lines demonstrate to be a better alternative to capture the the new axial lines demonstrate to be a better alternative to capture the
underlying urban structure. underlying urban structure.
The main contribution of this paper is multi-fold: (1) a less ambiguous definition The main contribution of this paper is multi-fold: (1) a less ambiguous definition
of axial line is provided, (2) based on the new definition of axial line, an auto- of axial line is provided, (2) based on the new definition of axial line, an auto-
solution is developed, (3) the scaling of geographic space and head/tail division solution is developed, (3) the scaling of geographic space and head/tail division
rule are successfully applied in the model, (4) the generated axial lines in cities rule are successfully applied in the model, (4) the generated axial lines in cities
with different morphologies prove to work well and (5) as a kind of new with different morphologies prove to work well and (5) as a kind of new
geographic representation, the new axial lines further show the underlying urban geographic representation, the new axial lines further show the underlying urban
structure and scaling of geographic space. structure and scaling of geographic space.
72 72
5.6. Paper II: Identification of urban sprawl patches 5.6. Paper II: Identification of urban sprawl patches
This paper attempts to identify urban sprawl patches based on the principle of This paper attempts to identify urban sprawl patches based on the principle of
scaling of geographic space and the head/tail division rule. Previous studies scaling of geographic space and the head/tail division rule. Previous studies
dedicated to detecting urban sprawl are mainly based on census and remote dedicated to detecting urban sprawl are mainly based on census and remote
sensing data sets, which are suffered from being subjective and not-up-to-date to sensing data sets, which are suffered from being subjective and not-up-to-date to
some extent. In this paper, the detection of urban sprawl is totally based on the some extent. In this paper, the detection of urban sprawl is totally based on the
spatial data: blocks. Therefore, the first thing is to figure out how blocks can be spatial data: blocks. Therefore, the first thing is to figure out how blocks can be
identified as sprawling ones or not. As stated previously, blocks are minimum identified as sprawling ones or not. As stated previously, blocks are minimum
cycles decomposed from road networks. They are covering the whole space cycles decomposed from road networks. They are covering the whole space
where the road networks take and the important geographic elements in the where the road networks take and the important geographic elements in the
process of urbanization. It means that the attributes of blocks possess rich process of urbanization. It means that the attributes of blocks possess rich
knowledge of urban growth. According to previous urban studies, urban systems knowledge of urban growth. According to previous urban studies, urban systems
evolve in a self-organized manner and possess hierarchical structures. Therefore, evolve in a self-organized manner and possess hierarchical structures. Therefore,
if we can find out the appropriate attributes of blocks and prove their scaling if we can find out the appropriate attributes of blocks and prove their scaling
property, then we can apply the head/tail division rule to group blocks into property, then we can apply the head/tail division rule to group blocks into
normal and abnormal ones in terms of urban growth. Because urban sprawl is normal and abnormal ones in terms of urban growth. Because urban sprawl is
the disorderly growth in the transition from rural area to urban area, thus the the disorderly growth in the transition from rural area to urban area, thus the
abnormal blocks can be identified as sprawling patches. This is the basic idea of abnormal blocks can be identified as sprawling patches. This is the basic idea of
this paper. this paper.
We can see that this method applies the scaling property of geographic space in We can see that this method applies the scaling property of geographic space in
a quantitative way in detecting urban sprawl. In this study, we find that the a quantitative way in detecting urban sprawl. In this study, we find that the
areas, diameters and dangling lines of blocks all demonstrate scaling property. areas, diameters and dangling lines of blocks all demonstrate scaling property.
Similarly, the hierarchical structures can be detected based on the scaling Similarly, the hierarchical structures can be detected based on the scaling
properties of area and dangling lines of blocks and head/tail division rule. And properties of area and dangling lines of blocks and head/tail division rule. And
the geographic implications behind the hierarchies are: the blocks with larger the geographic implications behind the hierarchies are: the blocks with larger
area and more dangling lines inside natural cities imply the urban sprawl, area and more dangling lines inside natural cities imply the urban sprawl,
because such blocks lead to less density and inconvenient accessibility. because such blocks lead to less density and inconvenient accessibility.
The urban sprawl is detected within cities. Thus, the first thing is to delineate The urban sprawl is detected within cities. Thus, the first thing is to delineate
natural cities. Natural cities are the aggregations of blocks with small areas, natural cities. Natural cities are the aggregations of blocks with small areas,
because the geographic implication is that small blocks represent the urban area. because the geographic implication is that small blocks represent the urban area.
More than that, the diameter and ratio of block also demonstrate the scaling More than that, the diameter and ratio of block also demonstrate the scaling
pattern. And the hierarchical structures divided by these two attributes of blocks pattern. And the hierarchical structures divided by these two attributes of blocks
indicate that blocks: (1) with smaller area and ratio are urban ones; (2) with indicate that blocks: (1) with smaller area and ratio are urban ones; (2) with
small diameter and the neighbors are not all blocks with larger area or ratio, are small diameter and the neighbors are not all blocks with larger area or ratio, are
city blocks. On the other hand, blocks with large area or diameter are rural ones. city blocks. On the other hand, blocks with large area or diameter are rural ones.
In this way, the natural cities are re-aggregated and the results prove to be In this way, the natural cities are re-aggregated and the results prove to be
refined compared with old ones. Under this modified strategy, the new natural refined compared with old ones. Under this modified strategy, the new natural
cities become more complete and the possible errors are eliminated. cities become more complete and the possible errors are eliminated.
73 73
The above model is applied to refine the natural cities in Texas and identify the The above model is applied to refine the natural cities in Texas and identify the
urban sprawl patches in the natural city of Dallas, TX. The identified urban urban sprawl patches in the natural city of Dallas, TX. The identified urban
sprawl patches are further classified into different levels. Through a comparison sprawl patches are further classified into different levels. Through a comparison
between the randomly picked sprawling blocks and corresponding imagery between the randomly picked sprawling blocks and corresponding imagery
obtained from Google Maps, it is found that the identified blocks are where the obtained from Google Maps, it is found that the identified blocks are where the
urban sprawl is actually occurring. urban sprawl is actually occurring.
Blocks that represents urban space Blocks that represents urban space
Detect scaling attributes of blocks Detect scaling attributes of blocks
Get hierarchies by head/tail rule Get hierarchies by head/tail rule
Refine natural cities Refine natural cities
Identify urban sprawl patches Identify urban sprawl patches
Figure 39: The basic view of the method by order Figure 39: The basic view of the method by order
This study is an application of the scaling of geographic space and head/tail This study is an application of the scaling of geographic space and head/tail
division rule. The main contribution of this study is: (1) detect the scaling division rule. The main contribution of this study is: (1) detect the scaling
attributes of block besides area, i.e. diameter, ratio and dangling lines, and attributes of block besides area, i.e. diameter, ratio and dangling lines, and
reveal their geographic implications; (2) refine the natural cities based the reveal their geographic implications; (2) refine the natural cities based the
scaling properties of area, diameter and ratio; (3) identify urban sprawl patches scaling properties of area, diameter and ratio; (3) identify urban sprawl patches
(sprawling blocks) based on scaling area and dangling lines of blocks inside (sprawling blocks) based on scaling area and dangling lines of blocks inside
natural cities. This kind of work is inspiring and deserves further studies. natural cities. This kind of work is inspiring and deserves further studies.
5.7. Paper III: Uncovering urban mobility patterns 5.7. Paper III: Uncovering urban mobility patterns
In this paper, the principle scaling of geographic space is applied to temporal- In this paper, the principle scaling of geographic space is applied to temporal-
spatial data sets. The data sets include the attributes of location, attributes and spatial data sets. The data sets include the attributes of location, attributes and
time. The motions of objects in geographic space generate massive mobility time. The motions of objects in geographic space generate massive mobility
data, which are normally recorded as trajectories in the context of time- data, which are normally recorded as trajectories in the context of time-
geography (Hägerstrand, 1970). Collectively, the trajectories of many objects geography (Hägerstrand, 1970). Collectively, the trajectories of many objects
over a relative long time cover the whole geographic space and their temporal over a relative long time cover the whole geographic space and their temporal
and spatial patterns reflect the inner structures of the local urban systems in and spatial patterns reflect the inner structures of the local urban systems in
terms of mobility. Previous urban theories and models focused on the spatial terms of mobility. Previous urban theories and models focused on the spatial
attribute of geographic entities, whereas some other studies focused on the attribute of geographic entities, whereas some other studies focused on the
whole trajectory. However, this paper takes a deeper looks at the trajectories, whole trajectory. However, this paper takes a deeper looks at the trajectories,
divides these trajectories into moves and stops, and then focuses on the time divides these trajectories into moves and stops, and then focuses on the time
attribute of the latter. Because, as the components of complex traffic system, the attribute of the latter. Because, as the components of complex traffic system, the
74 74
stop part in trajectories always indicate abnormal behavior such as traffic jams stop part in trajectories always indicate abnormal behavior such as traffic jams
in terms of normal driving. Furthermore, suppose the time attribute of the stop in terms of normal driving. Furthermore, suppose the time attribute of the stop
part demonstrates scaling property, then the head/tail division rule can part demonstrates scaling property, then the head/tail division rule can
immediately differentiate the hierarchies, behind which is the implications for immediately differentiate the hierarchies, behind which is the implications for
urban mobility patterns. This is the basic idea in this study, which extends the urban mobility patterns. This is the basic idea in this study, which extends the
scaling principle to the time dimension to analyze the inner structure of urban scaling principle to the time dimension to analyze the inner structure of urban
systems. systems.
According to the basic idea stated above, more than fourteen million GPS points According to the basic idea stated above, more than fourteen million GPS points
collected from over ten thousands taxicabs in Wuhan, Hubei, China are first collected from over ten thousands taxicabs in Wuhan, Hubei, China are first
identified by its ID and connected as trajectories. Then, the trajectories are identified by its ID and connected as trajectories. Then, the trajectories are
separated into moves and stops and associated with taxicab IDs through a kind separated into moves and stops and associated with taxicab IDs through a kind
of data intensive geo-computation. Thereafter, it is found that the time intervals of data intensive geo-computation. Thereafter, it is found that the time intervals
of all the stops demonstrate the scaling property. That is, there are far more short of all the stops demonstrate the scaling property. That is, there are far more short
stops than long ones, which can be clearly differentiated by the head/tail stops than long ones, which can be clearly differentiated by the head/tail
division rule. From this point of view, the principle of scaling of geographic is division rule. From this point of view, the principle of scaling of geographic is
fully applicable to its time dimension. The long stops are considered to the fully applicable to its time dimension. The long stops are considered to the
typical hallmark of traffic system. Based on the long stops, the spatiotemporal typical hallmark of traffic system. Based on the long stops, the spatiotemporal
clusters (Figure 40) are obtained via a simple algorithm at different timelines. clusters (Figure 40) are obtained via a simple algorithm at different timelines.
Interestingly, the scales of these spatiotemporal clusters also show scaling Interestingly, the scales of these spatiotemporal clusters also show scaling
property in terms of time interval and number of taxicabs. Thus, the hierarchies property in terms of time interval and number of taxicabs. Thus, the hierarchies
of these clusters can be obtained via head/tail division rule in a quantitative of these clusters can be obtained via head/tail division rule in a quantitative
manner for analyzing urban structures and patterns. Furthermore, the geographic manner for analyzing urban structures and patterns. Furthermore, the geographic
implications behind the hierarchical structures indicate where the traffic jams implications behind the hierarchical structures indicate where the traffic jams
and hot spots happen, respectively. In this way, the evolution of each traffic jam and hot spots happen, respectively. In this way, the evolution of each traffic jam
or hot spot becomes trackable. And the distributions of traffic jams and hot spots or hot spot becomes trackable. And the distributions of traffic jams and hot spots
suggest the dynamic and multi nuclei urban mobility patterns. suggest the dynamic and multi nuclei urban mobility patterns.
Figure 40: Comparison between hotspot and traffic jam (Source: paper III) Figure 40: Comparison between hotspot and traffic jam (Source: paper III)
75 75
In addition to the idea, data processing is also an important part in this study. It In addition to the idea, data processing is also an important part in this study. It
is a step-by-step solution with some simple algorithms. The following figure is a step-by-step solution with some simple algorithms. The following figure
provides a conceptual view of this method: provides a conceptual view of this method:
Floating care data (GPS points) Floating care data (GPS points)
Trajectories of each taxicab Trajectories of each taxicab
Stops and moves Stops and moves
Spatiotemporal clusters of stops Spatiotemporal clusters of stops
Figure 41: Step-by-step data processing solution Figure 41: Step-by-step data processing solution
This paper successfully extends and applies the scaling of geographic space and This paper successfully extends and applies the scaling of geographic space and
the head/tail division rule to the time dimension. The main contributions are: (1) the head/tail division rule to the time dimension. The main contributions are: (1)
the stop part in the GPS points is used to uncover the mobility patterns, rather the stop part in the GPS points is used to uncover the mobility patterns, rather
than the conventional trajectory methods which mix the moving and stop points than the conventional trajectory methods which mix the moving and stop points
together; (2) the urban mobility patterns are analyzed in a quantitative manner. together; (2) the urban mobility patterns are analyzed in a quantitative manner.
That is, the traffic jams and hot spots are extracted, and their evolution can be That is, the traffic jams and hot spots are extracted, and their evolution can be
accurately tracked. accurately tracked.
5.8. Paper IV: Comparison of hierarchical spatial structures 5.8. Paper IV: Comparison of hierarchical spatial structures
The aim of this paper is first to define the hierarchical spatial structure at the The aim of this paper is first to define the hierarchical spatial structure at the
country level and then make an across-country comparison among them. The country level and then make an across-country comparison among them. The
hierarchical spatial structure at the country level is defined as follows: because hierarchical spatial structure at the country level is defined as follows: because
the city sizes of each country demonstrate scaling property, then we can apply the city sizes of each country demonstrate scaling property, then we can apply
the head/tail division rule to all cities to get the small and large ones as a two- the head/tail division rule to all cities to get the small and large ones as a two-
tire hierarchical structure. Repeatedly apply the head/tail division rule to head tire hierarchical structure. Repeatedly apply the head/tail division rule to head
part of cities in each country, and then we will get more of such structures at part of cities in each country, and then we will get more of such structures at
different levels. These two-tire hierarchical structures at each level of a country different levels. These two-tire hierarchical structures at each level of a country
are called hierarchical spatial structure. are called hierarchical spatial structure.
The definition of hierarchical spatial structure of country depends upon two The definition of hierarchical spatial structure of country depends upon two
facts: first is the scaling theory and hierarchical models from previous studies, facts: first is the scaling theory and hierarchical models from previous studies,
which means the self similarity of inner structure. In this paper, we simply adopt which means the self similarity of inner structure. In this paper, we simply adopt
the head/tail division rule to obtain the similar structures. Therefore, this study is the head/tail division rule to obtain the similar structures. Therefore, this study is
a kind of application of scaling of geographic space in essence. And second is a kind of application of scaling of geographic space in essence. And second is
that the blocks are considered to represent the geographic carrying capacity in a that the blocks are considered to represent the geographic carrying capacity in a
country due to human activities is mainly constrained inside blocks. country due to human activities is mainly constrained inside blocks.
76 76
Based on the above ideas, we set up the comparison standard as follows: at each Based on the above ideas, we set up the comparison standard as follows: at each
level, we can get the variable ratio of tail divided by head. Obviously, the ratios level, we can get the variable ratio of tail divided by head. Obviously, the ratios
of countries vary with each other, but basically are in agreement with the 80/20 of countries vary with each other, but basically are in agreement with the 80/20
principle. Therefore, if the ratio is deviated too far away, such as half/half principle. Therefore, if the ratio is deviated too far away, such as half/half
percent, then we can say that the structure at current level of the country is not percent, then we can say that the structure at current level of the country is not
good. That is, the more scaling, the better the structure as a self-organized good. That is, the more scaling, the better the structure as a self-organized
complex system. Moreover, 85% of blocks in a country belong to cities on complex system. Moreover, 85% of blocks in a country belong to cities on
average. And we find that the correlation coefficients between city sizes/number average. And we find that the correlation coefficients between city sizes/number
of blocks and GDP/population are up to 0.87. Therefore, according to results of of blocks and GDP/population are up to 0.87. Therefore, according to results of
the comparison, the assessment on the countries’ economical structure can be the comparison, the assessment on the countries’ economical structure can be
further made. further made.
This paper applies the scaling principle to obtain and compare the inner This paper applies the scaling principle to obtain and compare the inner
structures. However, the hierarchical spatial structures at country level could be structures. However, the hierarchical spatial structures at country level could be
twofold: hierarchical structures of blocks and hierarchical structures of natural twofold: hierarchical structures of blocks and hierarchical structures of natural
cities. This kind of hierarchical structure is easy to understand, and the reason cities. This kind of hierarchical structure is easy to understand, and the reason
why chose the structures based cities not only because the cities possess most of why chose the structures based cities not only because the cities possess most of
the blocks in a country, but only because cities are functioning units in a the blocks in a country, but only because cities are functioning units in a
country. Therefore, the hierarchical structure directly based on blocks does not country. Therefore, the hierarchical structure directly based on blocks does not
make sense. make sense.
Compared with previous papers, the hierarchical structures at country level are Compared with previous papers, the hierarchical structures at country level are
analyzed in detail and compared with each other. The main contribution of this analyzed in detail and compared with each other. The main contribution of this
paper is to apply the principle of scaling of geographic space and head/tail paper is to apply the principle of scaling of geographic space and head/tail
division rule to define hierarchical structure at country level and set up a division rule to define hierarchical structure at country level and set up a
standard for comparison. standard for comparison.
77 77
78 78
6. Conclusions and future research 6. Conclusions and future research
6.1. Conclusions 6.1. Conclusions
In this thesis we examine the scaling property of geographic space using In this thesis we examine the scaling property of geographic space using
volunteered geographic information from different perspectives in a quantitative volunteered geographic information from different perspectives in a quantitative
manner. The scaling of geographic space refers to the phenomenon that small manner. The scaling of geographic space refers to the phenomenon that small
objects are much more numerous than larger ones. This phenomenon can be objects are much more numerous than larger ones. This phenomenon can be
characterized by a heavy-tailed distribution from a mathematical viewpoint. characterized by a heavy-tailed distribution from a mathematical viewpoint.
Moreover, the mean value of sizes of all geographic representations can be used Moreover, the mean value of sizes of all geographic representations can be used
to clearly divide all the representations into those lie above the mean value (a to clearly divide all the representations into those lie above the mean value (a
low percentage) in the head and those lie below (a high percentage) in the tail. low percentage) in the head and those lie below (a high percentage) in the tail.
This rule is termed as head/tail division rule, which can naturally differentiate This rule is termed as head/tail division rule, which can naturally differentiate
the hierarchical structures in geographic phenomena. Based on the obtained the hierarchical structures in geographic phenomena. Based on the obtained
hierarchies, corresponding geographic implications can be revealed and applied hierarchies, corresponding geographic implications can be revealed and applied
to urban studies on a range of topics. This is a bottom-up approach to efficiently to urban studies on a range of topics. This is a bottom-up approach to efficiently
reducing the high degree of complexity and effectively solving the issues in reducing the high degree of complexity and effectively solving the issues in
urban studies. urban studies.
The principle of scaling of geographic space is first examined from the The principle of scaling of geographic space is first examined from the
perspective of city and field blocks at the country level. All of the block sizes perspective of city and field blocks at the country level. All of the block sizes
are characterized by a lognormal distribution, which indicates the scaling are characterized by a lognormal distribution, which indicates the scaling
property of the geographic space. The head/tail division rule is further defined property of the geographic space. The head/tail division rule is further defined
and applied to the all the blocks. The obtained clear hierarchies show the and applied to the all the blocks. The obtained clear hierarchies show the
geographic implications of urban and rural areas. An automatic solution is geographic implications of urban and rural areas. An automatic solution is
developed to delineate them into groups, which we call natural cities. The sizes developed to delineate them into groups, which we call natural cities. The sizes
of natural cities demonstrate power law distribution, which provides further of natural cities demonstrate power law distribution, which provides further
proof of the scaling of geographic space. The perspective of blocks is unique in proof of the scaling of geographic space. The perspective of blocks is unique in
the sense that it can capture the underlying structure and patterns of geographic the sense that it can capture the underlying structure and patterns of geographic
space. This is concluded to be the law of the scaling of geographic space. space. This is concluded to be the law of the scaling of geographic space.
Similarly, the generated old axial lines at city level also verify the scaling of Similarly, the generated old axial lines at city level also verify the scaling of
geographic space. geographic space.
The scaling of geographic space is consistent with the concept of spatial The scaling of geographic space is consistent with the concept of spatial
heterogeneity in conventional spatial analysis. The scaling of geographic space heterogeneity in conventional spatial analysis. The scaling of geographic space
has emerged in the vast background of GIScience. GIScience is a paradigm shift has emerged in the vast background of GIScience. GIScience is a paradigm shift
for conventional GIS and casts new light on some fundamentals of conventional for conventional GIS and casts new light on some fundamentals of conventional
GIS with a series of new theories and methods. Some of the primary new GIS with a series of new theories and methods. Some of the primary new
theories and methods involved in this thesis, i.e., graph, space syntax, fractal, theories and methods involved in this thesis, i.e., graph, space syntax, fractal,
heavy-tailed distribution and the Gestalt principle, are briefly introduced and heavy-tailed distribution and the Gestalt principle, are briefly introduced and
reexamined to show how they can be applied to the study of the scaling of reexamined to show how they can be applied to the study of the scaling of
geographic space and application to urban studies. geographic space and application to urban studies.
79 79
As applications of law of the scaling of geographic space, the law is used for the As applications of law of the scaling of geographic space, the law is used for the
automatic generation of newly defined axial lines, urban sprawl detection, urban automatic generation of newly defined axial lines, urban sprawl detection, urban
mobility pattern discovery, route planning and an across country comparison of mobility pattern discovery, route planning and an across country comparison of
hierarchical spatial structure: (a) the curviness of natural streets demonstrates a hierarchical spatial structure: (a) the curviness of natural streets demonstrates a
heavy-tailed distribution, and parameters in the model for generating axial lines heavy-tailed distribution, and parameters in the model for generating axial lines
are determined by the head/tail division rule. The newly defined and generated are determined by the head/tail division rule. The newly defined and generated
axial lines proved to be a powerful geographic representation in terms of axial lines proved to be a powerful geographic representation in terms of
capturing the underlying urban morphologies for urban studies, (b) two types of capturing the underlying urban morphologies for urban studies, (b) two types of
block measurements, i.e., diameter and ratio of the geometry and dangling lines block measurements, i.e., diameter and ratio of the geometry and dangling lines
in structure, are found to exhibit the scaling property. The hierarchies of blocks in structure, are found to exhibit the scaling property. The hierarchies of blocks
obtained via the head/tail division rule are used to identify the sprawling blocks obtained via the head/tail division rule are used to identify the sprawling blocks
or patches from the bottom-up, (c) the time intervals of all the GPS points of or patches from the bottom-up, (c) the time intervals of all the GPS points of
taxicabs and the extracted spatiotemporal clusters are found to demonstrate the taxicabs and the extracted spatiotemporal clusters are found to demonstrate the
scaling property. Based on the head/tail rule, the obtained hierarchical structures scaling property. Based on the head/tail rule, the obtained hierarchical structures
of spatiotemporal clusters are categorized into hotspots and traffic jams, of spatiotemporal clusters are categorized into hotspots and traffic jams,
quantitatively, (d) the cognitive turn and numerous perceptual small spaces in quantitatively, (d) the cognitive turn and numerous perceptual small spaces in
generated route demonstrate the scaling property and hierarchical structure, and generated route demonstrate the scaling property and hierarchical structure, and
(e) the natural cities at country level are natural categorized into small, middle, (e) the natural cities at country level are natural categorized into small, middle,
big and mega cities via the head/tail division rule according to their scaling big and mega cities via the head/tail division rule according to their scaling
property. Moreover, the scaling property is evaluated in detail. In turn, these property. Moreover, the scaling property is evaluated in detail. In turn, these
successful applications verify the correctness of the principle of scaling of successful applications verify the correctness of the principle of scaling of
geographic space and the head/tail division rule. geographic space and the head/tail division rule.
Meanwhile, this study is a data-intensive geo-computation. The emergence of Meanwhile, this study is a data-intensive geo-computation. The emergence of
VGI provides a free to use and editable geographic data source. As a special VGI provides a free to use and editable geographic data source. As a special
case of UGC, VGI is also driven by web 2.0 technologies. The definition, issues case of UGC, VGI is also driven by web 2.0 technologies. The definition, issues
and contents of VGI are discussed from a technical point of view. OSM is one of and contents of VGI are discussed from a technical point of view. OSM is one of
the most successful VGI examples and is also the underlying data set used in the most successful VGI examples and is also the underlying data set used in
this thesis. The basic content and components of OSM are briefly reviewed. this thesis. The basic content and components of OSM are briefly reviewed.
Particularly, the data structure and usage, data quality analysis and the Particularly, the data structure and usage, data quality analysis and the
implementation of processing OSM data and the extraction of road networks are implementation of processing OSM data and the extraction of road networks are
described in detail. The OSM data prove to be a very successful geographic data described in detail. The OSM data prove to be a very successful geographic data
source. VGI and OSM play an important role in GIScience and in this thesis. source. VGI and OSM play an important role in GIScience and in this thesis.
Through a series of studies, the proposed research objectives of this thesis stated Through a series of studies, the proposed research objectives of this thesis stated
in section 1.2 are achieved. First, different geographic representations are in section 1.2 are achieved. First, different geographic representations are
appropriately selected in different urban environment contexts using volunteered appropriately selected in different urban environment contexts using volunteered
geographic information, such as axial line, block and spatiotemporal clusters of geographic information, such as axial line, block and spatiotemporal clusters of
mobile data of taxicabs. Second, the quantitative approach to obtaining the mobile data of taxicabs. Second, the quantitative approach to obtaining the
hierarchical structure is defined as scaling of geographic space and the head/tail hierarchical structure is defined as scaling of geographic space and the head/tail
division rule. Third, the physical meanings behind the hierarchies are explored division rule. Third, the physical meanings behind the hierarchies are explored
to effectively solve the issues in urban studies. to effectively solve the issues in urban studies.
80 80
6.2. Future research 6.2. Future research
Despite the discovery of the law of scaling of geographic space and its Despite the discovery of the law of scaling of geographic space and its
successful application to various urban problems, several potential research successful application to various urban problems, several potential research
issues related to this thesis are to be considered in the future. issues related to this thesis are to be considered in the future.
First, additional representations of geographic space should be extracted from First, additional representations of geographic space should be extracted from
OSM data as well as other resources. Currently, the road network is one of the OSM data as well as other resources. Currently, the road network is one of the
main data sources from OSM data. Based on the processed road network, main data sources from OSM data. Based on the processed road network,
blocks, natural cities and new axial lines are generated. In addition, there are blocks, natural cities and new axial lines are generated. In addition, there are
also other data types, such as buildings, points of interest (POI) and land uses. also other data types, such as buildings, points of interest (POI) and land uses.
New geographic representations should be extracted to uncover the geographic New geographic representations should be extracted to uncover the geographic
implication for additional studies of topics of importance to the urban implication for additional studies of topics of importance to the urban
environment, e.g., human mobility and urban system dynamics. environment, e.g., human mobility and urban system dynamics.
Second, additional theories and methods must be explored to examine the Second, additional theories and methods must be explored to examine the
scaling property of geographic space. In essence, the scaling of geographic scaling property of geographic space. In essence, the scaling of geographic
space concerns the differentiation of hierarchies from all quantified geographic space concerns the differentiation of hierarchies from all quantified geographic
representations and reveals the geographic implications (e.g., urban and rural representations and reveals the geographic implications (e.g., urban and rural
areas) behind the structures. Thus, additional applications to urban studies areas) behind the structures. Thus, additional applications to urban studies
should be explored. In this thesis, the heavy-tailed distribution is one of the should be explored. In this thesis, the heavy-tailed distribution is one of the
primary means to this phenomenon. That is, the scaling of geographic space primary means to this phenomenon. That is, the scaling of geographic space
relies heavily on physical statistics. The head/tail division rule is used to divide relies heavily on physical statistics. The head/tail division rule is used to divide
the hierarchies. Although this method is simple and performs well, the the hierarchies. Although this method is simple and performs well, the
relationship between geographic hierarchies and heavy-tailed distributions does relationship between geographic hierarchies and heavy-tailed distributions does
not need to be coexisting. That is, as long as we can identify the hierarchies of a not need to be coexisting. That is, as long as we can identify the hierarchies of a
geographic phenomenon and reveal the geographic implications, any type of geographic phenomenon and reveal the geographic implications, any type of
distribution and rule will work and be worthy of additional study. distribution and rule will work and be worthy of additional study.
Third, the definition of the heavy-tailed distribution is restricted to only five Third, the definition of the heavy-tailed distribution is restricted to only five
types of distributions, i.e., the power law, lognormal, exponential, power law types of distributions, i.e., the power law, lognormal, exponential, power law
with an exponential cutoff and stretched exponential. Additional statistical with an exponential cutoff and stretched exponential. Additional statistical
distributions that can characterize the phenomenon of the scaling of geographic distributions that can characterize the phenomenon of the scaling of geographic
space should be explored. Furthermore, the KS test for identifying a heavy- space should be explored. Furthermore, the KS test for identifying a heavy-
tailed distribution has several limitations. Moreover, he head/tail division rule is tailed distribution has several limitations. Moreover, he head/tail division rule is
performed simply by calculating the mean of all values. Although it is simple performed simply by calculating the mean of all values. Although it is simple
enough, it could be deducted another way, e.g., using the cumulative density enough, it could be deducted another way, e.g., using the cumulative density
function. function.
Fourth, this study is a data-intensive geo-computation to some extent, and there Fourth, this study is a data-intensive geo-computation to some extent, and there
is still much room to improve in terms of computing ability. For example, the is still much room to improve in terms of computing ability. For example, the
block extraction is extremely time- and memory- consuming. To handle the block extraction is extremely time- and memory- consuming. To handle the
81 81
global data, additional memory is needed on our current server. The computing global data, additional memory is needed on our current server. The computing
time may be too long, e.g., greater than three weeks. Furthermore, to follow the time may be too long, e.g., greater than three weeks. Furthermore, to follow the
spirit of VGI, the corresponding program must be shared online such that more spirit of VGI, the corresponding program must be shared online such that more
people can become involved and contribute to the studies. people can become involved and contribute to the studies.
Finally, this study on the scaling of geographic space is neither definitive nor Finally, this study on the scaling of geographic space is neither definitive nor
exhaustive, but rather represents a beginning. Additional studies are worth exhaustive, but rather represents a beginning. Additional studies are worth
conducting to further complete and maturate this research. conducting to further complete and maturate this research.
82 82
References References
Adamic L. (2002), Zipf, Power-laws, and Pareto – A ranking tutorial, Adamic L. (2002), Zipf, Power-laws, and Pareto – A ranking tutorial,
http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html, accessed http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html, accessed
on December 2011. on December 2011.
Adamic L. (2011), Unzipping Zipf’s law, Nature, 474, 164-165. Adamic L. (2011), Unzipping Zipf’s law, Nature, 474, 164-165.
Alperovich G.A. (1984), The size distribution of cities: on the empirical validity Alperovich G.A. (1984), The size distribution of cities: on the empirical validity
of the rank-size rule, Journal of Urban Economics, 16(2), 232-239. of the rank-size rule, Journal of Urban Economics, 16(2), 232-239.
Anderson C. (2006), The Long Tail: Why the Future of Business Is Selling Less Anderson C. (2006), The Long Tail: Why the Future of Business Is Selling Less
of More, Hyperion: New York. of More, Hyperion: New York.
Barabási A.L. (1999), Emergence of scaling in random networks, Science, 286, Barabási A.L. (1999), Emergence of scaling in random networks, Science, 286,
509-512. 509-512.
Batty M. and Longley P. (1994), Fractal Cities: A Geometry of Form and Batty M. and Longley P. (1994), Fractal Cities: A Geometry of Form and
Function, Academic Press, San Diego, CA and London. Function, Academic Press, San Diego, CA and London.
Batty M. (2008), The Size, Scale, and Shape of Cities, Science, 319 (5864), 769- Batty M. (2008), The Size, Scale, and Shape of Cities, Science, 319 (5864), 769-
771. 771.
Beauregard R. (2007), Journal of Planning History, 6(3), 248-271. Beauregard R. (2007), Journal of Planning History, 6(3), 248-271.
Berners-Lee Tim (2005), http://news.bbc.co.uk/2/hi/technology/4132752.stm, Berners-Lee Tim (2005), http://news.bbc.co.uk/2/hi/technology/4132752.stm,
accessed on December 2011. accessed on December 2011.
Bettencourt L.M.A., Lobo J., Helbing D., Kunhert C. and West G.B. (2007), Bettencourt L.M.A., Lobo J., Helbing D., Kunhert C. and West G.B. (2007),
Growth, innovation, scaling, and the pace of life in cities, Proceedings of Growth, innovation, scaling, and the pace of life in cities, Proceedings of
the National Academy of Sciences (USA), 104, 7301-7306. the National Academy of Sciences (USA), 104, 7301-7306.
Bettencourt L.M.A. , Lobo J., Strumsky D. and West G.B. (2010), Urban scaling Bettencourt L.M.A. , Lobo J., Strumsky D. and West G.B. (2010), Urban scaling
and its deviations: Revealing the structure of wealth, innovation and crime and its deviations: Revealing the structure of wealth, innovation and crime
across cities, PLoS One, 5:e13541. across cities, PLoS One, 5:e13541.
Bettencourt L.M.A. and West G.B. (2010), A unified theory of urban living. Bettencourt L.M.A. and West G.B. (2010), A unified theory of urban living.
Nature, 467, 912-913. Nature, 467, 912-913.
Bon R. (1979), Allometry in topologic structure of transportation networks, Bon R. (1979), Allometry in topologic structure of transportation networks,
Quality and Quantity, 13(4), 307-326. Quality and Quantity, 13(4), 307-326.
Altonen B. (2011), Hexagonal Grid Analysis, http://brianaltonenmph.com/6-gis- Altonen B. (2011), Hexagonal Grid Analysis, http://brianaltonenmph.com/6-gis-
ecology-and-natural-history/hexagonal-grid-analysis, accessed on February ecology-and-natural-history/hexagonal-grid-analysis, accessed on February
2012. 2012.
Car A. (1997), Hierarchical spatial reasoning: theoretical consideration and its Car A. (1997), Hierarchical spatial reasoning: theoretical consideration and its
application to modeling wayfinding, published Ph.D. thesis, Technical application to modeling wayfinding, published Ph.D. thesis, Technical
University Vienna, Austria. University Vienna, Austria.
Chakravarti, Laha, and Roy (1967), Handbook of Methods of Applied Statistics, Chakravarti, Laha, and Roy (1967), Handbook of Methods of Applied Statistics,
John Wiley and Sons, 1, 392-394. John Wiley and Sons, 1, 392-394.
Chen Y. and Zhou Y. (2003), The rank-size rule and fractal hierarchies of cities: Chen Y. and Zhou Y. (2003), The rank-size rule and fractal hierarchies of cities:
mathematical models and empirical analyses, Environment and Planning mathematical models and empirical analyses, Environment and Planning
B: Planning and Design, 30(6), 799-818. B: Planning and Design, 30(6), 799-818.
Chen Y. and Zhou Y. (2008), Scaling laws and indications of self-organized Chen Y. and Zhou Y. (2008), Scaling laws and indications of self-organized
criticality in urban systems, Chaos Solitons Fractals, 35(1), 85-98. criticality in urban systems, Chaos Solitons Fractals, 35(1), 85-98.
83 83
Cherldu E. (2007), OSM and the art of bicycle maintenance, the State of the Cherldu E. (2007), OSM and the art of bicycle maintenance, the State of the
Map 2007 conference, Manchester UK, on 14th July 2007. Map 2007 conference, Manchester UK, on 14th July 2007.
Christaller W. (1933), Central places in southern germany (translated by Baskin Christaller W. (1933), Central places in southern germany (translated by Baskin
C. (1966)), Prentice-Hall: Englewood Cliffs, NJ. C. (1966)), Prentice-Hall: Englewood Cliffs, NJ.
Church M. and Mark D.M. (1980). On size and scale in geomorphology. Church M. and Mark D.M. (1980). On size and scale in geomorphology.
Progress in Physical Geography, 4:342–390. Progress in Physical Geography, 4:342–390.
Clauset A., Shalizi C. R., and Newman M. E. J. (2009), Power-law distributions Clauset A., Shalizi C. R., and Newman M. E. J. (2009), Power-law distributions
in empirical data, SIAM Review, 51, 661-703. in empirical data, SIAM Review, 51, 661-703.
Coast S. (2006), Yahoo! aerial imagery in OSM, OpenGeoData. Coast S. (2006), Yahoo! aerial imagery in OSM, OpenGeoData.
http://old.opengeodata.org/2006/12/04/yahoo-aerial-imagery-in-osm, http://old.opengeodata.org/2006/12/04/yahoo-aerial-imagery-in-osm,
Coast S. (2007), AND donate entire Netherlands to OpenStreetMap, Coast S. (2007), AND donate entire Netherlands to OpenStreetMap,
OpenGeoData, http://old.opengeodata.org/2007/07/04/and-donate-entire- OpenGeoData, http://old.opengeodata.org/2007/07/04/and-donate-entire-
netherlands-to-openstreetmap/index.html, accessed on December 2011. netherlands-to-openstreetmap/index.html, accessed on December 2011.
Coleman D.J., Georgiadou Y. and Lobonte J. (2009), Volunteered geographic Coleman D.J., Georgiadou Y. and Lobonte J. (2009), Volunteered geographic
information: the nature and motivation of produsers, International Journal information: the nature and motivation of produsers, International Journal
of Spatial Data Infrastructures Research, 4, 332-358. of Spatial Data Infrastructures Research, 4, 332-358.
Cormen T.H., Leiserson C.E., Rivest R.L. and Stein C. (2001), Introduction to Cormen T.H., Leiserson C.E., Rivest R.L. and Stein C. (2001), Introduction to
Algorithms, MIT Press, second edition, 2001. Algorithms, MIT Press, second edition, 2001.
Craglia M. (2007), Volunteered geographic information and spatial data Craglia M. (2007), Volunteered geographic information and spatial data
infrastructures: When do parallel lines converge?, A Position paper for the infrastructures: When do parallel lines converge?, A Position paper for the
workshop on Volunteered Geographic Information, Santa Barbara, CA, 13- workshop on Volunteered Geographic Information, Santa Barbara, CA, 13-
14, December 2007. 14, December 2007.
Córdoba J.C. (2008), On the distribution of city sizes, Journal of Urban Córdoba J.C. (2008), On the distribution of city sizes, Journal of Urban
Economics, 63, 177-197. Economics, 63, 177-197.
Decker E.H., Kerkhoff A.J., Moses M.E. (2007), Global patterns of city size Decker E.H., Kerkhoff A.J., Moses M.E. (2007), Global patterns of city size
distributions and their fundamental drivers, PLoS ONE, 2(9): e934. distributions and their fundamental drivers, PLoS ONE, 2(9): e934.
Dennis C., Marsland D. and Cockett T. (2002), Central place practice: shopping Dennis C., Marsland D. and Cockett T. (2002), Central place practice: shopping
centre attractiveness measures, hinterland boundaries and the UK retail centre attractiveness measures, hinterland boundaries and the UK retail
hierarchy, Journal of Retailing and Consumer Services, 9, 185-199. hierarchy, Journal of Retailing and Consumer Services, 9, 185-199.
Dijkstra E.W. (1959), A note on two problems in connexion with graphs, Dijkstra E.W. (1959), A note on two problems in connexion with graphs,
Numerische Mathematik, 1, 269–271. Numerische Mathematik, 1, 269–271.
Douglas D.H. and Peucker T.K. (1973), Algorithms for the reduction of the Douglas D.H. and Peucker T.K. (1973), Algorithms for the reduction of the
number of points required to represent a line or its caricature, The number of points required to represent a line or its caricature, The
Canadian Cartographer, 10(2), 112-122. Canadian Cartographer, 10(2), 112-122.
Eeckhout J. (2004), Gibrat’s law for (all) cities, American Economic Review, 94, Eeckhout J. (2004), Gibrat’s law for (all) cities, American Economic Review, 94,
1429-1451. 1429-1451.
Egenhofer M.J. (1993), What's Special about Spatial? Database Requirements Egenhofer M.J. (1993), What's Special about Spatial? Database Requirements
for Vehicle Navigation in Geographic Space, In SIGMOD '93: Proceedings for Vehicle Navigation in Geographic Space, In SIGMOD '93: Proceedings
of the 1993 ACM SIGMOD international conference on Management of of the 1993 ACM SIGMOD international conference on Management of
data, 22(2), 398-402. data, 22(2), 398-402.
84 84
Elwood S. (2008), Volunteered Geographic Information: Future Research Elwood S. (2008), Volunteered Geographic Information: Future Research
Directions Motivated by Critical, Participatory, and Feminist GIS, Directions Motivated by Critical, Participatory, and Feminist GIS,
GeoJournal, 72 (3&4), 173–183. GeoJournal, 72 (3&4), 173–183.
ESRI (1990), Understanding GIS: The ARC/INFO Method, ESRI Press: ESRI (1990), Understanding GIS: The ARC/INFO Method, ESRI Press:
Redlands, CA, 2nd edition. Redlands, CA, 2nd edition.
ESRI (2012), ArcObjects Library Reference (Geometry), ESRI (2012), ArcObjects Library Reference (Geometry),
http://resources.esri.com/help/9.3/ArcGISServer/apis/ArcObjects/esriGeom http://resources.esri.com/help/9.3/ArcGISServer/apis/ArcObjects/esriGeom
etry/ITopologicalOperator_Simplify.htm, accessed on February 2012. etry/ITopologicalOperator_Simplify.htm, accessed on February 2012.
Flanagin A., and Metzger M. (2008), The credibility of volunteered geographic Flanagin A., and Metzger M. (2008), The credibility of volunteered geographic
information, GeoJournal, 72(3&4), 137-148. information, GeoJournal, 72(3&4), 137-148.
Gabaix X. (1999), Zipf’s law for cities: an explanation, Quarterly Journal of Gabaix X. (1999), Zipf’s law for cities: an explanation, Quarterly Journal of
Economics, 114, 739-767. Economics, 114, 739-767.
Garner B. (1968), Models of Urban Geography and Settlement Location, in Garner B. (1968), Models of Urban Geography and Settlement Location, in
Socio-economic Models of Geography, 303-360. Socio-economic Models of Geography, 303-360.
Giesen K. and Sudekum J. (2011), Zipf’s law for cities in the regions and the Giesen K. and Sudekum J. (2011), Zipf’s law for cities in the regions and the
country, Journal of Economic Geography, 11, 667-686. country, Journal of Economic Geography, 11, 667-686.
Goodchild M.F. (1992), Geographical information science. International Goodchild M.F. (1992), Geographical information science. International
Journal of Geographical Information Science, 6(1), 31-45. Journal of Geographical Information Science, 6(1), 31-45.
Goodchild M.F. (2004), GIScience, Geography, Form, and Process, Annals of Goodchild M.F. (2004), GIScience, Geography, Form, and Process, Annals of
the Association of American Geographers, 94(4), 709–714. the Association of American Geographers, 94(4), 709–714.
Goodchild M.F. (2007a), Citizens as sensors: the world of volunteered Goodchild M.F. (2007a), Citizens as sensors: the world of volunteered
geography, GeoJournal 69 (4), 211–221. geography, GeoJournal 69 (4), 211–221.
Goodchild M.F. (2007b), Citizens as voluntary sensors: Spatial data Goodchild M.F. (2007b), Citizens as voluntary sensors: Spatial data
infrastructures in the world of Web 2.0, International Journal of Spatial infrastructures in the world of Web 2.0, International Journal of Spatial
Data Infrastructure Research, 2, 24–32. Data Infrastructure Research, 2, 24–32.
Graham S. and Marvin S. (1996), Telecommunications and the City: Electronic Graham S. and Marvin S. (1996), Telecommunications and the City: Electronic
Spaces, Urban Places, London: Routledge. Spaces, Urban Places, London: Routledge.
Grossner K. and Glennon A. (2007), Volunteered geographic information: Level Grossner K. and Glennon A. (2007), Volunteered geographic information: Level
III of a digital earth system, A position paper for the workshop on III of a digital earth system, A position paper for the workshop on
Volunteered Geographic Information, Santa Barbara, CA, 13-14, Volunteered Geographic Information, Santa Barbara, CA, 13-14,
December 2007. December 2007.
Gupta R. (2007), Mapping the global energy system using wikis, open sources, Gupta R. (2007), Mapping the global energy system using wikis, open sources,
WWW, and Google Earth, A position paper for the workshop on WWW, and Google Earth, A position paper for the workshop on
Volunteered Geographic Information, Santa Barbara, CA, 13-14, Volunteered Geographic Information, Santa Barbara, CA, 13-14,
December 2007. December 2007.
Hägerstrand T. (1970), What about people in regional science?, Papers of the Hägerstrand T. (1970), What about people in regional science?, Papers of the
Regional Science Association, 24, 1-12. Regional Science Association, 24, 1-12.
Haggett P. (1965), Locational analysis in human geography, London: Edward Haggett P. (1965), Locational analysis in human geography, London: Edward
Arnold. Arnold.
Haklay M. and Weber P. (2008), OpenStreetMap: User-Generated Street Maps, Haklay M. and Weber P. (2008), OpenStreetMap: User-Generated Street Maps,
IEEE Pervasive Computing, October–December, 12-18. IEEE Pervasive Computing, October–December, 12-18.
85 85
Haklay M. (2010), How good is volunteered geographical information? A Haklay M. (2010), How good is volunteered geographical information? A
comparative study of OpenStreetMap and Ordnance Survey datasets, comparative study of OpenStreetMap and Ordnance Survey datasets,
Environment and Planning B: Planning and Design, 37(4), 682–703. Environment and Planning B: Planning and Design, 37(4), 682–703.
Harary F (1994), Graph Theory, Reading, MA: Addison-Wesley, p. 10. Harary F (1994), Graph Theory, Reading, MA: Addison-Wesley, p. 10.
Harris C.D. and Ullman E.L. (1945), The nature of cities, Annals of the Harris C.D. and Ullman E.L. (1945), The nature of cities, Annals of the
American Academy of Political and Social Science, 242, 7-17. American Academy of Political and Social Science, 242, 7-17.
Hart P., Nilsson N., and Raphael B. (1968), A Formal Basis for the Heuristic Hart P., Nilsson N., and Raphael B. (1968), A Formal Basis for the Heuristic
Determination of Minimum Cost Paths, IEEE Trans. Syst. Science and Determination of Minimum Cost Paths, IEEE Trans. Syst. Science and
Cybernetics, SSC-4(2),100-107. Cybernetics, SSC-4(2),100-107.
Hartshorn T.A. and Muller P.O. (1989), Suburban Downtowns and the Hartshorn T.A. and Muller P.O. (1989), Suburban Downtowns and the
Transformation of Metropolitan Atlanta's Business Landscape, Urban Transformation of Metropolitan Atlanta's Business Landscape, Urban
Geography, 10 (4), 375-395. Geography, 10 (4), 375-395.
Hillier B. (1997), Space Syntax: First International Symposium, proceedings, Hillier B. (1997), Space Syntax: First International Symposium, proceedings,
16-18 April 1997, University College London. 16-18 April 1997, University College London.
Hillier B. (1996), Space is the Machine: A Configurational Theory of Hillier B. (1996), Space is the Machine: A Configurational Theory of
Architecture, Cambridge University Press. Architecture, Cambridge University Press.
Hillier B. and Hanson J. (1984), The Social Logic of Space, Cambridge Hillier B. and Hanson J. (1984), The Social Logic of Space, Cambridge
University Press. University Press.
Hirtle S.C. and Jonides J. (1985), Evidence of Hierarchies in Cognitive Maps, Hirtle S.C. and Jonides J. (1985), Evidence of Hierarchies in Cognitive Maps,
Memory & Cognition, 13(3), 208-217. Memory & Cognition, 13(3), 208-217.
Hoberman S. (2009), Data Modeling Made Simple, 2nd Edition, Technics Hoberman S. (2009), Data Modeling Made Simple, 2nd Edition, Technics
Publications, LLC. Publications, LLC.
Hoyt H. (1939), The structure and growth of residential neighbourhoods in Hoyt H. (1939), The structure and growth of residential neighbourhoods in
american cities, Washington DC, Federal Housing Administration. american cities, Washington DC, Federal Housing Administration.
Hu Y., Wang Y. and Di Z. (2009), The scaling laws of spatial structure in social Hu Y., Wang Y. and Di Z. (2009), The scaling laws of spatial structure in social
networks, Submitted, arxiv.org/abs/0802.0047. networks, Submitted, arxiv.org/abs/0802.0047.
Ingolf Vogeler (2012), Central Place Theory, Ingolf Vogeler (2012), Central Place Theory,
http://www.uwec.edu/geography/ivogeler/w111/urban.htm, accessed on http://www.uwec.edu/geography/ivogeler/w111/urban.htm, accessed on
February 2012. February 2012.
Ioannides, Y.M. and Overman, H.G. (2003), Zipf’s law for cities: an empirical Ioannides, Y.M. and Overman, H.G. (2003), Zipf’s law for cities: an empirical
examination, Regional Science and Urban Economics, 33(2), 127-137. examination, Regional Science and Urban Economics, 33(2), 127-137.
Jia T. (2012), Building and analyzing US airport network based on en-route Jia T. (2012), Building and analyzing US airport network based on en-route
location information, submitted to Physica A. location information, submitted to Physica A.
Jiang B. et al. (2000), An integration of space syntax into GIS for modeling Jiang B. et al. (2000), An integration of space syntax into GIS for modeling
urban spaces, International Journal of Applied Earth Observation and urban spaces, International Journal of Applied Earth Observation and
Geoinformation, 2, 161-171. Geoinformation, 2, 161-171.
Jiang B. and Claramunt C. (2004), A structural approach to model generalisation Jiang B. and Claramunt C. (2004), A structural approach to model generalisation
of an urban street network, Geoinformatica: an International Journal on of an urban street network, Geoinformatica: an International Journal on
Advances of Computer Science for Geographic Information Systems, 8(2), Advances of Computer Science for Geographic Information Systems, 8(2),
157-171. 157-171.
Jiang B. (2006), Hägerstrand project of GIS-based Mobility Information for Jiang B. (2006), Hägerstrand project of GIS-based Mobility Information for
Sustainable Urban Planning, funded by FORMAS. Sustainable Urban Planning, funded by FORMAS.
86 86
Jiang B. (2009), Lecture on spatial analysis and modeling, University of Gävle, Jiang B. (2009), Lecture on spatial analysis and modeling, University of Gävle,
Sweden. Sweden.
Jiang B., Liu X. and Jia T. (2011), Scaling of geographic space as a universal Jiang B., Liu X. and Jia T. (2011), Scaling of geographic space as a universal
rule for mapping or cartographic generalization, Submitted, Feb. 2011. rule for mapping or cartographic generalization, Submitted, Feb. 2011.
Jiang B., Zhao S., and Yin J. (2008), Self-organized natural roads for predicting Jiang B., Zhao S., and Yin J. (2008), Self-organized natural roads for predicting
traffic flow: a sensitivity study, Journal of Statistical Mechanics: Theory traffic flow: a sensitivity study, Journal of Statistical Mechanics: Theory
and Experiment, July, P07008, Preprint, arxiv.org/abs/0804.1630. and Experiment, July, P07008, Preprint, arxiv.org/abs/0804.1630.
Jiang B. and Liu X. (2011), Computing the fewest-turn map directions based on Jiang B. and Liu X. (2011), Computing the fewest-turn map directions based on
the connectivity of natural roads, International Journal of Geographical the connectivity of natural roads, International Journal of Geographical
Information Science, x, xx-xx, Preprint, arxiv.org/abs/1003.3536. Information Science, x, xx-xx, Preprint, arxiv.org/abs/1003.3536.
Jiang B. and Jia T. (2011a), Zipf's law for all the natural cities in the United Jiang B. and Jia T. (2011a), Zipf's law for all the natural cities in the United
States: a geospatial perspective, International Journal of Geographical States: a geospatial perspective, International Journal of Geographical
Information Science, x, xx-xx, Preprint, arxiv.org/abs/1006.0814. Information Science, x, xx-xx, Preprint, arxiv.org/abs/1006.0814.
Jiang B. and Jia T. (2011b), Exploring Human Mobility Patterns Based on Jiang B. and Jia T. (2011b), Exploring Human Mobility Patterns Based on
Location Information of US Flights, Preprint, arxiv.org/abs/1104.4578. Location Information of US Flights, Preprint, arxiv.org/abs/1104.4578.
Kalapala V., Sanwalani V., Clauset A. and Moore C. (2006), Scale invariance in Kalapala V., Sanwalani V., Clauset A. and Moore C. (2006), Scale invariance in
road networks, Physical Review E, 73(2). road networks, Physical Review E, 73(2).
Kraak M.J. (2006), Why Maps Matter in GIScience, The Cartographic Journal Kraak M.J. (2006), Why Maps Matter in GIScience, The Cartographic Journal
43(1), 82–89. 43(1), 82–89.
Kuhn W. (2007), Volunteered geographic information and GIScience, A Kuhn W. (2007), Volunteered geographic information and GIScience, A
position paper for the workshop on Volunteered Geographic Information, position paper for the workshop on Volunteered Geographic Information,
Santa Barbara, CA, 13-14, December 2007. Santa Barbara, CA, 13-14, December 2007.
Kuipers B.J. (1978), Modelling spatial knowledge, Cognitive Science, 2, 129- Kuipers B.J. (1978), Modelling spatial knowledge, Cognitive Science, 2, 129-
153. 153.
Lämmer S., Gehlsen B. and Helbing D. (2006), Scaling laws in the spatial Lämmer S., Gehlsen B. and Helbing D. (2006), Scaling laws in the spatial
structure of urban road networks, Physica A, 363(1), 89 - 95. structure of urban road networks, Physica A, 363(1), 89 - 95.
Lauwerier H. (1991), Fractals: Endlessly Repeated Geometrical Figures, Lauwerier H. (1991), Fractals: Endlessly Repeated Geometrical Figures,
Princeton University Press: Princeton, NJ. Princeton University Press: Princeton, NJ.
Longley P.A., Goodchild M.F., Maguire D.J. and Rhind D.W. (2001), Longley P.A., Goodchild M.F., Maguire D.J. and Rhind D.W. (2001),
Geographic Information Systems and Science, John Wiley & Sons: Geographic Information Systems and Science, John Wiley & Sons:
Chichester. Chichester.
Mandelbrot B.B. (1982), The Fractal Geometry of Nature, W.H. Freeman and Mandelbrot B.B. (1982), The Fractal Geometry of Nature, W.H. Freeman and
Company. Company.
Makse H.A., Havlin S. and Stanley H.E. (1995), Modelling urban growth Makse H.A., Havlin S. and Stanley H.E. (1995), Modelling urban growth
patterns, Nature, 377, 608-612. patterns, Nature, 377, 608-612.
Maritan A., Rinaldo A., Rigon R., Giacometti A. and Rodriguez I.I. (1996), Maritan A., Rinaldo A., Rigon R., Giacometti A. and Rodriguez I.I. (1996),
Scaling law for river networks, Physical Review E, 53(2), 1510-1515. Scaling law for river networks, Physical Review E, 53(2), 1510-1515.
Mark D.M. and Frank A.U. (1996), Experiential and formal models of Mark D.M. and Frank A.U. (1996), Experiential and formal models of
geographic space, Environment and Planning B: Planning and Design, geographic space, Environment and Planning B: Planning and Design,
23(1), 3-24. 23(1), 3-24.
Mark D.M. (2000), Geographic Information Science: Critical Issues in an Mark D.M. (2000), Geographic Information Science: Critical Issues in an
Emerging Cross-Disciplinary Research Domain, URISA Journal, 12(1), Emerging Cross-Disciplinary Research Domain, URISA Journal, 12(1),
45-54. 45-54.
87 87
Mark D.M. (2003) Geographic information science: Defining the field, In Mark D.M. (2003) Geographic information science: Defining the field, In
Foundations of Geographic Information Science, ed. Duckham M., Foundations of Geographic Information Science, ed. Duckham M.,
Goodchild M. F. and Worboys M. F., 3–18, Taylor and Francis: New York. Goodchild M. F. and Worboys M. F., 3–18, Taylor and Francis: New York.
Mark D. and Egenhofer M. (1998), Geospatial lifelines, In Günther O., Sellis T. Mark D. and Egenhofer M. (1998), Geospatial lifelines, In Günther O., Sellis T.
and Theodoulidis B. (Eds), Integrating Spatial and Temporal Databases, and Theodoulidis B. (Eds), Integrating Spatial and Temporal Databases,
Dagstuhl Seminar Report No. 228. Dagstuhl Seminar Report No. 228.
Miller H.J. (2000), Geographic representation in spatial analysis, Journal of Miller H.J. (2000), Geographic representation in spatial analysis, Journal of
Geographical Systems, 2(1), 55-60. Geographical Systems, 2(1), 55-60.
Miller H.J. (2004), Tobler’s First Law and Spatial Analysis, Annals of the Miller H.J. (2004), Tobler’s First Law and Spatial Analysis, Annals of the
Association of American Geographers, 94(2), 284–289. Association of American Geographers, 94(2), 284–289.
Montello D. (1993), Scale and multiple psychologies of space. In Frank A. U. Montello D. (1993), Scale and multiple psychologies of space. In Frank A. U.
and Campari I. (Eds), Spatial Information Theory, A Theoretical Basis for and Campari I. (Eds), Spatial Information Theory, A Theoretical Basis for
GIS, volume 716 of Lecture Notes in Computer Science, Springer-Verlag: GIS, volume 716 of Lecture Notes in Computer Science, Springer-Verlag:
Berlin, 312-321. Berlin, 312-321.
Montello D.R. and Golledge R.G. (1999), Scale and detail in the cognition of Montello D.R. and Golledge R.G. (1999), Scale and detail in the cognition of
geographic information, Technical Report Specialist Meeting, Project geographic information, Technical Report Specialist Meeting, Project
Varenius, University of California at Santa Barbara, Santa Barbara, CA. Varenius, University of California at Santa Barbara, Santa Barbara, CA.
Montello D.R. (2001), Scale in Geography, International Encyclopedia of the Montello D.R. (2001), Scale in Geography, International Encyclopedia of the
Social & Behavioral Sciences, 13501-13504. Social & Behavioral Sciences, 13501-13504.
Mauer A. (2008), http://lamp2.fhstp.ac.at/~lbz/beispiele/ws2008/capitals, Mauer A. (2008), http://lamp2.fhstp.ac.at/~lbz/beispiele/ws2008/capitals,
Newman M.E.J. (2005), Power laws, Pareto distributions and Zipf's law, Newman M.E.J. (2005), Power laws, Pareto distributions and Zipf's law,
Contemporary Physics, 46, 323. Contemporary Physics, 46, 323.
O’Rourke J. (1987), Art gallery theorems and algorithms, New York: Oxford O’Rourke J. (1987), Art gallery theorems and algorithms, New York: Oxford
University Press. University Press.
OpenStreetMap Statistics (2012), http://wiki.openstreetmap.org/wiki/Stats, OpenStreetMap Statistics (2012), http://wiki.openstreetmap.org/wiki/Stats,
acessed on February 2012. acessed on February 2012.
Park R.E., Burgess E.W and Roderick D.M. (1925), The city, Chicago: Park R.E., Burgess E.W and Roderick D.M. (1925), The city, Chicago:
University of Chicago Press. University of Chicago Press.
Park R.E. (1952), Human communities: the city and human ecology, The Free Park R.E. (1952), Human communities: the city and human ecology, The Free
Press, New York. Press, New York.
Pastor S.R. and Vespignani A. (2001), Epidemic Spreading in Scale-Free Pastor S.R. and Vespignani A. (2001), Epidemic Spreading in Scale-Free
Networks, Physical Review Letters, 86(14), 3200–3203. Networks, Physical Review Letters, 86(14), 3200–3203.
Peng G. (2010), Zipf's law for Chinese cities: Rolling sample regressions, Peng G. (2010), Zipf's law for Chinese cities: Rolling sample regressions,
Physica A, 389(18), 3804-3813. Physica A, 389(18), 3804-3813.
Peponis J. et al. (1997), On the description of shape and spatial configuration Peponis J. et al. (1997), On the description of shape and spatial configuration
inside buildings: convex partitions and their local properties, Environment inside buildings: convex partitions and their local properties, Environment
and Planning B: Planning and Design, 24,761–781. and Planning B: Planning and Design, 24,761–781.
Peponis J. et al. (1998), On the generation of linear representations of spatial Peponis J. et al. (1998), On the generation of linear representations of spatial
configuration, Environment and Planning B: Planning and Design, 25, configuration, Environment and Planning B: Planning and Design, 25,
559–576. 559–576.
88 88
Ramsey P. (2009), OpenStreetMap moves to PostgreSQL, Ramsey P. (2009), OpenStreetMap moves to PostgreSQL,
http://blog.cleverelephant.ca/2009/04/openstreetmap-moves-to- http://blog.cleverelephant.ca/2009/04/openstreetmap-moves-to-
postgresql.html, accessed on December 2011. postgresql.html, accessed on December 2011.
Rashed T. and Jürgens C. (2010), Remote Sensing of Urban and Suburban Rashed T. and Jürgens C. (2010), Remote Sensing of Urban and Suburban
Areas, Springer, 1st Edition. Areas, Springer, 1st Edition.
Reitz T. (2010), A mismatch description language for conceptual schema Reitz T. (2010), A mismatch description language for conceptual schema
mapping and its cartographic representation, In S.I. Fabrikant et al. (Eds.), mapping and its cartographic representation, In S.I. Fabrikant et al. (Eds.),
GIScience 2010, LNCS 6292, 204-218. GIScience 2010, LNCS 6292, 204-218.
Rui C. and Alan P. (2004), Scaling and universality in the micro-structure of Rui C. and Alan P. (2004), Scaling and universality in the micro-structure of
urban space, Physica A, 32, 539-547. urban space, Physica A, 32, 539-547.
Samaniego H. and Moses M.E. (2008), Cities as organisms: allometric scaling Samaniego H. and Moses M.E. (2008), Cities as organisms: allometric scaling
of urban road networks, Journal of Transport and Land Use, 1(1), 21-39. of urban road networks, Journal of Transport and Land Use, 1(1), 21-39.
Santa fe community college (SFCC 2011), The Gestalt principles, Santa fe community college (SFCC 2011), The Gestalt principles,
http://graphicdesign.spokanefalls.edu/tutorials/process/gestaltprinciples/ges http://graphicdesign.spokanefalls.edu/tutorials/process/gestaltprinciples/ges
taltprinc.htm, accessed on December 2011. taltprinc.htm, accessed on December 2011.
Saw J.T. (2000), Gestalt, http://daphne.palomar.edu/design/gestalt.html, Saw J.T. (2000), Gestalt, http://daphne.palomar.edu/design/gestalt.html,
Sedgewick R. and Wayne K. (2010), introcs.cs.princeton.edu/java/11gaussian, Sedgewick R. and Wayne K. (2010), introcs.cs.princeton.edu/java/11gaussian,
accessed on Februry 2012. accessed on Februry 2012.
Shalizi C.R. (2011), Scaling and hierarchy in urban economies, Proceedings of Shalizi C.R. (2011), Scaling and hierarchy in urban economies, Proceedings of
the National Academy of Sciences (USA), Submitted, arxiv:1102.4101. the National Academy of Sciences (USA), Submitted, arxiv:1102.4101.
Sheridan P., Kamimura T. and Shimodaira H. (2010), A Scale-Free Structure Sheridan P., Kamimura T. and Shimodaira H. (2010), A Scale-Free Structure
Prior for Graphical Models with Applications in Functional Genomics, Prior for Graphical Models with Applications in Functional Genomics,
PLoS ONE, 5(11): e13580. PLoS ONE, 5(11): e13580.
Simmel G. (1903), The metropolis of modern life, In Simmel: On individuality Simmel G. (1903), The metropolis of modern life, In Simmel: On individuality
and social forms, Levine D.(Eds), Chicago University Press, 1971. p324. and social forms, Levine D.(Eds), Chicago University Press, 1971. p324.
Song C., Havlin S. and Makse H.A. (2006), Origins of fractality in the growth of Song C., Havlin S. and Makse H.A. (2006), Origins of fractality in the growth of
complex networks, Nature Physics, 2, 275- 81. complex networks, Nature Physics, 2, 275- 81.
Song C., Havlin S. and Makse H.A. (2005), Self-similarity of complex Song C., Havlin S. and Makse H.A. (2005), Self-similarity of complex
networks, Nature, 433, 392-395. networks, Nature, 433, 392-395.
Song S. and Zhang K.H. (2002), Urbanisation and City Size Distribution in Song S. and Zhang K.H. (2002), Urbanisation and City Size Distribution in
China, Urban Studies, 39(12), 2317-2327. China, Urban Studies, 39(12), 2317-2327.
Soo K.T. (2005), Zipf’s law for cities: a cross-country investigation, Regional Soo K.T. (2005), Zipf’s law for cities: a cross-country investigation, Regional
Science and Urban Economics, 35, 239-263. Science and Urban Economics, 35, 239-263.
Soo K.T. (2007), Zipf’s law and urban growth in Malaysia, Urban Studies, Soo K.T. (2007), Zipf’s law and urban growth in Malaysia, Urban Studies,
44(1), 1-14. 44(1), 1-14.
Star J. and Estes J. (1990), Geographic Information Systems: An Introduction, Star J. and Estes J. (1990), Geographic Information Systems: An Introduction,
Prentice Hall: Englewood Cliffs, NJ. Prentice Hall: Englewood Cliffs, NJ.
Sui D. (2008), The wikification of GIS and its consequences: Or Angelina Sui D. (2008), The wikification of GIS and its consequences: Or Angelina
Jolie’s new tattoo and the future of GIS, Computers, Environment and Jolie’s new tattoo and the future of GIS, Computers, Environment and
Urban Systems, 32, 1–5. Urban Systems, 32, 1–5.
Tobler W. (1970), A computer movie simulating urban growth in the Detroit Tobler W. (1970), A computer movie simulating urban growth in the Detroit
region, Economic Geography, 46(2): 234-240. region, Economic Geography, 46(2): 234-240.
89 89
Tomlin C.D. (1990), Geographic Information Systems and Cartographic Tomlin C.D. (1990), Geographic Information Systems and Cartographic
Modeling, Prentice Hall: Englewood Cliffs, NJ. Modeling, Prentice Hall: Englewood Cliffs, NJ.
University Consortium for Geographic Information Science (UCGIS 2004), A University Consortium for Geographic Information Science (UCGIS 2004), A
Research Agenda for Geographic Information Science, Taylor & Francis. Research Agenda for Geographic Information Science, Taylor & Francis.
Vance J.E. (1964), Geography and urban evolution in the San Francisco Bay Vance J.E. (1964), Geography and urban evolution in the San Francisco Bay
Area, University of California, Berkeley. Area, University of California, Berkeley.
Wang X., Chen C. and Liu Z. (2008), Evolution of Geographic Information Wang X., Chen C. and Liu Z. (2008), Evolution of Geographic Information
Systems research front using information visualizing network, Proceedings Systems research front using information visualizing network, Proceedings
of the Fourth International Conference on Webometrics, Informetrics and of the Fourth International Conference on Webometrics, Informetrics and
Scientometrics & Ninth COLLNET Meeting, Berlin, Germany, 28 July - 1 Scientometrics & Ninth COLLNET Meeting, Berlin, Germany, 28 July - 1
August, 2008. August, 2008.
Warren C.P., Sander L.M. and Sokolov I.M. (2002), Geography in a scale-free Warren C.P., Sander L.M. and Sokolov I.M. (2002), Geography in a scale-free
network model, Physical Review E, 66, 056105. network model, Physical Review E, 66, 056105.
Watts D.J. and Strogatz S.H. (1998), Collective dynamics of small-world Watts D.J. and Strogatz S.H. (1998), Collective dynamics of small-world
networks, Nature, 393, 440. networks, Nature, 393, 440.
Wells D. (2009), http://www.agile-process.org, accessed on Februry 2012. Wells D. (2009), http://www.agile-process.org, accessed on Februry 2012.
Wrighta D.J. and Wang S.(2011), The emergence of spatial cyberinfrastructure, Wrighta D.J. and Wang S.(2011), The emergence of spatial cyberinfrastructure,
Proceedings of the National Academy of Sciences (USA), 108(14), 5488- Proceedings of the National Academy of Sciences (USA), 108(14), 5488-
5491. 5491.
World Wide Web Consortium (2012), http://www.w3.org, accessed on February World Wide Web Consortium (2012), http://www.w3.org, accessed on February
2012. 2012.
Xu Z. and Harriss R. (2010), A spatial and temporal autocorrelated growth Xu Z. and Harriss R. (2010), A spatial and temporal autocorrelated growth
model for city rank–size distribution, Urban Studies, 47(2), 321-335. model for city rank–size distribution, Urban Studies, 47(2), 321-335.
Yuan M., Mark D.M., Egenhofer M.J. and Peuquet D.J. (2004), Extensions to Yuan M., Mark D.M., Egenhofer M.J. and Peuquet D.J. (2004), Extensions to
Geographic Representations, In McMaster R. and Usery E.L. (Eds), A Geographic Representations, In McMaster R. and Usery E.L. (Eds), A
Research Agenda for Geographic Information Science, Taylor & Francis. Research Agenda for Geographic Information Science, Taylor & Francis.
Zeiler M. (2000). Modeling Our World: The ESRI Guide to Geodatabase Zeiler M. (2000). Modeling Our World: The ESRI Guide to Geodatabase
Design, ESRI Press. Design, ESRI Press.
90 90

FULLTEXT02

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FULLTEXT02

Uploaded by

Copyright:

Available Formats

TRITA-SoM 2012-03

Doctoral Thesis in Geodesy and Geoinformatics

Xintao Liu Xintao Liu

Doctoral Thesis Doctoral Thesis

Doctoral dissertation Doctoral dissertation

TRITA-SoM 2012-03 TRITA-SoM 2012-03

Printed by e-print, Sweden 2012 Printed by e-print, Sweden 2012

Abstract ............................................................................................................... iii Abstract ............................................................................................................... iii

1. Introduction ................................................................................................... 1 1. Introduction ................................................................................................... 1

Xintao Liu Xintao Liu

1.1. Background 1.1. Background

1.1.1. Geographic information science 1.1.1. Geographic information science

(a) (b) (c) (a) (b) (c)

1.2. Research objectives 1.2. Research objectives

1.3. Thesis outline 1.3. Thesis outline

2.1.1. Preliminaries 2.1.1. Preliminaries

Graph and related concepts Graph and related concepts

Space syntax Space syntax

(a) (b) (a) (b)

(a) (b) (a) (b)

Natural roads/Streets Natural roads/Streets

Algorithmic strategy Algorithmic strategy

1) Breadth First Search 1) Breadth First Search

(a) (b) (a) (b)

2.1.2. VGI and OSM 2.1.2. VGI and OSM

2.1.3. Understanding VGI 2.1.3. Understanding VGI

Technical perspectives of VGI Technical perspectives of VGI

Web2.0 Web 2.0 server Web2.0 Web 2.0 server

Client Middle tier Dataset Client Middle tier Dataset

2.1.4. OpenStreetMap project 2.1.4. OpenStreetMap project

OpenStreetMap data OpenStreetMap data

2.1.5. Scaling of geographic space 2.1.5. Scaling of geographic space

Fractal, scaling and self-similarity Fractal, scaling and self-similarity

(a) (b) (a) (b)

Studies of the scaling property Studies of the scaling property

Central place theory Central place theory

Concentric zone theory Concentric zone theory

Multiple nuclei model Multiple nuclei model

Urban Realms model Urban Realms model

2.3. Applications in urban studies 2.3. Applications in urban studies

(a) (b) (a) (b)

Wuhan city Wuhan city

3.2. Processing road networks 3.2. Processing road networks

3.2.1. Highway extraction 3.2.1. Highway extraction

<highway id="61431" points="5" length="742.17"/> <highway id="61431" points="5" length="742.17"/>

Figure 22: Highway intersection. Figure 22: Highway intersection.

(a) (b) (a) (b)

OSM XML file OSM XML file

Start to read Start to read Start to read Start to read

End of file? End of file? End of file? End of file?

No Intilize grid index No Intilize grid index

Convert to ArcGIS No End of all Convert to ArcGIS No End of all

Road arc/segments Road arc/segments

Connectivity graph Connectivity graph

Blocks Natural streets Blocks Natural streets

4.1. Overall structure 4.1. Overall structure

Application in urban studies

City level City level

Country hierachical structures Country hierachical structures

4.2. Heavy-tailed distributions 4.2. Heavy-tailed distributions

4.2.1. Concept and definitions 4.2.1. Concept and definitions

Head (20%) Head (20%)

𝑓(𝑥) = 𝐶3 𝑒 −𝜆𝑥 , 𝐶3 = 𝜆𝑒 𝜆𝑥𝑚𝑖𝑛 (3) 𝑓(𝑥) = 𝐶3 𝑒 −𝜆𝑥 , 𝐶3 = 𝜆𝑒 𝜆𝑥𝑚𝑖𝑛 (3)