Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

*56"];t7PM/Pt.

BSDIt 

Web scraping and mapping urban


data to support urban design
decisions

Elif ENSARİ1, Bilge KOBAŞ2


1
FMJGFOTBSJ!HNBJMDPNt"SDIJUFDUVSBM%FTJHO$PNQVUJOH1SPHSBN *TUBOCVM
Technical University, Istanbul, Turkey
2
bilge@super-eight.com t#JUTO#SJDLT"OBMZUJDBM3FTFBSDI$FOUFS *TUBOCVM 
Turkey

3FDFJWFE0DUPCFSt Final Acceptance: November 2017

Abstract
Cities generate data in increasing speed, volume and variety which is more eas-
ily accessed and processed by the advance of technology every day. Consequently,
the potential for this data to feedback into the city to improve living conditions
and efficiency of utilizing resources grows. Departing from this potential, this pa-
per presents a study that proposes methods to collect and visualize urban data
with the aim of supporting urban design decisions. We employed web scraping
techniques to collect a variety of publicly available data within the Kadıköy mu-
nicipal boundaries of Istanbul and utilized a visual programming software to map
and visualize this information. Through this method and superposition of our re-
doi: 10.5505/itujfa.2018.40360

sulting maps, we visually communicate urban conditions including demographic


and economic trends based on online real estate listings as well as spatial distri-
bution and accessibility of public and commercial resources. We propose that
this method and resulting visualizations present valuable potential in supporting
urban design decision-making processes.

Keywords
Geospatial data, Urban data, Urban design decision support, Web scraping.
6

1. Introduction and its relevance for a specific urban


Data turns out to be more relevant issue. We also present visualization
for urban design as it becomes more of each data set with some basic spa-
ubiquitously available and accessible tial analysis. Through this, we aim to
through the advance of technology and aid quick and easy visual analysis and
as it is more widely incorporated in the better communication of this data to
decision-making processes regarding designers and decision makers. Fur-
UIF VSCBO FOWJSPONFOU 3FDFOU ZFBST thermore, Kadıköy region of Istanbul
have seen the proliferation of big data as has never been subject to such a com-
well as research concerned with its inte- prehensive study in terms of the variety
gration into the analysis, management of urban data and data sources utilized
BOE EFTJHO PG DJUJFT /VNFSPVT BDB- as well as the techniques adopted for
demic, civic and commercial research mapping and visualization. The data
centers have been founded dedicated DPMMFDUJPO XBT EPOF VUJMJ[JOH 1ZUIPO
to studying the urban environment scripting and all mapping, visualiza-
through information generated by the tion and visual analysis processes were
DJUJFT #BUUZ  
 8IJMF UIF TDBMF  done using the visual programing en-
variety and production speed of data WJSPONFOU(SBTTIPQQFSGPS3IJOPE
7T
JODSFBTFFWFSZEBZ JUTBWBJMBCJMJUZ  Our research has revealed various
accessibility and communication to urban patterns that became apparent
designers, decision makers and to gen- through the maps we have created uti-
eral public remain a critical issue. This lizing the data collected from multiple
is due to the range of skills required to geolocated urban data sources. Two
access and process the data in order to most significant cases we have ob-
make it relevant and meaningful. served to have potential in contributing
The aim of our study is to present an in the urban design decision dialogues
urban data collection and visualization are the exposition of public and com-
methodology which can help urban de- mercial resource distribution, and de-
signers produce maps to communicate mographic and economic trends read
various urban patterns to stake holders through the various maps we present
and decision makers in order to sup- in the following sections.
port urban design decisions, as well as We aim to develop this study by in-
to communities to better inform par- tegrating more comprehensive analy-
ticipatory processes. We also present ses to our workflow in future research.
a set of maps produced following this
workflow, along with some basic geo- 2. Literature review
spatial analyses we have performed to While cities around the world are
create them. The questions we take on growing in size and population every
to answer through the study presented day, digital devices and systems are be-
in this paper are as follows. coming more and more integral to their
t How can sources of publicly avail- functioning in parallel with technolog-
able data with geolocation informa- ical advancements. This results in the
tion be relevant and useful to urban generation of massive amounts of data
designers? in the urban environment which have
t How can this data be gathered and already been subject to a large body of
visualized? research motivated with the belief that
t What kind of knowledge can be cities can learn from this data and be-
generated through the visualization DPNFATNBSUFSJOPSEFSUPQSPWJEFCFU-
of this data that may be presentable ter living conditions for their residents
to decision makers even before any while utilizing resources more efficient-
geostatistical analysis is carried out? MZ #BUUZFUBM 5PXOTFOE 

The study is unique in that we pres- .PCJMJUZ QBUUFSOT PG VSCBO EXFMMFST
ent a methodology to compile a rich drawn from digital public transporta-
dataset which we propose to be rele- tion ticketing systems, cellular phones
vant for urban design decision making or cameras detecting traffic patterns,
processes, consisting of information GIS data compiled and made avail-
drawn from multiple sources rather able to public by governments, various
than studying a single source of data use and activity data shared through

*56"];t7PM/Pt.BSDIt&&OTBSŔ #,PCBʰ
7

personal mobile devices and publicly to draw data from public web sites, web
available social media content are some scraping which we utilize in our study
of many examples of such data. involves automated means to query and
" QBSU PG UIFTF TUVEJFT JO VSCBO download large sets of geospatial data.
data focuses on gathering, analyzing #PFJOHBOE8BEEFMM 
VUJMJ[FUIJT
and visualizing data which is inher- technique to analyze real-estate listings
ently linked to geolocation informa- in the US and are able to compare elev-
tion, namely geospatial big data (Lee en million listings all over the country
 ,BOH  
 NBLJOH JU QPTTJCMF UP with government declared fair market
map and observe geospatial patterns of SFOUT JO  NFUSPQPMJUBO BSFBT "1*T
urban phenomenon. While such stud- BQQMJDBUJPO QSPHSBNNJOH JOUFSGBDFT

ies often bring together geographers provided by many online platforms
and computer scientists based on in- such as Google, Instagram, Facebook,
UFSFTUBOEUFDIOJDBMTLJMMTSFRVJSFEVS- Flickr, Foursquare or Twitter make it
ban planners, designers and architects possible to query and download user
also see value in this data which can generated content, which becomes in-
provide precious insight on the func- teresting for urban researchers when
tioning of cities and support urban the data is accessed together with geo-
decisions in order to improve the qual- MPDBUJPO JOGPSNBUJPO "NPOH TFWFSBM
ity of the urban environment (Glaeser, recent studies that utilize social net-
%VLF,PNJOFST -VDB /BML 
 work data to better understand urban
One of the most popular subjects patterns, Cranshaw and colleagues
of study of dynamic urban data is real 
 EFUFDU BSFBT PG DPODFOUSBU-
estate rental and sales prices, mapping FE VSCBO BDUJWJUZ i-JWFIPPETw
 VT-
and geographical analysis of which al- ing Foursquare check-ins and Jiang,
low for understanding location driven "MWFT  3PESJHVFT  'FSSFJSB   1FSFJSB
contributors to property values (Co- 
FTUJNBUFMBOEVTFCBTFEPO:B-
IFO  $PVHIMJO   (FPHIFHBO  IPPTQPJOUPGJOUFSFTU 10*
EBUBXJUI
8BJOHFS   #PDLTUBFM   8BEEFMM  a higher accuracy than with traditional
#FSSZ   )PDI  
 .PSF SFDFOUMZ  NFUIPET$IFO 
QSPWJEFTBHPPE
studies utilize housing price informa- overview of data collecting methods
tion from public real-estate websites from social media platforms including
#FODZ  3BMMBQBMMJ  (BOUJ   4SJWBUTB  XFCTDSBQJOHBOEUIFVTFPG"1*TUIBU
 #PFJOH  8BEEFMM  
 GB- were also utilized in our study.
cilitating the use of very large and up "MM HFPTQBUJBM EBUB JT BMSFBEZ CJH
to date datasets, which allows for the EBUB -FF,BOH 
BOEWFMPDJUZ 
development of more accurate hous- WBSJFUZ BOE WPMVNF BSF UIF 7T DPO-
ing price prediction models. One such sidered to define big data. While col-
study finds correlations of housing pric- lecting, managing and analyzing such
es with geo-tagged social media con- rapidly generated large scales of data
UFOU -J  :F  -FF  (POH   2JO  
 already pose a significant challenge,
which is another type of data that has visualizing and communicating it is ut-
been subject to urban analysis research most critical in deriving meaning and
as an indicator of urban activity (Jen- value from geospatial data in order for
kins, Croitoru, Crooks, & Stefanidis, cities to learn from it. There are several
4DISFDL,FJN 
8BMLJOH  tools from geographical information
running and other forms of exercise TZTUFNT (*4
 CVJMEJOH JOGPSNBUJPO
data tracked and shared by urban res- models to virtual simulation environ-
idents have also been studied to better ments that make available various visu-
understand qualities of preferred loca- BMJ[BUJPOUFDIOJRVFT 1FUUJUFUBM 

tions and routes for recreational activ- for geospatial data.
JUZJOUIFVSCBOFOWJSPONFOU #BMBCBO " QSPKFDU UIBU EFNPOTUSBUFT UIF
5VODFS $MBSLF4UFFMF 
 power of visualization of geospatial
There are various methods and tech- EBUB  i.JMMJPO %PMMBS #MPDLTw ,VS-
niques which allow for management of HBO  
 NBQT UIF IPNF BEESFTTFT
big data, yet means of its collection and of criminals as they are admitted into
visualization is the main subject of our prison as well as the length of their stay
JOUFSFTU JO UIJT SFTFBSDI "T B NFUIPE JOĕWFPG64TMBSHFTUDJUJFTBOESFWFBMT

Web scraping and mapping urban data to support urban design decisions


Figure 1. The workflow.


a remarkable concentration of these these visualizations created through
addresses around a few city blocks. basic spatial analysis and mapping
0OF PG UIF QSPKFDUT NPTU TUSJLJOH WJ- techniques allow for human visual as-
suals maps the money spent by the sessment of a rich set of urban data,
state on each criminal to their home which is deemed as valuable as auto-
BEESFTT  EJTQMBZJOH  NJMMJPO EPM- mated analytical processes (Schreck &
lars spent on criminals coming from ,FJN 

XJUIJOKVTUGPVSCMPDLTJO#SPXOTWJMMF 
#SPPLMZO"TUVEZDPOEVDUFEJOBOFX- 3. Methodology
MZ BDUJWBUJOH BSFB PG "NTUFSEBN  ćF Our process involves the following
,OPXMFEHF .JMF /JFEFSFS  $PMPNCP  steps: drawing of data from online
.BVSJ   "[[J  
 QSFTFOUT HFP- sources, organizing the data, perform-
graphical maps of companies and their ing mathematical operations to calcu-
online hyperlinks, most shared images late some additional values on the da-
and most shared locations using data tabase, mapping the data based on geo
gathered from online platforms Goo- location information, performing basic
HMF4FBSDI 1BOPSBNJP *OTUBHSBNBOE geometrical analysis and visualization.
'PVSTRVBSF .BLJOH BQQBSFOU UIF PO- Our workflow is presented in Figure 1.
line presence and character of the ur- We utilized multiple sources of pub-
ban axis in the focus of the research, licly available data from within the
these maps were later shared with local Kadıköy municipal boundaries of Is-
stakeholders in participatory design tanbul, to draw already geo-tagged data
sessions and reportedly initiated con- or data with address information that
versation and further field work. we converted to latitude and longitude
Having various similarities with the WBMVFTUISPVHI(PPHMFT(FPDPEF"1*
research presented above, our study While mapping and visualizing the raw
presents multiple layers of geographi- data we collected, we applied different
cally linked urban data drawn from dy- MFWFMTPGJOUFSWFOUJPOGSPNTJNQMZQP-
namic sources of public web content, sitioning the circles at geographical
mapped to reveal patterns of urban ac- locations and scaling them based on
tivity and distribution of commercial their values within the domain of the
and public facilities. We propose that data set or connecting the geolocation

*56"];t7PM/Pt.BSDIt&&OTBSŔ #,PCBʰ


Figure 2. Unit rental prices (rent price per sqm).


points to create routes and superposing of data and the themes of the maps that
them, to performing surface to surface were created.
intersections to calculate areas of influ-
ence or accessibility of amenities auto- 3.1. Housing and real estate
mated through custom codes. We utilized one of the most wide-
Our data comprises real estate rental ly used real estate listings websites in
and sales prices, property square me- 5VSLFZXJUINJMMJPOBDUJWFNFN-
ters, property bedroom numbers, ages CFST WJTJUJOH UIF QPSUBM  NJMMJPO
of buildings, uses of buildings, social times per month that allows for post-
media activity, public park boundaries, ing and managing of apartment rental
commercial facilities, public transpor- and sales listings by both the owner
tation and other public facilities. of the property as well as real estate
For the data collection phase, we use agents. The website has a map interface
1ZUIPOQSPHSBNNJOHMBOHVBHFUPTFOE through which we were able to draw
SFRVFTUTDPNQPTFECZUIF1PTUNBOUP HFPMPDBUJPO  TJ[F JO TRNT
 OVNCFS
UIFTPVSDFXFCTJUFTTFSWFSTBOEHFOFS- of rooms and comments of the listing
ally download the data in Json format. owner of each apartment in the Json
We then convert the data to csv (com- format. Then we sent a request to the
NBTFQBSBUFEWBMVF
GPSNBUBOEJNQPSU page of each listing to draw further text
it to Grasshopper visual programming based information including the type
FOWJSPONFOU  BO BEEPO GPS 3IJOPE  of heating, additional amenities and
VTJOH.FFSLBUQMVHJO)FSFXFQSPDFTT whether the listing was posted by the
the values within the database or run owner or the real estate agent. The text
simple spatial analysis after converting EBUB XBT QBSTFE VTJOH UIF #FBVUJGVM
UIFN UP HFPNFUSJDBM FOUJUJFT "MM UIF 4PVQ-JCSBSZGPS1ZUIPOBOEDPOWFSU-
steps carried out in the Grasshopper ed into csv format with Csv Library for
environment besides the final graphic 1ZUIPOćFDTWĕMFTXFSFUIFOJNQPSU-
touch-ups are automated so that they ed to Grasshopper and processed to
are easily repeatable for different loca- calculate the rental and sales prices per
tions in future studies. TRNBOEDSFBUFNBQT 'JHVSFT

#FMPXXFQSFTFOUUIFEFUBJMTPGPVS In these maps, the radius of each cir-
methods grouped based on the sources cle positioned at the geolocation of the

Web scraping and mapping urban data to support urban design decisions


Figure 3. Sales prices per sqm.

Figure 4. Number of bedrooms for each house listing (rental and for sale).
listing is scaled proportionately with from both rental and sales listings and
its price divided by sqms, after the out- NBQQFEXJUIDPMPST 'JHVSFT

liers were excluded from the dataset. 1BZCBDL WBMVFT XFSF DBMDVMBUFE QFS
The number of rooms per apartment N CZ N BSFB CBTFE PO UIF BW-
and building age data were combined erage apartment rental and sales prices

*56"];t7PM/Pt.BSDIt&&OTBSŔ #,PCBʰ
11

Figure 5. Ages of the buildings of the listings (rental and for sale).

Figure 6. Payback time.


per sqm available in each grid square ing use information, with the build-
'JHVSF
 ing footprints colored using a gradient
"(*4ĕMFDSFBUFECZ*TUBOCVM.FUSP- based on the proportion of the number
QPMJUBO.VOJDJQBMJUZXBTVTFEUPNBQ of non-residential units to the residen-
the residential/non-residential build- UJBMPOFTGPSFBDICVJMEJOH 'JHVSF


Web scraping and mapping urban data to support urban design decisions
12

Figure 7. Residential and non-residential uses in buildings.

Figure 8. Circle grid system for Instagram queries.


3.2. Social media LFZ  XJUI  NJMMJPO VTFST CZ 
Instagram was the source of social 4UBUJTUBDPN 
1ZUIPODPEFBOE
media data we utilized for this study as *OTUBHSBNTPXO"1*XBTVTFEUPRVFSZ
it is known to be favored over Twitter QPTUTBUBUJNF GSPNXJUIJON
and other forms of social media in Tur- SBEJJPGQPJOUTBSSBOHFEUPDPWFSBMM

*56"];t7PM/Pt.BSDIt&&OTBSŔ #,PCBʰ


Figure 9. Cafes, Instagram posts and their like numbers.


the area within the municipal bound- with the method we drew Instagram
BSJFTPG,BE‘LÚZ 'JHVSF
#FTJEFTUIF QPTUT XFRVFSJFEQPJOUTXJUIN
geolocation data, number of likes and SBEJJBOESFDPSEFEQMBDFTBUBUJNF
the text accompanying each image was Then we cleaned up the duplicates and
drawn. The duplicate posts download- created a simple point-location map.
ed due to inevitable overlaps of circles One of our resources was a widely
XFSFDMFBOFEVQ1PTUTUIBUXFSFMJLFE used food delivery portal, Yemeksepe-
PWFSUJNFTXFSFDPVOUFEBT UJDPN  XJUI PWFS  NJMMJPO VTFST CZ
due to their low frequency and posts  
 UIBU EJSFDUT PSEFST GSPN
XJUIJO B EJTUBODF PG NT UP FBDI PUI- users to member restaurants in Tur-
er were combined by adding up their LFZ&BDIJUFNPOBSFTUBVSBOUTNFOV
number of likes for the sake of a more is displayed with its price, however, the
MFHJCMFSFQSFTFOUBUJPO"NBQXBTDSF- SFTUBVSBOUTBEESFTTJTOPUQSPWJEFEPO
ated with the scale of each circle repre- the website. First, we queried the geo-
senting the like numbers per post point location information of each restaurant
'JHVSF
 VTJOH (PPHMFT (FPDPEF "1* ćFO
The text data drawn from the Insta- the prices of the most commonly sold
gram posts were analyzed using a cus- JUFNTXIJDIXFSFBNMDBOPG$PLF
tom definition separating each word us- BOE NM DVQ PG "ZSBO B ZPHVSU
JOH1ZUIPOTQMJUGVODUJPOEFĕOJOHTQBDF CBTFE5VSLJTICFWFSBHF
XFSFRVFSJFE
sign “ “ as a delimiter and hashtags were We also drew the hours of operation for
appended to the data of each post in each restaurant. The website does not
the csv file. Then the frequency of each provide the menu items in a database
IBTIUBH XBT DBMDVMBUFE QFS N CZ GPSNBUUIFSFGPSFBDVTUPN1ZUIPOXFC
NHSJEBOENBQQFE 'JHVSF
 scraper script was developed which
searches for all restaurants delivering to
3.3. Public and commercial the area and then follows each of their
amenities MJOLT UP BDDFTT UIFJS NFOVT #FBVUJGVM
6TJOH UIF (PPHMF 1MBDFT "1*  XF Soup Library was used to convert the
drew the geolocations and ratings of )5.-GPSNBUUFETJUFJOUPBGPSNBUUFE
$BGÏT BOE 3FTUBVSBOUT XIJDI XFSF spreadsheet, the prices were converted
NBSLFE PO (PPHMF .BQT 4JNJMBSMZ to the scales of circle radii distributed

Web scraping and mapping urban data to support urban design decisions


Figure 10. Hashtag map.


on the map based on the geolocation calculated the number of stops within
of each restaurant and the number of these circles first, and then multiplied
operating hours were converted to col- the seated passenger capacity of each
or codes to be visualized on the map vehicle with its frequency throughout
'JHVSFT
 the day. Without drawing the route
lines, we did a similar calculation for
3.4. Transportation and access ferries and subway lines, and added
We drew the geolocation data of stops up all the seated passenger capacities
for buses and metrobuses (a rapid bus within the mentioned radii distance
transit line using dedicated bus lanes from each building centroid. We illus-
GPS NVDI PG UIF SPVUFT
 GSPN *&55T trated these numbers through a gradi-
(the general transportation department ent of colors for each building, repre-
PGUIF$JUZPG*TUBOCVM
XFCTJUFBTXFMM senting a scale of accessibility to public
as the number of times a bus or metro- USBOTQPSUBUJPO 'JHVSF

bus passes through each stop to create "OPUIFSNBQXFDSFBUFEBJNTUPWJ-


SPVUFTPOBNBQ 'JHVSF
8FVTFE sualize the accessibility of recreational
their frequency data to define the thick- green spaces from buildings. For this,
ness of the route lines on our map. we utilized the geographically located
Then for each building centroid, outlines of public parks drawn from
we computed the number of public the GIS files and made surface inter-
USBOTQPSUTFBUTBDDFTTJCMFXJUIJON  sections of them with the circles of
N BOE N SBEJJ 'PS UIJT  XF N  N BOE N SBEJJ GSPN

*56"];t7PM/Pt.BSDIt&&OTBSŔ #,PCBʰ


Figure 11. Soft drink prices.

Figure 12. Operating hours of the restaurants.


each building centroid. The areas of We drew the exercise routes of Kadıköy
intersecting regions gave us a domain residents from an activity tracking social
of accessible green space sqms that we network that allows for sharing of walk-
mapped as a gradient of colors for each ing, running, biking and swimming activ-
CVJMEJOHQPMZHPO 'JHVSF
 ity data. Superposing these routes revealed

Web scraping and mapping urban data to support urban design decisions
16

Figure 13. Bus and metrobus lines.

Figure 14. Accessibility to public transportation seats.


the most popular streets and in one case a 4. Observations and urban design
swimming route, as well as the approxi- decision support
mately estimated geographic locations of We propose that the set of maps pro-
the buildings, residents of which use these duced by our automated workflow are
SPVUFTNPTUGSFRVFOUMZ 'JHVSF
 powerful in demonstrating how urban

*56"];t7PM/Pt.BSDIt&&OTBSŔ #,PCBʰ
17

Figure 15. Accessibility to green spaces.

Figure 16. Exercise routes.


data can be basically organized and visu- through geospatial analysis of this data
alized to communicate meaningful obser- will advance this research significant-
vations and aid urban design decisions. ly, we believe that the current stage of
"MUIPVHI UIF RVBOUJUBUJWF SFTVMUT the study we present here is as useful
we aim to generate in future studies in revealing the relevance of geospatial

Web scraping and mapping urban data to support urban design decisions


information for urban design. Some the liveliness and feeling of safety on
of our observations at this stage of our a street, contributing to the “Eyes on
research are presented below. We also the street,” a concept first proposed by
present ways in which the visual in- +BOF +BDPCT 
 ćF HFPHSBQIJDBM
formation revealed in these maps can distribution of commercial amenities
support urban planning and design and social media activity can inform
decisions following each of our obser- urban decisions of zoning and invest-
vations. ment in new facilities or services by re-
There is a concentration of apart- vealing if the existing facilities are suf-
ments for sale and higher sales prices ficient or attractive to urban dwellers
along the coast and a concentration and how commercially active a street is
PGSFOUBMTPG BOE VOJUTPOUIF UISPVHIPVUUIFEBZ"SFBTUIBUBSFMBDL-
XFTUPG4ÚʓàUMà±FʰNF 'JHVSFT  ing in such amenities can be prioritized

 ćJT JT JOGPSNBUJWF BCPVU UIF EF- for making changes in planning to en-
mographics of residents around these courage a more vibrant street life and
neighborhoods and can guide the mu- commercial activity on their streets.
nicipality in deciding on the kind of Our public transportation and ac-
investments they should make, may DFTT NBQT 'JHVSFT   
 WJTVBM-
they be introducing co-working spac- ly present one of the most common
es, libraries, professional education indicators of walkability for urban
facilities, kindergartens or retirement OFJHICPSIPPET"MUIPVHIUIFEJTUBODF
homes. Our Instagram hashtags and from each building was not measured
TPę ESJOL QSJDFT NBQT 'JHVSFT   through the street network but simply

 TJNJMBSMZ DPOUSJCVUF JO SFWFBMJOH calculated based on a radius, the map
the demographics of the local residents showing access to public transpor-
and visitors, as well as providing a tation seats provides an idea of how
glimpse of their consumer trends. democratically distributed the trans-
Urban transformation which is portation facilities are within the mu-
encouraged by the policy to renew OJDJQBMCPVOEBSJFTPG,BE‘LÚZ"DDFTT
non-earthquake resistant housing to greenspace that we created a map
stock of Istanbul is made apparent in GPS 'JHVSF
JTBTJNJMBSMZTJHOJĕDBOU
our maps displaying the ages of build- contributor to walkability and quality
JOHT 'JHVSF
TIPXJOHUIFHFPHSBQI- of life. Our exercise routes map (Figure
ic distribution of recent construction. 
 HJWFT BO JEFB PG XIJDI SFTJEFOUJBM
The rapid transformation within the locations have better access to recre-
Caddebostan area in the last couple of BUJPOBM BDUJWJUJFT XJUIJO UIF DJUZ #F-
years is revealed which may be obvious sides distance, we argue that this map
to the neighborhood residents but no may reveal additional contributors to
known maps exist with updated infor- the exercise behavior in the city such as
mation available to public. The local the pleasantness, comfort and safety of
authorities can utilize this information these routes, as well as their accessibili-
not only for understanding areas where ty from different neighborhoods.
transformation is slow, but also where .VOJDJQBMJUJFT BOE DFOUSBM BVUIPS-
it is rapid, and they can work with ities can manage the allocation of
urban designers to take precautions their resources based on information
against disadvantageous consequences regarding the distribution of public
of such a high density of construction services and facilities easily observ-
sites and the rapid change of the built able through our maps demonstrating
environment for the local residents. access to public transport, green ar-
Together with the cafes & Insta- eas and routes used for sports activity.
HSBN MJLFT NBQ 'JHVSF 
 PVS SFTJ- Since residential neighborhoods less
dential-nonresidential buildings map fortunate in terms of access to public
SFWFBM 'JHVSF 
 UIF NBJO DPNNFS- transportation can quickly be identi-
cial axes and the mixed-use pattern in fied in these maps, they can encourage
Kadıköy which is known to be an in- local residents to demand an increase
dicator influencing the walkability in of routes and more frequent services
a neighborhood. The operating hours GSPN USBOTQPSUBUJPO BVUIPSJUJFT "D-
PG SFTUBVSBOUT 'JHVSF 
 JOĘVFODF cess to green areas and the shoreline

*56"];t7PM/Pt.BSDIt&&OTBSŔ #,PCBʰ


which is obviously attractive for ex- es, and at this stage of the study, the
ercisers can be enhanced by new bike datasets were created using the most
lanes and improved street conditions widely used website and/or application
for better walkability especially in areas in their respective category. While this
that link the northern parts of Kadıköy approach gave us a strong initial start,
to these areas. additional sources may be added to the
We present these observations datasets in further research. In such a
and suggestions for urban designers case, a mutual database format should
through the case study of Kadıköy, be created as the type of data retrieved
however, we aim this study and these from different sources will also begin
maps to constitute a model for reveal- to differ.
ing and communicating information Final discussion marks include
that can guide urban decisions in im- DPQZSJHIU JTTVFT BT BMNPTU BMM PG UIF
proving the street life, commercial websites used in these studies limit re-
activity, local development and life production and re-distribution of the
quality of urban residents through data for commercial use in their legal
better allocation and more democratic VTFS BHSFFNFOUT #PFJOH BOE 8BEEFMM
geographic distribution of resources. 
OPUFBDBTFXIFSFBGFEFSBM64
Without doubt, each urban area from court decided that web scraping pub-
each country is unique in many aspects licly available data did not constitute
that also contribute to the urban qual- a copyright violation. Even though
ities that we propose can be improved non-commercial research does not fall
through our observations, thus all data under legal restrictions, only a handful
should be mapped and interpreted with of the websites and applications make
various such differences in mind to be the data collecting process openly
relevant in supporting urban decisions. available to developers and researches.
This means that the sources without
5. Discussion developer access may easily limit or re-
We do not present any geostatistical strict our access in the future by mak-
analysis of urban data in this paper, but ing changes to their web structures.
rather propose this research as a pre-
liminary step to be followed by rigor- 6. Conclusion
ous spatial analyses in studies to follow. /PXBEBZT  B TJHOJĕDBOU QPSUJPO PG
/FWFSUIFMFTT XFQSPQPTFUIFNFUIPE- public and commercial activity takes
ology we present for urban data collec- place on online social networks. We
tion and visualization as a guide for de- swim in an enormous pool of data
signers to better understand and utilize streaming all around us. The tweets,
means to gather and process data rele- selfies, check-ins, likes, ignores, pokes,
vant for urban design decision support. snaps, listings, stars, user comments
We acknowledge that there may be constantly keep updating and adding
possible errors in information provid- UP UIJT QPPM .PTUMZ DPOTJTUJOH PG VO-
ed by the listing owners regarding the organized, unfiltered and uncatego-
SFBMFTUBUF EBUB XF VUJMJ[F 7PMVOUBSJ- SJ[FESBXEBUBUIJTQPPMJTUIFVMUJNBUF
ly generated data may also cause in- collective subconscious of our society.
accuracies due to the free data-entry It has the information on what people
format, as in the case for social media MJLFEPJOHXIFSF XIFO IPXIPXUIFZ
entries. The social media users tend to USBWFMIPXUIFZTQFOEUIFJSUJNFIPX
create their own use of language, ab- UIFZ TQFBL UP FBDI PUIFS XIBU UIFJS
breviations, and personal methods of TIBSF PG FDPOPNZ JT IPX UIFZ UBLF
VTJOHIBTIUBHT 4DISFDL,FJN 
 what was designed and how they are
or simply make misspellings. Further- transformed by that. This is a valuable
more, Turkish is an agglutinative lan- source of information for anyone inter-
guage and multiple suffixes make it un- ested in exploring the geospatial past,
favorable for text based analytics. current time and future in great detail.
"EEJUJPOBMMZ  UIF TPVSDFT XF IBWF "MUIPVHI VSCBO EFTJHOFST BOE BV-
used for this research are limited: thorities already make use of the data
There are several websites and appli- available to them, traditional data col-
cations dedicated for similar purpos- lecting methods tend to be more static

Web scraping and mapping urban data to support urban design decisions


and data is not so easy to update, where- #BUUZ  .  "YIBVTFO  ,  (JBOOPUUJ 


as spatial information is exponentially ' 1P[EOPVLIPW " #B[[BOJ " 8B-
more valuable when the time layer is DIPXJD[  .  0V[PVOJT  (  1PSUVHBMJ 
included in the equation. Deriving : 
4NBSUDJUJFTPGUIFGVUVSFEu-
the data from constantly self-updating ropean Physical Journal: Special Topics,
sources, and establishing a workflow  o
that collects, organizes, and processes #FODZ  "  3BMMBQBMMJ  4  (BOUJ  3 
this data ensures access to up to date 4SJWBUTB  .   .BOKVOBUI  # 

TDFOBSJPT BU BMM UJNFT "EEJUJPOBMMZ  #FZPOE4QBUJBM"VUP3FHSFTTJWF.PE-
the accumulation of this data creates a FMT 1SFEJDUJOH )PVTJOH 1SJDFT XJUI
spatio-temporal archive in time, which 4BUFMMJUF*NBHFSZ*OIEEE Winter
would illustrate the development and Conference on Applications of Comput-
evolution of urban use patterns. er Vision QQ o
 4BOUB 3PTB 
Through this research we have in- $" 64"*&&&
tegrated different computational ap- #PFJOH  (   8BEEFMM  1 

proaches together to create an auto- /FX *OTJHIUT JOUP 3FOUBM )PVTJOH
mated workflow that draws data from .BSLFUTBDSPTTUIF6OJUFE4UBUFT8FC
online sources and visualizes them on 4DSBQJOH BOE "OBMZ[JOH $SBJHTMJTU
NBQT " NBKPS QBSU PG UIJT XPSLĘPX 3FOUBM-JTUJOHTJournal of Planning Ed-
can be utilized to draw data from var- ucation and Research,o
ious other sources with minor tweaks $IFO  / $ 
 Urban Data
in the query code. We present multi- Mining: Social Media Data Analysis as
ple layers of urban data that we believe a Complementary Tool for Urban De-
to be relevant for aiding urban design sign.*5
decisions by revealing various urban $MBSLF " 4UFFMF 3 
)PX
use patterns and distribution of public personal fitness data can be re-used by
and commercial amenities and services smart cities. In 7th International Con-
within a neighborhood. We visual- ference on Intelligent Sensors, Sensor
ize our data layers on maps which we Networks and Information Processing
propose to constitute a basis for fur- QQo
"EFMBJEF 4" "VTUSBMJB
ther study where geospatial statistical IEEE.
analyses will be performed to achieve $PIFO  + 1   $PVHIMJO  $ $
quantitative results. 
4QBUJBMIFEPOJDNPEFMTPGBJS-
We believe that with the further port noise, proximity, and housing
development of web technologies and prices. Journal of Regional Science,
integration of mobile devices into our  
o
lives, the definition of a city and what it $SBOTIBX + 4DIXBSU[ 3 )POH +
encompasses is already being redefined * 4BEFI / 
ćF-JWFIPPET
fundamentally. Therefore, approaches 1SPKFDU6UJMJ[JOH4PDJBM.FEJBUP6O-
to urban analysis and design must be derstand the Dynamics of a City. In
re-evaluated and become as dynamic Proceedings of the Sixth International
our urban environments already have. AAAI Conference on Weblogs and So-
cial Media QQo
1BMP"MUP $" 
Acknowledgements 64"ćF"""*1SFTT
The authors would like to thank Geoghegan, J., Wainger, L. a., &
Can Kadir Sucuoğlu and Cemal Koray #PDLTUBFM  / & 
 "OBMZTJT 4QB-
#JOHÚMGPSUIFJSPVUTUBOEJOHDPOUSJCV- tial landscape indices in a hedonic
tions in this study. framework:an ecological economics
analysis using GIS. Ecological Econom-
References ics  o
#BMBCBO 0 5VODFS # 
7J- Glaeser, E. L., Duke-Kominers, S.,
TVBMJ[JOH 6SCBO 4QPSUT .PWFNFOU *O -VDB . /BML / 
Big Data
Complexity & Simplicity - Proceedings and Big Cities: The Promises and Lim-
of the 34th eCAADe Conference 7PM  itations of Improved Measures of Urban
QQo
0VMVMV 'JOMBOEF$""%F Life. HKS Faculty Research Working Pa-
and University of Oulu. per Series.$BNCSJEHF ."
#BUUZ . 
Urban Informatics +BDPCT + 
The Death and Life
and Big Data. London. of Great American Cities. /FX :PSL 

*56"];t7PM/Pt.BSDIt&&OTBSŔ #,PCBʰ
21

/: 64"3BOEPN)PVTF 1FUUJU $ 8JEKBKB * 3VTTP 1 4JO-


+FOLJOT  "  $SPJUPSV  "  $SPPLT  OPUU  3  4UJNTPO  3   5PNLP  .
"5 4UFGBOJEJT " 
$SPXE- 
 7JTVBMJTBUJPO TVQQPSU GPS FY-
sourcing a collective sense of place. ploring urban space and place. ISPRS
PLoS ONE, 11 
o Annals of Photogrammetry, Remote
+JBOH  4  "MWFT  "  3PESJHVFT  '  Sensing and Spatial Information Scienc-
'FSSFJSB  +   1FSFJSB  ' $ 
 es, I-2 o
.JOJOHQPJOUPGJOUFSFTUEBUBGSPNTP- 4DISFDL 5 ,FJN % 
7J-
cial networks for urban land use clas- TVBM "OBMZTJT PG 4PDJBM .FEJB %BUB
sification and disaggregation. Comput- Computer, 46 
o
ers, Environment and Urban Systems, 1FOH % #JBHJ - *UP 6 

53 o Leading countries based on number of
,VSHBO  - 
 .JMMJPO%PMMBS monthly active Instagram users as of
#MPDLT*OClose up at a distance: map- TURVBSUFS3FUSJFWFE0DUPCFS 
ping, technology, and politics ;POF  GSPNIUUQXXXTUBUJTUBDPN
#PPLT 5PXOTFOE ". 
Smart Cit-
-FF +( ,BOH . 
(FP- ies: Big Data, Civic Hackers, and the
TQBUJBM #JH %BUB $IBMMFOHFT BOE 0Q- Quest for a New Utopia/FX:PSL8
portunities. Big Data Research   
 8/PSUPO$PNQBOZ*OD
o 8BEEFMM 1 #FSSZ #+- )PDI 
Li, S., Ye, X., Lee, J., Gong, J., & Qin, * 
 3FTJEFOUJBM QSPQFSUZ WBMVFT
$ 
 4QBUJPUFNQPSBM "OBMZTJT PG JO B NVMUJOPEBM VSCBO BSFB /FX FWJ-
)PVTJOH 1SJDFT JO $IJOB " #JH %BUB dence on the implicit price of location.
1FSTQFDUJWF Applied Spatial Analysis The Journal of Real Estate Finance and
and Policy  
o Economics, 7 
o
/JFEFSFS 4 $PMPNCP ( .BVSJ .  :FNFLTFQFUJOEFO  MF[[FU BM-
"[[J . 
4USFFUMFWFM$JUZ"OB- NBOBʓ‘ 
 3FUSJFWFE 0DUPCFS  
MZUJDT.BQQJOHUIF"NTUFSEBN,OPXM-   GSPN IUUQXXXZFNFLTFQFUJ
FEHF.JMF*OHybrid City 2015: Data to com/
the People QQo
63*"$

Web scraping and mapping urban data to support urban design decisions

You might also like