Professional Documents
Culture Documents
Web Scraping
Web Scraping
BSDIt
Abstract
Cities generate data in increasing speed, volume and variety which is more eas-
ily accessed and processed by the advance of technology every day. Consequently,
the potential for this data to feedback into the city to improve living conditions
and efficiency of utilizing resources grows. Departing from this potential, this pa-
per presents a study that proposes methods to collect and visualize urban data
with the aim of supporting urban design decisions. We employed web scraping
techniques to collect a variety of publicly available data within the Kadıköy mu-
nicipal boundaries of Istanbul and utilized a visual programming software to map
and visualize this information. Through this method and superposition of our re-
doi: 10.5505/itujfa.2018.40360
Keywords
Geospatial data, Urban data, Urban design decision support, Web scraping.
6
*56"];t7PM/Pt.BSDIt&&OTBSŔ
#,PCBʰ
7
personal mobile devices and publicly to draw data from public web sites, web
available social media content are some scraping which we utilize in our study
of many examples of such data. involves automated means to query and
" QBSU PG UIFTF TUVEJFT JO VSCBO download large sets of geospatial data.
data focuses on gathering, analyzing #PFJOHBOE8BEEFMM
VUJMJ[FUIJT
and visualizing data which is inher- technique to analyze real-estate listings
ently linked to geolocation informa- in the US and are able to compare elev-
tion, namely geospatial big data (Lee en million listings all over the country
,BOH
NBLJOH JU QPTTJCMF UP with government declared fair market
map and observe geospatial patterns of SFOUT JO NFUSPQPMJUBO BSFBT "1*T
urban phenomenon. While such stud- BQQMJDBUJPO QSPHSBNNJOH JOUFSGBDFT
ies often bring together geographers provided by many online platforms
and computer scientists based on in- such as Google, Instagram, Facebook,
UFSFTUBOEUFDIOJDBMTLJMMTSFRVJSFEVS- Flickr, Foursquare or Twitter make it
ban planners, designers and architects possible to query and download user
also see value in this data which can generated content, which becomes in-
provide precious insight on the func- teresting for urban researchers when
tioning of cities and support urban the data is accessed together with geo-
decisions in order to improve the qual- MPDBUJPO JOGPSNBUJPO "NPOH TFWFSBM
ity of the urban environment (Glaeser, recent studies that utilize social net-
%VLF,PNJOFST
-VDB
/BML
work data to better understand urban
One of the most popular subjects patterns, Cranshaw and colleagues
of study of dynamic urban data is real
EFUFDU BSFBT PG DPODFOUSBU-
estate rental and sales prices, mapping FE VSCBO BDUJWJUZ i-JWFIPPETw
VT-
and geographical analysis of which al- ing Foursquare check-ins and Jiang,
low for understanding location driven "MWFT
3PESJHVFT
'FSSFJSB
1FSFJSB
contributors to property values (Co-
FTUJNBUFMBOEVTFCBTFEPO:B-
IFO $PVHIMJO
(FPHIFHBO
IPPTQPJOUPGJOUFSFTU 10*
EBUBXJUI
8BJOHFS
#PDLTUBFM
8BEEFMM
a higher accuracy than with traditional
#FSSZ
)PDI
.PSF SFDFOUMZ
NFUIPET$IFO
QSPWJEFTBHPPE
studies utilize housing price informa- overview of data collecting methods
tion from public real-estate websites from social media platforms including
#FODZ
3BMMBQBMMJ
(BOUJ
4SJWBUTB
XFCTDSBQJOHBOEUIFVTFPG"1*TUIBU
#PFJOH 8BEEFMM
GB- were also utilized in our study.
cilitating the use of very large and up "MM HFPTQBUJBM EBUB JT BMSFBEZ CJH
to date datasets, which allows for the EBUB -FF,BOH
BOEWFMPDJUZ
development of more accurate hous- WBSJFUZ BOE WPMVNF BSF UIF 7T DPO-
ing price prediction models. One such sidered to define big data. While col-
study finds correlations of housing pric- lecting, managing and analyzing such
es with geo-tagged social media con- rapidly generated large scales of data
UFOU -J
:F
-FF
(POH
2JO
already pose a significant challenge,
which is another type of data that has visualizing and communicating it is ut-
been subject to urban analysis research most critical in deriving meaning and
as an indicator of urban activity (Jen- value from geospatial data in order for
kins, Croitoru, Crooks, & Stefanidis, cities to learn from it. There are several
4DISFDL,FJN
8BMLJOH
tools from geographical information
running and other forms of exercise TZTUFNT (*4
CVJMEJOH JOGPSNBUJPO
data tracked and shared by urban res- models to virtual simulation environ-
idents have also been studied to better ments that make available various visu-
understand qualities of preferred loca- BMJ[BUJPOUFDIOJRVFT 1FUUJUFUBM
tions and routes for recreational activ- for geospatial data.
JUZJOUIFVSCBOFOWJSPONFOU #BMBCBO " QSPKFDU UIBU EFNPOTUSBUFT UIF
5VODFS
$MBSLF4UFFMF
power of visualization of geospatial
There are various methods and tech- EBUB
i.JMMJPO %PMMBS #MPDLTw ,VS-
niques which allow for management of HBO
NBQT UIF IPNF BEESFTTFT
big data, yet means of its collection and of criminals as they are admitted into
visualization is the main subject of our prison as well as the length of their stay
JOUFSFTU JO UIJT SFTFBSDI "T B NFUIPE JOĕWFPG64TMBSHFTUDJUJFTBOESFWFBMT
Web scraping and mapping urban data to support urban design decisions
*56"];t7PM/Pt.BSDIt&&OTBSŔ
#,PCBʰ
Web scraping and mapping urban data to support urban design decisions
Figure 4. Number of bedrooms for each house listing (rental and for sale).
listing is scaled proportionately with from both rental and sales listings and
its price divided by sqms, after the out- NBQQFEXJUIDPMPST 'JHVSFT
liers were excluded from the dataset. 1BZCBDL WBMVFT XFSF DBMDVMBUFE QFS
The number of rooms per apartment N CZ N BSFB CBTFE PO UIF BW-
and building age data were combined erage apartment rental and sales prices
*56"];t7PM/Pt.BSDIt&&OTBSŔ
#,PCBʰ
11
Figure 5. Ages of the buildings of the listings (rental and for sale).
Web scraping and mapping urban data to support urban design decisions
12
*56"];t7PM/Pt.BSDIt&&OTBSŔ
#,PCBʰ
Web scraping and mapping urban data to support urban design decisions
*56"];t7PM/Pt.BSDIt&&OTBSŔ
#,PCBʰ
Web scraping and mapping urban data to support urban design decisions
16
*56"];t7PM/Pt.BSDIt&&OTBSŔ
#,PCBʰ
17
Web scraping and mapping urban data to support urban design decisions
information for urban design. Some the liveliness and feeling of safety on
of our observations at this stage of our a street, contributing to the “Eyes on
research are presented below. We also the street,” a concept first proposed by
present ways in which the visual in- +BOF +BDPCT
ćF HFPHSBQIJDBM
formation revealed in these maps can distribution of commercial amenities
support urban planning and design and social media activity can inform
decisions following each of our obser- urban decisions of zoning and invest-
vations. ment in new facilities or services by re-
There is a concentration of apart- vealing if the existing facilities are suf-
ments for sale and higher sales prices ficient or attractive to urban dwellers
along the coast and a concentration and how commercially active a street is
PGSFOUBMTPGBOEVOJUTPOUIF UISPVHIPVUUIFEBZ"SFBTUIBUBSFMBDL-
XFTUPG4ÚʓàUMà±FʰNF 'JHVSFT
ing in such amenities can be prioritized
ćJT JT JOGPSNBUJWF BCPVU UIF EF- for making changes in planning to en-
mographics of residents around these courage a more vibrant street life and
neighborhoods and can guide the mu- commercial activity on their streets.
nicipality in deciding on the kind of Our public transportation and ac-
investments they should make, may DFTT NBQT 'JHVSFT
WJTVBM-
they be introducing co-working spac- ly present one of the most common
es, libraries, professional education indicators of walkability for urban
facilities, kindergartens or retirement OFJHICPSIPPET"MUIPVHIUIFEJTUBODF
homes. Our Instagram hashtags and from each building was not measured
TPę ESJOL QSJDFT NBQT 'JHVSFT through the street network but simply
TJNJMBSMZ DPOUSJCVUF JO SFWFBMJOH calculated based on a radius, the map
the demographics of the local residents showing access to public transpor-
and visitors, as well as providing a tation seats provides an idea of how
glimpse of their consumer trends. democratically distributed the trans-
Urban transformation which is portation facilities are within the mu-
encouraged by the policy to renew OJDJQBMCPVOEBSJFTPG,BELÚZ"DDFTT
non-earthquake resistant housing to greenspace that we created a map
stock of Istanbul is made apparent in GPS 'JHVSF
JTBTJNJMBSMZTJHOJĕDBOU
our maps displaying the ages of build- contributor to walkability and quality
JOHT 'JHVSF
TIPXJOHUIFHFPHSBQI- of life. Our exercise routes map (Figure
ic distribution of recent construction.
HJWFT BO JEFB PG XIJDI SFTJEFOUJBM
The rapid transformation within the locations have better access to recre-
Caddebostan area in the last couple of BUJPOBM BDUJWJUJFT XJUIJO UIF DJUZ #F-
years is revealed which may be obvious sides distance, we argue that this map
to the neighborhood residents but no may reveal additional contributors to
known maps exist with updated infor- the exercise behavior in the city such as
mation available to public. The local the pleasantness, comfort and safety of
authorities can utilize this information these routes, as well as their accessibili-
not only for understanding areas where ty from different neighborhoods.
transformation is slow, but also where .VOJDJQBMJUJFT BOE DFOUSBM BVUIPS-
it is rapid, and they can work with ities can manage the allocation of
urban designers to take precautions their resources based on information
against disadvantageous consequences regarding the distribution of public
of such a high density of construction services and facilities easily observ-
sites and the rapid change of the built able through our maps demonstrating
environment for the local residents. access to public transport, green ar-
Together with the cafes & Insta- eas and routes used for sports activity.
HSBN MJLFT NBQ 'JHVSF
PVS SFTJ- Since residential neighborhoods less
dential-nonresidential buildings map fortunate in terms of access to public
SFWFBM 'JHVSF
UIF NBJO DPNNFS- transportation can quickly be identi-
cial axes and the mixed-use pattern in fied in these maps, they can encourage
Kadıköy which is known to be an in- local residents to demand an increase
dicator influencing the walkability in of routes and more frequent services
a neighborhood. The operating hours GSPN USBOTQPSUBUJPO BVUIPSJUJFT "D-
PG SFTUBVSBOUT 'JHVSF
JOĘVFODF cess to green areas and the shoreline
*56"];t7PM/Pt.BSDIt&&OTBSŔ
#,PCBʰ
which is obviously attractive for ex- es, and at this stage of the study, the
ercisers can be enhanced by new bike datasets were created using the most
lanes and improved street conditions widely used website and/or application
for better walkability especially in areas in their respective category. While this
that link the northern parts of Kadıköy approach gave us a strong initial start,
to these areas. additional sources may be added to the
We present these observations datasets in further research. In such a
and suggestions for urban designers case, a mutual database format should
through the case study of Kadıköy, be created as the type of data retrieved
however, we aim this study and these from different sources will also begin
maps to constitute a model for reveal- to differ.
ing and communicating information Final discussion marks include
that can guide urban decisions in im- DPQZSJHIU JTTVFT BT BMNPTU BMM PG UIF
proving the street life, commercial websites used in these studies limit re-
activity, local development and life production and re-distribution of the
quality of urban residents through data for commercial use in their legal
better allocation and more democratic VTFS BHSFFNFOUT #PFJOH BOE 8BEEFMM
geographic distribution of resources.
OPUFBDBTFXIFSFBGFEFSBM64
Without doubt, each urban area from court decided that web scraping pub-
each country is unique in many aspects licly available data did not constitute
that also contribute to the urban qual- a copyright violation. Even though
ities that we propose can be improved non-commercial research does not fall
through our observations, thus all data under legal restrictions, only a handful
should be mapped and interpreted with of the websites and applications make
various such differences in mind to be the data collecting process openly
relevant in supporting urban decisions. available to developers and researches.
This means that the sources without
5. Discussion developer access may easily limit or re-
We do not present any geostatistical strict our access in the future by mak-
analysis of urban data in this paper, but ing changes to their web structures.
rather propose this research as a pre-
liminary step to be followed by rigor- 6. Conclusion
ous spatial analyses in studies to follow. /PXBEBZT
B TJHOJĕDBOU QPSUJPO PG
/FWFSUIFMFTT
XFQSPQPTFUIFNFUIPE- public and commercial activity takes
ology we present for urban data collec- place on online social networks. We
tion and visualization as a guide for de- swim in an enormous pool of data
signers to better understand and utilize streaming all around us. The tweets,
means to gather and process data rele- selfies, check-ins, likes, ignores, pokes,
vant for urban design decision support. snaps, listings, stars, user comments
We acknowledge that there may be constantly keep updating and adding
possible errors in information provid- UP UIJT QPPM .PTUMZ DPOTJTUJOH PG VO-
ed by the listing owners regarding the organized, unfiltered and uncatego-
SFBMFTUBUF EBUB XF VUJMJ[F 7PMVOUBSJ- SJ[FESBXEBUBUIJTQPPMJTUIFVMUJNBUF
ly generated data may also cause in- collective subconscious of our society.
accuracies due to the free data-entry It has the information on what people
format, as in the case for social media MJLFEPJOHXIFSF
XIFO
IPXIPXUIFZ
entries. The social media users tend to USBWFMIPXUIFZTQFOEUIFJSUJNFIPX
create their own use of language, ab- UIFZ TQFBL UP FBDI PUIFS XIBU UIFJS
breviations, and personal methods of TIBSF PG FDPOPNZ JT IPX UIFZ UBLF
VTJOHIBTIUBHT 4DISFDL,FJN
what was designed and how they are
or simply make misspellings. Further- transformed by that. This is a valuable
more, Turkish is an agglutinative lan- source of information for anyone inter-
guage and multiple suffixes make it un- ested in exploring the geospatial past,
favorable for text based analytics. current time and future in great detail.
"EEJUJPOBMMZ
UIF TPVSDFT XF IBWF "MUIPVHI VSCBO EFTJHOFST BOE BV-
used for this research are limited: thorities already make use of the data
There are several websites and appli- available to them, traditional data col-
cations dedicated for similar purpos- lecting methods tend to be more static
Web scraping and mapping urban data to support urban design decisions
*56"];t7PM/Pt.BSDIt&&OTBSŔ
#,PCBʰ
21
Web scraping and mapping urban data to support urban design decisions