Download as pdf or txt
Download as pdf or txt
You are on page 1of 15


Urban Planning and

Ivy Manalang
Vaishnavi Panga
Sarathi Prabu Mohan
David Gomez Camargo
This domain
involves land use, transportation systems,
energy consumption, environmental
As cities continue impact, waste management, and more.
to grow in population and complexity,
effective planning is essential. Big data
plays a key role in tackling
these challenges by
providing technologies.
By 2050, the UN
projects that around 2/3 of the
global population will reside in
urban areas.
four V’s of
Real-time data is crucial for making immediate
Big Data
decisions and responding to changing
conditions in urban planning.
To illustrate the
magnitude, consider
that New York City’s
MTA (2019) generates
over 1 terabyte of data
each day, encompassing
information on subway
ridership, bus
movements, and station
foot traffic.
four V’s of
Big Data

• Structured data often includes census data,
traffic records, and GIS (Geographic Information
System) datasets.

• Semi-structured data commonly involves sensor

data from IoT (Internet of Things) devices.

• Urban planners also exploit unstructured data

from sources like news articles, satellite imagery,
and surveillance cameras.
four V’s of
Big Data

The veracity of data is critical, some issues include:
• Incomplete datasets that can hinder
the accuracy of urban planning
• Data collected from diverse sources
can carry inherent biases.
• The quality of data can vary
significantly, even within the same
data set.
• Combining data from diverse
sources can also produce
integration challenges.
• Data Collection.
challenges in
• Data Formats.
big data • Volume of Data.
• Data Quality and Accuracy.
Challenges in
Big Data

data collection
• Diverse data sources: Sensors (IoT
devices, CCTV), satellite imagery, drones,
social media, feedback, and more.
• Need for constant data collection.
• Digital divide - Data collection in
developing countries.
• Lack of real-time data hindering decision-
Challenges in
Big Data

data formats
Modality gap challenges in handling different data
structures, formats, and qualities.
Remote Sensing (RS).
• Spectral, textural, temporal and spatial features.
• Spatial distributions and relationships with the
surrounding environments.
• Machine-generated unstructured data from satellites,
drones. Usually a grid, in the form of a raster.
Geospatial Big Data (GBD).
• Human-generated which can be semi-structured or
unstructured data.
• Data from fixed and mobile sensors such as
environmental sensors, cameras, webcams, social media.
• Has various formats which includes image, geo-tagged
text, video, and vector.
• Reflects human behavior.
Challenges in
Big Data

volume and velocity

• Landsat 8 and 9 each make a complete orbit
every 99 minutes, completes about 14 full
orbits each day, and crosses every point on
Earth once every 16 days. Between these
two satellites, approximately 1,500 scenes
are added to the USGS archive each day.
• Social media and sensor-generated GBD.
• Managing the massive influx of data.
• In X, 500 million tweets every day.
• Meta 30 billion posts every day.
• Raw, unprocessed satellite images are
hundreds of megabytes or even gigabytes
per image.

• Hadoop Distributed File System (HDFS) and

• HDFS: Java-based, high-throughput
distributed file system.
• Redundant data storage and task
division for resilience
• MapReduce: core of Hadoop, Yet
Another Resource Navigator (YARN)-
based for parallel processing
• Apache Spark: A fast and general engine for
large-scale data processing implemented in

• Amazon Elastic MapReduce (EMR) – a managed service that makes it easy to

run petabyte-scale data analytics in the cloud.
• EMR automatically installs and configures open-source frameworks such as
Apache Spark, Hive, and Presto.
• Cloud based data handling for efficiency - Amazon S3 is a versatile, secure,
and highly available storage service that is widely used.
• Streamlining Data ingestion from multiple sources.
• GBD data from Open Street Maps can possibly
reside in a Postgres database in Amazon Aurora.
Another GBD data can be latitude and longitude
coordinates in CSV format stored in an S3 bucket. A
Spark application that uses GeoSpark geospatial
library from Apache can then read data from Aurora
and S3, do a Spatial Join, and store the result as CSV
in a target S3 bucket. The Spark application will run
on an EMR cluster that runs Hadoop and Spark.

• Utilizing AWS Open Data for satellite

images from Sentinel-2 and Landsat,
as well as vetted GBDs.
• Building, training, and deploying ML
• Using pre-trained models to save time
and resources.
• Amazon SageMaker and Rekognition.
• Supervised ML for satellite image
labeling with pretrained models.
• Surveillance tool for classification like
detecting oil well pads.
• Enhancing model performance with
F1 score, average precision, and recall.

You might also like