Big Data-Driven Sustainable Urban Planning and Management - Slides

Sustainable
Urban Planning and

Management
Ivy Manalang
Vaishnavi Panga
Sarathi Prabu Mohan
David Gomez Camargo
Application
Domain
This domain
involves land use, transportation systems,
energy consumption, environmental
As cities continue impact, waste management, and more.
to grow in population and complexity,
effective planning is essential. Big data
plays a key role in tackling
these challenges by
providing technologies.
By 2050, the UN
projects that around 2/3 of the
global population will reside in
urban areas.
velocity
four V’s of
Real-time data is crucial for making immediate
Big Data
decisions and responding to changing
conditions in urban planning.
volume
To illustrate the
magnitude, consider
that New York City’s
MTA (2019) generates
over 1 terabyte of data
each day, encompassing
information on subway
ridership, bus
movements, and station
foot traffic.
four V’s of
Big Data
Variety
• Structured data often includes census data,
traffic records, and GIS (Geographic Information
System) datasets.
• Semi-structured data commonly involves sensor

data from IoT (Internet of Things) devices.
• Urban planners also exploit unstructured data

from sources like news articles, satellite imagery,
and surveillance cameras.
four V’s of
Big Data
Veracity
The veracity of data is critical, some issues include:
• Incomplete datasets that can hinder
the accuracy of urban planning
models.
• Data collected from diverse sources
can carry inherent biases.
• The quality of data can vary
significantly, even within the same
data set.
• Combining data from diverse
sources can also produce
integration challenges.
• Data Collection.
challenges in
• Data Formats.
big data • Volume of Data.
• Data Quality and Accuracy.
Challenges in
Big Data
data collection
• Diverse data sources: Sensors (IoT
devices, CCTV), satellite imagery, drones,
social media, feedback, and more.
• Need for constant data collection.
• Digital divide - Data collection in
developing countries.
• Lack of real-time data hindering decision-
making.
Challenges in
Big Data
data formats
Modality gap challenges in handling different data
structures, formats, and qualities.
Remote Sensing (RS).
• Spectral, textural, temporal and spatial features.
• Spatial distributions and relationships with the
surrounding environments.
• Machine-generated unstructured data from satellites,
drones. Usually a grid, in the form of a raster.
Geospatial Big Data (GBD).
• Human-generated which can be semi-structured or
unstructured data.
• Data from fixed and mobile sensors such as
environmental sensors, cameras, webcams, social media.
• Has various formats which includes image, geo-tagged
text, video, and vector.
• Reflects human behavior.
Challenges in
Big Data
volume and velocity

• Landsat 8 and 9 each make a complete orbit
every 99 minutes, completes about 14 full
orbits each day, and crosses every point on
Earth once every 16 days. Between these
two satellites, approximately 1,500 scenes
are added to the USGS archive each day.
• Social media and sensor-generated GBD.
• Managing the massive influx of data.
• In X, 500 million tweets every day.
• Meta 30 billion posts every day.
• Raw, unprocessed satellite images are
hundreds of megabytes or even gigabytes
per image.
existing
solutions
• Hadoop Distributed File System (HDFS) and

MapReduce.
• HDFS: Java-based, high-throughput
distributed file system.
• Redundant data storage and task
division for resilience
• MapReduce: core of Hadoop, Yet
Another Resource Navigator (YARN)-
based for parallel processing
• Apache Spark: A fast and general engine for
large-scale data processing implemented in
Scala.
proposed
solutions
• Amazon Elastic MapReduce (EMR) – a managed service that makes it easy to

run petabyte-scale data analytics in the cloud.
• EMR automatically installs and configures open-source frameworks such as
Apache Spark, Hive, and Presto.
• Cloud based data handling for efficiency - Amazon S3 is a versatile, secure,
and highly available storage service that is widely used.
• Streamlining Data ingestion from multiple sources.
• GBD data from Open Street Maps can possibly
reside in a Postgres database in Amazon Aurora.
Another GBD data can be latitude and longitude
coordinates in CSV format stored in an S3 bucket. A
Spark application that uses GeoSpark geospatial
library from Apache can then read data from Aurora
and S3, do a Spatial Join, and store the result as CSV
in a target S3 bucket. The Spark application will run
on an EMR cluster that runs Hadoop and Spark.
proposed
solutions
• Utilizing AWS Open Data for satellite

images from Sentinel-2 and Landsat,
as well as vetted GBDs.
• Building, training, and deploying ML
models.
• Using pre-trained models to save time
and resources.
• Amazon SageMaker and Rekognition.
• Supervised ML for satellite image
labeling with pretrained models.
• Surveillance tool for classification like
detecting oil well pads.
• Enhancing model performance with
F1 score, average precision, and recall.

Big Data-Driven Sustainable Urban Planning and Management - Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data-Driven Sustainable Urban Planning and Management - Slides

Uploaded by

Copyright:

Available Formats

Sustainable

Urban Planning and

• Semi-structured data commonly involves sensor

• Urban planners also exploit unstructured data

volume and velocity

• Hadoop Distributed File System (HDFS) and

• Amazon Elastic MapReduce (EMR) – a managed service that makes it easy to

• Utilizing AWS Open Data for satellite

You might also like