Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Big Data-Driven

Urban PlanningSustainable
Management
and
Ivy Manalang
Vaishnavi Panga
Sarathi Prabu Mohan
David Gomez Camargo
Application
Domain
This domain
involves land use, transportation systems,
energy consumption, environmental impact,
As cities continue waste management, and more.
to grow in population and complexity, effective
planning is essential. Big data
plays a key role in tackling
these challenges by
providing technologies.
By 2050, the
UN
projects that around 2/3 of the global
population will reside in urban areas.
velocity
four V’s of Real-time data is crucial for making immediate
Big Data decisions and responding to changing conditions
in urban planning.
volume
To illustrate the
magnitude, consider that
New York City’s MTA
(2019) generates over 1
terabyte of data each day,
encompassing
information on subway
ridership, bus
movements, and station
foot traffic.
four V’s of
Big Data

Variety
• Structured data often includes census data, traffic
records, and GIS (Geographic Information System)
datasets.

• Semi-structured data commonly involves sensor data


from IoT (Internet of Things) devices.

• Urban planners also exploit unstructured data from


sources like news articles, satellite imagery, and
surveillance cameras.
four V’s of
Big Data

Veracity
The veracity of data is critical, some issues include:
• Incomplete datasets that can hinder the
accuracy of urban planning models.
• Data collected from diverse sources
can carry inherent biases.
• The quality of data can vary
significantly, even within the same
data set.
• Combining data from diverse sources
can also produce integration
challenges.
• Data Collection​.
challenges in • Data Formats​.
big data​ • Volume of Data​.
• Data Quality and Accuracy.
Challenges in
Big Data

data collection​
• Diverse data sources: Sensors (IoT devices,
CCTV), satellite imagery, drones, social
media, feedback, and more​.
• Need for constant data collection​.
• Digital divide - Data collection in developing
countries​.
• Lack of real-time data hindering decision-
making.
Challenges in
Big Data

data formats​
Modality gap challenges in handling different data structures,
formats, and qualities.​
Remote Sensing (RS)​.
• Spectral, textural, temporal and spatial features.​
• Spatial distributions and relationships with the surrounding
environments.​
• Machine-generated unstructured data from satellites, drones.
Usually a grid, in the form of a raster.​
Geospatial Big Data (GBD)​.
• Human-generated which can be semi-structured or
unstructured data​.
• Data from fixed and mobile sensors such as environmental
sensors, cameras, webcams, social media. ​
• Has various formats which includes image, geo-tagged text,
video, and vector​.
• Reflects human behavior.
Challenges in
Big Data

volume and velocity​


• Landsat 8 and 9 each make a complete orbit
every 99 minutes, completes about 14 full orbits
each day, and crosses every point on Earth once
every 16 days. Between these two satellites,
approximately 1,500 scenes are added to the
USGS archive each day. ​
• Social media and sensor-generated GBD.​
• Managing the massive influx of data.​
• In X, 500 million tweets every day. ​
• Meta 30 billion posts every day​.
• Raw, unprocessed satellite images are hundreds
of megabytes or even gigabytes per image.
existing
solutions​

• Hadoop Distributed File System (HDFS) and


MapReduce​.
• HDFS: Java-based, high-throughput
distributed file system​.
• Redundant data storage and task division
for resilience​
• MapReduce: core of Hadoop, Yet Another
Resource Navigator (YARN)-based for
parallel processing​
• Apache Spark: A fast and general engine for
large-scale data processing implemented in
Scala.
proposed
solutions​

• Amazon Elastic MapReduce (EMR) – a managed service that makes it easy to


run petabyte-scale data analytics in the cloud.​
• EMR automatically installs and configures open-source frameworks such as
Apache Spark, Hive, and Presto​.
• Cloud based data handling for efficiency - Amazon S3 is a versatile, secure, and
highly available storage service that is widely used​.
• Streamlining Data ingestion from multiple sources.
• GBD data from Open Street Maps can possibly reside in
a Postgres database in Amazon Aurora. Another GBD
data can be latitude and longitude coordinates in CSV
format stored in an S3 bucket. A Spark application that
uses GeoSpark geospatial library from Apache can then
read data from Aurora and S3, do a Spatial Join, and
store the result as CSV in a target S3 bucket. The Spark
application will run on an EMR cluster that runs Hadoop
and Spark.
proposed
solutions​

• Utilizing AWS Open Data for satellite


images from Sentinel-2 and Landsat, as
well as vetted GBDs​.
• Building, training, and deploying ML
models.​
• Using pre-trained models to save time
and resources.
• Amazon SageMaker and Rekognition.​
• Supervised ML for satellite image
labeling with pretrained models.​
• Surveillance tool for classification like
detecting oil well pads.​
• Enhancing model performance with F1
score, average precision, and recall.

You might also like