Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

UNIT CODE: GGI 4202

UNIT NAME: SPATIAL BUSINESS


INTELLIGENCE

Lecture 04-05
Spatial Big Data
Introduction
 Big Data are “data sets that are so big they cannot be handled efficiently by
common database management systems” (Dasgupta, 2013).
 Big Data have volume of 100 terabytes to petabytes, have structured and
unstructured formats, and have a constant flow of data (Davenport, 2014)
 Spatial Big Data represents Big Data in the form of spatial layers and
attributes.
 There is no standard threshold on minimum size of Big Data or Spatial Big
Data, although big data in 2013 was considered one petabyte (1,000
terabytes) or larger (Dasgupta, 2013).
 Big Data are getting unbelievably large.
 More video is captured daily today than happened in the initial 50 years of
television
 Amount of data available today. More than 2.8 zettabytes (2.8 trillion
gigabytes).
 Spatial big data are spatial data that challenge current computing systems in
terms of management, processing, or analysis.

2
Introduction
 FRAMINGHAM, Mass., November 9, 2015 – The Big Data market continues to
exhibit strong momentum as businesses accelerate their transformation into
data-driven companies.
 This momentum is driving strong growth in big data-related infrastructure,
software, and services.
 A new forecast from International Data Corporation (IDC ) sees the big data
technology and services market growing at a compound annual growth rate
(CAGR) of 23.1% over the 2019 forecast period with annual spending
reaching $48.6 billion in 2019.
 And a new IDC Special Study examines spending on big data solutions in
greater detail across 19 vertical industries and eight big data technologies.
 "The ever-increasing appetite of businesses to embrace emerging big data-
related software and infrastructure technologies while keeping the
implementation costs low has led to the creation of a rich ecosystem of new
and incumbent suppliers,"

3
Spatial Big Enablers
 Technological advances (Lidar and Satellite imagery)
 Volunteered Geographic Information (VGI) (OpenStreetMap)
 GPS
 Geo-enabled Social Media (Twitter, Flickr, ...)
 Inexpensive storage of large volumes of data
 Inexpensive compute power
 Next Generation Analytics
 Moving from off-line to in-line embedded analytics
 Need to explain what happened
 Need to predicting what will happen
 Operating on
 Data at rest – stored someplace
 Data in motion – streaming
 Multiple disparate data sources

4
So, we know that “big data” is BIG…

5
Sources of Spatial Big Data
 Sources of Spatial Big Data include:
 GPS, including
 GPS-enabled devices
 Satellite remote sensing
 Aerial surveying
 Radar
 Lidar
 Sensor networks
 Digital cameras
 Location of readings of RFID
 Mobile devices
 Internet of things
 satellites,
 Dones,
 Vehicles,
 Geosocial networking services,
 A significant portion of big data is in fact spatial big data

6
Where is this Big Data coming from?
 It’s from the Mobile Planet and Internet of Everything…

We’re About Here

7
Where is this Big Data coming from?
 It’s User-Generated Content…

8
Where is this Big Data coming from?
 It’s Sensor Data…

9
Where is this Big Data coming from?
 It’s all these “Smart” “Things”…

10
Spatial Big Data vs Traditional Datasets
Traditional
Data characteristic Big Data analytics
Type of data Unstructured Formatted in
Formats columns and rows
Volume of data 100 terabytes to 10s of terabytes or
petabytes less
Flow of data Continual flow Static pool of data
Analytical Machine learning Hypothesis-based
methods
Primary purpose Data-based Internal decision
products support and
services

➢ Traditional datasets could be quite large, but they were traditionally formatted
in spreadsheets or data-bases, tended to be static, and were designed to prove
hypotheses.
➢ Big Data has the 5 Vs and can use machine learning, which pushes out
solutions by seeing what works in big datasets.
➢ The statistical term is exploratory.

11
Five V’s of Spatial Big Data
 Volume
 Satellite imagery covers the globe so is vast.
 Sensors are expanding worldwide at a rapid rate.
 Digital cameras have reached several billion through spatially-reference cell
phones.
 One estimate indicates that 2.5 quintillion bytes are generated daily
worldwide. (www.ibm.com). 2.5 with 18 zeros.
 Variety
 The form of data is based on 2-D or 3-D points configured as vector or raster
imagery. This is entirely different than conventional big data which is
alphanumeric or pixel-based (similar to raster but not vector)

12
Five V’s of Spatial Big Data
 Velocity
 Velocity is very fast since imagery travels at speed of light.

13
Five V’s of Spatial Big Data…
 Veracity
 Data veracity is the degree to which data is accurate, precise and trusted.
Data is often viewed as certain and reliable.
 The reality of problem spaces, data sets and operational environments is that
data is often uncertain, imprecise and difficult to trust. The following are
illustrative examples of data veracity
 Attribute veracity
 For attribute (non-spatial) data, do the data meet data quality tests?
 Spatial veracity
 For vector data (imagery based on points, lines, and polygons), the quality varies. It
depends on whether the points have been GPS determined, or determined by unknown
origins or manually. Also, resolution and projection issues can alter veracity.
 For geocoded points, there may be errors in the address tables and in the point location
algorithms associated with addresses
 For raster data (imagery based on pixels), veracity depends on accuracy of recording
instruments in satellites or aerial devices, and on timeliness.

14
Five V’s of Spatial Big Data…

15
Big Data Analytic Techniques
 Big data analytics examines large amounts of data to uncover hidden
patterns, correlations and other insights.
 With today’s technology, it’s possible to analyze your data and get answers
from it almost immediately – an effort that’s slower and less efficient with
more traditional business intelligence solutions
 Big data analytics helps organizations harness their data and use it to identify
new opportunities. That, in turn, leads to smarter business moves, more
efficient operations, higher profits and happier customers.
 Big data brings about the following advantages:
 Cost reduction.
 Big data technologies such as Hadoop and cloud-based analytics bring significant cost
advantages when it comes to storing large amounts of data – plus they can identify
more efficient ways of doing business.
 Faster, better decision making.
 With the speed of Hadoop and in-memory analytics, combined with the ability to
analyze new sources of data, businesses are able to analyze information immediately –
and make decisions based on what they’ve learned.
 New products and services.
 With the ability to gauge customer needs and satisfaction through analytics comes the
power to give customers what they want.
16
Traditional Big Data Analytic Techniques
 What is enabling them?
 Classification
 Clustering
 Regression
 Simulation
 Anomaly Detection
 Numerical Forecasting
 Optimization
 Geographic Mapping

Limitations. For Big Data, they often


cannot handle well the 3 V’s of volume,
velocity, and variety
They tend to work best with “Small Data” 17
Non-traditional” Big Data Analytic Techniques
 Ensemble methods
 Combine multiple models, e.g. linear regression, decision tree, neural
network, spatial autocorrelation work together to yield one answer.
 Commodity models
 Apply complex models to address only the high-value data.
 For most of the data, use simple, less resource-intensive model(s)
 Modern Data Visualization
 Multiple graphs and charts linked to the same underlying Big Data, and
displayed in Dashboards, including maps
 Space-Time slider visualiizations, showing locational changes in a movie-like
sequence.
 3-D Displays. 3-D Mapping
 Text Analysis (Content Analysis)
 Appropriate for unstructured text. Opens up social media, call center
conversations, etc. for powerful analytics.
 Parse the text and use the components to extract meaning, valence, and
feelings. 18
Non-traditional” Big Data Analytic Techniques
 Spatial Analysis
 Spatial sampling, auto-correlation, continuous contours (ocean, air), etc.
 Analytic Point Solutions
 Software to solve very specific Big Data, Analytics problems. (e.g. Esri’s
ArcLogistics.
 Virtual Reality
 Google VR
 Can include fictional or actual geographic mapping
 Machine Learning
 AI-based programs that can learn without having been specifically pre-
programmed them for the application.
 “Intelligent” Robotics is one type
 Neural networks verges on ML, but they are often restricted to learning in
specialized ways

19
Big Data Analytic Platforms
 What is enabling them?
 Lower Cost
 Greater Storage (HD and RAM)
 Faster Input / Output Operations
 Faster Processing
 Increased Bandwidth
 Cloud / Distributed Computing
 New Data Management Tools (Hadoop, etc.)
 New Technologies (Spark, etc.)
 Ease-of-Use (Browser-based, etc.)

20
Techniques for Handling Big Data
 Spatial data distribution
 Large datasets are split into smaller datasets and distributed across a collection of machines
 Often, the data in a distribution will be ordered from smallest to largest, and graphs and
charts allow you to easily see both the values and the frequency with which they appear.
 Parallel processing
 A mode of operation in which a process is split into parts, which are executed
simultaneously on different processors attached to the same computer.
 Using a collection of machines to process the smaller datasets, combining the partial results
together.
 Fault tolerance
 Is the property that enables a system to continue operating properly in the event of the
failure of (or one or more faults within) some of its components.
 Making copies of the partitioned data to ensure that if a machine fails, the dataset can still
be processed
 Commodity hardware
 Using standard hardware that is not dependent upon exotic architectures, topologies, or
data storage (e.g., RAID)
 Scalability
 Algorithms and frameworks that can be easily scaled to run on larger collections of
machines in order to address larger datasets
21
Challenges of Spatial Big Data
 Retaining computational efficiency: Computational Systems Desktop PCs
often cannot handle large volumes of data or data with rapid velocity
 Availability of data vs availability of spatial technologies to manage, analyze
and disseminate the results.
 Storing Spatial Big Data into the cloud
 Applying new data when Spatial Big Data or change old data => repartitioning
is needed.
 Security and integration concerns
 Spatial Big data is considered as structured and unstructured datasets
with massive data volumes that cannot be easily captured, stored,
manipulated, analyzed, managed and presented by traditional hardware
 Algorithms and Methods: 3 V's challenge traditional algorithms and methods
to help make senseof all the data.
 Database design to handle variety of data as well as volume (storage) and
velocity (reading/writing speed).
 Geovisualization of all this variety data quickly is challenging.
 Network limitations to transfer data with large volume or rapid velocity
22
Summary; Spatial Big Data, and Analytics
 Big Data refers to huge data-sets that overflow ordinary data management
systems.
 The 5 V’s define big data including Volume, Variety, Velocity, Veracity, and
Value.
 Spatial Big Data is Big Data that is spatially referenced, so in addition to
common analytics techniques, mapping and spatial analytics can be applied.
 Ordinary, small-data approaches will not work, because most of the
traditional techniques cannot perform exploration of massive data sets.
 Big Data methods allow multidimensional screening and “data mining” to
locate parts of the mass that are showing interesting relationships, trends, or
comparisons.
 Those interesting parts of a Big Data Set can be sorted into small data-sets
that can have the more powerful traditional analysis methods applied to
them.
 Success need to be studied from a management and organizational
standpoint to understand what works managerially and results in profits and
other benefits.

23

You might also like