Professional Documents
Culture Documents
L01 Introduction
L01 Introduction
L01 Introduction
1
Objective…
How to manage very large amounts of data and extract value and
knowledge from them
2
OUTLINE:
TYPES OF DIGITAL DATA
INTRODUCTION TO BIG DATA
BIG DATA ANALYTICS
3
OUTLINE:
TYPES OF DIGITAL DATA
INTRODUCTION TO BIG DATA
BIG DATA ANALYTICS
4
Classification of Digital Data
Databases such as
Oracle, DB2,
Teradata, MySql,
PostgreSQL, etc
OLTP Systems
Ease with Structured Data
Input / Update /
Delete
Security
Scalability
Transaction
Processing
Semi-structured
Data
Sources of Semi-structured Data
Inconsistent Structure
Self-describing
(lable/value pairs)
Semi-structured data
Often Schema information is
blended with data values
Images
Free-Form
Text
Audios
Unstructured data
Videos
Body of
Email
Text
Messages
Chats
Social
Media data
Word
Document
Issues with terminology – Unstructured Data
Data Mining
24
What is big data?
• “Everyday, we create 2.5 quintillion bytes of data — so
much that 90% of the data in the world today has been
created in the last two years alone.
What is big data?
• “Every day, we create 2.5 quintillion bytes of data — so much
that 90% of the data in the world today has been created in
the last two years alone.
29
Definition of Big Data
Big Data Definition
High-volume
Big Data is high-volume,
High-velocity high-velocity, and high-
High-variety variety information
assets that demand cost
effective, innovative
forms of information
Cost-effective, innovative processing for enhanced
forms of information insight and decision
processing making.
Source: Gartner
Enhanced insight & IT Glossary
decision making
Big data spans three dimensions: Volume,
Velocity and Variety
Velocity
Variety Often time-sensitive,
Structured and data must be
unstructured data: analyzed as it’s
clinical notes, audio streaming in to
transcription, maximize its value to
imaging, click streams patient care (e.g.
patient monitoring)
Volume in petabytes
Electronic medical records, images, digital
pathology, email, web communications
Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
– 44x increase from 2009 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Exponential increase in
collected/generated data
34
Big Data Characteristics
How big is the Big Data?
- What is big today maybe not big tomorrow
39
Variety
41
Velocity
• Volatility
• Variability
Sources of Big
Data
Sources of Big Data
Who’s Generating Big Data
Mobile devices
(tracking all objects all the time)
46
Challenges with Big Data
Challenges with Big Data
Capture
Storage
Curation
Analysis
Transfer
Visualization
Privacy
Violations
Challenges in Handling Big Data
49
Why Big Data?
Why Big Data?
More Data
Reporting /
ERP
Dashboarding
CRM OLAP
Hadoop
Operational
Systems
Images and Videos
Data Warehouse
Social Media
(Twitter, Facebook, etc.)
MapReduce
Data Marts
Hadoop
Operational
Systems
Images and Videos
Data Warehouse
Data Warehouse
Social Media
(Twitter, Facebook, etc.)
MapReduce
Data Marts