Professional Documents
Culture Documents
Chapter 1
Chapter 1
Chapter 1
INTRODUCTION TO
BIG DATA ANALYTICS
Dr G Sudha Sadasivam
• Introduction – Big data & Characteristics, Cloud
Computing, Virtualisation, Data Center
Architecture
• Data Analytics Life Cycle & Hadoop , Map
Reduce
• No SQL Stores - Key-Value Stores – Columnar
Stores – Document Stores - Graph Databases
• Theory & Methods – Clustering, Classification,
Regression- Stream Analytics - Spark
CA
• Test 1 – Unit 1 to 2;
• Test 2 – Unit 3,4
• Assignment Presentation
– Hadoop
– MongoDB
– Neo4j
– Storm / Spark
• Tutorial 1 & 2
– Analysis questions
Agenda
• Current generation data
• Data Science and related fields
• Big Data Characteristics
• Traditional vs Big Data Systems
• Composition of Data Science Team
• Architecture of Big data System with usecase
• Big Data Usecases
Introduction
• Social Media, Machine logs and Transaction data that includes
weblogs, text, videos, images and sensors create large volume
of heterogeneous real time data.
• Organization, management, analysis, dissemination, and
knowledge discovery from this priceless data aids decision
making in business
• This requires large-scale computing infrastructure and
sophisticated systems for storage
• Data is the raw material of the current industrial revolution 4.0.
Sales of Cakes
Jan Feb Mar Apr May Nov Dec
Data – Numbers
Information – More sales in Jan, Nov, Dec
Knowledge – connect with a community Festival
Action – Advertise to the community
V
a
l
u
e
Velocity
Challenges & 3 - Vs
Real Time
Massive and growing amounts of information
residing internal and external to the
Near Real
Time organization…. Volume (Terabytes -> Zettabytes)
tion
Unconventional semi structured or
ma
uto
Periodic
unstructured (diverse) including web pages, log
et a
ar k
Batch
files, social media, click-streams, instant
a, M
ed i
di o i de
eb
Salary
from active and passive systems, etc.
Au , V
i
Soc
System
oto
Structured
Variety (Structured -> Semi-structured ->
rs,
Ph
so
MB
Sen
GB
Semi- Unstructured)
le,
TB
bi
structured
Changing information
Mo
PB unstructured
Velocity (Batch -> Streaming Data)
Variety Volume
Database vs DataScience:
– past vs future querying
– ACID vs CAP
– SQL va NoSQL (schema less)
Data miners sort through huge datasets using sophisticated
software to identify undiscovered patterns and establish
hidden relationships.
Data analytics focuses on inference, the process of deriving a
conclusion based solely on what is already known by the
researcher
• Data analytics aims at using tools and techniques to discover
knowledge from hidden patterns and to take effective actions
for prediction
– Descriptive analytics - Analysis of data leads to describing
patterns
– Diagnostic analytics - the knowledge discovery helps to
understand the reason behind the occurrence of patterns
– Predictive analytics - the knowledge discovered is used to
forecast future trends
– Prescriptive analytics - the knowledge discovered can be
used to suggest actions to be taken in future
– Cognitive Analytics - the knowledge discovered causes
something to happen
2. Diagnostic Analytics –
Jan Feb Mar Apr May Nov Dec Why products are sold?
23 120 100 130 140 180 250
0
Complexity of different types of Analytics