Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 9

Big Data Analytics

Presented by ,
Tanushree soni
Himanshu wagh
Sanskruti chandorkar
What is Big Data?
“Big Data” exceeds the capacity of traditional analytics and information management
paradigms across what is known as the 4 V’s: Volume, Variety, Velocity, and Veracity

Veracity Velocity Variety Volume

Uncertainty of Data Analysis of Streaming Different Forms of Data Scale of Data


Data

With exponential The speed at which Represents the Reflects the size of a
increases of data from data is generated and diversity of the data. data set. New
unfiltered and used. New data is Data sets will vary by information is
constantly flowing being created every type (e.g. social generated daily and
data sources, data second and in some networking, media, in some cases hourly,
quality often suffers cases it may need to text) and they will creating data sets that
and new methods be analyzed just as vary how well they are measured in
must find ways to quickly are structured terabytes and
“sift” through junk to petabytes
find meaning

PwC 2
The Promise of Big Data
Even more important than its definition is what Big Data promises to achieve:
intelligence in the moment.
Traditional Techniques &
Big Data Differentiators
Issues
• Does not account for biases,
Veracity

• Data is stored, and mined meaningful to the problem


noise and abnormality in data being analyzed
• Keeps data clean and processes to keep ‘dirty data’
from accumulating in your systems

In real-time:
Velocity

• No real time analysis


• Dynamically analyze data
• Consistently integrate new information
• Auto deletes unwanted to ensure optimal storage

• Compatibility issues • Frameworks accommodate varying data types and


Variety

• Advanced analytics struggle with data models


non-numerical data • Insightful analysis with very few parameters

• Analysis is limited to small data • Scalable for huge amounts of multi-sourced data
Volume

sets
• Facilitation of massively parallel processing
• Analyzing large data sets = High
• Low-cost data storage
Costs & High Memory

PwC 3
Types of Big Data
Variety is the most unique aspect of Big Data. New technologies and new types of data
have driven much of the evolution around Big Data.

Twitter, Linkedin, Facebook, Tumblr, Blog,


Images, videos, audio, Flash, live
SlideShare, YouTube, Google+, Instagram,
streams, podcasts, etc.
Social Flickr, Pinterest, Vimeo, WordPress, IM, RSS,
Media Review, Chatter, Jive, Yammer, etc.
Media

Medical devices, smart electric


XLS, PDF, CSV, email, Word,
meters, car sensors, road cameras,
PPT, HTML, HTML 5, plain
satellites, traffic recording devices,
text, XML, JSON, etc. Sensor
Docs processors found within vehicles,
data video games, cable boxes,
assembly lines, office building, cell
Government, weather, towers, jet engines, air
competitive, traffic, regulatory, conditioning units, refrigerators,
compliance, health care services, trucks, farm machinery, etc..
economic, census, public
finance, stock, OSINT, the Public Machine Event logs, server data,
World Bank, SEC/Edgar, Web Log application logs, business process
Wikipedia, IMDb, etc. Data logs, audit logs, call detail records
(CDRs), mobile location, mobile
app usage, clickstream data, etc.
Archives of scanned documents, statements,
insurance forms, medical record and customer Business
Archive Project management, marketing automation,
correspondence, paper archives, and print Apps productivity, CRM, ERP content management
stream files that contain original systems of
system, HR, storage, talent management,
record between organizations and their
procurement, expense management Google
customers
Docs, intranets, portals, etc.
PwC 4
Not to be confused with…

Structured, semi-structured or
unstructured information
distinguished by one or more of
the four “V”s: Veracity, Velocity,
Variety, Volume.
Big Data Open Data

Public, freely available data

Crowdsourced
Data

Data collected through contributions from a


large number of individuals.

Graphic and definitions based on “Big Data in Action for Development,” World Bank, worldbank.org

PwC 5
It’s not just about the data…
It is important to understand the distinction between Big Data sets (large, unstructured,
fast, and uncertain data) and ‘Big Data Analytics’.

Big Data + Big Data Analytics


Refers to the DATA only Methods of using Big Data to generate insight

• Leveraging a computer’s ability to learn


Machine Learning/Deep
1 Learning
without being explicitly programmed to solve
business problems

• Understanding value drivers from the ever-


IoT (Internet of Things) &
2 Sensor Analytics
growing network of connected physical
objects and the communication between them

• Mining product reviews to estimate


Modeling Willingness-to-
3 Pay
willingness-to-pay for product features

• Understanding human speech as it is spoken


4 Natural Language through application of computer science, AI,
Processing and computational linguistics

• Using distributed computing and machine


5 Analyzing Data @ Scale
learning tools to analyze hundreds of
gigabytes of data

• Mining social data in real time to understand


Creating a Streaming when and where consumers are making
6 Consumer Behavior Data Lake choices

PwC 6
Data Mining, Text Mining, and Natural Language Processing
What are they and how are they used?

Natural Language
Processing
NLP is a theoretically
motivated range of
computational
Text Mining techniques for analyzing
Analysis of large and representing
quantities of natural naturally occurring texts
language text and at one or more levels of
Data Mining detecting lexical or linguistic analysis for the
linguistic usage purpose of achieving
Extraction of implicit, human-like
patterns to extract
previously unknown, language processing for a
probably useful
and potentially useful range of tasks or
information
information from data applications.
Source: Text Mining, Ian Witten, 2004

PwC 7
PwC 8
PwC 9

You might also like