Big Data Ibm 2014

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Vikas K Manoria

Technical Consultant – Big Data & Analytics


vmanoria@in.ibm.com

Overview - Big Data & Analytics

1 © 2013 IBM Corporation


IBM Big Data & Analytics

Agenda
What is Big Data?
– Concepts
– Characteristics

Business Motivation
– Big Data Challenges
– How Big Data Impacts Every Aspect of Your Business
– A Big Data Journey

IBM Big Data Platform


– InfoSphere Data Explorer
– InfoSphere BigInsights
– IBM PureData Systems, InfoSphere Warehouse
– InfoSphere Streams

Big Data Use Cases

Get Started

2 © 2013 IBM Corporation


IBM Big Data & Analytics

What is Big Data?


All kinds of data
– Large volumes
– Valuable insight, but difficult to extract
– May be extremely time sensitive
Big Data is a Hot Topic Because Technology Makes it Possible to Analyze
ALL Available Data

“Big data technologies describe a new generation of technologies and architectures, designed to
economically extract value from very large volumes of a wide variety of data, by enabling
high velocity capture, discovery and/or analysis.”
Source: Matt Eastwood, IDC
3 © 2013 IBM Corporation
IBM Big Data & Analytics

Characteristics of Big Data


V4 = Volume Velocity Variety Veracity

Cost efficiently Responding to the Collectively analyzing


processing the increasing Velocity the broadening Variety
growing Volume

50x 30 Billion
35 ZB RFID 80% of the
sensors and worlds data is
counting unstructured

2010 2020

Establishing the 1 in 3 business leaders don’t trust


Veracity of big the information they use to make
data sources decisions

4 © 2013 IBM Corporation


IBM Big Data & Analytics

Information is at the Center … And Organizations


of a New Wave of Opportunity… Need Deeper Insights

44x 2020
Business leaders frequently make
as much Data and Content
Over Coming Decade
35 zettabytes
1 in 3 decisions based on information they
don’t trust, or don’t have

1 in 2 Business leaders say they don’t


have access to the information they
need to do their jobs

of CIOs cited “Business

83% intelligence and analytics” as part

2009
800,000 petabytes
80%
Of world’s data
of their visionary plans
to enhance competitiveness

is unstructured of CEOs need to do a better job

60% capturing and understanding


information rapidly in order to
make swift business decisions

5 © 2013 IBM Corporation


IBM Big Data & Analytics

Merging the Traditional and Big Data Approaches


Traditional Approach Big Data Approach
Structured & Repeatable Analysis Iterative & Exploratory Analysis

IT
Business Users
Delivers a platform to
Determine what enable creative
question to ask discovery

IT Business
Structures the Explores what
data to answer questions could be
that question asked

Monthly sales reports Brand sentiment


Profitability analysis Product strategy
Customer surveys Maximum asset utilization

6 © 2013 IBM Corporation


IBM Big Data & Analytics

Imagine the Possibilities of Harnessing Your Data Resources

Big data challenges exist in every organization today


Government cuts acoustic Utility avoids power Hospital analyses streaming
analysis from hours to failures by analyzing vitals to detect illness
70 Milliseconds 10 PB of data in minutes 24 hours earlier

Retailer reduces time to Stock Exchange cuts Telco analyses streaming


run queries by 80% to queries from 26 hours to network data to reduce
optimize inventory 2 minutes on 2 PB hardware costs by 90%

7 © 2013 IBM Corporation


IBM Big Data & Analytics

Leveraging Big Data Requires Multiple Platform Capabilities

Understand and Navigate Federated Discovery


Federated Big Data Sources and Navigation

Manage and Store Huge Hadoop File System


Volume of any Data MapReduce

Structure and Control Data Data Warehousing

Manage Streaming Data Stream Computing

Analyze Unstructured Data Text Analytics Engine

Integrate and Govern Integration, Data Quality,


all Data Sources Security, ILM, MDM

8
IBM Big Data & Analytics

IBM’s Business-centric Big Data Platform


Enables you to start with a critical business needs and expand the
foundation for future requirements
“Big data” isn’t just a
technology— it’s a business
strategy for capitalizing on
information resources

Getting started is crucial

Success at each entry point


is accelerated by products
within the big data platform

Build the foundation for


future requirements by
expanding further into the big
data platform

9 © 2013 IBM Corporation


IBM Big Data & Analytics

A Big Data Journey:


Anticipating and Improving Customer Interactions
Project 1: Big Data Foundation
• Financial and -Data Warehousing, Data Quality, Customer Data Hub
tax -Single view of the customer
preparation
software and
services Project 2: Analytics
-Customer behavior and segmentation analysis
• $4.15B rev -Reduced customer churn 10%
2012 -$10M new revenue in 12months

Project 3: Unstructured Data Analytics


-Social media analysis, Log Analysis, Text Analytics
-Augment customer profiles with new data sources
-Data warehouse cost optimization
-Data Exploration

Project 4: Real Time Analytics


-No latency analytics
-Real time behavior prediction
-Real time customer segmentation
10
IBM Big Data & Analytics

Big Data Platform and Application Frameworks


Speed time to
Solutions value with analytic
and application
Analytics and Decision Management accelerators
Gather, extract
and explore data
using best of IBM Big Data Platform
breed Analyze streaming
visualization data and large
Visualization Applications & Systems
& Discovery Development Management
data bursts for
real-time insights

Cost-effectively
analyze
Accelerators
Petabytes of Index and
structured and federated
unstructured Hadoop Stream Data Contextual discovery for
information System Computing Warehouse Discovery contextual
collaborative
insights

Govern data
quality and
manage Deliver deep insight
information with advanced
Information Integration & Governance
lifecycle in-database
analytics and
operational analytics

Big Data Infrastructure


Cloud | Mobile | Security
IBM Big Data & Analytics

An example of the big data platform in practice


Ingestion and Real-time Analytic Zone Analytics and
Streams Reporting Zone

Warehousing Zone

BI &
Reporting

Enterprise
Warehouse
Connectors

Predictive
Analytics
Hadoop

MapReduce Hive/HBase Data Marts


Col Stores
Visualization
& Discovery

Documents
in variety of formats ETL, MDM, Data Governance

Landing and Analytics Sandbox Zone Metadata and Governance Zone


12
IBM Big Data & Analytics

TECHNOLOGY

Example: Integrate big data sources with


enterprise data
IBM Business Analytics

Cognos
SPSS Cognos Cognos Cognos Consumer
Modeler RTM BI Insight Insight
Predictive Real-time Reporting / Analysis Export and Social Media
Analytics Dashboards Explore Analysis

Data In-Motion Data At-Rest

PureData InfoSphere
Systems BigInsights

IBM Big Data Platform Other Sources


IBM Big Data & Analytics

Big Data Key Use Cases:

Big Data Exploration Enhanced 360o View Security/Intelligence


Find, visualize, understand of the Customer Extension
all big data to improve Extend existing customer Lower risk, detect fraud
decision making views (MDM, CRM, etc) by and monitor cyber security
incorporating additional in real-time
internal and external
information sources

Operations Analysis Data Warehouse Augmentation


Analyze a variety of machine Integrate big data and data warehouse
data for improved business results capabilities to increase operational efficiency
14 © 2013 IBM Corporation
IBM Big Data & Analytics

Big Difference: Schema on Run

Regular database Big Data (Hadoop)


– Schema on load – Schema on run

Raw data
Raw data

Schema Storage
to filter (unfiltered,
raw data)

Schema
to filter

Storage
(pre-filtered data) Output

15 © 2013 IBM Corporation


IBM Big Data & Analytics
IBM Big Data & Analytics

BigInsights Enterprise Edition Open Source IBM

DB export
Optional Analytics and discovery “Apps” Administrative and
IBM and Text Web Crawler
DB import
development tools
Accelerator for
partner processing
engine and
social data
analysis Ad hoc query
offerings library Boardreader
Web console
BigSheets Accelerator for Machine
Distrib file copy learning
machine data
analysis • Monitor cluster health, jobs, etc.
... Data
• Add / remove nodes
processing
• Start / stop services
• Inspect job status
• Inspect workflow status
Infrastructure ZooKeeper Jaql Pig • Deploy applications
• Launch apps / jobs
Integrated • Work with distrib file system
Enhanced Oozie HBase Hive
installer security •Work with spreadsheet interface
•Support REST-based API
•...
Adaptive
Text compression Indexing Lucene MapReduce
MapReduce

Flexible HCatalog HDFS


scheduler
GPFS (EAP)
Eclipse tools
• Text analytics
Connectivity and Integration Streams • MapReduce programming
• Jaql, Hive, Pig development
• BigSheets plug-in development
JDBC Sqoop DB2 Netezza R • Oozie workflow generation

Flume Data Explorer Guardium DataStage Cognos BI


IBM Big Data & Analytics

Stream Computing Represents a Paradigm Shift


Traditional Computing Stream Computing

Historical fact finding Current fact finding

Find and analyze information stored on disk Analyze data in motion – before it is stored

Batch paradigm, pull model Low latency paradigm, push model

Query-driven: submits queries to static data Data driven – bring data to the analytics

Real-time
Analytics
18
IBM Big Data & Analytics

Big Data in real-time with InfoSphere Streams

Filter / Sample
Modify Annotate

Analyze

Fuse
Classify

Windowed
Score Aggregates
IBM Big Data & Analytics

Analytic Accelerators Designed for Velocity (and Variety)

Mining in Microseconds Acoustic


(included with Streams) (IBM Research)
(Open Source)

Simple & Advanced Text Advanced


Text
(listen, verb), (included with Streams) Mathematical
(radio, noun) (IBM Research) Models
(Open Source UIMA) (IBM Research)

Statistics
Predictive
(IBM Research)
∑ R(s , a )
population
t t (included with
Streams)
Image & Video
Geospatial (Open Source)
(IBM Research)

20
IBM Big Data & Analytics

Putting it all together …end-to-end big data solution

Discover
Netezza
Appliance Model
Visualize
& Publish

InfoSphere
Warehouse IBM SPSS

InfoSphere
InfoSphere Streams
BigInsights
IBM Cognos
Measure
Score

Streaming Data
Sources

21
IBM Big Data & Analytics
IBM Big Data & Analytics
IBM Big Data & Analytics
IBM Big Data & Analytics

Cognos Business Intelligence optimized for Big SQL

Big SQL enables the Cognos BI Cognos BI Server


server to delegate many types of
analytical computations to Explore &
Report & Act
Analyze
BigInsights MapReduce
processing instead of computing
SQL
them locally at a performance Interface
cost like it would do with Hive via JDBC

Application
(Map-Reduce)
Faster response times due to
Storage
increased opportunity for query (HBase, HDFS)

processing to occur closer to the InfoSphere BigInsights


data

Not hindered by the latency and


other limitations of querying
Hadoop via Hive
Hive
IBM Big Data & Analytics

Performance – Cognos BI + DB2 BLU

Faster cube load*


38x Average
Acceleration
Of database queries
for reporting2

Faster DB Query*
Cognos BI
+ Dynamic
Dynamic
Dynamic
Cubes
Cubes Compatible

DB2 BLU Query Query

Dynamic
Dynamic Cubes
Query
Compatible
Query
+ C1 C2 C3 C4 C5 C6 C7 C8

Power DB2 with BLU

2. Based on internal tests.


IBM Big Data & Analytics

IBM PureData Systems


Meeting Big Data Challenges – Fast and Easy!

For apps like Real-time Fraud Detection…


Operational data warehouse services optimized to
System for Operational Analytics
balance high performance analytics and real-time
operational throughput

For apps like Customer Analysis…


System for Analytics Data warehouse services optimized for
high-speed, peta-scale analytics and simplicity

For Exploratory Analysis & Queryable Archive


System for Hadoop
Hadoop data services optimized for big data analytics
and online archive with appliance simplicity

For apps like E-commerce…


System for Transactions
Database cluster services optimized for
transactional throughput and scalability
IBM Big Data & Analytics

Use Cases for a Big Data Platform


Know Everything about your Customer
Social media customer sentiment analysis
Innovate New Products
Promotion optimization at Speed and Scale
Segmentation Social Media - Product/brand Sentiment
Customer profitability analysis
Click-stream analysis Brand strategy
CDR processing Market analysis
Multi-channel interaction analysis RFID tracking & analysis
Loyalty program analytics Transaction analysis to create insight-
Churn prediction based product/service offerings

Run Zero Latency Instant Awareness of


Operations Risk and Fraud
Multimodal surveillance
Smart Grid/meter management
Cyber security
Distribution load forecasting
Fraud modeling & detection
Sales reporting
Risk modeling & management
Inventory & merchandising optimization
Regulatory reporting
Options trading
ICU patient monitoring
Disease surveillance
Transportation network optimization
Store performance Exploit Instrumented Assets
Environmental analysis
Experimental research Network analytics
Asset management and predictive issue resolution
Website analytics
IT log analysis

28 © 2013 IBM Corporation


IBM Big Data & Analytics

Every Industry can Leverage Big Data and Analytics.

Energy & Media &


Banking Insurance Telco Entertainment
Utilities
• Optimizing Offers and • 360˚ View of Domain • Pro-active Call Center • Smart Meter Analytics • Business process
Cross-sell or Subject • Network Analytics • Distribution Load transformation
• Customer Service and • Catastrophe Modeling Forecasting/Scheduling • Audience & Marketing
• Location Based
Call Center Efficiency • Fraud & Abuse Services • Condition Based Optimization
Maintenance

Travel & Consumer


Retail Government Healthcare
Transport Products
• Actionable Customer • Customer Analytics & • Shelf Availability • Civilian Services • Measure & Act on
Insight Loyalty Marketing • Promotional Spend • Defense & Intelligence Population Health
• Merchandise • Predictive Maintenance Optimization Outcomes
• Tax & Treasury Services
Optimization Analytics • Merchandising • Engage Consumers in
• Dynamic Pricing Compliance their Healthcare

Automotive Chemical & Aerospace & Electronics Life Sciences


Petroleum Defense
• Advanced Condition • Operational Surveillance, • Uniform Information • Customer/ Channel • Increase visibility into
Monitoring Analysis & Optimization Access Platform Analytics drug safety and
• Data Warehouse • Data Warehouse • Data Warehouse • Advanced Condition effectiveness
Optimization Consolidation, Integration Optimization Monitoring
& Augmentation

29
IBM Big Data & Analytics

Clients Achieve Breakthrough Outcomes With IBM’s Big Data Platform

Imperative Primary Capability Business Value

Provide single point of access to


Aircraft Secure single point InfoSphere Data disparate data sources
of access to all Explorer
Manufacturer enterprise data

Reduce maintenance costs


Run Zero Latency InfoSphere and differentiate by optimal
Operations BigInsights turbine placement

Know Everything Analyzed call records to drive


about your InfoSphere real-time promotions &
Customers Streams reduce churn

Increased network
Exploit Instrumented availability by identifying
PureData for
Assets and fixing holes
Analytics

Analysis time on 2 PB of
Instant Awareness of PureData for data cut from 26 hours to 2
Risk and Fraud Analytics minutes

30 © 2013 IBM Corporation


IBM Big Data & Analytics

A Catalyst for ISV and Partner Innovation


Traditional Approach Transformational Outcomes
Combining data from hundreds of
Managing rising cost of care hospitals to improve results across
the healthcare continuum

Historical analysis of 2 million events analyzed


per minute, delivering real-time
subscriber data insight to mobile operators

Customer segmentation based Capturing information from all


interactions to improve customer
on loyalty data lifetime value

Anti-corruption and bribery Use Big Data analytics to prioritize


compliance program and isolate areas of risk or rogue
activity

Manual supply chain Provide visibility, analysis and


reporting across the entire supply
integration chain (planning -> execution)
Measure and predict patient
Treat-first, seek-payment-later payment behavior, reduce risk from
and write off bad debt bad debt and boost collection rates
Analyzing parking systems to
Random parking meter patrols maximize revenue & improve the
& search for open spots parking experience in cities
31
IBM Big Data & Analytics

Get started!

Execute and
Think Big Pick your Spot
Deliver Value

New
New insights
insights and
and Identify
Identify and
and prioritize
prioritize Evolve
Evolve your
your existing
existing
new possibilities
new possibilities business
business use
use cases
cases analytics
analytics capabilities
capabilities

Process
Process and
and performance
performance Ensure
Ensure that
that the
the business
business Build
Build or
or acquire
acquire new
new
improvement
improvement is
is engaged
engaged skills
skills required
required

New
New revenue
revenue Agree
Agree on
on the
the key
key Measure
Measure and
and
opportunities
opportunities communicate
measures
measures for
for success
success communicate success
success
IBM Big Data & Analytics

Thank You

April 24, 2014 © 2013 IBM Corporation

You might also like