Bbil2-B2-Bbil2 b2 109 Sallam Os

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Key Trends and Emerging

Technologies in Big Data Analytics

Rita Sallam

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in
any form without Gartner's prior written permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on
gartner.com. The information contained in this publication has been obtained from sources believed to be reliable. Gartner di sclaims all warranties as to the accuracy, completeness
or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publication consists of the opinions of Gartner's research
organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research may include a
discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its
shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these
firms or funds. Gartner research is produced independently by its research organization without input or influence from these firms, funds or their managers. For further information
on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity."
"Not everything
Don't replace Commonthat counts
Sense and
can be
Experience counted,
With "Data Science"

and not everything that be


counted, counts."
Most of all data
is just noise

Photo Source: from Wikimedia Commons


© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Don't replace Common Sense and
Experience With "Data Science"

"Not everything that counts


can be counted,
and not everything that be
counted, counts."

Most of all data


is just noise

Photo Source: from Wikimedia Commons


© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Key Issues

1. What are some of the key trends in


data science?
2. What is happening at the cutting edge?
3. What challenges do organizations face and
how to go about it?

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Key Issues

1. What are some of the key trends in


data science?
2. What is happening at the cutting edge?
3. What challenges do organizations face and
how to go about it?

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Data Scientist:
The Sexiest Job of the 21st Century
Harvard Business Review October 2012 Job Trends from Indeed.com
Data Science
Semantic
Technology
Scientific
Computing

Signal Linguistics
Processing
Operations
Research

Machine
Statistics Data Science: .... unified discipline that
Learning
develops methodologies to utilize data for the
purpose of monitoring, understanding,
anticipating and controlling parts of the world.

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Like Computer Scientists and Physicians,
Data Scientists Will Specialize
Specialization Sales, supply chain, production,
by business process/
industry
pharma, banking, telco, retail ...

Data Science (Advanced) Analytics


Enables
Specialization
by data/aspect
"Structured data," text, audio, social, image,
speech, claims, loyalty, profitability, failures,
pricing, fraud, security, quality, risk

Specialization Front-office, back-office, insurance,


by business process/ pharma, banking, telco, retail ...
industry

Computer Science Information Technology


Enables
Specialization
by aspect
Security, databases, high-performance computing,
application integration, software engineering,
Web design, hardware, ERP, CRM

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Data Science Drives All Kinds of
Analytics Sophistication

Link Analysis, Clustering, Inversion, Sensitivity Analysis,


Decision Trees Design of Experiments

Descriptive
Monitoring Human

Decision
Diagnostic

Action
Human
Data

Understanding

Predictive
Anticipating Human

Prescriptive Decision Support Human


Controlling
Decision Automation

Optimization, Decision Neural Nets, Nearest


Management, Neighbor, SVM, Time Series,
Next-best-action, ... Markov Chain Modelling...

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Data Science: Cycling between Deductive
and Inductive ...
Do Good Deductive/
Improve response-rate Question-based
Attract new customers
Best route or schedule Monitor
Diagnose Avoid Bad
Predict Fraud detection
avoid churn
binning cleansing Understand
quality defects
slicing/dicing transforming
Predict failures
filtering
sampling
visualizing

Inductive / Drives Understanding


Data Exploration

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Advanced Analytics Market:
Do More With Less
scope /
breadth C, C++, Java

R, Python,
SAS, Matlab

Advanced
Analytics
Platforms Data Discovery
Vendors
Text Analytics
BI Platforms
Specialty Analytics
Packaged Analytic
Applications Embedded
Analytics
Top Data Power Business Information
notch scientists users analysts consumer
2 (5) 20 (200) 2000 (5000) 20000 (50K) Everybody
Per Million in OECD Nations User-required data
(in Top 10 metropolitan areas)
science skills
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Key Issues

1. What are some of the key trends in


data science?
2. What is happening at the cutting edge?
3. What challenges do organizations face and
how to go about it?

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


The Cutting Edge: AI/Machine Learning
on the Front News Pages

Deep
Learning
Computer Self-driving Self-driving Google acquires
beats top cars in the cars in DeepMind
chess player desert normal ($400m)
traffic

1997 ... 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Self-driving cars
on normal streets
IBM Watson beats
jeopardy experts
Google/
Facebook
Self-driving hire top ML
cars experts

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Signal = Big Data − Noise

volume Big Data


~1.5 x

Signal-2-noise-ratio worsens

noise
signal
~1.20 x

used
time

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Big Data? or Smarter Analytics?

Performance

5 60

25

Data 5
volume

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


The Power of "More" Data Sources
Business Information
Crowdsourcing Communities, Twitter, Tumblr
WLAN Market Research Blogs
Economic Census Credit Bureaus Facebook,
External

Weather Social Media LinkedIn


Commercial
Data
Open Data Data
Public
Data
Geoinformation Data
Meter's, RFID, Fusion Customer
GPS Field Reports
Interactions
Sensor
Internal

Monitoring
Operational Log-Data
Data Enterprise
Transactions "Dark Data"
Contracts

Tabular Non-tabular
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Turning on the Audio
Perfect
10%
Combined
9%
Normal
Calls Correlated to Churn

8%
7%
Audio
6% at 10%-Touchrate
5%
"Don't Know" Correct Wrong
4%
3% Normal 55% 45%
2% Combined 73% 27%
1%
0%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Call Population

Adapted from:
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Ensembles Are Cutting Edge

Model 4
Model 1 Model 6

Model 2
Model 3 Model 5

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


The Power of Ensembles
(The wisdom of the crowds)
One Expert Versus "the Audience" A
C Majority
Question Vote: A
A

Advantages
• avoid overfitting of a single model
95% of the time correct ... • robust regarding "hyper-parameters"
• fairly easy-to-use
Assume 20% of the audience has
the answer: • Today's gold standard for high precision

32% Cautions
20% • require faster computers
22% 24% 22%
• less transparent models

12%

A B C D
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
From the Cutting Edge of Data Science:
Deep Neural Nets
The trick: Biggest Neural Net So Far
feature learning
Merck's
Cheminformatic
ao a*
y
bo b*

a' a"

b' b"
word2vec
Android Speech Recognition
Advantages
• complex features (a*, b*) are "learned"
• utilize unlabelled + labeled data
Cautions
• less feature engineering?
• Novel approach
• domain knowledge less important?
• Brittle hyper-parameters
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Google Face Recognition

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Crowdsourcing
"Outsource data science "Outsource small micro-tasks (HITS)
A projects to scattered experts" B to scattered Internet users"

Scattered
Market-place humans

Customer
• Quality, time, cost trade-off
Objective: Access to the best of
• Great for hard-to-automate scenarios
the best in data science
• Fixed costs versus variable costs

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Crowdsourcing Applications (for HITs)
Scout newspapers ... Surveil neighborhoods
Filter social media Scrutinize trains, pipes
Observe traffic Classify medical images
Moderate content Good Collect geo data
Translate text Categorize products
Make photos from properties ... Enrich product search
Write down opening hours Extract metadata
Measure environment sound Scrape the Web for business information
question-answering Mystery shopping
Audio transcription
Read greetings

Buy likes Bad (mostly) Fake reviews (positive or bad)


Distort opinions Influence the influencers
Organize flash mobs Click fraud
Help with SEO (e.g., bad and good linking) Buy follower

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Tim Berners-Lee Video: Crowdsourcing
Being Used for Data Collection

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Key Issues

1. What are some of the key trends in


data science?
2. What is happening at the cutting edge?
3. What challenges do organizations face and
how to go about it?

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Where to Put the Data Scientists?
Data Scientists @ LOBs Data Scientists @ IT Data Scientists as Separate BU

LOB LOB LOB LOB LOB LOB LOB


LOB LOB
DS DS
DS

DS IT IT
IT

Scattered Experts • Agility ACE (formerly BICC)


• Cross-functional view
LOB LOB LOB LOB
LOB • Knowledge sharing AA/DS AA/DS
LOB AA/DS
AA/DS • Business intimacy ACE
• Proximity to process AA/DS
IT
AA/DS IT and data

AA/DS = advanced analytics/data science; LOB = line of business; ACE = Analytic Center of Excellence
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Where Do I Find the Talent?
Light- Mid- to heavy- Top-
Power weight data- weights notch
Users scientists data scientists data scientists
Hiring • Social Sciences • Statisticians • Kaggle
• Natural Sciences • Machine Learning • TopCoder
• Electrical Engineers • Operations • CrowdAnalytix
• Mathematicians Researcher
• Industrial Engineers
• Computational
Linguistics
Training • YouTube • "On-the-job" • Academia
• MOOC • Big Data Science
• Local Colleges Firms: Google,
• Vendors Facebook, Amazon,
• "On-the-job" "Wall Street"

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Driving the Success of Data Science Solutions:
Skills, Roles and Responsibilities ...
Ask good questions Know the constraints
(e.g., legal, ethics, market)
Latency at Execution? Decision Making
Business
Build, Buy, Outsource Skills Transparent Versus "black box"

Gauge political
friction Performance Criteria That Matter
(ROI, accuracy, profitability
Deployment versus market gain)
"Analytics Leader"?
Feature Engineering
Data

Recalibration With
Data IT Data New Data?
Logistics Skills Science
High-
performance Which Analytics
Computing to Choose?
Project
Execution/Monitoring Data Exploration
Data Governance
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.
Creativity, Communication
Choosing the Right Approach for the
Right Data Science Solution
 An Eight-Question Decision Framework
for Buying, Building and Outsourcing
Data Science Solutions
Alexander Linden and Lisa Kart (G00258056)

See Magic Quadrant

BUILD
Advanced Analytics Platforms
See Hype Cycle

OUTSOURCE
Data Science Service Provider BUY
Packaged Analytics Applications

E.g., IBM, Accenture, Mu Sigma, Opera Solutions

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Recommendations
 Develop/Hire "analytics leaders" that can fill the void
between IT, business and quantitative data science.
 Skills/Communication more important than "architecture"
and "strategy": $100K skills are worth more than
$100K IT.
 Best-of-breed approaches lead to better accuracy!
 With increasing business complexity, creative
approaches are becoming more important (again!)
 Intuition, and problem insight will always trump "big data"
 Don't overengineer!

© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.


Recommended Gartner Research
 Organizational Principles for Placing Advanced
Analytics and Data Science Teams
Alexander Linden and others (G00255555)
 Extend Your Portfolio of Analytics Capabilities
Lisa Kart, Alexander Linden and others (G00254653)
 Who's Who in Advanced Analytics
Lisa Kart, Alexander Linden and others (G00254652)
 Magic Quadrant for Advanced Analytics Platforms
Gareth Herschel and others (G00258011)

For more information, stop by Gartner Research Zone.


© 2014 Gartner, Inc. and/or its affiliates. All rights reserved.

You might also like