Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Implications of Big Data and Analytics

for GTS Enterprise Architects


Tony Carrato, IBM Analytics
tony.carrato@us.ibm.com

© 2015 IBM Corporation


What we’ll discuss
 Why this talk, why now?  Experiences with Big Data:
 Who I and who am I think  In our product team
you are  What we’re seeing at
customers
 The journey I’ve been on,
 Product and technology
bringing analytic solutions, thoughts
including those dealing with
 Implications for Enterprise
“big data” to customers
Architects
 Analytic technologies  Technology building blocks
 What are the main types of  Governance
analytics
• Data governance
 What technologies are used
• From IBM
 Some examples
• From other software companies  Where to learn more
• Open Source
2 © 2015 IBM Corporation
Why this talk, why now?

 I’m finding myself engaged with customers, including SO customers, who are moving from
analytics and big data “research” projects to early production projects with both analytics
and big data
 As a result, I’m getting a good view of the uncertainty IBM teams and, for that matter,
customer teams are facing in these areas
 Part of that uncertainty is not knowing where or how to get started in these discussions
 I’ve now spent a couple years in learning about and helping develop solutions around big
data and analytics
 So I think I’ve learned enough to have something to say

3 © 2015 IBM Corporation


Business Drivers

Business Driver: Why Big Data & Analytics are Important ?


Data is emerging as the world’s newest resource for competitive advantage
 Uncover new insights to transform the business
The power of all data  Why did it happen?
coming together …  What is likely to happen?
 What’s the best course of action based on what
you’ve learned ?
Real-
Time

 Empower many more employees within the


organization
 At every level of the organization to make better
decisions
structured & Analytics  Improve effectiveness & competitiveness of the
unstructured data
(sensor , social , business
logs, machine , etc) Delivering  Make speed a differentiator
 Monetize the data itself
Actionable
…with the power  Be more right, more often
of New Insights
Technology
 Manage risk
 Protect against poor decision-making (risk-
opportunity equation right)
 Protect against security and privacy risks

 New and more effective approach to perform analytics


 Move analytics to the data, ( in-motion & native
Hadoop, Streaming, format)
Cognitive,  Leverage open source and commodity hardware (cost
In-Memory, Exploration savings)
4 … © 2015 IBM Corporation
Who I am and who I think you are

 Who I am:
 16 year IBMer
 A senior certified architect, in IBM Analytics
• Currently in Tech Sales
• Was previously in product development
 A member of the IBM Academy of Technology
 Someone who has been the CTA on two SO accounts: Telstra and Westpac
 Who I think you are:
 Senior GTS architects
 Some will be explicitly delivering Enterprise Architecture services
 Others will be in other leadership roles on SO accounts
 Many will operational backgrounds

5 © 2015 IBM Corporation


My journey in analytic solutions

 I’ve been involved with analytics products for about six years
 Starting with the Smarter Cities products
 For the last year plus, I’ve been working with Energy & Utilities customers, helping design
and helping sell IBM Insights Foundation for Energy (IFE)
 IFE’s main focus is supporting analytic solutions and providing an enterprise-grade
analytics platform
 Big data turns out to be an important element in many analytics solutions
 I’m seeing, as I’ve mentioned, this as being new ground for many accounts and architects

6 © 2015 IBM Corporation


Some definitions
 Analytics:  Data curation: The University of Illinois (USA)
 From Wikipedia (yeah, I know): “Analytics is the defines data curation as “the active and
discovery and communication of meaningful ongoing management of data through its life
patterns in data.” cycle of interest and usefulness to scholarship,
 Typically thought of as: science, and education. Data curation activities
enable data discovery and retrieval, maintain its
• Descriptive quality, add value, and provide for reuse over
• Predictive time, and this new field includes authentication,
• Prescriptive archiving, management, preservation, retrieval,
 Usually has an industry or business context and representation.”
• Which almost always requires appropriate
SMEs
 Big Data:
 Wikipedia again: “Big data is a broad term for
data sets so large or complex that traditional data
processing applications are inadequate.
Challenges include analysis, capture, data
curation, search, sharing, storage, transfer,
visualization, and information privacy.”
 This is not just about Hadoop. Big Data requires
addressing a range of things, including Enterprise
Akraya, Inc.
Content Management, Master Data Management,
ETL, Data Warehousing, Information/Data
7 © 2015 IBM Corporation
Security and so on
We see many customers taking on analytics/big data projects
 Some of these are test driving new technologies
 For example, clients setting up a Hadoop cluster for a test drive
• These are usually driven by an IT group
 We also see line of business (:LOB) teams adopting analytical tools
 There are often many different tools adopted, for different projects
• You can argue that this just shifts the many versions of the truth problem to another
space
 LOB teams may also take on projects with IBM Research and similar organizations, such as
universities
 These projects often produce impressive results
• Certainly that’s the most common case with IBM Research!
• Still, the projects may struggle to be adopted by the client’s mainstream business
• Organizational change and trusting analytic results is often hard
 A group which doesn’t seem to be engaged in these projects and discussions is the
enterprise architecture team at any given customer
 What I do see in EA teams are somewhat theoretical discussions, often without
awareness of projects going on in their own organization

8 © 2015 IBM Corporation


The IBM Business Analytics and Optimization: Enterprise Architecture View*

9 * From the IBM Big Data & Analytics Reference Architecture © 2015 IBM Corporation
What about the technologies – analytics examples
 Descriptive analytics
 Generally, this means reporting
 IBM’s main product here is Cognos
• The RAVE interface, from Cognos 10.2.1, is very cool
 There are many competitors for reporting
 Predictive analytics
 Given a set of data, called independent variables, what
computed outcome (dependent variable) is likely
 SPSS is IBM’s predictive analytics product
 R is a popular open source competitor
 Commercial competitors include SAS
 Prescriptive analytics
 Given what we think is likely to happen and a set of
constrained options, what action(s) should be taken
• In many cases, this is a recommendation, not actually
an automated action
 ILOG CPLEX is IBM’s prescriptive analytics
 There are fewer competitors in this area

10 © 2015 IBM Corporation


What about the technologies – big data examples
 Hadoop
 Spreads the data across multiple, inexpensive nodes
 IBM’s version is BigInsights
 Many customers choose open source alternatives
 Pure Data Appliance
 Also known as Pure Data for Operational Analytics (PDA)
IBM PDOA
 And formerly known as Netezza
 DB2 BLU
 Brings a set of technologies to speed DB2
• Includes in-memory as well as compression technologies
 Major competitor is SAP HANA
 Informix Time Series
 Is usually measurements at successive points in time
 Time series data is more common than you may think
• Many Internet of Things devices produce this sort of data
 Spark
 Provides a layer above other data sets, including Hadoop and SQL
 Is very fast and offers a strong programming model
 Is attracting attention from developers, worldwide
 Has a big commitment from IBM
 It’s important to note that SAP HANA and Business Objects will show up quite regularly,
especially in any SAP account.
11 © 2015 IBM Corporation
More about the big data technologies
 IBM Spark Technology Center: Putting IBM engineers at the center of Spark innovation, in
San Francisco
 Spark TC site: http://www.spark.tc/
 On IBM.com http://www.ibm.com/analytics/us/en/technology/spark/
 Forbes magazine article: http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-
apache-spark-for-big-data-analytics/
 IBM (and friends) Open Data Platform: A common core platform for Open Data
 Hadoop, YARN, HDFS, Apache Ambari
• Able to be tested and certified
• That is – a standard!
 Home: http://opendataplatform.org/
 IBM.com: http://www-03.ibm.com/software/products/en/ibm-open-platform-with-apache-
hadoop
 InfoQ article: http://www.infoq.com/news/2015/04/hortonworks-odp

12 © 2015 IBM Corporation


You are going to see open source technologies in this area
 Important examples
 Big Data:
• Spark
• Hadoop
 Analytics
• Python and Python Notebooks
• R
 A thought about open source:
 Free software is often free for a reason
 Consider backups, alone
 For Hadoop, as an example, you have choices of:
• Teeing. to another HDFS cluster
• Replication, also to another cluster
• Snapshots
• SQL backups
• Hadoop usually doesn’t consider archiving
 As we all know, our customers may embrace open source or be suspicious of it
 And there are IBM rules for Global Services, when working with open source technologies
• But I’m not going into that here

13 © 2015 IBM Corporation


What our (Insights Foundation for Energy) product team has
learned in looking at big data options

 All the noise about Spark is well justified


 Spark turns out to be easy to work with, using Python & Python Notebooks
• See http://ipython.org/notebook.html
 Performance, including ingesting lots of data is excellent
 Previously, we’d used Informix Time Series, which outperforms DB2 for time series data
 Time series data includes things like meter readings, where there is an update
periodically, but most things don’t change
 This still works well but new projects seem to be migrating to Spark
• See http://www-01.ibm.com/software/data/informix/timeseries/
 Spark typically implies, but doesn’t require, Hadoop
 PureData, also known as Netezza, is really fast
 But requires specific and costly hardware

14 © 2015 IBM Corporation


Example: National Grid UK uses predictive modeling
and big data analytics to implement condition-based
maintenance

~9% reduction
in operating expenses with
condition-based maintenance of
about £200M

Provides alerts Business challenge: Until recently, power grid operators have needed to rely on
facilitating proactive rather than costly traditional scheduled asset maintenance to ensure the highest availability
reactive responses and reliability of power transmission because they could not plan maintenance
around actual asset conditions.

The smarter solution: With a cloud-based big data and analytics solution, this
Eliminates costs national electricity grid operator has a 360-degree view of its assets from the
of implementing or replacing transformer level to the entire grid. Predictive modeling and advanced analytics
infrastructure by using provide not only near-real-time asset status, but also long-term projections of
cloud-based hosting maintenance requirements, helping the company plan future preventive
maintenance. The company can now plan maintenance for each asset on an as-
needed basis, rather than scheduling simultaneous maintenance for all assets of
that type, adding to cost reductions.

15 © 2015 IBM Corporation


Example: UK Retail 1520 4238 696 249
first two months…
New Prices
Products in ITP Recommend Modelled
pilot recommende ed prices prices
“The ITP tool has helped myself executed executed
d
and the team feel more confident
that we are competitive on price
on a daily basis. I’m sure that
once the planned changes are 14 62
Modelled
implemented in the tool this will Buyers using prices were to
become a slicker process that will pilot round the
aid in balancing the new daily “all prices price
workload.” “Continued updates
look “the ITP tool has enabled me to
from the team mean
accurate monitor competitive pricing &
that I can trust the tool”
and reactive with ease and speed.
trustworthy” Daily access to the tool allows
Categories being me to have consistently
used in Pilot: competitive pricing”
Personal Electrical
Small Appliances
Most broken rule No price changes Commercial
Smarts in “Price
Small Kitchen Appliances
executed that are 100% Trial”
“Online vs Store”
Toys categories in pilot (but policy compliant
not used) – Pre School & “Bottoming out of the 40 hours buyer
training &
Girls market” 2nd most broken feedback
All data for Electrical & rule 240 slices of
Toys is in ITP ready for Pizza eaten
pre-go live
rollout
16 grabs
£5m increased GM profit up for
16 © 2015 IBM Corporation
Example: A gas and electricity provider in multiple US States
Risk Analytics for Gas Pipeline Safety and Reliability Management

Safety, Reliability
Real time and big data enable
insight driven gas infrastructure
management

$500k Business challenge: Asset intensive industries delivering oil, gas and other liquid
products rely on the pipeline infrastructure to provide safe and timely deliver to
Gas value lost from an unnoticed their customers. This infrastructure is critical both from revenue and safety
leak during 3 months perspective. Recent incidents in pipeline industry has put the whole industry under
tremendous pressure to provide improved safety and reliability.

Data Driven Leak The smarter solution: IBM in collaboration with a number of pipeline companies
has built an innovative approach to risk mitigation by bringing the concept of real
Detection time analytics using descriptive, predictive and prescriptive analytics.
Detects leaks before SCADA • Real Time Risk Assessment
sensors • 360º Visibility in source of Risk
• Advanced What-If Dashboard

17 © 2015 IBM Corporation


There are implications for enterprise architects
These implications include:
 Being able to describe how “big data” is different from a data warehouse
 What architectural principles should be considered/adopted for:
 Analytics
• Including when descriptive/predictive/prescriptive analytics might apply to a problem
• When prescriptive analytics could generate automated actions
• What sort of roadmap might there be for analytics:
• Uses/use cases
• Tools
 Big Data
• When is data “big”, for example
• Information model(s)
 Technology choices among
 Technologies
 Vendors
 Operational implications

18 © 2015 IBM Corporation


Architecture Overview

Architecture Overview: Data and Analytics*

19 * From the IBM Big Data & Analytics Reference Architecture © 2015 IBM Corporation
Example for enterprise architecture: Sorting out the Analytics & Big
Data tools jungle
Analytic tools at a real
customer!

20 © 2015 IBM Corporation


Example for enterprise architecture: Big Data and Analytics
Governance
 This should be an extension of overall architecture governance
 It will introduce new:
 Technologies
 Methods
 Vendors
 Stakeholders
 For analytics, line of business expertise is critical
 IT will not normally have the subject matter expertise to address analytics for a
given business area
 This has implications for membership for any governance council or team
 Guidance for project teams is going to be needed
 Architecture Principles
 Recorded and shared architectural decision
 Approved tools
• For open source, IBM’s internal Green to Open Project is instructive
• See http://w3.ibm.com/articles/workingknowledge/2011/08/isc_g2oproject.html
21 © 2015 IBM Corporation
Example for enterprise architects: Data governance

 It’s unlikely you/your customer are already


doing this
 My hat is off to you, if you are!
 Don’t expect to ever get to completely
clean data
 That doesn’t mean you should ignore it
 In fact, dirty data can tell you a lot
 Including where to focus cleansing
work
 And give clues as to what is producing
dirty data
 IBM has an information governance model
 See http://w3-
03.ibm.com/software/spcn/content/N70
3009S74572N39.html
22 © 2015 IBM Corporation
Where you can find help

 Inside GTS
 The Enterprise Architecture SSA (of course)
 GTS has a Data Science community: https://ibm.biz/BdXu3R
 IBM Analytics Practitioners’ Community: https://ibm.biz/BdXu3E
 Beyond these, there are many communities out there, covering various aspects of analytics
and big data
 And your friends in the product organizations are ready to help, too!
 After all, that’s where this discussion came from

23 © 2015 IBM Corporation


Some places to learn more about data science & analytics
 Software Group Architecture Board Analytics Education Series:
http://w3.blueprint.sby.ibm.com/b_dir/blueprint.nsf/url/AB493445?OpenDocument
 IBM Big Data & Analytics Reference Architecture in iRAM: https://w3-
03.ibm.com/tools/cm/iram/oslc/assets/12F17905-3864-201B-BF14-55A563C4B63E/1.3.1
 Presentations on Big Data, SPSS & R on Media Library:
http://w3.tap.ibm.com/medialibrary/media_view?id=306478
 Spark @ IBM: http://w3.ibm.com/news/w3news/top_stories/2015/06/chq-ibm-spark.html
 Spark introductory video by Rob Thomas, VP, Product Development, IBM analytics
http://w3.ibm.com/news/w3news/top_stories/2015/06/chq-ibm-spark.html
 External IBM page on Data Science: http://www-01.ibm.com/software/data/infosphere/data-
scientist/
 IBM external page on Big Data: http://www.ibm.com/big-data/us/en/
 IBM external page on Analytics: http://www.ibm.com/analytics/us/en/
 IBM Big Data Hub: http://www.ibmbigdatahub.com/
 Data Science Central web site: http://www.datasciencecentral.com/
 IBM Learning – lots of material on data science, analytics, SPSS, Cognos https://w3-
151.ibm.com/learning/lms/Saba/Web/Main
 The Open Group’s Open Platform 3.0 Forum: http://opengroup.org/subjectareas/platform3.0
 A pretty good book from O'Reilly: doing Data Science: https://ibm.biz/BdXEgJ (yes, Tony
24 bought it) © 2015 IBM Corporation
Questions?
25 © 2015 IBM Corporation

You might also like