Tdwi Bobi 2014 Web Updated

Volume 11
The very best of

TDWIs BI articles,
research, and
newsletters
2013 IN REVIEW:
CLOUD BI
TAKES OFF
PLUS 2014 Forecast:

BI, Analytics, and Big Data Trends
and Recommendations for the New Year
Research Excerpts
Insightful Articles
The State of Big Data Management
Inside Facebook's Relational Platform
Implementation Practices for
What Data Warehouses Can Learn from Big Data
Better Decisions
The Future of Customer-Centric Retail
See and understand your data

in seconds with Tableau.
The data you capture has massive potential to give you a competitive advantage. Turn that
potential into reality with fast visual analysis and easy sharing of reports and dashboards from
Tableau. By revealing patterns, outliers and insights, Tableau helps you find more insights in
your data and get more value from it.
Tableau provides:
Live, optimized connection to a variety
of data sources
The ability to analyze truly big data at
interactive speeds
An in-memory data engine that
overcomes slow databases
Click on screen to play,
or view online version.
Drag & drop data visualization

no coding required
Tableau is changing the way companies are analyzing and sharing

their data. For a free trial visit, www.tableausoftware.com/tdwi
Volume 11
Table of Contents

FEATURES
5 2013 in Review: Cloud BI Takes Off

Stephen Swoyer
10 2014 Forecast: BI, Analytics, and Big Data Trends

and Recommendations for the New Year
Fern Halper, Philip Russom, David Stodder

Sponsor Index
Birst
HP Vertica
Information Builders
Paxata
Tableau Software
TDWI BEST PRACTICES REPORTS
21 The State of Big Data Management

Philip Russom

25 Implementation Practices for Better Decisions
David Stodder

29 TEN MISTAKES TO AVOID SERIES

Ten Mistakes to Avoid When Delivering Business-Driven BI
Laura Reeves

TDWI FLASHPOINT
32 Enabling an Agile Information Architecture
William McKnight

34 TDWI Salary Survey: Average Wages Rise a Modest 2.3
Percent in 2012
Mark Hammond

BUSINESS INTELLIGENCE JOURNAL

36 The Database Emperor Has No Clothes
David Teplow

41 Dynamic Pricing: The Future of Customer-Centric Retail
Troy Hiltbrand

BI THIS WEEK
48 Inside Facebooks Relational Platform
Stephen Swoyer

50 Load First, Model LaterWhat Data Warehouses
Can Learn from Big Data
Jonas Olsson

TDWI CHECKLIST REPORT

52 How to Gain Insight from Text
Fern Halper

55 NEW! Vote for Your Favorite

Best of BI Story
56 TDWI WEBINAR SERIES
57 TDWI EDUCATION
60 BEST PRACTICES AWARDS 2013
66 BI SOLUTIONS
69 ABOUT TDWI
TDWIS BEST OF BI VOL. 11
tdwi.org
Think fast.
Enterprise-caliber BI
born in the cloud.
Pulling real insights from all your data
just got a whole lot easier. And faster.
Birst is engineered with serious muscle under the
hood. Like an automated data warehouse coupled with
powerful analytics. Now add the speed and ease of use
of the cloud and you get a solution more flexible than
legacy BIand a whole lot more powerful than Data
Discovery. But, hey, dont just take our word for itfind
out why Gartner named us a Challenger in its most
recent BI Magic Quadrant and why more than a
thousand businesses rely on Birst for their analytic
needs. Learn to think fast at www.birst.com.
Take a look at whats under our hood attdwi.org

birst.com
Volume 11
Editorial Directors note

tdwi.org
Editorial Director Denelle Hanlon
Senior Production Editor Roxanne Cooke

Graphic Designer Rod Gosser
President
Richard Zbylut

Director of Education Paul Kautza
Director, Online Melissa Reeve

Products & Marketing
President & Neal Vitale

Chief Executive Officer
Senior Vice President & Richard Vitale

Chief Financial Officer
Executive Vice President Michael J. Valenti

Vice President, Finance Christopher M. Coates
& Administration
Vice President, Information Erik A. Lindgren

Technology & Application
Development

Vice President, David F. Meyers

Event Operations
Chairman of the Board Jeffrey S. Klein

REACHING THE STAFF
Staff may be reached via e-mail, telephone, fax, or mail.
E-mail: To e-mail any member of the staff, please use the

following form: FirstinitialLastname@1105media.com
Renton office (weekdays, 8:30 a.m.5:00 p.m. PT)
Telephone 425.277.9126; Fax 425.687.2842
555 S Renton Village Place, Ste. 700 Renton, WA 98057
Corporate office (weekdays, 8:30 a.m.5:30 p.m. PT)
Telephone 818.814.5200; Fax 818.734.1522
9201 Oakdale Avenue, Suite 101, Chatsworth, CA 91311
Welcome to the eleventh annual TDWIs Best of Business Intelligence: A Year

in Review. Each year we select a few of TDWIs most well-received, hardhitting articles, research, and information, and present them to you in this
publication.
Stephen Swoyer kicks off this issue with a review of major business
intelligence (BI) developments. In 2013 in Review: Cloud BI Takes Off, he
argues that cloud BI has gained real and verifiable momentum in 2013.
In this issues 2014 Forecast, TDWI Research analysts Fern Halper, Philip
Russom, and David Stodder share their predictions and recommendations for
the coming year, including BI and analytics trends and tips for successful big
data implementations.
To further represent TDWI Research, weve provided excerpts from some
of the past years Best Practices Reports. Russoms The State of Big
Data Management covers big data implementations, and Stodders
Implementation Practices for Better Decisions explains how to use data
visualization to help your organization succeed.
This volumes Ten Mistakes to Avoid will help you dodge common
blunders when delivering a BI solution. And thanks to articles from TDWIs
e-newsletters, youll learn more about agile information architectures, salary
trends in the BI/DW industry, Facebooks relational platform, and how to
maximize the ROI of your data warehouse.
In The Database Emperor Has No Clothes, one of our selections from
the Business Intelligence Journal, youll read about Hadoops advantages
over relational database management systems. Our second Journal piece,
Dynamic Pricing: The Future of Customer-Centric Retail describes a new toll
road in Washington, DC that uses big data and advanced analytics to monitor
and manage traffic flow.
ADVERTISING OPPORTUNITIES
Scott Geissler, sgeissler@tdwi.org, 248.658.6365
Reprints and E-prints: For single article reprints (in
minimum quantities of 250500), e-prints, plaques, and
posters, contact PARS International.
Phone 212.221.9595; E-mail 1105reprints@parsintl.com;
Web www.magreprints.com/QuickQuote.asp
Copyright 2014 by TDWI (The Data Warehousing Institute),
a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written
permission. Mail requests to Permissions Editor, c/o Best
of BI 2014, 555 S Renton Village Place, Ste. 700 Renton, WA
98057. The information in this magazine has not undergone
any formal testing by 1105 Media and is distributed without
any warranty expressed or implied. Implementation
or use of any information contained herein is the readers sole
responsibility. While the information has been reviewed for
accuracy, there is no guarantee that the same or similar
results may be achieved in all environments. Technical inaccuracies may result from printing errors, new developments
in the industry, and/or changes or enhancements to either
hardware or software components. Produced in the USA.
TDWI is a trademark of 1105 Media, Inc. Other product and
company names mentioned herein may be trademarks
and/or registered trademarks of their respective companies.
Were also including a selection of our informative, on-demand Webinars, as

well as a peek inside a TDWI World Conference keynote address, given by
popular speaker Ken Rudin. And dont miss our Best of BI story survey on
page 55we want to know your favorite stories from this issue.
TDWI is committed to providing industry professionals with information that
is educational, enlightening, and immediately applicable. Enjoy, and we look
forward to your feedback on the Best of Business Intelligence, Volume 11.
Denelle Hanlon
Editorial Director, TDWIs Best of Business Intelligence
The Data Warehousing Institute
dhanlon@tdwi.org
tdwi.org
FRE
E Do
wnl
oad
HP
Ver
ti
ca C
om
HP Vertica 7
mu
nity
Edit
io
Big Data Analytics - No Limits, No Compromises

Its Time to Modernize Your Enterprise Data Warehouse
So, you have made a major investment in your data warehouse. However, squeezing out marginal
performance gains requires you to constantly invest in more hardware and services. There has to
EHDEHWWHUPRUHFRVWHHFWLYHVROXWLRQsDVROXWLRQZKHUH\RXFDQSXWDQHQGWRWKHFRQVWDQW
compromise between your quality of insight and speed of decision making.
Why make compromises or be limited by your data warehouse? Purpose built for Big Data analytics
IURPWKHYHU\UVWOLQHRIFRGHRQO\WKH+39HUWLFD$QDO\WLFV3ODWIRUPFDQPRGHUQL]H\RXUHQWHUSULVH
data warehouse by delivering:
Optimized Data Storage
Blazing - Fast Analytics
Open Architecture
Massive Scalability
Store 10x-30x more data per server than

row databases to achieve the lowest TCO
Leverage tight built-in support for Hadoop,

R, and your choice of ETL and BI
DOWNLOAD NOW
Run queries 50x-1,000x faster than legacy

data warehouses
$GGDQXQOLPLWHGQXPEHURILQGXVWU\VWDQGDUG
servers for petabyte-scale
n!
feature
2013 in Review
ClOUD BI TAKES OFF
BY STEPHEN SWOYER
The year that was 2013 doesn't fit any obvious pattern.
It was like no other. Applications such as social media
sentiment analysis and big data analytics are hard, costly,
and time-consuming. In 2013, enterprises began to come
to terms with this inescapable fact.
Along the way, marketing shifted into overdrive, cloud BI
at long last took off, NoSQL grew exponentially, and a cocreator of the data warehousewhich, by the way, turned
25 this yearserved up a trenchant assessment of the state
of BI. Such was the year that was 2013.
Self-Service Salvation
This year, vendor marketers celebrated self service as a fix
for the usability, inflexibility, and adoption issues that
have long bedeviled BI. Selling it as a new tool, however,
was a tougher sell. Self-service BI isn't new; it isnt even
sort of new. A decade ago, the former Business Objects,
the former Cognos, and the former Hyperionalong
with BI stalwarts such as Information Builders and
tdwi.org
MicroStrategyalso championed self service as a solution

for what ailed BI. In painful point of fact, self service is
almost as old as BI itself.
Self service first became a mantra for BI back in the
early 1990s. The problem was that data was locked in a
database and [users] didnt want to have to go to IT to
ask them to write SQL queries whenever they needed
something, industry luminary Cindi Howson, a principal
with BIScorecard, told BI This Week in an interview in
2012. That first form of self service was generating SQL
through a semantic layer, a business view, or whatever you
want to call it.
Search might be marketed as a silver

bullet, but there's no disputing that it's a
genuinely intriguing technology.
One of the things that ailed BI in 1993 and 2003 is the
same thing that ails BI today: anemic adoption. This
is in spite of the fact that todays BI tools incorporate a
staggering array of self-service features, along with (often
genuinely helpful) data visualization capabilities. They
likewise address new usage paradigms (such as visual BI
discovery) and, accordingly, are less rigid (in regard to
both usability and information access) than were their
predecessors.
BI vendors in 2013 talked about use cases and feature
bundles that couldn't have been anticipated a decade ago,
but has any of this actually made BI better, more usable,
or more pervasive? Are companies having more success
with BIand does this success actually correlate with
value?
Judging by BI adoption rates, you might not think so.
According to Howson, BI adoption has hovered at or
around 25 percent for the better part of a decade. In the
2013 installment of BI Scorecard's Successful BI Survey,
sizeable percentages of respondents reported that BI tools
still aren't easy enough to use (an issue cited by 24 percent
of survey respondents) or aren't able to answer complex
questions (23 percent).
In Search of a Silver Bullet

Self service is a time-tested prescription for the BI blues.
Information search is a new(er) solution.
The case for search goes something like this: thanks
to the inflexibility of the data warehouse and its
rigid data model, information from multi-structured

sources (e.g., machine or sensor logs, blog postings and
documents, videos and photos) can't easily be prepared
and schematized for SQL query access. Information
search promises to bridge the structured and multistructured worlds, situating business facts in a rich
semantic context.
Search, too, is by no means new: Google, for example,
introduced an enterprise search appliance almost a decade
ago. That being said, 2013 produced some genuinely
interesting developments on the information search front,
such as IBM's still-incubating Project Neo naturallanguage search technology (NLS).
Elsewhere, Information Builders introduced a new version
of its WebFOCUS Magnify information search offering
and Microsoft touted NLS as a major part of PowerBI, the
BI and analytic component of its Office365 cloud service.
Also this year, vendors such as Cloudera (Cloudera Search),
MarkLogic, and DataRPMalong with established
players such as NeutrinoBI and Oracle (with its Endeca
product line)likewise touted search as a differentiating
technology.
True, search might be marketed as a silver bullet, but
there's no disputing that it's a genuinely intriguing
technology. What's more, it's becoming increasingly
commoditized. Tools such as Cloudera Search, DataRPM,
and WebFOCUS Magnify leverage open source software
(OSS) components such as Solr (an OSS search platform)
and Lucene (an OSS indexing library).
The upshot is that it's increasingly possible to build
a serviceable information search platform using free
OSS tools. For example, a savvy organization could
use a combination of Solr, the Apache unstructured
information management architecture (UIMA) project,
the R statistical programming environment, and other
technologies to build and deploy an analytic search
platform that addresses both faceted search (a nontaxonomic scheme for classifying information in multiple
dimensions) and NLS requirements. Look for information
search to play a more prominent role in 2014.
At Long Last, Cloud?

In 2013, we saw BI marketing shift to the cloud, with
Microsoft's PowerBI for Office365, a new cloud offering
from Tableau, a new Active Data Warehouse Private
Cloud service from Teradata, a cloud BI platform-as-aservice (PaaS) offering from start-up RedRock BI, and a
data management splash by relative newcomer Treasure
tdwi.org
Data, which markets a hosted big data analytic service

(this last mixes OSS pieces of Hadoop with proprietary
bits), among other entries.
There's been no shortage of SaaS BI offerings, including
solutions from Birst, Domo (a relative newcomer), and
GoodData, among othersbut for a long time, prevailing
wisdom held that enterprise BI and data warehousing
just wouldn't take in the cloud. BI information is
too sensitive and data warehousing workloads too
demanding for cloud environments, some argued.
As 2013 draws to a close, in fact, it's fair

to say that cloud BI has real and verifiable
momentum.
There's some truth to both claims: some workloads or
applications simply can't be shifted to the cloud, owing
chiefly to regulatory requirements. Give Mark Madsen,
a research analyst with IT strategy consultancy Third
Nature, a few hours and he'll exhaustively tally the many
and varied reasons data warehousing workloads aren't a
great fit for the highly virtualized, loosely coupled cloud.
That said, Madsen himself expects that a clear majority of
data warehousing (DW) workloads will shift to the cloud
over the next decade. As 2013 draws to a close, in fact, it's
fair to say that cloud BI has real and verifiable momentum.
Several vendorsRedRock BI, but also MicroStrategy
and Yellowfineven market BI PaaS offerings, which
shift BI (i.e., platform infrastructure, workloads, data,
development) entirely to the cloud.
MicroStrategy hosts its own PaaS service, while Yellowfin
BI can be deployed on Amazon Web Services (AWS) and
used in conjunction with Amazon's Redshift massively
parallel processing (MPP) cloud data warehouse service.
Other BI vendors, such as Actuate, Jaspersoft, and Talend,
are available as PaaS packages, too.
Then there's AWS, which is a bona fide cloud powerhouse.
A year ago, Amazon announced Redshift, an MPP
cloud data warehouse for AWS. Let's not sugarcoat this:
there are significant challenges involved in shifting data
warehouse workloads into the cloud. With Redshift,
Amazon seems to have licked many of them.
Steve Dine, managing partner with DataSource
Consulting and a frequent instructor at TDWI
educational events, says he's worked with Redshift in a few
client engagements.
It scales well. Just like any MPP system, it scales based on

how well you parallelize your workloads, how well you
partition your data, and how many nodes you spin up, he
explains.
[Redshift is] just like any columnar database: if you're
isolating it to a subset of attributes, it's great; if you're
trying to do very wide queries, as you would in many retail
situations, you are likely to see better performance from a
row-based MPP database.
Dine doesn't think of Redshift as a silver bullet (e.g., even
though it's inexpensive, Redshift's per-TB pricing can
quickly add up) but sees it (1) as a compelling option for
smaller companies looking to build data warehouses in the
cloud, and (2) as a proof of concept for large companies
concerned about shifting MPP workloads to the cloud.
It just democratizes [MPP analytic databases], he points
out. What's nice about it is that you can spin it up and
set it to automatically take snapshots. You can bring it
up and take it down whenever you want. Will it work for
everybody? As with any [MPP platform], it just depends
on what your workload is.
Hadoop, NoSQL, and Google F1

As a combined market/technology segment, NoSQL
continues to grow like a flowering kudzu plant. At
O'Reilly's Strata conference in February, Pivotal, a big data
spin-off formed by EMC and VMWare last December,
announced Hawq, an ANSI SQL, ACID-compliant
database system (based on EMC's Greenplum MPP
database) for Hadoop.
Elsewhere this year, the OSS community (aided by
Cloudera, Hortonworks, MapR, and other commercial
software vendors) focused on bolstering Hadoop's security
and disaster recovery bona fides. One of the biggest
deliverables of 2013 was version 2.2 of the Hadoop
framework, which went live in early October. Hadoop 2.2
bundles YARN (a backronym for yet another resource
negotiator), which promises to make it easier to monitor
and manage non-MapReduce workloads in Hadoop
clusters. (Prior to YARN, Hadoop's JobTracker and
TaskTracker jointly managed resource negotiation. Both
daemons were built with the MapReduce compute engine
in mind.) Now that YARN's available, users should finally
be able to manage, monitor, and scale mixed workloads in
the Hadoop environment.
Nor is Hadoop the last word in NoSQL. Vendors such as
Basho Technologies (which develops the Riak distributed
DBMS), Cloudant (which bases its distributed NoSQL
tdwi.org
database on the Apache CouchDB project), DataStax

(a commercial distribution of Apache Cassandra),
FoundationDB, MarkLogic, and RainStor, among others,
would beg to differ. This October, Basho previewed a
version 2 release of Riak that claims to support strong
consistency (i.e., strong transactions, or the ACID that's
known and loved by DM types). Most distributed DBMSs
(such as NuoDB and Splice Machine) support what's
known as eventual consistency.
An altogether new entrant was F1, the ANSI SQLand ACID-compliant database platform that Google
announced in September. Google, which uses F1 to
power its AdWords service, claims it can function as a
single platform for both OLTP and analytic workloads.
Unlike Hadoop, F1 addresses classic data management
requirements (with support for strong transactions and
row-level locking)and does so at Google scale. Google's
push behind F1 underscores an important consideration:
we conflate the terms NoSQL, big data, and Hadoop at our
own risk.
Final Thoughts: Business unIntelligence and

the Data Warehouse at 25
This year was a milestone annum, too. The data warehouse
itself was born 25 years ago, in 1988, when Dr. Barry
Devlin and Paul Murphy published their seminal paper,
An architecture for a business and information system, in
the IBM System Journal.
The Business of BI: Comings, Goings,

and IPOs
This year, we bade adieuchiefly by way of acquisition
to several stalwart vendors. Composite Software, Kalido,
KXEN, ParAccel, and Pervasive Software, among others,
were acquired this year. Cisco Systemswhich seems
poised to make a big push into data management in 2014
and beyondsnapped up Composite in June; Silverback
Enterprise Group, an Austin-based holding company,
acquired Kalido in October; SAP nabbed KXEN, a longtime Teradata Partner, in September; and Actian acquired
both ParAccel and Pervasive. Will these technologies
survive and thrive, or will they vanish (as with the former
Brio Software, the former Celequest, and the former
DecisionPoint Software, to name just a few) into the void
of the industry's memory hole?
Over the last quarter century,

organizations have invested hundreds
of billions of dollars in data warehouse
systems, to say nothing of the BI tools that
are the DW's raison d'tre.
Over the last quarter century, organizations have invested

hundreds of billions of dollars in data warehouse systems,
to say nothing of the BI tools that are the DW's raison
d'tre. There probably isn't a Global 2000 organization
that doesn't have at least one enterprise data warehouse.
Elsewhere this year, Tableau's long-rumoured IPO finally

(and successfully) took place. Dell pulled off a kind of
reverse IPO: in early February, its shares were delisted
from both the NASDAQ and the Hong Kong Stock
Exchange. At the same time, founder Michael Dell
(bolstered by VC giant Silver Lake Partners, and with
an additional $2 billion in financing from Microsoft)
came back to take Dell private. Prior to its delisting, Dell
had managed to cobble together a large information
management portfolio, anchored by its Toad assets, which
it acquired from the former Quest Software. Its execution
in 2014 will bear watching.
The net net of this ubiquity, as Devlin demonstrates

with perspicacity and humor, is a kind of muddling
through. (The unIntelligence in Devlin's title speaks
to precisely this problem.) In this regard, Devlin could
well say of BI what philosopher Immanuel Kant famously
said of humankind: Out of the crooked timber of [a BI
implementation project], no straight thing was ever made.
So, too, will that of Teradata, which this year became

acquainted with the downside of being a public
company. Teradata missed its earnings in Q2 andjust
ahead of its Partners conference in Octobercut its
earnings outlook for the year. As a result, the DW giant's
stock was repeatedly buffeted by the market. Slings and
arrows, indeed.
Devlin was back in 2013 with a new book, Business

unIntelligence: Insight and Innovation beyond Analytics and
Big Data. In many ways, Devlin's book is a wry assessment
of his prodigal creation, which, a quarter century on, is at
once dominant and besieged.
tdwi.org
Buzzwords: A Plea
A new year brings new buzzwords. By buzzword, we
mean those once-unique coinages thatwhen used
sparingly or in isolationhave imaginative, conceptual,
and/or thematic power. As poet A.R. Ammons once aptly
put it, A word too much repeated falls out of being.
And how. This year, adjectives such as disruptive,
transformative, self service, high value, and
advanced and/or predictive analytic along with
the term analytic itselfwere taken up as adjectives or
adverbs by marketers everywhere. Even the word cloud
was used and misused by marketeers.
All of these terms passed into a lexicon of descriptive
wordssprinkled with a few scant nouns and verbsthat
includes marketing mainstays such as game changing,
unprecedented, and patent pending, as well as more
innocuous descriptors such as visual, intuitive, market
leading, or innovative.
These words comprise the noise that we must filter out if
we're to meaningfully assess products and technologies, or
(more important) make buying decisions about products
and technologies. It isn't that these words aren't useful
and don't mean anything. Rather, it's that the contexts in
which they're employed are so general (or so inapposite)
as to dilute their meanings. They're all-noise, noor very
littlesignal.
This year, and with shocking frequency, BI and DM
vendors delivered disruptive tools that surface (a
popular alternative is expose) innovative and/
or intuitive capabilitiesalmost always in a rich
visual and/or self-service contextand promise to
transform a typically dysfunctional status quo. In
many cases, tools tout advanced analytic or predictive
capabilities and (increasingly) have cloud components
or attributes, too.
This author is as guilty in the dilution of these terms as
anyone else. For the New Year, he's resolved to strike
them from his lexiconat least in contexts where they're
inappropriate.
If only industry vendors would do the same.
Stephen Swoyer is a contributing editor for TDWI.
tdwi.org
feature
2014 Forecast:
BI, Analytics, and Big Data Trends and

Recommendations for the New Year
By Fern Halper, Philip Russom, and David Stodder
Four Analytics
Technology Trends
for 2014
By Fern Halper, Research Director for Advanced
Analytics, TDWI
In 2013, TDWI saw increasing activity around predictive

analytics as a foothold for advanced analytics. Predictive
analytics is fast becoming an important component of
an organizations analytics arsenal, providing significant
advantage for achieving a range of desired business
outcomes, including higher customer profitability and
more efficient and effective operations. TDWI expects
interest will continue to build in 2014 for this technology,
and that it will continue to evolve. Additionally, other
advanced analytics technologies will begin to gain
momentum.
Advanced analytics provides algorithms for complex

analysis of either structured or unstructured data. It uses
sophisticated statistical models, machine learning, and
other advanced techniques to find patterns in data for
prediction and decision optimization. Although some of
the techniques for advanced analytics have been around
for many years, several factors have come together in
almost a perfect storm to ignite increasing market interest
in these technologies: the explosion of data (type, volume,
and frequency); the availability of cheap computing
power; and the realization that analytics can provide a
competitive advantage.
Here are four analytics trends that I see as we move into
2014.
Trend #1: Predictive analytics deployment models progress.
Predictive analytics is a technology whose time has finally
come. Although many of the algorithms have been around
for decades, more organizations want to utilize the power
of this technology. Use cases include predicting customer
tdwi.org
10
What kind of analytics are you currently using in your organization today to analyze data?
In three years? Please select all that apply.
Analysis type
Using now or will use within 3 years
Visualization tools
96%
Predictive analytics
88%
Geospatial analytics
70%
Other advanced statistical techniques (e.g. clustering,

forecasting, optimization)
64%
Text analytics
53%
Web analytics
51%
Social media analytics
51%
Other data mining techniques (e.g. neural nets,

machine learning)
Link analysis
45%
28%
Figure 1. Source: TDWI World Conference Tech Survey, 2013.
and machine behavior, such as fraud, churn, or machine

failure. Figure 1, from a recent TDWI World Conference
survey, illustrates that close to 90% of respondents
cited predictive analytics as a technology they would be
using in the next three years. Twenty-seven percent were
currently using it. TDWI expects adoption of predictive
analytics will continue in 2014.
The deployment options for this technology are evolving.
There has been a market move over the past few years to
democratize predictive analytics (i.e., make it easier to use).
This has fueled the growth of the technology. For example,
in the new TDWI Best Practices Report, Predictive
Analytics for Business Advantage, 86% of respondents cited
business analysts and 79% cited statisticians when we
asked the question, In the near future, who do you expect
will be using predictive analytics tools in your company?
This points to a shift occurring (at least in perception)
about who is going to make use of predictive analytics.
Many respondents believe that business analysts will build
the predictive models. Whether all business analysts have
the skills to build complex models is another discussion.
However, other deployment models for predictive
analytics that can help a wider group of people make
use of the technology will become more popular in 2014.
These include:
Operationalizing it. One way to make predictive analytics
more pervasive is to include it as part of an automated
business process. For instance, a data scientist or business analyst might build a model for cross-sell and
up-sell that is instantiated into a call center system. A

call center agent might use the model output without
even necessarily knowing there is a complex model
working behind the scenes. The agent might only see the
next best offer to suggest to a customer.
Consumerizing it. Another way to make predictive
analytics more consumable is for a technical person to
build a model that someone less technical can interact
with. For instance, a data scientist might build a model
that a marketing analyst uses.
Trend #2: Geospatial analytics continues to gain steam. Geospatial
data, sometimes referred to as location data or simply
spatial data, is emerging as an important source of
information in both traditional and big data analytics.
Geospatial data and geographic information systems
(GIS) software are being integrated with other analytics
products to enable analytics that utilize location and
geographic information. Use cases include market
segmentation, logistics, detecting fraud, and situational
awareness.
Geospatial analytics is being used in visualizations
that layer geospatial information together with other
information to spot patterns. Geospatial data is also
being combined with other forms of data to be used in
more sophisticated analysis, such as prediction. In this
case, the geospatial data is combined with other sources
of data as attributes for a model. In fact, more than
70% of respondents surveyed for Predictive Analytics for
Business Advantage indicated that they plan to incorporate
tdwi.org
11
geospatial data into their predictive models within

the next three years (see Figure 3). And as Figure 1
illustrates, close to 70% of the respondents to the 2013
World Conference survey plan to be using geospatial
analytics in the next three years. This kind of analysis is
unmistakably gaining in popularity. TDWI expects this
trend to continue in 2014.
Trend #3: Analytics in the cloud becomes more popular. Although
the adoption of cloud analytics has been slower than
predicted, TDWI is seeing an increasing number of
companies investigating the technology. This trend
will continue into 2014. A hybrid approach (using a
combination of public and private clouds) will also become
more popular.
TDWI Research in predictive analytics supports this trend.
In Figure 2, only 26% of respondents stated that they
would never use the cloud for BI or predictive analytics,
whereas 35% were thinking about using the cloud for
some kind of analytics and 25% were currently using it.
This is an increase from previous surveys.
As companies start to think about analytic workloads, the
cloud will continue to become more popular. TDWI is
already seeing certain kinds of analytics workloads move
to the cloud. Typically, these are not well suited to the
data warehouse. For instance, companies that are already
collecting data in the public cloud are analyzing it there to
reduce it. This might include telemetry data or other kinds
of big data. They are then sending this reduced data set to
their on-premises data centers for further analysis.
Companies are also using the public cloud for analytics
sandboxes (i.e., test beds) for advanced analytics, but
more often leaving the data there and using the cloud for
many different kinds of advanced analysis. As this occurs,
service providers are also setting up communities where
data (such as census data) can be shared. TDWI expects to
see more use cases of cloud analytics emerge in 2014.
Trend #4: Data and the Internet of Things. There is little doubt
that the world will continue to create more data. TDWI
Research indicates that in addition to ever-increasing
amounts of structured data, companies will begin to use
other forms of data for analysis. For instance, in Predictive
Analytics for Business Advantage, we asked respondents who
were already using predictive analytics what data they plan
to use for it. Figure 3 illustrates their responses.
Three big growth areas (aside from geospatial data) over
the next three years include social media data, text data,
and real-time event data. For instance, TDWI sees
more companies utilizing text analytics technologies to
essentially structure unstructured data. Organizations are
using this data in isolation to discover patterns; however,
they are also marrying the text data with structured data
to provide lift to advanced analytics models.
Likewise, more companies are looking to real-time data for
advanced analytics. New technologiessuch as complex
Does your organization use cloud computing for analytics?

2%
2%
We use a public cloud or SaaS for BI and/or predictive analytics

14%
4%
We use a public cloud or SaaS for BI and/or predictive analytics and
on-premises
6%
11%
We use a hybrid approach to BI/predictive analytics, meaning we use

both public and private cloud
We use cloud and on-premises solutions
35%
26%
We use a private cloud for BI or predictive analytics

We would never use the cloud for BI or predictive analytics
35%
26%
We don't use the cloud now but we are thinking about using a cloud for
BI or predictive analytics
Dont know
Figure 2. Source: TDWI Research survey to support TDWI Best Practices Report: Predictive Analytics for
Business Advantage, 2014.
tdwi.org
12
What kind of data do you use for predictive analytics? Now? Three years from now?
Using today and
will keep using
Will use within
3 years
2%
Structured data (from tables, records)
98%
Demographic data
77%
Times series data
65%
No plans
N/A or dont know
Web log data
35%
Clickstream data from websites
32%
Real-time event data
31%
Internal text data (i.e. from e-mails,call

center notes, claims, etc.)
31%
Machine-generated data (e.g., RFID,

sensor, etc.)
14%
37%
Geospatial data
External social media text data
11%
19%
11%
21%
9%
37%
16%
12%
25%
40%
14%
20%
45%
9%
18%
44%
22%
10%
33%
29%
21%
6% 6%
25%
38%
6%
10%
21%
Figure 3. Source: TDWI Best Practices Report: Predictive Analytics for Business Advantage, 2014.
event processing (CEP), stream mining, in-memory
analytics, and in-database analyticshave enabled realtime analytics for customer interactions and operational
and situational intelligence, among other use cases. TDWI
expects that more users will adopt real-time data in 2014.
One class of potentially real-time data is machine-generated
data, which is called out separately in Figure 3. Interestingly,
close to 20% of respondents are already using machinegenerated data in some sort of predictive capacity. Another
20% expect to use it in the next three years.
This machine-generated data is part of the Internet of
Things (IoT), which TDWI expects to grow in market
awareness in 2014. This term refers to the fact that devices
are becoming more numerous, and they are equipped with
sensors and other technologies that can send data over
the Internet. These devices are generating huge amounts
of data that can be used in various ways. For instance,
insurance companies are looking to use telemetric data from
devices placed in cars to help develop better risk models
for insurance. Logistics providers are tracking produce
that is tagged with RFID tags to check for temperature
and spoilage. Since data simply dumped into some sort of
storage environment is not that useful, IoT analytics will
also start to evolve.
A Final Word
Companies are finally starting to utilize more advanced
analytics, and this trend will continue into 2014. This
will be the case whether organizations are dealing with
big data or with their current data sets. Organizations
are looking to employ more analytics over different kinds

of data, especially once they feel that they have their BI
implementations under control. 2014 promises to be an
exciting year for analytics!
Fern Halper is director of TDWI Research for advanced
analytics, focusing on predictive analytics, social media
analysis, text analytics, cloud computing, and other big data
analytics approaches. She has more than 20 years of experience
in data and business analysis, and has published numerous
articles on data mining and information technology. Halper
is co-author of Dummies books on cloud computing, hybrid
cloud, service-oriented architecture, service management, and
big data. She has been a partner at industry analyst firm
Hurwitz & Associates and a lead analyst for AT&T Bell Labs.
Her Ph.D. is from Texas A&M University. You can reach her
at fhalper@tdwi.org, or follow her on
Twitter: @fhalper.
10 Recommendations for
Big Data Implementations
By Philip Russom, Research Director for Data
Management, TDWI
Its human nature to start a new calendar year by

ruminating on the many things youd like to achieve.
After all, most of us humans want to improve our work
and home lives, and setting goals is one way to achieve
improvement.
If youre a data management professional or similar
specialist, the spirit of the new year may have you thinking
tdwi.org
13
about how to get a better grip on one of the most

apparent opportunities facing us nowadays, namely socalled big data.
Managing big data is a relatively new practice, so
its best practices and critical success factors are still
emerging. To give you a leg up, allow me to present 10
recommendations for successful big data implementations.
1. Demand business value from big data.

Think of big data as an opportunity, and seize it. In
TDWIs big data management survey of 2013, 89% of
survey respondents said the management of big data is an
opportunity. Sure, the management of big data presents
technical challenges, but the insights resulting from
the analysis of big data can lead to cost reductions and
revenue lift. Hence, the primary path to business value
from big data is through analytics. A second path joins
new big data with older enterprise data to extend complete
views of customers and other business entities. A third
path taps streaming big data to enlighten and accelerate
time-sensitive business processes.
Leverage big data, dont just manage it. It costs money,
time, bandwidth, and human resources to capture, store,
process, and deliver big data. Therefore, no one should be
content to simply manage big data as a cost center that
burns up valuable resources.
2. Put advanced analytics and big data together.

Its the analytics, stupid. Current consensus says that
analytics is the primary path to getting business value
from big data. Therefore, the point of managing big data is
to provide a large and rich data set for actionable business
insights, discovered via analytics. This fact is so apparent
that theres even a name for it: big data analytics.
For example, a common analytic application today is
the sessionization of website log data, which reveals the
behavior of site visitorsinformation that helps marketers
and Web designers do their jobs better. As another
example, trucks and railcars are loaded with sensors
and GPS systems nowadays so logistic firms can analyze
operator behavior, vehicle performance, onboard inventory,
and delivery route efficiency.
In these examples, collecting big data from Web
applications or sensors is almost incidental. The real point
is to elevate the business to the next level of corporate
performance based on insights gleaned from the analysis
of big data.
3. Dont expect new forms of analytics to replace

older forms.
Online analytic processing (OLAP) continues to be the
most common form of analytics today. OLAP is here
to stay because of its value serving a wide range of end
users. The current trend is to complement OLAP with
advanced forms of analytics based on technologies for
data mining, statistics, natural language processing, and
SQL-based analytics. These are more suited to exploration
and discovery than OLAP is. Note that most data
warehouses today are designed to provide data mostly for
standard reports and OLAP, whereas future-facing data
warehouses also provide additional data and functionality
for advanced analytics.
Use big data to create new applications and extend old
ones. For example, big data can expand the data samples
that data mining and statistical analysis applications
depend on for accurate actuarial calculations and
customer segments or profiles. Similarly, much of big
datas value comes from mixing it with other enterprise
data. The proverbial 360-degree view of customers and
other business entities accumulates more degrees when big
data from new sources is integrated into views.
4. Hire and train your staff for big data

management.
The focus should be on training and hiring data analysts,
data scientists, and data architects who can develop the
applications for data exploration, discovery analytics, and
real-time monitoring that organizations need if theyre to
get full value from big data. Most BI/DW professionals
are already cross-trained in many data disciplines; crosstrain them more. When in doubt, hire and train data
specialists, not application specialists, to manage big data.
TDWIs take is that its easier to train a BI professional in
Hadoop and other big data technologies than it is to train
an applications developer in BI and data warehousing.
As with all data management, collaboration is key to
the management of big data. Due to big datas diversity,
diverse technology teams will need to play coordinated
roles. From a business viewpoint, big data should be
managed as an enterprise asset, such that multiple business
units and stakeholders have access to big data so they can
leverage it. It takes a lot of collaborationboth business
and technicalto be sure everyone knows their role and
has their needs met.
tdwi.org
14
5. Beware the proliferation of siloed repositories

for big data analytics.
Most analytic applications have a departmental bias.
For example, the sales and marketing departments
want to own and control customer intelligence, just as
procurement needs to control supply chain analytics,
and the financial department owns financial analysis.
Furthermore, unstructured and semi-structured big
data is regularly segregated because it cant be managed
properly in the usual relational databases. For these and
other reasons, TDWI sees big data collections and big data
platforms (such as Hadoop and NoSQL databases) too
often managed in isolation silos.
Your goal should be to integrate big data into your wellintegrated enterprise data and BI/DW environments, not
proliferate twenty-first-century spreadmarts and teramarts.
Besides, eventually well stop calling it big data and just
assumes its a subset of enterprise data. Someone (probably
not you) should decide whether big data platforms will be
departmentally owned (as a lot of analytic applications are)
or shared enterprise infrastructure supplied by central IT
(similar to how IT provides SAN/NAS storage, servers, the
network, and so on).
6. Consider a data warehouse architecture that

mixes relational and Hadoop technologies.
Architecture can enable or inhibit critical next-generation
big data management functions such as extreme scalability,
complete views, unforeseen forms of analytics, big data as
an enterprise asset, and real-time operation.
One architectural strategy being adopted by many
organizations is to reserve the data warehouse for the
relational and multidimensional data that populates the
majority of BI deliverablesnamely standard reports,
reports in dashboard or scorecard styles, metrics and key
performance indicators for performance management, and
multidimensional OLAP. In most organizations, the list
constitutes a whopping 80% or more of the output of a BI
program. So it makes sense that you guard the DW that
your deliverables (and your job!) depend on most.
On one hand, this architecture assures that the vast
majority of BI deliverables have ample, clean, wellmodeled, and well-documented data sourced from
a traditional warehouse. On the other hand, this
architecture assumes that the 20% or fewer deliverables for
data exploration and advanced analytics will be populated
with big data and similar data sets managed on other
data platforms, not the core data warehouse. This is quite
advantageous because each deliverable type and the data

it requires is supported by data platforms that are most
conducive and most easily optimized for them. It also
parallels team structures that separate reporting and
analytics, because the two require very different skills
and tools.
Organizations that have moved to this two-part DW
architecture usually depend on two key platform types: a
relational database management system (RDBMS) and
the Hadoop Distributed File System (HDFS). TDWI
keeps finding more RDBMS/HDFS data warehouses, in
a growing list of industries, which indicates this will soon
be a common architectural approach to data warehouse
environments, especially in organizations that need to
leverage big data via analytics while still maintaining high
standards in reporting and related deliverables.
7. Define places for big data in architectures

for data warehousing and enterprise data
management.
For example, an obvious place to start is to rethink the
data staging area within your data warehouse. Thats
where big data enters a data warehouse environment and
where it is usually stored and processed before being
loaded into the warehouse proper. Consider moving your
data staging area to a standalone big data management
platformon Hadoop, a columnar DBMS, a data
warehouse appliance, or a combination of these and other
alternative (non-relational) data platforms outside the core
data warehouse.
Data staging aside, there are many other areas within
standard DW architectures where alternative data
platforms can make a contribution, namely in archiving
detailed source data, managing non-structured data,
managing file-based data, data sandboxes, more processing
power for an ETL hub or ELT push down, and anywhere
you might use a non-dimensional operational data store.
Consider the many new architectures that boost scalability
and performance for big data. If your relational data
warehouse is still on an SMP platform, make migration to
MPP a priority. Consider distributing your data warehouse
architecture, largely to offload a workload to a standalone
platform that performs well with that workload. When
possible, take analytic algorithms to the data, instead of
data to the algorithm (as is the DW tradition); this new
paradigm is seen with in-database analytics, Hadoop with
MapReduce layered over it, and gate-array processing in
some storage platforms and appliances.
tdwi.org
15
8. Reevaluate your current portfolio of data

platforms and data management tools.
For one thing, big data management is, more and more,
a multi-platform solution (as are most data warehouse
architectures), so you should expect to further diversify
your software portfolio accordingly to fully accommodate
big data. For another thing, survey data shows that the
software types poised for the most brisk new adoption
in the next three years are Hadoop (including HDFS,
MapReduce, and miscellaneous Hadoop tools) and
complex event processing (for streaming real-time big
data). After those come NoSQL DBMSs, private clouds,
and data virtualization/federation. If youre like most
organizations surveyed, all of these have a potential use in
your big data management (BDM) solution, so you should
educate yourself about them, then evaluate the ones that
come closest to your BDM requirements.
In addition, diverse big data is subject to diverse processing,
which may require multiple platforms. To keep things
simple, users should manage big data on as few data
platform types as possible to minimize data movement as
well as to avoid data synchronization and silo problems
that work against the single version of the truth. Yet
there are ample exceptions to this rule, such as the multiplatform RDBMS/HDFS architecture for DWs discussed
earlier. As you expand into multiple types of analytics with
multiple big data structures, you will inevitably spawn
many types of data workloads. Because no single platform
runs all workloads equally well, most DW and analytic
systems are trending toward a multi-platform environment.
9. Embrace all formats of big data, not just

relational big data.
Non-structured and semi-structured data types are daunting
for the uninitiated, but they are the final frontierthe data
your enterprise hasnt tapped for analytics. For example,
human language text drawn from your website, call center
application, and social media can be processed by tools
for text mining or text analytics to create a sentiment
analysis, which in turn gives sales and marketing valuable
insights into what your customers think of your firm
and its products. As another example, organizations with
an active supply chain can analyze semi-structured data
exchanged among partners (in, say, XML, JSON, RFID, or
CSV formats) to understand which partners are the most
profitable and which supplies are of the highest quality.
such as log files that have a recurring record structure.

Carefully select a beachhead for unstructured data, such
as text analytics applied to call center text in support
of sentiment analysis. Look for mission-critical data
thats semi-structured, as in the XML documents your
procurement department is exchanging with partnering
companies. Then continue down the line of big data types.
10. Embrace big data in motion, not just big data

at rest.
Some forms of big data are generated continuously. For
example, Web servers can capture every click of every
website visitor and append information about these to logs.
An RFID chip emits data every time it passes by an RFID
receiver. Sensors mounted on mobile assets (e.g., trucks,
railcars, shipping pallets) transmit valuable information
about their route and environment.
In a lot of ways, streaming data of this sort is the hardest
form of big data to handle, because it takes special
systems to capture and process the data in real time, such
as systems-based complex event processing (CEP). Yet
streaming data is worth the effort and expense when it
delivers unique insights into business processes, and does so
faster and more frequently than any other data source can.
Many organizations start with streaming big data by
capturing a stream and analyzing it offline. (Web log
data is a common stream to start with, followed by
various types of machine data.) Assuming the analysis
corroborates that the stream contains valuable content, the
next phase is to start processing the messages, events, and
other data in the stream as they arrive in real time.
Philip Russom is director of TDWI Research for data
management and oversees many of TDWIs research-oriented
publications, services, and events. He is a well-known figure in
data warehousing and business intelligence, having published
over 500 research reports, magazine articles, opinion columns,
speeches, Webinars, and more. Before joining TDWI in 2005,
Russom was an industry analyst covering BI at Forrester
Research and Giga Information Group. He also ran his own
business as an independent industry analyst and BI consultant
and was a contributing editor with leading IT magazines.
Before that, Russom worked in technical and marketing
positions for various database vendors. You can reach him at
prussom@tdwi.org, @prussom on Twitter, and on LinkedIn at
linkedin.com/in/philiprussom.
Create a phased plan that eventually addresses all types

of big data. You have to start somewhere, so start with
relational data, then move on to other structured data,
tdwi.org
16
Driving the Next Phase

of BI Innovation: Four
Trends for 2014
By David Stodder, Research Director for Business
Intelligence, TDWI
Heading into the New Year, three Cs dominate many

discussions about business intelligence and self-directed
discovery analytics: content, context, and collaboration.
Users want to go beyond the traditional limits of BI and
data warehousing systems to access and analyze big data,
which includes unstructured and semi-structured content.
Users also seek more than just the numbers; they want
context to fill in the gaps. New requirements are making
it necessary for business users and IT to establish better
forms of collaboration to accomplish objectives, govern the
data, and ensure overall performance.
Here are four trends that I see guiding how organizations
will approach BI and emerging visual, self-directed
data discovery analytics deployments in the next 12
months. Note that given these trends, success with BI
and analytics will increasingly involve innovation in both
technology implementation and in solving the often more
difficult people and organizational challenges.
Trend #1: Expansion in users consumption and direction of BI
and analytics will force recalibration of the relationship with IT.
ITs role in BI and analytics is and always will be vital,
but the spotlight today is on business-driven BI and
analytics. Widespread implementation of cloud computing
and software-as-a-service for customer relationship
management, project management, and other applications
has opened the door to the movement of BI and analytics

outside ITs direct control. Although data ownership and
governance concerns require organizations to be cautious
about who is doing what with the data, the clear trend
is toward increased data consumption and self-directed
analysis by a broad range of users.
Often, the initial intent is not to create permanent
business-driven systems. Built to be disposable, such
shadow BI and analytics systems are to be shut down
once a marketing campaign or particular analytical
inquiry has run its course (of course, we all know how
many such systems really die versus those that end
up in ITs lap). Shadow systems can be beneficial to
organizations as a place for experimentation with new and
innovative technologies or practices that are not part of
ITs repertoire. TDWI Research finds that one of the top
reasons for business users to deploy self-service, businessdriven BI and analytics systems is that IT lacks the
experience and expertise to give users what they want (see
Figure 1).
The business-driven trend in BI and analytics means that
business and IT must define a new level of collaboration.
IT has a key role in governance, but IT must also excel at
preparing and provisioning data for the growing population
of data consumers and self-directed analysts. One key step
in forging a new collaborative relationship is the adoption of
agile development methods, which bring business users and
IT developers together in small teams to work on projects
designed to deliver continuous, incremental value. In the
coming year, we will see a growing number of organizations
either adopt agile methods or apply agile principles to
update and improve collaboration in the development of BI
and analytics systems.
What are your organizations main reasons for implementing self-service BI and analytics?
67%
Users are requesting to do more on their own
58%
IT cannot keep up with changing business needs

Users are going rogue and IT needs a comprehensive solution
Current BI processes cannot adapt to test-and-learn analytic processes
38%
32%
IT lacks adequate BI/analytics expertise
31%
Lack of IT budget or need to reduce ITs BI/DW budget
28%
Users need access to unstructured data sources and content

We do not have a self-service BI initiative
Poor quality of data in IT-managed BI reports
27%
23%
18%
Figure 1. Source: TDWI Best Practices Report: Achieving Greater Agility with Business Intelligence, 2013.
tdwi.org
17
Trend #2: BI and visual discovery implementations focus on

improving business processes. In recent years, the biggest
buzz in BI has been about self-service visual discovery
tools. These have enabled nontechnical users to get
beyond simple reports to investigate the why questions
behind the numbers. The tools have accelerated the trend
toward greater independence from IT for data access and
BI dashboard creation. Although independence from
IT offers pluses and minuses from a data governance
perspective, enabling users to do more investigation and
sharing of insights on their own is critical to developing
a broader analytic culture and strengthening data-driven
decision making.
Improving user productivity has long been a key goal of
BI. Self-service discovery can help organizations accelerate
progress toward user productivity goals by more closely
integrating analytic activities with business process
workflow. Then, users can apply their discovery insights
more directly to the tasks and automated procedures for
which they are held accountable. Discovery analytics
can improve situational awareness by enabling users
to examine, in real time, patterns that they may detect
through activity monitoring or alerting functions.
In 2014, we will see leading BI and visual data discovery
tool vendors focus technology releases on achieving a
tighter integration with process workflow, role-based
responsibilities and interaction, and common metadata
across process and analysis applications.
as whether website visitors are finding information easily

or whether refinements need to be made to content
classification systems.
Moving forward, organizations need to address user
requirements to be able to view or access the full spectrum
of data. In the new year, leading vendors will more fully
incorporate search and text analytics tools and functions
to expand and unify data access and analysis beyond the
limits of existing BI and DW boundaries.
Trend #4: Analytics will enable users to improve operational
decisions through insights into broader patterns. Traditional
BI applications often fall short of supporting operational
decision makers because the systems exist in silos. They do
not enable users to access data from outside single or small
numbers of sources, and they limit views to simple reports
and dashboards. Users can thus be blind to insights drawn
from data scientists analyses of multiple and diverse data
sources that could be valuable to operational decisions and
performance monitoring.
Leading organizations are beginning to link advanced
analytics focused on larger trends and patterns to users BI
and performance management systems so that insights
drawn from analytics can be brought to bear on daily
interactions with customers, patients, partners, or other
individuals. They are also basing key performance metrics
on sharp, detailed analytics rather than standard budget
numbers and forecasts.
Trend #3: Organizations will expand the role of text analytics and
enterprise search in users BI and discovery tool sets. For most
organizations, textual content still accounts for the lions
share of their data and most of what lies beyond data
warehouse systems that manage structured relational data.
In addition to documents, e-mails, customer satisfaction
surveys, and other internal content, organizations are
reaching out externally into social media sources and are
applying text analytics to interpret customer sentiment.
A good example is the use of BI and analytics for

population health management. Healthcare providers
want to discover and track trends and patterns in larger
populations and deliver analytic insights in real time so
that practitioners can make better decisions at the point
of care. Organizations also want to use knowledge of
population health to predict and spot gaps in a patients
continuum of care from provider to provider so that all
can be more proactive with patient treatments and avoid
expensive emergency hospital visits.
Along with text analytics, enterprise search tools are vital

for exploration and navigation of content. Search-based
discovery, using tagging or labeling to describe the data,
is an important capability to have for finding content
that exists outside of what is described in BI and DW
systems structured metadata. When integrated with BI
systems, enterprise search can employ indexes to help
users sift through and locate not only unstructured and
semi-structured content, but also items in voluminous
and numerous BI reports. Finally, enterprise search can
help organizations gain a vital outside perspective, such
While providers and policy organizations concerned with

population health must work within the confines of patient
health information privacy regulations such as HIPAA that
limit how electronic health record (EHR) data can be used,
organizations are finding ways to implement predictive
analytics across a broad range of sources to assess health
risks and disease patterns. BI dashboards, including on
mobile devices, will be an increasingly effective means of
delivering actionable, real-time insights into larger patterns
to improve the care of individual patients.
tdwi.org
18
User Satisfaction: Closer at Hand

BI technologies and practices are improving, enabling
diverse users from executives to frontline personnel to
be productive with more kinds of data. BI systems are
delivering actionable insight from low-latency data for
operations, while analytics applications are providing the
means for deeper, more exploratory discovery. In 2014,
some of the standard frustrations that have plagued users
of BI and analytics applications should thankfully fade
as tools become more user friendly, visual, universal, and
faster. Of course, as new objectives are defined, new
challenges will come to the fore. Business users and IT will
need a strong partnership to overcome them.
David Stodder is director of TDWI Research for business
intelligence. He focuses on providing research-based insights
and best practices for organizations implementing BI, analytics,
data discovery, data visualization, performance management,
and related technologies and methods. Stodder has provided
thought leadership about BI, analytics, information
management, and IT management for over two decades.
Previously, he headed up his own independent firm and served
as vice president and research director with Ventana Research.
He was the founding chief editor of Intelligent Enterprise and
served as editorial director for nine years. He was also one of
the founders of Database Programming & Design magazine.
You can reach him at dstodder@tdwi.org, or follow him on
Twitter: @dbstodder
tdwi.org
19
Top 6 Worst Practices in

Business Intelligence
Avoid these major pitfalls
Download the White Paper

Updated for 2014
Even the worlds top organizations make poor decisions when planning,
selecting, and rolling out a business intelligence (BI) solution mistakes
that can be detrimental to BI success.
In this white paper, updated for 2014, we detail the six worst practices in BI,
and show you how to avoid them.
DN 7507663.0114
Download your copy now to ensure a successful BI implementation in your

own organization by learning from the mistakes of others.
informationbuilders.com
Get Social
WebFOCUS
iWay Software
Omni
tdwi.org
20
2013 tdwi best practices Report
TDWI rese arch
The State of Big Data

Management
By philip russom
Status of Implementations for Big Data

Management
A number of user organizations are actively managing
big data today, as seen in survey results. However, do
they manage big data with a dedicated BDM solution,
as opposed to extending existing data management
platforms? To quantify these issues, this reports survey
asked: Whats the status of big data management in your
organization today? (See Figure 1.) The survey also asked:
When do you expect to have a big data management
solution in production? (See Figure 2.)
Dedicated BDM solutions are quite rare, for the moment.
Only 10% of respondents report having deployed a special
solution for managing big data today. Most of these
are very new (7%), whereas a few are relatively mature
(3%), as seen in Figure 1. This is consistent with the 11%
of respondents who already have a BDM solution in
production, as seen in Figure 2.
In the short term, the number of deployed BDM solutions
will double. Another 10% of respondents say they have
a BDM solution in development as a committed project,
tdwi.org
21
Whats the status of BDM in your organization today?

Deployed and relatively mature
Deployed, but very new
In development, as a committed project
When do you expect to have a BDM solution in production?
3%
It is already in production
11%
Within 6 months
10%
7%
10%
Prototype or proof-of-concept under way
20%
37%
Under discussion, but no commitment

No plans for managing big data with a special
solution
Within 12 months
20%
Within 24 months
19%
12%
Within 36 months
23%
22%
In 3+ years
Never
6%
Figure 1. Based on 461 respondents.
as seen in Figure 1. This is consistent with the 10% who

say they will deploy a dedicated BDM solution within six
months, as seen in Figure 2.
Half of organizations have a strategy for managing big data.

This is true whether the strategy involves deploying new
data management systems specifically for big data (20%)
or extending existing systems to accommodate big data
(31%). One survey respondent selected other and added
the comment: Our big data strategy is a core competency
for our business.
Half of surveyed organizations plan to bring a BDM solution

online within three years. In addition to the 10% over six
months just noted, more solutions will come online in 12
months (20%), 24 months (19%), and 36 months (12%). If
users plans pan out, dedicated BDM solutions will jump
from rare to mainstream within three years. But note
that users plans are by no means certain, because many
projects are still in the prototyping or discussion stage
(20% and 37%, respectively, in Figure 1).
Few organizations dont need a special solution for managing
big data. Just a quarter report no plans at present for such
a solution (23% in Figure 1); even fewer say theyll never
deploy a BDM solution (6% in Figure 2).
Strategies for Managing Big Data

Different organizations take different technology
approaches to managing big data. On one hand, a fork
in the road decision is whether to manage big data in
existing data management platforms or to deploy one or
more dedicated solutions just for managing big data. On
the other hand, some organizations dont have or say they
dont need a strategy for managing big data. (See Figure 3.)
The other half doesnt have a strategy, for various reasons.

Some dont have a strategy because theyre not committed
to big data (15%). The business value is questionable,
said one respondent. Others lack a strategy for managing
big data, as yet, even though they know they need one
(30%). Once our POC completes, strategy can be
defined.
A lack of maturity can prevent a strategy from coalescing.
One survey respondent added the comment: We dont
know enough yet to determine a strategy. Another
commented: Our data management is in a nascent stage.
[It] needs to mature before a strategy becomes clear.
As with many strategies, hybrids can be useful. According
to one respondent: [Well use] a blend of extending
existing [platforms] and deploying new [ones] in a hybrid
mode. Another echoed that strategy, but turned it into an
evolutionary process: [Well] extend existing systems now,
and add new and better systems later.
Which of the following best describes your organizations strategy for managing big data?
20%
Deploy new data management systems specifically for big data

Extend existing data management systems to accommodate big data
31%
No strategy for managing big data, although we do need one
30%
15%
No strategy for managing big data, because we dont need one

Other
4%

tdwi.org
22
How successful has your organization been with the technical

management of big data?
11%
Highly successful
Highly successful
65%
Moderately successful
24%
Not very successful
How successful has big data management been in terms of

supporting business goals?
12%
64%
Moderately successful
Not very successful
24%
Figure 4. Based on 188 respondents who have experience

managing big data.
Figure 5. Based on 188 respondents who have experience

managing big data.
Strategy should be part business, part technology. Ideally,

BDM strategy should start with upper management, who
determines that big data and its management supports
business goals enough that the business should in turn
support big data management. Without this business
strategy in place first, technology strategies for BDM are
putting the cart before the horse.
This is good news, considering that BDM is a relatively

new practice. It also suggests that BDM can balance both
technology and business goals.
The Success of Big Data Management

Managing big data successfully on a technology level is
one thing. Managing big data so that it supports business
goals successfully is a different matter. For example,
the benefits of BDM noted in the discussion of Figure
3 include business goals such as more numerous and
accurate business insights, greater business value from big
data, and business optimization.
To estimate metrics for these measures of success, this
reports survey asked two related questions: How successful
has your organization been with the technical management
of big data? How successful has big data management been
in terms of supporting business goals? (See Figures 4 and 5.)
Note that these questions were answered by a subset of 188
survey respondents (which is 41% of the total respondents)
who claim theyve managed one or more forms of big data.
Hence, their responses are strongly credible, as they are
based on direct, hands-on experience.
Big data management (BDM) is moderately successful
for both technology and business. A clear majority of
respondents feel BDM (which theyve done hands-on) is
moderately successful on both technology and business
levels (65% in Figure 4 and 64% in Figure 5, respectively).
Few consider BDM to be highly successful. This is the case

for both technology (11%) and business (12%). No doubt,
BDM will mature into higher levels of success.
Roughly a quarter of respondents consider BDM to be not
very successful. Again, this is true for both technology
(24%) and business (24%). The lack of success in some
organizations may be due to the newness of BDM. At
this point in BDM, weve mostly seen organizations first
attempts and early implementation stages; as these mature,
success ratings will likely improve.
Philip Russom is director of TDWI Research for data
management and oversees many of TDWI's research-oriented
publications, services, and events. He is a well-known figure in
data warehousing and business intelligence, having published
over 500 research reports, magazine articles, opinion columns,
speeches, Webinars, and more. Before joining TDWI in 2005,
Russom was an industry analyst covering BI at Forrester
Research and Giga Information Group. He also ran his own
business as an independent industry analyst and BI consultant
and was a contributing editor with leading IT magazines.
Before that, Russom worked in technical and marketing
positions for various database vendors. You can reach him at
prussom@tdwi.org, @prussom on Twitter, and on LinkedIn at
linkedin.com/in/philiprussom.
The report was sponsored by Cloudera, Dell Software, Oracle,
Pentaho, SAP, and SAS.
This article is an excerpt.

PDF
Adobe
Read the full report

Read more reports
tdwi.org
23
CERTIFIED BUSINESS INTELLIGENCE PROFESSIONAL
TDWI CERTIFICATION
Get Recognized as
an Industry Leader
Advance your career
with CBIP
Professionals holding a TDWI CBIP

certification command an average
salary of $113,500more than
$8,200 greater than the average
for non-certified professionals.
2013 TDWI Salary, Roles,
and Responsibilities Report
Distinguishing yourself in your career can be a difficult

yet rewarding task. Let your rsum show that you have
the powerful combination of experience and education
that comes from the BI and DW industrys most meaningful
and credible certification program.
Become a Certified Business Intelligence Professional today! Find out how
to advance your career with a BI certification credential from TDWI. Take the
first step: visit tdwi.org/cbip.
TDWIS BEST OF BI
tdwi.org/cbip

VOL. 11
tdwi.org
24
2013 tdwi best practices Report
TDWI rese arch
Implementation Practices
for Better Decisions
By David Stodder
Increasingly, implementation success rises and falls with

users, not IT; dashboards, visual analytics, and discovery
tools are giving users more control, enabling them to
progress further on their own rather than depend on
IT. This is important for large organizations where IT
application backlogs are a problem; it is also a significant
benefit for small and midsize firms that do not have
extensive IT support for visual reporting and analysis.
However, as always, with the advantages come new
challenges.
One of the most potent benefits is better communication.
Our research makes it clear that performance
management continues to be a vital initiative and that the
associated dashboards are intended to be the centerpiece.
In Figure 1, we can see that KPI definition and delivery
is the most prevalent activity currently deployed for
users through implementation of data visualization and
visual analysis technologies (60%). Second and third
highest are snapshot report creation (45%) and alerting/
monitoring activity (44%). For all three of these activities,
visualizations are critical in providing actionable insight;
tdwi.org
25
Which of the following business analysis, reporting, and alerting activities are currently deployed for users in your
organization through implementation of data visualization and visual analysis technologies?
(Please select all that apply.)
60%
KPI definition and delivery
45%
Snapshot report creation
44%
Alerting/activity monitoring
39%
Time series analysis

Pattern and trend analysis
35%
Visual analysis of content
34%
32%
Forecasting, modeling, and simulation
22%
Predictive analysis
21%
Outlier, anomaly, or exception detection

Portfolio analysis
21%
Quantitative modeling and scoring
20%
List reduction
6%
Figure 1. Based on answers from 408 respondents; respondents could select more than one answer.
they enable executives, managers, and other users to focus

on the situation at hand rather than having to tease out
facts from data tables, ratios, and formulas.
Visualizations enable new forms of collaboration on data.
Many tools allow users to publish charts, not only in
dashboards for viewers to share, but also through e-mail
and collaboration platforms such as Microsoft SharePoint.
Dashboards can deliver context for visualizations by
providing annotations and related charts, since one
chart often cannot tell the whole story. Other means of
storytelling, including animation or video and audio files,
may be part of the collaboration.
Storytelling is important because visualizations are
usuallyand often, intentionallyleft open to
interpretation. Different viewers can draw different
interpretations, which they can investigate by drilling down
into the data. Some charts may hide the importance of
certain factors, while others might exaggerate them. This
ambiguity makes it important for executives, managers,
and users to work with visualizations as tools to engage
in a productive dialogue about metrics and measures.
Organizations can use visualizations to overcome the
one-way street limitations often cited as the bane of
performance management and standard BI reporting.
Time series analysis is an important focus. A significant
percentage of respondents implement visualizations for
time series analysis (39%). Users in most organizations
need to analyze change over time, and they typically
use various line charts for this purpose. Some will also
apply more exotic visualizations such as scatterplots for

specialized time series analysis, including examining
correlations over time between multiple data sources.
Visualizations for pattern and trend analysis, often
related to time series analysis, are employed by 35% of
respondents.
Time series, pattern, and trend analysis complement
predictive analysis. Organizations want to use history to
forecast what will happen next and identify what factors
will cause patterns to repeat themselves. Almost a third
(32%) of respondents use visualizations for forecasting,
modeling, and simulation, and 22% are doing so for
predictive analysis. Again, visualizations can improve
vital collaboration on predictive analysis among different
subject matter experts, who can share perspectives and
help the organization adjust strategies to be proactive. The
organization will anticipate events and be prepared with
the most intelligent way to respond.
Geospatial Analysis and Visualization

The ability to superimpose data visualizations on top of
maps is already a powerful asset for firms in industries
such as real estate, energy, telecommunications, land
management, law enforcement, and urban planning. As
more location-based data from geographical information
systems (GIS) becomes available, organizations in many
other industries are also becoming interested in analytical
capabilities. Retail firms, for example, can use the
combination of business data and maps to determine
tdwi.org
26
where to locate stores; healthcare organizations can better

understand patient behavior and disease patterns;
insurance firms can use location analysis to improve risk
management; and marketing functions in a variety of
firms can overlay customer information and demographics
on maps to sharpen messaging to different neighborhoods.
Although just under half (49%) of organizations surveyed
are not currently implementing geospatial analysis, a
significant percentage are implementing visualization for
activities such as geographic targeting (35%), routing and
logistics (14%), and finding nearest locations. Nearly a
third (31%) of respondents seek to integrate geospatial
with other types of analysis. The ability to visualize
corporate data and advanced analysis such as time series
along with location information can help organizations
add a new dimension to business strategy and operational
intelligence. Mapping visualizations can be enhanced with
data to become geographical heat maps; these might show
the most or least profitable sales territories or where
customers are having particular kinds of
service problems.
David Stodder is director of TDWI Research for business

intelligence. He focuses on providing research-based insights
and best practices for organizations implementing BI, analytics,
data discovery, data visualization, performance management,
and related technologies and methods. Stodder has provided
thought leadership about BI, analytics, information
management, and IT management for over two decades.
Previously, he headed up his own independent firm and served
as vice president and research director with Ventana Research.
He was the founding chief editor of Intelligent Enterprise and
served as editorial director for nine years. He was also one of
the founders of Database Programming & Design magazine.
You can reach him at dstodder@tdwi.org, or follow him on
Twitter: @dbstodder.
The report was sponsored by Adaptive Planning, ADVIZOR
Solutions, Esri, Pentaho, SAS, and Tableau Software.

PDF
Adobe

Read more reports
tdwi.org
27
TDWI ONSITE EDUCATION
BI Training Solutions:
As Close as Your
Conference Room
TDWI Onsite Education brings our vendor-neutral BI and DW training to companies

worldwide, tailored to meet the specific needs of your organization. From fundamental
courses to advanced techniques, plus prep courses and exams for the Certified Business
Intelligence Professional (CBIP) designationwe can bring the training you need directly
to your team in your own conference room.
YOUR TEAM, OUR INSTRUCTORS, YOUR LOCATION.
Contact Yvonne Baho at 978.582.7105

or ybaho@tdwi.org for more information.
tdwi.org/onsite
ten mistakes
to
avoid
When Delivering
Business-Driven BI
By Laura Reeves
FOREWORD
Most organizations have a reporting mechanism in place. Too
often, these reports are not enough to meet all business needs.
There is growing pressure to move forwardto an environment
that supports more sophisticated analysis. In this vision,
business users can use the environment themselves to access
and manipulate the necessary data and drive the analytics. This
is commonly called self-service business intelligence (BI). The most
successful BI solutions are those whose design and subsequent
use are driven by the business itself.
This is much easier said than done. It is easy to get caught up in
a frenzy of activity in the attempt to build and deliver something.
Too often, what is delivered is not well received by the business
community, or worse, met with disappointment or resistance.
The most common mistakes, and tips for avoiding them, are
explained here.
NOT SOLVING A REAL BUSINESS PROBLEM

Technology is cool. Working on a BI project can be
great for your rsum. Everyone is doing it.
Its easy to get caught up in the hype, but these are

not good reasons to launch a BI project. Maybe you already have
several reporting and BI environments in place. Maybe you are
starting fresh. Either way, too many organizations fail to seek real
business problems to address. The BI project may be IT driven,
and it is admirable to have an IT group that cares enough about
the organization to want to build a BI solution. The intentions
are good. However, no matter how much IT believes a project will
benefit the organization, most projects will fail unless they are
tied to problems the business community needs to address.
Developing a new BI solution can benefit IT by reducing
maintenance costs of the current reporting environment or
replacing technology that is no longer supported. These are
benefits to the overall organization, but they do not motivate
business people to use new solutions that are provided.
How can you find a real business problem to work on? Often
there is a critical business need where data is not accessible
or is only available to a few people with strong technical skills.
This can be a great place to start. If there is no single, overriding
concern, then more research is needed, typically through
requirements gathering.
Identify and prioritize business problems that need attention.
Have an analyst from IT collect requirements. Conduct interviews
or facilitated group sessions. (These should be business
discussions about the challenges facing the organization.) Flesh
out potential analyses and supporting data. Analyses that need
the same data can be grouped together into themes. Now, hold
a joint prioritization session to compare themes and evaluate
business impact and feasibility (complexity, effort, and cost).
Typically, there is a clear set of analyses and data that could
have significant business benefit and can be delivered in an
achievable project.
tdwi.org
29
By jointly defining the scope of the project, you have a doable

project that is targeted to directly support specific business needs.
HAVING SOLUTION ENVY
BI solutions are everywhere. BI vendors tout their

latest success stories. Take a look onlinethere
are many impressive examples. You even see
them on TV: financial services organizations offer
analytics to individual investors to better manage
their portfolios. It can feel like everyone has something better.
However, it is a mistake to think you must have a _______ (fill
in the blank: dashboard, colorful report, ad hoc interface) just
because everyone else seems to have one.
LACK OF ONGOING COMMUNICATION
BI projects are typically run by IT. Most teams

attempt to gather some type of business
requirements, then design and build a solution.
If you dont have major progress to report, you
may hesitate to convene a meeting. Sometimes
you must call a meeting to ask for more time or money. In the
meantime, the business community is left wondering what is
being done and why it is taking so long. Business-driven BI
requires more than gathering requirements at the beginning of a
project; it means working together throughout the entire project.
It is easier than ever before to grab data from any source and
present it in a visually pleasing manner. Resist the temptation to
toss some data around just so you can have a state-of-the-art
screen to brag about. Work together (IT staff and business users)
to figure out what will help your organization, then invest the time
and energy to find and prepare data that will feed a sustainable
BI solution that aids the organization. Rather than feeling bad
about seeing what others have, use this as inspiration to build
solutions that meet the specific needs of your organization.
A better approach is for one or two key representatives from

the business to come on board as part of the core project team.
These individuals work side by side with the project team and
participate in weekly project status meetings. Project updates
should also be provided to stakeholders on a regular basis,
regardless of how much (or little) progress has been made. The
message can simply be that the team is working hard and the
project is on schedule. If there are concerns about meeting
project deadlines, share them. Be honest. If there is anything
that could make things better (access to a critical person, adding
another ETL developer, a bigger server for development)speak
up. You wont get any help unless you ask.
NOT INCLUDING BUSINESS USERS WHEN

DATA MODELING
Over the years, most BI tools have been deemed

too hard to use, yet the tools are embraced by
business users within other organizations. How can
the same tool be too hard, yet easy? It boils down
to the data. When data is organized in a way that makes sense
to the business, the BI environment is easier for users to use and
queries run more efficiently. How can we make sure the data is
organized intuitively? We need to have the business community
work with us as we create the data models.
Human beings naturally think about data in a dimensional
mannerwhich is one reason for the widespread adoption of
dimensional modeling for analytics. Dimensional models are also
well supported by technology so we can deliver environments
that perform well.
When we partner with the business to develop the dimensional

model, discussions must be conceptual rather than too technical
too soon. Have discussions about the attributes commonly used
as selection criteria for reports, drill paths, and what business
measures are frequently used together. Once you get the dialogue
rolling, youll need to continue collaborating as you document and
refine your model. Documentation of dimensional models is often
geared toward the IT members of the team. You must capture
these concepts in business terms. Once you have the business
perspective captured, the IT team can use it to design the physical
data structure that will best leverage the database technology and
support the business dimensional model.
DESIGNING DASHBOARDS WITHOUT DATA
With good intentions, business people (often

without any IT or BI architects) gather in a
conference room to draft a dashboard design, and
theres plenty of great dialogue and discussion. This
is definitely business driven, which is what we want,
right? The resulting design, including key performance indicators,
is documented and presented to IT to build. Now, expectations
are set that the design will become reality soon.
The pitfall here is that the organization may not have the data
needed to actually build this solution. No matter how great the
project team and the technology, you will never meet these
expectations. Despite the best efforts of the team, the solution
will always fall short.
What would be better? Gather the same group and include
people who are knowledgeable about your systems, what data
is available, and any major deficiencies or problems with your
data. The design should be vetted against the data you actually
have so you can identify any gaps. Feasible BI and dashboard
projects can be defined. Business users and IT professionals
should work together to determine how to collect the rest of the
data that users need. Because business users are part of this
process, they will be aware of the data challenges, potential
solutions, and associated costs. IT and the business community
can work toward this vision. In the meantime, initial dashboards
can be deployed that support the business, with appropriate
expectations.
tdwi.org
30
NOT PROVIDING EDUCATION

Business professionals have a working knowledge
of several business applications, such as Outlook,
Word, and Excel. You may think that as long as you
make the BI interface intuitive, training for your BI
solution is unnecessary, right?
Wrong.
People can easily open a dashboard or even answer prompts to
run a report. The real challenge is when you want to know more.
You need to educate the solutions consumers. Three different
types of training are recommended:
Training about the data: Users need to learn what data is
available and how it is organized. Understanding how the data
is organized will help business users navigate efficiently and
get what they need more directly.
Training in analysis and data use: The type of education
most critical for long-term adoption is teaching people how
data and analysis applies to their day-to-day work. Teach
people what to look for and what they should do next. Show
how using the BI environment can help them generate revenue
or increase customer satisfaction.
Intermediate and advanced tool training: Although many

information consumers wont need in-depth tool training,
some people will benefit from more technical training.
Provide the opportunity to learn more features and functions
of the technology, including techniques to perform more
sophisticated analytics.
Your enterprise must provide initial and ongoing training to
ensure initial and continued use of the environment as people
move in and out of the organization.
Laura Reeves, author of A Managers Guide to Data

Warehousing and co-author of the first edition of The Data
Warehouse Lifecycle Toolkit, has 27 years of experience in
end-to-end data warehouse development focused on creating
comprehensive project plans, collecting business requirements,
developing business dimensional models, designing databases
(star/snowflake and entity-relationship designs), and
developing enterprise data warehouse strategies and data
architecture. As StarSoft Solutions cofounder, Laura has
implemented business intelligence/data warehouse solutions for
many business functions for private and public industry.

PDF
Adobe
Read the full issue (Premium Members)

Become a Premium Member
tdwi.org
31
TDWI FlashPoint
Enabling
an Agile
Information
Architecture
By William McKnight
Long gone are the days when enterprise information

architectures were judged by their similarity to a
generalized standard. Times were simpler when:
There was no big data of concern
Master data meant data warehouse dimensional data
The idea of a cross-system query was laughable
We tried to put all data in the data warehouse
permanently
Information management meant data warehousing
The cloud was nonexistent
Syndicated data was not useful
Columnar orientation was only performed by the
strange Sybase IQ
We secretly judged our success on the size of our data
warehouse
Information management was separated from
operations by an impenetrable brick wall
Time has seen these views fall by the wayside. Vendors
who still bring laminated architecture to the sales table
are quickly deemed out of touch. Today, the possibilities
for architecture are endless. There is no one size fits all
in terms of architecture from company to company; nor
is there a one-size-fits-all platform for data. It is likely
that at least five data structures will store 80 percent of a
companys data, with a total of 15 or more capturing all
of a companys data.
Information architecture is built from the ground up.
The key is the separation of workloads into the correct
tdwi.org
32
platform, tied together by architecture elements. These

elements include data integration, data virtualization, and
master data management, as well as the softer forces of
data governance and program governance.
Today, these elements need to be in place to enable
agility in the architecture. Without them, inefficiencies
and suboptimal platform selection will result. The latter
is especially harmful because workload performance
will be harmed, and good performance translates to a
projects success.
Inefficiencies can also drag down a projects success and,
consequently, the enterprises success. Mastering data
for an application, for example, is not really a choice. All
applications need good enough data to pass muster. The
question is where they are going to get it--another one-off
grow, or via an API from an enterprise-adjudicated source?
Data virtualization helps fill in the cracks of information
architecture, allowing for the one-off query for data that was
placed across the enterprise. There may also be built-in data
virtualization, when the performance price is acceptable,
for data that is otherwise best placed in a heterogeneous
structure. Its a judgment call, which is perhaps the best
touchstone for the agile information architecture.
Agility is fused into architectures when it leverages what is in
place for new requirements. It may extend an existing data
store or two with more information for the requirement, or it
may flow data from a source into a new data store.
Take, for example, the possibilities for post-operational
analytic data stores: relational row-based data warehouses,
data warehouse appliances, data appliances such as
HANA that also serve as operational stores, columnar
databases, and Hadoop. Combinations of these include
hybrid relational/columnar DBMSes, appliances that are

columnar, and so on.
There are also cubes, relational marts, and others. Making
correct platform decisions will comprise half the success.
The other half is fitting the decision into the architecture
correctly, leveraging what is there, and quite possibly
receiving feeds to systems designed to feed quality data.
Both platform (or re-platform) selection and architecture
fit are necessary.
Any conversation about how information architecture
should look, void of requirements, is an academic
exercise. In the real world, the new requirement must
fit into the existing architecture, reusing and extending
it appropriately. No application takes the form of the
sacrificial lamb, taking one for the enterprise so that the
long-term architecture can be formed. Likewise, I do
five-year plans to yield a true north for the planners to
keep an eye on, but always stress how it will change as the
business changes. Architectures must ultimately be agile.
Every enterprise must have, or rent, knowledge of all the
relevant information management possibilities in order to
fully activate data and the workloads. It must then make
great selections and it must support the architecture with
cross-enterprise elements. This is the essence of the agile
information architectures that succeed today in support
of the modern gold that is information.
William McKnight is a consultant, speaker, and author in
information management. His company, McKnight Consulting
Group, has attracted such Global 2000 clients as Fidelity
Investments, Pfizer, and Verizon. William is a popular speaker
worldwide and a prolific writer. He provides clients with
action plans, architectures, strategies, complete programs, and
vendor-neutral tool selection to manage information. He can
be contacted at www.mcknightcg.com or 214.514.1444.
This article appeared in the May 2, 2013 issue.

Read more issues (Premium Members)
tdwi.org
33
TDWI FlashPoint
TDWI FlashPoint
TDWI Salary Survey:

Average Wages Rise
a Modest 2.3 Percent
in 2012
by Mark Hammond
Wondering how your wages, bonuses, job satisfaction, and

responsibilities compare to your peers? TDWI breaks it
down in the 2013 TDWI Salary, Roles, and Responsibilities
Report.
Average salaries for full-time permanent BI/DW employees
rose a modest, if unspectacular, 2.3 percent in 2012, to a
record high of $106,818. These findings, from TDWIs
survey of 885 BI/DW professionals in the U.S. and Canada
in fall 2012, are in line with the IT industry at large, with
both InformationWeek and Computerworld reporting similar
increases in their most recent salary surveys.
At the same time, average wages for BI/DW freelancers and
contractors (4 percent of our respondent pool) suffered a
precipitous 16 percent decline, shrinking to about $121,000.
The median wage across both permanent employees and
contractors was up 2.9 percent in 2012, to $105,000.
Average bonuses, meanwhile, dipped 13 percent to $14,252,
though that decrease comes after a stunning 51 percent
gain in 2011. Compared to 2010, bonuses in 2012 were
up a healthy 31 percent, and nearly two-thirds of BI/DW
professionals received some type of bonus, the same as 2011.
Available exclusively to TDWI Premium Members, the
2013 TDWI Salary, Roles, and Responsibilities Report
tdwi.org
34
provides a valuable guide for both employees and managers

to assess compensation and bonuses by key roles, from the
highest paid (BI directors at an average salary of $136,187
in 2012) to the lowest (business requirements analysts at
$86,786).
The report slices and dices compensation data by a number
of dimensions, such as gender, organizational BI maturity,
organizational revenues, years of experience, certifications,
and industry. It also examines job satisfaction, top
considerations for a new job, and respondents sense of job
security and fair compensation. Highlights include:
Gender. Women continue to lag far behind men in average
salary ($97,037 versus $111,225, a $14,188 gap) and bonuses
($9,566 versus $15,770). Moreover, fewer women than men
received bonuses (61 percent versus 67 percent).
BI maturity. Organizations with advanced BI
environments pay roughly $15,000 more in salary than
beginner BI organizations$113,921 versus $98,804
as they tend to invest in seasoned, multi-skilled BI/DW
professionals with a proven track record of driving business
value from BI.
Company size. The largest organizations (those with
more than $50 billion in revenue) offer the largest
salaries$120,820 on average, or nearly $22,000 more
than organizations with revenues between $500 million and
$1 billion.
Region. Geographic location is a large factor in
compensation. BI/DW professionals in the Mid-Atlantic
states command an average salary of $135,568, or about
$43,000 more than those in the Central Plains ($92,446).
Canadians earn less still, with an $88,251 average.
Certifications. IT-related certifications such as TDWIs

Certified Business Intelligence Professional (CBIP) are a
surefire way to increase earning power. BI/DW professionals
holding a TDWI CBIP certification command an average
salary of $113,501$8,200 greater than the average for
non-certified professionals.
Job and compensation satisfaction. Forty-nine percent of
respondents ranked their job satisfaction as high or very
high, up notably from a record low of 42 percent during
the depths of the 2010 recession. Similarly, 48 percent
believe they are fairly compensated.
Secondary roles. BI/DW professionals are gradually taking
on a greater number of secondary roles. The average of
3.52 secondary roles is the highest in this survey series,
dating to 2003. This reflects an organizational focus on
cost-effectiveness through multi-skilled individuals, as well
as employee appetite for a challengechallenging work
and chance to develop new skills were among the top
considerations for a new job.
The salary report also takes an in-depth look at 10 key
roles, with breakdowns by salaries and bonuses, average age,
years of experience, certifications, professional background,
and more. If you are a TDWI Premium Member, you can
download a copy of the 2013 TDWI Salary, Roles, and
Responsibilities Report now.
Mark Hammond is a veteran contributor to TDWI,
including a number of research reports, Business Intelligence
Journal, and What Works.
This article appeared in the June 6, 2013 issue.

tdwi.org
35
Business Intelligence Journal
The Database Emperor

Has No Clothes
Hadoops Inherent Advantages over RDBMS
in the Big Data Era
By David Teplow
Background
Relational database management systems (RDBMS)
were specified by IBMs E.F. Codd in 1970, and first
commercialized by Oracle Corporation (then Relational
Software, Inc.) in 1979. Since that time, practically
every database has been built using an RDBMSeither
proprietary (Oracle, SQL Server, DB2, and so on) or
open source (MySQL, PostgreSQL). This was entirely
appropriate for transactional systems that dealt with
structured data and benefitted when that data was
normalized.
In the late 1980s, we began building decision support
systems (DSS)also referred to as business intelligence
(BI), data warehousing (DW), and analytics systems. We
used RDBMS for these, too, because it was the de facto
standard and essentially the only choice. To meet the
performance requirements of DSS, we denormalized the
data to eliminate the need for most table joins, which are
costly from a resource and time perspective. We accepted
this adaptation (some would say misuse) of the relational model because there were no other optionsuntil
recently.
Relational databases are even less suitable for handling
so-called big data. Transactional systems were designed
for just thattransactions; data about a point in time
when a purchase occurred or an event happened. Big data
is largely a result of the electronic records we now have
tdwi.org
36
about the activity that precedes and follows a purchase or

event. This data includes the path taken to a purchase
either physical (surveillance video, location service, or
GPS device) or virtual (server log files or clickstream
data). It also includes data on where customers may have
veered away from a purchase (product review article or
comment, shopping cart removal or abandonment, jumping to a competitors site), and it certainly includes data
about what customers say or do as a result of purchases
or events via tweets, likes, blogs, reviews, customer
service calls, and product returns. All this data dwarfs
transactional data in terms of volume, and it usually does
not lend itself to the structure of tables and fields.
The Problems with RDBMS

To meet the response-time demands of DSS, we
pre-joined and pre-aggregated data into star schemas
or snowflake schemas (dimensional models) instead of
storing data in third normal form (relational models).
This implied that we already knew what questions we
would need to answer, so we could create the appropriate
dimensions by which to measure facts. In the real world,
however, the most useful data warehouses and data marts
are built iteratively. Over time, we realize that additional
data elements or whole new dimensions are needed or
that the wrong definition or formula was used to derive
a calculated field value. These iterations entail changes to
the target schema along with careful and often significant
changes to the extract-transform-load (ETL) process.
tdwi.org
37
The benefit of denormalizing data in a data warehouse is

that it largely avoids the need for joining tables, which
are usually quite large and require an inordinate amount
of machine resources and time to join. The risk associated
with denormalization is that it makes the data susceptible
to update anomalies if field values change.
For example, suppose the price of a certain item changes
on a certain date. In our transactional system, we would
simply update the Price field in the Item table or age
out the prior price by updating the effective date and
adding a new row to the table with the new price and
effective dates. In our data warehouse, however, the price
would most likely be contained within our fact table and
replicated for each occurrence of the item.
Anomalies can be introduced by an update statement
that misses some occurrences of the old price or catches
some it shouldnt have. Anomalies might also result from
an incremental data load that runs over the weekend
and selects the new price for every item purchased in
the preceding week when, in fact, the price change was
effective on Wednesday (which may have been the first of
the month) and should not have been applied to earlier
purchases.
With any RDBMS, the schema must be defined and
created in advance, which means that before we can load
our data into the data warehouse or data mart, it must
be transformedthe dreaded T in ETL. Transformation processes tend to be complex, as they involve some
combination of deduplicating, denormalizing, translating, homogenizing, and aggregating data, as well as
maintaining metadata (that is, data about the data such
as definitions, sources, lineage, derivations, and so on).
Typically, they also entail the creation of an additional,
intermediary databasecommonly referred to as a
staging area or an operational data store (ODS). This
additional database comes with the extra costs of another
license and database administrator (DBA). This is also
true for any data marts that are built, which is often done
for each functional area or department of a company.
Each step in the ETL process involves not only effort,
expense, and risk, but also requires time to execute
(not to mention the time required to design, code, test,
maintain, and document the process). Decision support
systems are increasingly being called on to support
real-time operations such as call centers, military intelligence, recommendation engines, and personalization
of advertisements or offers. When update cycles must
execute more frequently and complete more rapidly, a
1
complex, multi-step ETL process simply will not keep up

when high volumes of data arriving at high velocity must
be captured and consumed.
Big data is commonly characterized as having high levels
of volume, velocity, and variety. Volume has always been
a factor in BI/DW, as discussed earlier. The velocity
of big data is high because it flows from the so-called
Internet of Things, which is always on and includes not
just social media and mobile devices but also RFID tags,
Web logs, sensor networks, on-board computers, and
more. To make sense of the steady stream of data that
these devices emit requires a DSS that, likewise, is always
on. Unfortunately, high availability is not standard with
RDBMS, although each brand offers options that provide
fault resilience or even fault tolerance. These options are
neither inexpensive to license nor easy to understand
and implement. To ensure that Oracle is always available
requires RAC (Real Application Clusters for server
failover) and/or Data Guard (for data replication). RAC
will add over 48 percent to the cost of your Oracle
license; Data Guard, over 21 percent.1
The benefit of denormalizing data in a data

warehouse is that it largely avoids the
need for joining tables.
Furthermore, to install and configure RAC or Data
Guard properly is not simple or intuitive, but instead
requires specialized expertise about Oracle as well as your
operating system. We were willing to pay this price for
transactional systems because our businesses depended
on them to operate. When the DSS was considered a
downstream system, we didnt necessarily need it to
be available all the time. For many businesses today,
however, decision support is a mainstream system that is
needed 24/7.
Variety is perhaps the biggest big data challenge and
the primary reason its poorly suited for an RDBMS. The
many formats of big data can be broadly categorized as
structured, semi-structured, or unstructured. Most data
about a product return and some data about a customer
service call could be considered structured and is readily
stored in a relational table. For the most part, however,
big data is semi-structured (such as server log files or
likes on a Facebook page) or completely unstructured
(such as surveillance video or product-related articles,
Based on the Oracle Technology Global Price List dated July 19, 2012.
tdwi.org
38
reviews, comments, or tweets). These data types do not

fit neatlyif at allinto tables made up of fields that
are rigidly typed (for example, six-digit integer, floating
point number, fixed- or variable-length character string
of exactly X or no more than Y characters, and so on)
and often come with constraints (for example, range
checks or foreign key lookups).
Like high availability, high performance is an option
for an RDBMS, and vendors have attempted to address
this with features that enable partitioning, caching, and
parallelization. To take advantage of these features, we
have to license these software options and also purchase
high-end (that is, expensive) hardware to run it onfull
of disks, controllers, memory, and CPUs. We then
have to configure the database and the application to
take advantage of components such as data partitions,
memory caches and/or parallel loads, parallel
joins/selects, and parallel updates.
A New Approach
In December of 2004, Google published a paper on
MapReduce, which was a method it devised to store
data across hundreds or even thousands of servers, then
use the power of each of those servers as worker nodes
to map its own local data and pass along the results
to a master node that would reduce the result sets to
formulate an answer to the question or problem posed.
This allowed a Google-like problem (such as which
servers across the entire Internet have content related to
a particular subject and which of those are visited most
often) to be answered in near real time using a divideand-conquer approach that is both massively parallel and
infinitely scalable.
Yahoo! used this MapReduce framework with its distributed file system (which grew to nearly 50,000 servers)
to handle Internet searches and the required indexing
of millions of websites and billions of associated documents. Doug Cutting, who led these efforts at Yahoo!,
contributed this work to the open source community by
creating the Apache Hadoop project, which he named
for his sons toy elephant. Hadoop has been used by
Google and Yahoo! as well as Facebook to process over
300 petabytes of data. In recent years, Hadoop has been
embraced by more and more companies for the analysis
of more massive and more diverse data sets.
Data is stored in the Hadoop Distributed File System
(HDFS) in its raw form. There is no need to normalize
(or denormalize) the data, nor to transform it to fit
a fixed schema, as there is with RDBMS. Hadoop

requires no data schemaand no index schema. There
is no need to create indexes, which often have to be
dropped and then recreated after data loads in order to
accelerate performance. The common but cumbersome
practice of breaking large fact tables into data partitions
is also unnecessary in Hadoop because HDFS does
that by default. All of your data can be readily stored in
Hadoop regardless of its volume (inexpensive, commodity disk drives are the norm), velocity (there is no
transformation process to slow things down), or variety
(there is no schema to conform to).
As for availability and performance, Hadoop was
designed from the beginning to be fault tolerant and
massively parallel. Data is always replicated on three
separate servers, and if a node is unavailable or merely
slow, one of the other nodes takes over processing that
data set. Servers that recover or new servers that are
added are automatically registered with the system and
immediately leveraged for storage and processing. High
availability and high performance is baked in without
the need for any additional work, optional software, or
high-end hardware.
Although getting data into Hadoop is remarkably
straightforward, getting it out is not as simple as with
RDBMS. Data in Hadoop is accessed by MapReduce
routines that can be written in Java, Python, or Ruby,
for example. This requires significantly more work than
writing a SQL query. A scripting language called Pig,
which is part of the Apache Hadoop project, can be
used to eliminate some of the complexity of a programming language such as Java. However, even Pig is not as
easy to learn and use as SQL.
Hive is another tool within the Apache Hadoop project
that allows developers to build a metadata layer on
top of Hadoop (called HCatalog) and then access
data using a SQL-like interface (called HiveQL). In
addition to these open source tools, several commercial
products can simplify data access in Hadoop. I expect
many more products to come from both the open
source and commercial worlds to ease or eliminate the
complexity inherent in MapReduce, which is currently
the biggest inhibitor to Hadoop adoption. One that
bears watching is a tool called Impala, which is
being developed by Cloudera and allows you to run
SQL queries against Hadoop in real time. Unlike Pig
and Hive, which must be compiled into MapReduce
routines and then run in batch mode, Impala runs
tdwi.org
39
interactively and directly with the data in Hadoop so that

query results begin to return immediately.
Summary
Relational databases have been around for more than 30
years and have proven to be a far better way to process
data than their predecessors. They are especially well
suited for transactional systems, which quickly and
rightfully made them a standard for the type of data
processing that was typical in the 1980s and 1990s. We
soon found ways to adapt RDBMS for decision support
systems, which weve been building for about the past
20 years. However, these adaptations were unnatural in
terms of the relational model, and inefficient in terms
of the data staging and transformation processes they
created. We tolerated this because it achieved acceptable
resultsfor the most part. Besides, what other option
did we have?
When companies such as Google, Yahoo!, and Facebook

found that relational databases were simply unable to
handle the massive volumes of data they have to deal
withand necessity being the mother of inventiona
new and better way to process data for decision support
was developed. In this age of big data, more companies
must now deal with data that not only comes in much
higher volumes, but also at much faster velocity and in
much greater variety.
Relational databases are no longer the only game in town,
and for decision support systems, they are no longer the
best available option.
David Teplow was an early adopter of Relational DBMS
(Oracle v2) and is currently a managing partner at Sierra
Technology.
This article appeared in Volume 18, Number 1.

PDF

Adobe
tdwi.org
40
Business Intelligence Journal
Dynamic Pricing: The

Future of CustomerCentric Retail
BY Troy Hiltbrand
In November 2012, the state of Virginia brought the concept

of dynamic pricing into the spotlight with the launch of the
I-495 HOT (high-occupancy toll) lanes. This 14-mile stretch
of four-lane toll road, which is part of the Washington, DC
beltway, runs parallel to its toll-free counterpart. Commuters
often characterize the beltway as a combination of continuous delays and frustrated motorists.
What makes this toll road different is its use of big data and
advanced analytics to monitor and manage traffic flow based
on commuters value judgment. The goal of the roadway is
to ensure traffic does not fall below 45 miles per hour, while
also maximizing revenue levels to continue funding upkeep
and maintenance. As traffic increases and the road becomes
more congested (and the speed slows), the price increases to
deter additional motorists from traversing it. As the speed
increases and the flow improves, the price drops to encourage
more price-sensitive motorists to divert their route and take
advantage of an underutilized resource.
Although some commuters are confused by the complexity
associated with the pricing system, others feel empowered to
decide whether the price-to-value ratio is sufficient to justify
leaving the congested public roadway for an easier commute.
The HOT lanes are not the only example of dynamic pricing
in the market, but their opening has brought the subject
into the public view in a whole new way. As the market
becomes more congested with competition, companies will
tdwi.org
41
look to this and other examples to identify how they can

implement dynamic pricing to increase their market share
and maximize profits.
To implement dynamic pricing, enterprises must leverage
modern technological platforms that enable large-scale,
advanced analytics, and understand and address the behavioral sciences facets of the challenge. As with any successful
business analytics project, the first step is to understand
the business objectives of the initiative and determine
what impact the technology implementation will have on
achieving success. This step is independent of the technology
implementation and must be done first, or the project has a
significantly higher likelihood of failure.
Behavior
At its core, dynamic pricing is a problem of customer
behavior. If the price of a good or service changes, will the
consumer react in a desired manner? When profits are on the
line, dynamic pricing is not only about dynamic discounting, but also emphasizes the escalation of prices to meet
environmental factors.
Economists have historically looked at supply and demand
as a set of curves. As the price increases, consumers
will demand progressively less of a good or service and
companies are willing to sell progressively more of a good
or service. As price decreases, consumers desire to purchase
more and companies desire to sell less. By default, market
forces will adjust the price to a position of equilibrium where
the supply-and-demand curves intersect. These supply-anddemand curves take into consideration the entire market,
which includes the sum total of behaviors of the individuals.
Dynamic pricing assesses the demand and the supply curves
for a specific instance to find the point of equilibrium at a
more personal or situational level. It considers situational
facets of both supply and demand to dynamically establish
price equilibrium.
With I-495, the speed and traffic flow of the high-occupancy
lanes are major determining factors of the quantity the
company wants to sell. From these factors, the company
adjusts the price accordingly. During a period of heavy
traffic, an individuals demand curve shifts to represent a
willingness to pay a higher price for a service because of its
convenience. As this happens, the company determines how
much it is willing to supply to maintain its promised levels
of service, and the price moves to a new point of equilibrium
where the objectives of both the consumer and supplier
are met. This does not mean that everyone who wants to
travel these lanes can do so. Under these circumstances, the
individual demand curve of many consumers still puts the
relative value of the convenience below the equilibrium

price point.
In specific sectors, there are many highly mature examples
of dynamic pricing. In the financial industry, securities
prices are constantly moving toward a point of supply-anddemand equilibrium. The market price is representative of
the sum of environmental factors. In the past, the financial
market was based on interactions of humans negotiating
the price of a stock.
Because of the latency associated with communication,
prices of stocks in the mid-twentieth century adjusted
gradually over time. With the advent of new technology
came an era of high-frequency trading (HFT). This
landscape of algorithmic trading shortened the frequency of
price equilibrium from years to microseconds. Today, prices
move so rapidlyand with the support of huge arrays of
computersthat companies strive to eke out microseconds
of incremental speed to gain a competitive advantage.
You can find examples of dynamic pricing in retail as well,
such as online auctions on eBay. These auctions represent
individuals coming together to interactively determine the
point of equilibrium between supply and demand for a
product with respect to a specific supplier and consumer.
Again, this price equilibrium is determined by negotiations
of individuals.
Many companies would like to be able to maximize
their profit margin by dynamically adjusting their price
structure, but have been limited by implementation issues.
They seek to attract the widest possible audience to their
product line, but do not want to completely transition into
the auction model of name-your-own-price, which can
become difficult to manage and keep stable in financial and
manufacturing operations. This is where a new paradigm
comes into play, which involves utilizing analysis of historical data to dynamically anticipate price equilibrium, then
presenting this pricing structure to the consumer.
As this new market paradigm becomes a reality, there are
many advantages for enterprises that can master it and
take advantage of its fluidity. Enterprises must ensure
that implementation of dynamic pricing does not sour the
delicate customer relationship that has been developed over
time. These challenges include customer perception, data
accuracy, algorithm behavior, altered customer behavior,
and overall customer experience.
Customer Perception
In the realm of dynamic pricing, customer perception is
critical. Customer loyalty and retention are key factors to
tdwi.org
42
business success. Business practices that tarnish customer

perception of fairness can quickly undermine these
objectives.
With dynamic pricing, enterprises can take several
approaches to achieve the desired result. The first is similar
to the dynamic pricing of the HOT lanes. The prices
change dynamically for everyone based on the current
environmental factors. These prices are publicly displayed
and individuals can decide if the price justifies the expense.
In this case, the service provider adjusts the supply curve
dynamically and the equilibrium point dictates which
commuters will take advantage of the new price. Under
this scenario, the price represents a snapshot of the current market conditions. Consumers are presented with a
consistent picture and have the choice of whether the value
is sufficient to meet their demand.
Enterprises must ensure that

implementation of dynamic pricing does not
sour the delicate customer relationship
that has been developed over time.
Another option is to make small, subtle changes to the price
dynamically to respond to environmental or behavioral
factors. These factors include past customer behavior, current
browsing patterns, current inventory levels, and external
market conditions. Many online retailers display their prices
as a discount percentage off the suggested retail price. Subtle
price changes are represented as changes to the discount
percentage and not to the retail price itself. This has a different psychological impact on customer perception compared
to constantly changing the base price of a product or service.
Amazon.com has employed this method successfully,
but it is not free of controversy. Amazons pricing
practices, which cause variation in the discount of certain
products such as DVDs, have been spotlighted by the
press. Customers have raised concerns over whether its
practices represent price discrimination. A 2005 study by
the University of Pennsylvania found that 87 percent of
participants were discomforted by the idea that an online
retailer would charge different prices for the same product
at the same time (Turow, Feldman, and Meltzer).
Another option is to establish predefined price lists for
different types of customers and dynamically manage membership within the group, effectively matching a set of prices
to the individual consumer based on past behavior. For

example, an enterprise could offer a high-value customer
group better pricing than available to other customers and
dynamically place customers in this category (or remove
them) based on their behavior. With technology, movements between pricing groups can be more frequent and
targeted than in the past.
Another area where this membership-based pricing could
have a significant impact is attrition management. As
customers begin to demonstrate behaviors indicative of
attrition, new pricing models could be deployed to solidify
the relationship. Once the behaviors revert and the relationship is reestablished, the customer would be returned to a
more traditional pricing structure. This would allow the
company to be more responsive to customers on the edge of
departure and focus on retention efforts early in the attrition process, rather than trying to resurrect the relationship
after the fact or in the late stages of the process, when it is
more difficult.
A 2004 article in the Journal of Interactive Marketing
revealed that consumers report lower levels of trust, price
fairness, and repurchase intentions when Internet-enabled
buyer identification techniques are used to segment consumer markets. The authors of the article also found that
fairness and trust considerations may be greatly amplified
only for products and services that consumers perceive to
be the same and for which the firms communication is
unclear (Grewal, Hardesty, and Iyer, 2004). This indicates
that the more public the company is in deploying these
methods and justifying them appropriately to consumers,
the greater acceptance of the practice.
To mitigate the risks associated with customer perception,
companies can start small, experimenting with a limited
selection of products of low criticality where they can assess
their customer bases readiness to this paradigm shift. If the
customer base does not express concern that the relationship has been tarnished, the company can determine
whether to expand the selection of products under this
model to a greater portion of their product catalog or move
this model to a product set of higher criticality. Products
of high criticality could include those with higher profit
margins or excess inventory that have a greater impact on
the companys overall profitability.
Accomplishing dynamic pricing takes a level of customer
intimacy not previously available due to the lack of data
and analytics to evaluate it. This is changing with the
inception of big data and predictive analytics.
tdwi.org
43
Data Accuracy
Another key aspect of the challenges of implementing
dynamic pricing is the accuracy of the source data. The
reasonableness of a dynamic pricing schema depends on
whether the factors used to accurately determine the price
reflect either the market environment or customer behavior.
These predictive algorithms are not inherently intelligent;
they are a product of the source training data and the
patterns extrapolated from it.
When the quality of the training data used to develop the
models or of the data used in the prediction is suspect, the
results will be suspect. Unlike with human intervention,
the algorithms do not have a gut-feel factor. They develop
patterns based on the training data and apply those patterns
to attributes representing point-in-time scenarios to estimate
a prediction. To mitigate the risk of poor data quality, the
inputs and outputs must be inspected to ensure pricing
reasonableness. Predefined thresholds or statistical boundaries can be applied to assess the reasonable nature of data
quality. In addition, the algorithms must be tracked over
time to ensure the historical basis for the algorithm is still
applicable in the current environment.
Algorithm Mishaps
Computers can make decisions more rapidly than individuals and implement actions related to those decisions
nearly instantaneously. As enterprises employ algorithms
to adjust prices based on market factors, these decisions
can have significant consequences, both good and bad, in
near real time.
No market better represents the potential impact of
algorithm mishaps than the financial market. Over the past
decade, more Wall Street trading is performed by computer
programs. Some estimate two-thirds or more of trades today
are generated without human intervention and are executed
at lightning speed. This was evident on May 6, 2010, in
what is known as the Flash Crash. On that day, the market
experienced its largest single intraday drop in history. From
its intraday high, the market fluctuated by 1,010 points. The
reason for the wild ride was computers reacting to an erroneous trade that sent the market into a tailspin. Fortunately,
the market corrected itself and recovered most of the loss
by the end of the day. It was an ominous sign of the risks
associated with so much power in the hands of computers
without the luxury of human intuition.
The retail industry does not move at the same speed as the
financial market, but evidence of algorithm mishaps is just
as public for companies that implement intelligent dynamic
pricing. On Amazon in 2011, Peter Lawrences The Making
of a Fly skyrocketed in price to $2 million due to irregularities in the behavior-tracking algorithms. The impact was
negative press for Amazon and its implementation of
dynamic pricing.
Altered Customer Behavior
Newtons third law of motion says that for every action
there is an equal and opposite reaction. The same is often
true for human behavior. As companies employ dynamic
pricing, consumers will react and adapt to meet these
changes. Some customers, sensing a change to their
established belief of the definition of trust and fairness, will
shop elsewhere for the products and services they desire,
effectively eliminating the positive effect of the change.
Other consumers will face the change head-on and look for
ways to game the system, altering their behavior to increase
their chances for better prices.
To mitigate the risk of poor data quality,

the inputs and outputs must be inspected
to ensure pricing reasonableness.
As Amazon commenced using its third-party sales markets
as a basis for adjusting its retail discount price, users
recognized this behavior and started posting nonexistent
products to the marketplace to drive the price down.
They purchased the product at a lower price and then
revoked their marketplace listing, effectively coercing the
market to be favorable to their cause. In an era of global
connectedness and online community supported by social
media, consumers have a new collaboration venue that
expedites their reaction to changes brought on by dynamic
pricing. As quickly as companies implement mechanisms to
optimize pricing, consumers will find ways to exploit them
to their own advantage.
Overall Customer Experience
Enterprises understand that the sales cycle is not complete
once the purchase is made. The relationship between
the provider and the consumer is complex. Dynamic
pricing affects the purchase phase of the sales life cycle
and impacts other areas, such as fulfillment and delivery,
service, returns, and customer retention. The purpose
of dynamic pricing is to change the nature of consumer
behavior to match more favorably with the companys
supply. Companies need to know whether they will be
able to respond appropriately when dynamic pricing drives
increased customer demand. They also need to be able to
tdwi.org
44
gauge whether that increased customer demand represents

long-term stability or a short-term, unsustainable spike.
One area of particular interest in the overall experience is
customer service. With pricing determined dynamically
based on situational factors, the customer service aspect of
the sales cycle must adapt as well. When consumers engage
customer service about pricing, systems have to be sufficiently adept to provide corresponding pricing dynamism
to the customer service representative just as it is provided to
the customer.
In addition, price-matching policies and procedures must
be clearly defined and executed to ensure that customers
continue to feel they are being treated fairly and with
respect. If these issues are not addressed early and clearly,
customer relationships will be strained and dynamic pricing
objectives will fail.
Orbitz, an online travel site, looked at

the overall experience and executed a
different strategy to drive customer
demand without altering prices. Their
approach uses behavior to alter the way
search results are sorted.
Orbitz, an online travel site, looked at the overall experience
and executed a different strategy to drive customer demand
without altering prices (Mattioli, 2012). Their approach uses
behavior to alter the way search results are sorted. Orbitz
believed that customers would be more amenable to this
practice because it held constant the underlying sense of
trust among its customer base. It gives Orbitz the ability to
alter user behavior.
This is especially applicable to the casual shopper in the
market for travel. These shoppers are more likely to choose
a product that shows up within the first few results than
one that is a page farther down. Orbitz looks at factors
such as sites the user visited prior to coming to their site,
the location of the customer (based on IP address), past
behavior for registered users, and the deals that hotels are
offering to determine which items are displayed at the top of
a consumers list.
Whether a company can employ dynamic pricing with its

consumer base and in its market is complex and requires a
level of customer intimacy beyond what has traditionally
been understood. The enterprise must accept the risks and
identify mitigations to ameliorate the potential impact
to achieving their goals. Once a company understands
its business objectives, it must look at support from
its technological infrastructure, which has matured
significantly in recent years but still requires a high level of
expertise to employ.
Tools for Developing a Predictive Analytic Model
Big Data
After an enterprise has defined its business objectives, the
next stage of a dynamic pricing project is the gathering
and preparing of the data used to develop the predictive
analytic model. The industry as a whole has recognized
big datas importance to business. Big data is somewhat
of a misnomer because it represents much more than large
amounts of data. The difference between big data and
normal data falls into three categoriesthe famous three
Vs: volume, velocity, and variety.
When people think of big data, the first thing that comes
to mind is volume. New sources are generating significant
volumes of data. In the case of the HOT lanes, sensors
have been installed along the path to measure speed and
traffic throughput. These measurements are transformed
into signals processed by the algorithm that determines the
optimal price-to-drive behavior.
In e-business, customer behavior is often derived through
tracking clicks, views, and impressions. In a single
customer interaction, many individual events are executed
and tracked, which generates a significant amount of data
to analyze for behavior patterns. Some of this behavior
data is structured (such as the actual purchase history), but
much of it is wrapped up in large volumes of Web logs that
are often referred to as semi-structured. These logs must
be processed and the attributes extracted to leverage the
information buried within them.
In addition to volume, velocity is also an important
consideration with the evolving data landscape. Pricing that
reflects the market conditions from last week or yesterday
might be completely inconsequential to managing
consumer behavior today. Consumer behavior is not always
convenient. A customer doesnt necessarily visit a site one
day to establish their behavior profile, return the following
day to see the results of that behavior on prices, and then
complete their purchase.
tdwi.org
45
The point of impact for pricing is at the time a customer is

shopping. To achieve this customer intimacy, the information has to be processed in near real time, the analytics
performed by automated algorithms, and adjustments
applied to entice the customer to close the sale before they
leave and move on to a competitor. In addition, the data is
destined to come at an irregular pace. With varied sources
that are each arriving in an uneven fashion, matching data
points to paint a complete picture becomes more difficult.
identify patterns within the historical data where entities

with similar attributes fall together. Once these patterns
are defined and modeled, future entities can be evaluated
to determine where they fit with respect to these predefined
categories.
Much of the information needed to perform the analytics

supporting dynamic pricing is not neat and tidy and sitting
in an enterprise structured database, having passed a series of
data quality checks. Dynamic pricing is inherently a human
behavior problem, and human behavior is fundamentally
sporadic and hard to predict. As such, the information supporting it will come from multiple sources, many of which
are unstructured, semi-structured, or multi-structured.
Sources such as social media streams, Web logs, text from
product recommendations and customer complaints, and
purchase history can all be factors in developing a customer
behavior profile, which serves as a basis for determining the
point of price equilibrium specific to that individual.
fundamentally sporadic and hard to predict.
Technology, such as the Hadoop stack, provides a basis

for enterprises to take advantage of big data. Its parallel
processing enables management of data velocity, its distributed storage enables massive volumes of data to be processed
and stored, and the flexibility of the MapReduce model
allows companies to customize their processing to handle all
types and sources of data. Hadoop supports both the data
acquisition and the data preparation phases of the analytic
process. Regardless of the nature of the data as it enters the
system, its output must have structure and format for it to be
useful in analytic processing.
Data Mining/Predictive Analytics
Once the data has been gathered and transformed into structured elements, the next step of the process is developing and
testing models based on a set of training data. The variety
and breadth of analytic methods is vast and new methods
are developed each year. To accomplish dynamic pricing, a
company has to look at the possibilities and identify which
of these analytic frameworks meet(s) the business objectives.
With predictive analytics, there are two major categories of
models: classification and estimation. Within these categories, there are several different independent methods, but the
final outcomes are similar in nature.
Classification
The first category of algorithms is classification, sometimes
referred to as clustering. The goal of this category is to
Dynamic pricing is inherently a human

behavior problem, and human behavior is
In the area of dynamic pricing, classification provides a

basis for dynamically segmenting sets of customers and sets
of environmental factors. With customer segmentation,
attributes such as browsing history, known demographics,
site entry point, and referral site can all be used to develop
clusters of customers. These clusters can be assessed to
determine what type of consumers these attributes represent
and to assign an appropriate price or discount. Once these
clusters are defined by attributes in representative training
data, future customers who exhibit the same attributes can
be dynamically assigned to these customer segments and
receive the appropriate pricing.
Classification can be applied at an environmental level
as well. Developing a model evaluating the behavior of
the entire base of customers, supply chain factors, or
company financials, and classifying it and adjusting prices
appropriately, is a viable option. One example is a company
that uses the behaviors exhibited from historical fashion
trends to identify a new fashion trend much sooner. Once a
fashion trend is flagged, the company can apply discounting
to increase the momentum of the trend prior to its cresting.
Gaining increased momentum early in the fashion trend and
increasing the size of the trend wave can equate to a marked
improvement in company profits.
Estimation
The second category of analytic models is estimation,
which deals with finding a relationship between attributes
and developing a formula that will predict future outcomes based on that relationship. Estimation focuses on
predicting an actual numeric estimate based on historical,
continuous values.
The most common type of estimation is linear regression,
which uses a method known as least squares to fit a line
through an existing set of points. The line that most closely
fits historical data also extends into the future. Linear regression can be applied at a macro level for all prices, at a micro
level for an individual product or service, or can be coupled
tdwi.org
46
with classification to develop groups of products or services

to which regression is applied to develop an optimal pricing
structure for each category. The combination of classification
and regression is referred to as CART (classification and
regression trees).
Another type of regression, which has become very popular
in predictive analytics and can be effectively applied in
the area of dynamic pricing, is logistic regression. Unlike
linear regression, which develops a model to extrapolate a
price value for future usage, logistic regression focuses on
developing a model that identifies the likelihood of an event
occurring under a given set of circumstances. In the case of
dynamic pricing, a logistic regression model takes behavioral
and environmental attributes and price as input variables
and predicts the likelihood a customer will complete the
purchase. The logistic regression model can be run for
multiple price scenarios and the price with the highest
probability of purchase that meets the companys supply
model can be used.
Summary
These categories of analytic frameworks can be used to
develop the infrastructure and consume the data processed
in the data acquisition and transformation phase to provide
context and actionable insight to the dynamic pricing challenge. It provides a basis for determining the relative pricing
that optimizes the equilibrium point between the supply
and demand curves as they evolve to meet the changing
environmental or customer behavior factors.
Companies are looking at dynamic pricing as a mechanism
to expand their market and optimize the retail experience.
Several industry leaders have already started to experiment
with it to enhance their overall sales strategy. Dynamic pricing is not without risk, but it does have significant potential
in matching supply and demand on a more personal level.
With advances in technology, the tools and techniques exist

to support enterprises in advancing their pricing strategy in
ways not previously possible. With high-level projects such
as I-495 managing supply and demand through dynamic
pricing, it will become an area of interest with forwardlooking companies over the coming years.
References
Grewal, Dhruv, David M. Hardesty, and Gopalkrishnan
R. Iyer [2004]. The Effects of Buyer Identification and
Purpose Timing on Consumers Perception of Trust,
Price Fairness, and Repurchase Intentions, Journal
of Interactive Marketing, Volume 18, Number 4, pp.
87100.
http://itu.dk/~petermeldgaard/B12/lektion%209/
The%20effects%20of%20buyer%20identification%20
and%20purchase%20timing%20on.pdf
Mattioli, Dana [2012]. On Orbitz, Mac Users Steered to
Pricier Hotels, The Wall Street Journal, August 23. http://
online.wsj.com/article/SB1000142405270230445860457
7488822667325882.html
Turow, Joseph, Lauren Feldman, and Kimberly
Meltzer [2005]. Open to Exploitation: Americas
Shoppers Online and Offine, Annenberg School for
Communication Departmental Paper, University of
Pennsylvania. http://repository.upenn.edu/cgi/viewcontent.cgi?article=1035&context=asc_papers
Troy Hiltbrand is the enterprise architect for Idaho
National Laboratory and an adjunct professor at Idaho
State University.
This article appeared in Volume 18, Number 3.

PDF
Adobe

tdwi.org
47
BI This Week Newslet ter
Inside Facebooks
Relational Platform
By Stephen Swoyer
Editor's Note: You can watch Ken Rudin's entire keynote

address on page 57 of this publication.
During his keynote address at the 2013 TDWI's World
Conference in Chicago, Ken Rudin, director of analytics
for Facebook, surprised many attendees when he revealed
that Facebook has built itself a conventional data
warehouse.
The Facebook model has always been held up as an exemplar of The New. The company helped to develop Hive,
the SQL-like semantic layer for Hadoop, which it used to
power its Hadoop-based data warehouse environment.
In his keynote, however, Rudin staked out a pragmatic
position that many TDWI attendeesand most data
management (DM) practitionerscould easily endorse.
[Facebook] started in the Hadoop world. We are now
bringing in relational to enhance that. We're kind of going
[in] the other direction, Rudin told attendees. We've
been there, and [we] realized that using the wrong technology for certain kinds of problems can be difficult. We
started at the end and we're working our way backwards,
bringing in both.
Rudin invoked an aphorism by author James Collins, with
whom he studied at Stanford University's Graduate School
of Business. Collins, a critic of zero-sum decision making,
famously championed the genius of 'and' as an inclusive
alternative to the tyranny of 'or.'
tdwi.org
48
Big data is a great example of an inclusive and scenario,

Rudin argued.
What traditional systems like relational are really good
at are [answering] the traditional business questions that
we all still ask and will ask, and that's not going away just
because the new technologies are there, he explained.
Rudin suggested not only SQL as a backronym for the
term NoSQL, which has been used to describe schemaless technologies such as Hadoop. Queries or workloads
that execute in fractions of a second on a relational
platform will run orders of magnitude slower on Hadoop,
he observed. The reverse is true of analytic algorithms running against large sets of multi-structured data: they'll run
orders of magnitude faster on Hadoop. Trying to do ...
[a] complicated algorithm in a relational system is [going
to be] very, very painful, said Rudin, using the example
of an analysis of user-submitted Facebook photos.
You want to use the right kind of technology for the right
kind of question.
Throughout his keynote, Rudin distinguished between
analytic pragmatism and analytic perfection. In acknowledging the role of statistical rigor in product testing, for
example, Rudin stressed that what's most important is the
spirit and not the ideal of experimentation: don't forestall
or eschew testing simply because you can't come up with a
statistically perfect test.
The most modern incarnation of [experimentation] is A/B
testing. This is figuring out ... whether my great idea is
actually great, he said. Maybe you can't do a perfectly
statistically controlled A/B test, but ... there's always some
way ... to figure out how we've improved versus historical

trends. It's the spirit of the experimentation versus the
actual statistical significance of it that actually makes all of
the difference.
Earlier, Rudin had discussed the problem of managing
and governing data in the context of an analytic-driven
organization. There's a long-standing tension between the
DM-oriented need to tightly profile, control, and manage
data on the one hand, and the countervailing analytic
desire to access and consume data on terms that are
determined by the analyst herself.
In its old model, Facebook would have used a non-relational
platform such as Hive and Hadoop to address both
requirements; in its new, inclusive model, Facebook uses
non-relational platforms to empower analytic discovery and
experimentation; its relational data warehouse functions as a
consistent or reference platform for core business data.
Think about the core elements of the data that you must
manage and then don't worry about everything else. That's
uncomfortable for a lot of us ... particularly in a ... relational environment where you want to have nice structured
schemas, he said.
Facebook, he said, has hundreds of thousands of database
tables in its analytic environment. Only on the order
of several dozen of these tables are core to its business,
however. These are tables that we must manage very, very
carefully. ... For the things that we need to have consistency
on, ... that's a core table, that's managed relationally.
Stephen Swoyer is a contributing editor for TDWI.
This article appeared in the May 6, 2013 issue.

Subscribe to BI This Week
Read more issues
tdwi.org
49
BI This Week Newslet ter
Load First,
Model Later
What Data Warehouses
Can Learn from
Big Data
By Jonas Olsson
Big data has captured the attention of everyone from the

common man to the CEO to the data professional. The
common man gets excited at big data's potential to solve
some of the world's biggest problems, from healthcare to
education to the environment. The CEO sees the power to
tap into new opportunities for revenue and growth.
The data professional gets excited about big data for a
more practicalbut equally criticalreason, one that
offers guidance on how to add flexibility to your existing
data environment, and one that could greatly benefit data
warehouses.
That is, big data is not just about the "big." In fact, I
would argue that key to big data's appeal is that it allows
data to be loaded first and modeled later. This approach,
"schema on read," is not new, but because it has never been
the traditional method for implementing data warehouses,
it certainly feels new.
Traditionally, data warehouses have been designed around
the opposite principle, "schema on write," also known
tdwi.org
50
as extract, transform, and load (ETL). This approach

requires a predefined data model to be implemented as
a set number of tables. The data being loaded is then
mapped to the existing data model represented by the
tables in the database. If the data the user wants to load
into the warehouse does not match the existing tables,
changes must be made, such as to the ETL process or
the table structure. These changes can be expensive and
time-consuming, especially for businesses that deal with
complex data or work in a highly dynamic environment.
Big data's "schema on read" approach is differentand
much more appealing. You just throw the data in there
and then figure out what to grab and how to grab it later.
Because you basically apply the data model when you
read the data, you get much greater flexibility because
changes in the data model can be addressed in a layer
above the physical tables. As an added benefit, you don't
need to narrow your selection of data and you can, if your
business requires it, support multiple data definitions,
effectively having parallel data models applied to the same
set of physical data.
For businesses that don't have highly complex data
or whose business environments are more static, the
traditional approach of data warehousing will continue
to work well. However, in today's "Internet is everywhere"
economy, even the most hidebound and old-school
industries are increasingly dynamic and complex.
We can look at the traditional ETL data warehouse

modelor "schema on write"as one of the culprits.
ETL has served data professionals well for many years,
but with the increase in data environments' complexity,
it should come as no surprise that people are looking for
other solutions, and that big data is where they are looking
for answers.
There is no inherent limitation in data warehouses that
prevents data professionals from using schema on read,
to become lower cost, more easily deployed, and more
powerful.
The less-restrictive schema-on-read data model lets
companies collect data more freely without having to go
back to the drawing board as data sources change. With it,
data warehouse professionals will no longer have to know
how they will use the data as that step can be doneand
repeatedly redonelater, as business dynamics dictate.
This simple change in approach can maximize the ROI of
data warehousesalready in place and a permanent budget
line item in most companiesmaking them a better
solution than rushing headlong into new and unproven big
data technologies and data management philosophies.
Jonas Olsson is the CEO and founder of Graz Sweden AB,
provider of Hinc, an ELT-ready data warehouse built for the
financial industry. You can contact the author at jonas.olsson@
graz.se.
Data warehouses have earned a reputation for being

difficult to change and costly to deploynot just
technologically and financially but also politicallydue
to the time and budget resources required from various
business units.
This article appeared in the October 15, 2013 issue.

Subscribe to BI This Week
Read more issues
tdwi.org
51
TDWI rese arch
how to gain insight

from text
By fern halper
FOREWORD
Text is everywhereand its volumes are growing rapidly.
Text comes from internal sources such as e-mail messages,
log files, call center notes, claims forms, and survey comments as well as external sources such as tweets, blogs,
and news. Text analytics, a technology used to analyze
the content of this text, is rapidly gaining momentum in
organizations that want to gain insight into their unstructured text and use it for competitive advantage. Factors
fueling growth include a better understanding of the
technologys value, a maturing of the technology, the high
visibility of big data solutions, and available computing
power to help analyze large amounts of data.
Text analytics is being used across industries in numerous
ways, including customer-focused solutions such as voice
of the customer, churn analysis, and fraud detection. In
fact, many early adopters have used the technology to
better understand customer experience, and this is still
one of the most popular use cases. However, text analytics
is also being used in other areas such as risk analysis,
warranty analysis, and medical research.
The technology provides valuable insight because it
can help answer questions involving why and what. For
example, why did my customer leave? What is causing
the increase in service calls? What are the best predictors
of a certain risk? Text analytics can aid in discovery and
tdwi.org
52
improving the lift or accuracy of analytical models. This

impacts an enterprises top and bottom lines.
Companies are realizing that text is an important source
of data that can improve and provide new insight. The
questions these companies face include where to start and
how to think about text as data. This Checklist Report
focuses on helping organizations understand how to get
started with text analytics, including:
Basic definitions of text analytics
How such analysis can add structure to unstructured
data
How unstructured data can be used and its importance
to data discovery and advanced analytics
would be reducing costs by not having to read hundreds

of thousands of survey responses and code them
manually. Such potential improvements can be used in a
text analytics business case.
Top-line improvements. Top-line impacts are wide
and varied. They might include improved customer
retention rates or new customer win rates. They may
also involve positive referrals or a decrease in defect rates
or incidents of fraud.
Another factor to consider as part of the business case is
whether the solution is multi-purpose. For example, there
are numerous products on the market that use text analytics
to gain insight into social media to understand customer
opinions and sentiment. It is important to think beyond the
first use case and consider your options wisely: i.e., point
solutions versus more robust, integrated solutions.
How to think through a text analytics problem

What resources are needed to get the maximum value
from text
number one
Identify a problem worth solving.

The first step in any analysis is to identify the problem
youre trying to solve. This is also true for text analytics. It
is important to start with the end goal in mind. It generally
makes sense to pick an initial problem that has relatively
high visibility and where it is fairly easy to get at the data.
If possible, it should be a quick win that uses a proof of
concept (POC). This accomplishes three objectives. First, it
costs less and carries a lower risk than going all out. Second,
a problem worth solving will earn a seat at the executive
table, which can help to keep momentum high. Finally, a
technical benefit of the POC is to ensure that the technology youre using works with your specific data.
An important facet of identifying the right problem is
the ability to justify spending money on it as part of the
business case. That means looking for a problem where
improvements can be measured in one or both of the
following ways:
Bottom-line improvements. This generally means
you will focus on reducing costs. For text analytics,
this often involves decreasing the time required
for a particular task (a productivity improvement).
Bottom-line impacts can include reducing human
misclassification errors of documents, which means
finding information more quickly. Another example
number two
Determine data requirements.

There are several issues to consider in terms of gaining
access to the data as well as preparing it for analysis.
Data sources. Data sources can be internal or external
to your organization. For example, if a company is
trying to predict churn, it may determine that internal
call center notes are the most important source of
text data. These call center notes might be stored
somewhere inside the firewall or outside it (such as in
the cloud). Part of identifying the data sources includes
determining what spoken languages you need to include
in the analysis.
Data access. This includes gaining the right to use
certain internal or inter-company data stores, which can
be a hurdle, as well as being able to physically connect
to the data. Will you access the data in real time via an
API or get a one-time export to some common format
such as comma-separated variable (CSV)? Accessing
external data might require crawlers (programs that
find and gather Web pages) or working with a data
aggregator. What are the terms of service for the sites
where you are gathering data? Typically, companies tend
to deal with their internal data first, except when they
use a specific social media analysis solution.
Data security. Text data that contains personally
identifiable or other highly sensitive information must
be dealt with differently than a stream of public tweets.
tdwi.org
53
Data timeliness. Analyzing numbers once a quarter will

require a different approach than analyzing data daily,
hourly, or in real time.
Data preparation. The most important aspects of this
step are data normalization and data cleansing. Data
normalization makes sure that the data acquired from
each source will be able to match with other sources.
Data cleansing deals with issues such as typos to ensure
completeness of input (e.g., for social data) and that
the data is trustworthy. Determining data quality for
unstructured data is a science that is still evolving and
can be time-consuming.
number three
Identify what needs to be extracted.

Text analytics is the process of analyzing unstructured
text, extracting relevant information, and transforming it
into structured information that can be leveraged in various ways. Text analytics can use a combination of natural
language processing, statistical, and machine learning
techniques. Entities, themes, concepts, and sentiment are
all examples of structured data that can be extracted from
text analysis. These are often referred to as text features:
Entities: often called named entities. Examples include
names of persons, companies, products, geographical
locations, dates, and times. Entities are generally about
who, what, and where.
Themes: important phrases or groups of co-occurring
concepts about what is being discussed. A theme might
be womens rights or cloud computing. A particular
piece of content might contain many themes.
Concepts: sets of words and phrases that indicate
a particular idea or meaning with which the user
is concerned. A concept might be business or
smartphones. A particular piece of content generally is
only about a few concepts.
Sentiment: Sentiment reflects the tonality or point

of view of the text. The concept unhappy customer
would lead to a negative sentiment.
The goal is to accurately extract the entities, concepts,
themes, and sentiment in which you are interested. Solutions offer various features out of the box. A vendor might
include only a dictionary, list of names, or synonym list.
Another might support hierarchical taxonomies to better
organize information. The disadvantage of any purely
list-based or taxonomic solution is that youre limited to
finding whats in the list.
To address this issue, some vendors now incorporate
statistical models based on machine learning into their
solutions to help users extract features that were not
preconfigured. Vendors that provide models often pretrain them so users dont need to do anything but simply
use the model. Some vendors provide hybrid approaches
(statistical and rules based), which provide the benefits of
collection investigation combined with the specificity that
comes from linguistic rules.
Fern Halper, Ph.D., is director of TDWI Research for
advanced analytics, focusing on predictive analytics, social
media analysis, text analytics, cloud computing, and other
big data analytics approaches. She has more than 20
years of experience in data and business analysis, and has
published numerous articles on data mining and information
technology. Halper is co-author of Dummies books on cloud
computing, hybrid cloud, service-oriented architecture, service
management, and big data. She has been a partner at industry
analyst firm Hurwitz & Associates and a lead analyst for
AT&T Bell Labs. Her Ph.D. is from Texas A&M University.
You can reach her at fhalper@tdwi.org, or follow her on
Twitter: @fhalper.
This report was sponsored by Angoss, Lexalytics, and SAS.

PDF

Adobe
PDF
Read more reports

Adobe
tdwi.org
54
Vote for Your Favorite

BI/DW Story from TDWI's
Best of Business Intelligence
We want to hear from you! Which story from
TDWI's Best of Business Intelligence do you
consider to be the most important from the
past year?
TDWI Best Practices Report: The State of Big Data Management

TDWI Best Practices Report: Implementation Practices for Better Decisions
Ten Mistakes to Avoid Series: Ten Mistakes to Avoid When Delivering Business-Driven BI
TDWI FlashPoint: Enabling an Agile Information Architecture
TDWI FlashPoint: TDWI Salary Survey: Average Wages Rise a Modest 2.3 Percent in 2012
Business Intelligence Journal: The Database Emperor Has No Clothes
Business Intelligence Journal: Dynamic Pricing: The Future of Customer-Centric Retail
BI This Week: Inside Facebook's Relational Platform
BI This Week: Load First, Model LaterWhat Data Warehouses Can Learn from Big Data
TDWI Checklist Report: How to Gain Insight from Text
To vote, visit: tdwi.org/poll
tdwi.org
55
tdwi webinar series

TDWI Webinars deliver unbiased information on many BI/DW topics.
Each live Webinar runs one hour in length and includes an interactive
Q&A session with TDWI's expert presenters.
Here are some of the most popular Webinars TDWI broadcast in 2013.
View on-demand and upcoming Webinars
TDWI.ORG/WEBINARS
The Doctor Is In:

The Role of the Data
Scientist for Analyzing
Big Data
Watch Now
Busting 10 Myths
about Hadoop
Watch Now
No profession is getting more

attention these days than that of the
data scientist. This TDWI Webinar
will describe what data scientists do,
their capabilities, and how they differ
from other personnel, including those
devoted to business intelligence
and data warehousing. It will help
organizations determine whether they
need data scientists, particularly for
projects involving big data.
Original Webcast: May 21, 2013
Big Data and Your

Data Warehouse
Watch Now
This presentation lists the 10 most

common myths about Hadoop, then
corrects them. The goal is to clarify
what Hadoop is and does relative to
BI, as well as in which business and
technology situations Hadoop-based
BI, data warehousing, and analytics
can be useful.
Original Webcast: July 24, 2013
Big data and your data warehouse

can be a powerful team, providing
many new analytic applications
that enterprises need to stay
competitive. But you will need to
make some changes to your existing
infrastructure, tools, and processes
to integrate big data into your current
environment.
Original Webcast: September 5, 2013
Speaker:
David Stodder
Speaker:
Philip Russom
Speaker:
Philip Russom
Sponsor:
Teradata
Sponsor:
Teradata
Sponsor:
Teradata
tdwi.org
56
TDWI EDUCATION
TDWI is the premier provider of education and research in the business intelligence and data warehousing
industry. TDWI fosters a community of learning where business and technical professionals come together to
gain knowledge and skills, network with peers, and advance their careers. Through education and research
programs, TDWI enables individuals, teams, and organizations to leverage information to improve decision
making, optimize performance, and achieve business objectives.
TDWI World Conference Keynote

TDWI World Conferences provide the leading forum for business and technology professionals looking to
gain in-depth, vendor-neutral education on business intelligence and data warehousing. The TDWI World
Conference in Chicago, May 510, 2013, featured a Monday keynote presentation by Facebooks director of
analytics, Ken Rudin, who covered best practices to help you get the biggest impact from big data. Watch
the keynote below.
Big Data, Bigger Impact
Monday, May 6, 2013
Ken Rudin, Director of Analytics, Facebook
In most companies, data and analytics have historically been considered a service. However, analysts are
now taking a more proactive role in driving businesses, and the more recent introduction of big data has
accelerated this trend. This new world comes with a new set of best practices for leveraging big data and
driving even bigger results. This Webinar covers several of these best practices focused on getting the
biggest impact from big data and driving a proactive, data-driven culture.
Click below to watch the keynote now or view on YouTube.
tdwi.org
57
TDWI EDUCATION
TDWI LIVE
TDWI LIVE captures the essence of our World Conferences by providing users with access to photos, videos,
tweets, and more. Highlights of each conference are posted daily using Storify, which pulls out the most
interesting photos, videos, and tweets into a format that allows them to be seen side by side. This blended
social media experience provides users with key moments from the conference without having to sift
through dozens of postings to multiple sites.
tdwi.org
58
2014 TDWI Events Calendar

Each year, TDWI offers five major educational conferences, executive summits, educational seminars and
symposiums, and more. Heres a selection of some of our upcoming events.
View all TDWI Education events
re
ut Mo
Find O
e
Find Out Mor
Find Out More
Find Out Mor

e
Find Out More
Find Out More
Find out more about the many education events

and programs TDWI has to offer!
TDWI World Conferences

TDWI BI Executive Summits
Find Out More
TDWI Seminars
TDWI Symposiums
TDWI Solution Summits
TDWI Onsite Education
TDWI CBIP Certification
tdwi.org
59
Best Practices Awards 2013

TDWIs Best Practices Awards recognize
organizations for developing and implementing
world-class business intelligence and data
warehousing solutions. Here are summaries
of the winning solutions for 2013. For more
information, visit tdwi.org/bpawards.
2013
tdwi.org
2013
60
Organizational Structures
SAP
SAP is an enterprise application software vendor, assisting
companies to run better using business insight effectively
to stay ahead of the competition.
To be a best-run business ourselves, we initiated the
One SAP for Data Quality program to improve business process efficiency, drive cost improvements, and
increase reliability of decision-making analysis. With
our organizational structure of global, regional, and
line-of-business teams, we are instilling a culture of data
ownership and quality at SAP. We partner with business
leads to increase the quality of enterprise master data
across all business lines.
This information governance program has delivered
business benefits since 2009. Unlike similar programs,
the business is the primary program sponsor and clear
stakeholder in the outcome of the programs process
re-engineering efforts and IT solution improvements.
The program began with clearly defined business goals,
guiding investment and prioritization of activities.
Although every organization has a different culture
and responds to different approaches, elements of our
approach could be helpful to every organization. At SAP,
we approach data governance by embedding metrics and
KPIs into the governance processes as a way to guide
investment, track progress, and encourage future investment. For example, productivity improvement of our sales
force has exceeded 20 percent.
The core global data management team that drives and
manages the One SAP for Data Quality information
governance program is able to multiply this effect in
larger organizations with our commitment to communicate and demonstrate the business case for information
governance. We have developed innovative methods
for assessing information management capabilities and
combining process and data management.
Government and Non-Profit

USDA Risk Management Agency (RMA)

Solution Sponsor: Teradata Corporation
The Risk Management Agency (RMA) of the U.S.

Department of Agriculture (USDA) oversees the Federal
Crop Insurance Corporation, the primary source of risk
protection for Americas farmers. The program provides
approximately 1.3 million crop insurance policies, with
over $124 billion in protection for major crops. The
RMAs compliance mission is to ensure that (1) adequate
safeguards are in place to avoid potential fraud and abuse
and (2) claims are accurate so benefits can be distributed
equitably.
To counter fraud, waste, and abuse, the Agriculture
Risk Protection Act of 2000 mandated the use of a data
warehouse and data mining technologies to improve crop
insurance program compliance and integrity. RMA asked
the Center for Agriculture Excellence (CAE) at Tarleton
State University to create a system to monitor and analyze
the program, identifying fraud using satellite, weather,
and remotely sensed data to analyze claims filed by farmers for anomalous behavior that could indicate fraudulent
or other improper payments. CAE is at the leading edge
of application of remote sensing to agricultural insurance.
The RMA program has had several significant impacts,
including:
Identification of anomalous claims, plus monitoring as
a preventive measure
Linking claims histories with weather data
Integration of the latest MODIS and Landsat satellite
data into the data mining process
Automated claims analysis
The results: cost avoidance of over $1.5 billion

(20012007) scored by the Congressional Budget
Office. Estimated reductions from prior year indemnities
represent more than a $23 return for every dollar spent by
RMA on data mining since its inception.
One initiative produced a list of producers who were
subjected to increased compliance oversight; from 2001
to 2011, this reduced unneeded indemnity payments by
approximately $838 million.
tdwi.org
61
Enterprise DM Strategies
Advance Auto Parts (co-winner)

Solution Sponsor: Stibo Systems
Advance Auto Parts, Inc. (AAP) is an automotive

aftermarket retailer established in 1932. With 2012 sales
of $6.2 billion, AAP offers parts, accessories, batteries,
and maintenance items to both the do-it-yourself and
professional installer markets, with 3,794 stores in 39
states, Puerto Rico, the Virgin Islands, and through an
online shopping channel.
Advance Auto Parts embarked on its master data management (MDM) and product information management
(PIM) initiative to achieve service leadership and superior
product availability by: streamlining its merchandising
efforts and time to market; increasing revenue and
customer satisfaction; and receiving accurate information
from manufacturers and complying with the Automotive
Aftermarket Industry Association (AAIA) standards.
AAP sought to consolidate data for more than 650,000
products (SKUs) for over 350,000 vehicle models,
including more than 35 million part-to-vehicle relations
and over 500,000 interchange records (records that
cross-reference the part number with other part numbers
that could serve as a viable replacement).
Leveraging a multi-domain master data management
platformStibo Systems STEPAdvance Auto Parts
achieved buying accuracy, lowered return rates by offering the correct part/item the first time, improved sales
with accurate inventory, and created a fully automated
data processing pipeline for vendor-supplied data to be
delivered in industry standard formats.
The companys objectives for the MDM/PIM included:
Receive accurate information from manufacturers and
third-party vendors
Comply with Automotive Aftermarket Industry
Association (AAIA) standards
Reduce data entry costs
Reduce product returns
Improve reaction time
Improve time-to-market
The MDM/PIM automates the validation and processing

of vendor-supplied information and takes delivered data
through an optimized workflow process for the few pieces
of data that require manual review or approval. Once
processed and approved, the STEP system syndicates
the information to AAPs Enterprise Service Bus via a
WebSphere MQ interface ensuring guaranteed delivery.
Enterprise DM Strategies
Standard Life plc (co-winner)
Standard Life is an international long-term savings and
investments company headquartered in Edinburgh,
Scotland, responsible for the management of 218 billion
in assets (as of December 31, 2012).
The company wanted to develop a generic, reusable ETL
framework aimed at returning maximum ROI for its
business customers by increasing development productivity and reducing risk of deviation from design.
The project to deliver the EDW_TOOLKIT has ensured
consistent application of Kimball-based design patterns
across all projects. The EDW_TOOLKIT is composed of
over 40 generic components that form part of a mature
and extensive framework to enable cost-effective configuration of the ETL process. This has realized significant
cost savings (estimated to be in excess of 5M) through
reduced development and maintenance effort, while at
the same time significantly increasing the probability of
project success.
Having had a vision to build the EDW_TOOLKIT and
provide productivity and risk reduction benefits, we
are not finished. Our next chapter in innovating ETL
development involves continued streamlining of the
ETL development environment to minimize the requirement for technical resources in the project. This will
allow systems analysts and data analysts to configure
and build ETL jobs by configuring an end-to-end ETL
process rather than individual code modules within this
ETL process.
In addition, we want to expose business rules to systems
analysts, data analysts, and business support analysts so
they can view, define, and modify rules within the ETL
process. In a business-as-usual, support environment,
this will allow the ETL process to be modified with less
reliance on valuable technical resources.
Enterprise Data Warehousing

Stanford Hospital and Clinics

Solution Sponsor: Health Catalyst
Stanford Hospital and Clinics (SHC) is known worldwide

for advanced patient care provided by its physicians and
staff, particularly for the treatment of rare, complex
disorders. SHC is ranked by US News & World Report
as one of the best hospitals in America. Closely aligned
with Stanford University Medical School, SHC includes
a 613-bed ACS Level 1 Trauma Center, with over 2,100
medical staff, 700 interns and residents, and 2,100 RNs
servicing more than 725,000 inpatient, outpatient, and ER
visits each year.
SHC deployed the Health Catalyst Late-Binding
Enterprise Data Warehouse (EDW), along with a unique
multi-disciplinary approach and processes to drive quality
interventions that improve health outcomes for patients.
The project is especially critical to enable SHC to achieve
its triple aim of improved patient care, reduced costs, and
enhanced patient satisfaction.
Stanford Hospital and Clinics EDW architecture, along
with a unique multi-disciplinary approach and methodology, have driven quality interventions that, in early
results, have improved health outcomes for Stanfords
patients. These two innovations combined to give SHCs
users previously unheard-of insight into opportunities for
quality improvement and a clear process for the deployment of focused interventions.
As a result, SHC produced a number of early, promising
results, including reducing 30-day readmissions for heart
failure patients and significantly lowering staff surveillance
requirements for major hospital-acquired infections. Such
improvements are mission critical for todays hospitals,
which are confronted by a revolution in their fundamental
profit structure. To succeed under federal health reform,
they must improve the quality of care they deliver, be able
to measure and report on health outcomes across their
entire patient population, and drive down costs.
Performance Management
Quicken Loans, Inc.
Quicken Loans, Inc., is the nations largest online mortgage lender and third largest retail lender. The company
closed more than $70 billion in home loans in 2012, a 133
percent increase over the previous record of $30 billion set
in 2011. The company also doubled in size.
This growth can be attributed to the success of our online
lending platform. Our scalable, technology-driven loan
platform has allowed us to handle a large surge in loan
applications while keeping closing times for the majority of
our loans at 30 days or less.
The speed from loan application to close is due in large
part to the focus Quicken Loans places on operations
performance management, which allows us to meet client
needs as thoroughly and quickly as possible. Over the
past eight years, performance management has evolved
from a manual process of report generation to self-service
dashboards and user-defined alerts that allow business
leaders to proactively deal with obstacles and identify
opportunities for growth and improvement.
At Quicken Loans, performance management is not a
top-down project but a collaborative process between
the business intelligence and technology teams and the
business. This has reduced the time it takes to provide the
metrics to business leaders and has dramatically increased
the effectiveness of our solutions. Our culture and our
processes drive how we identify metrics, measure them,
and use them as a basis for improvement.
tdwi.org
63
Emerging Technologies and Methods

Motorola Mobility
Enterprise BI
Aircel Limited
Solution Sponsor: Cloudera
Motorola Mobility, a Google company, creates mobile

experiences for consumers. Whether calling, writing
e-mail, playing games, listening to music, watching a
movie, taking pictures and videos, surfing the Web,
collaborating, or sharing contentMotorola Mobility puts
consumers at the center of it all.
Motorola Mobility is leading the extension of traditional
enterprise data warehousing systems and tools to big data
and analytics systems. Motorola introduced the concept
of a unified data repository platform that breaks enterprise
data silos and helps drive a comprehensive global view of
the business and its customers by harnessing the power
of data.This platform is built using technologies that
scale on demand and at price points and speeds orders of
magnitude better than traditional technologies.
With the adoption of a cloud-based solution, Motorola
uses the power of the open source, distributed application
framework Hadoop to run computations previously
constrained by cost, scalability, speed, or resources. The
company can now conduct large-scale, complex analyses
across a mix of internal and external data sources so
the business can evaluate the performance of existing
products, understand consumer perceptions of its products
and features, get feedback about potential new product
offerings, and identify and address quality issues before
they impact customers.
Motorolas approach is beginning to have a meaningful
impact on business KPIs. Customer satisfaction scores
have risen based on call center insights, product returns (a
metric of quality) are decreasing based on service center
insights, and Motorola forums and other open forums
are helping drive insights into customer sentiment. This,
in turn, is helping internal business functions to respond
effectively to the voice of the customer. The business is
now better positioned to manage its product line, forecast
demand, and manage supply chain inventory with greater
accuracy and shorter turnaround.
Aircel is an innovative data-led telecom company with

operations across the country, serving more than 60
million subscribers.
The company had many heterogeneous systems capturing day-to-day customer interactions and transactions,
requiring analytical capabilities for business users to
generate an integrated analytical view of customer
demographics, usage patterns, and social behavior. With
market demands, future growth, and telecom dynamics,
Aircel initiated a business intelligence project to provide a
single customer view that helps it construct the best-suited
products and offers for increased customer satisfaction and
additional revenue.
The enterprise EDW/BI solution brings together lines
of business and over 20 varied data streams under one
umbrella, enabling a 360-degree view of customer lifetime
value and effecting synergies among sales and marketing
and customer relationship management. It has empowered
business users to explore previously unrealizable analytical
opportunities. The integrated view made it possible to
analyze the customer life cycle, including such tasks as
customer identification, customer acquisition, customer
relationship management, customer retention, and customer value enhancementand to do it instantaneously.
Business users can now analyze all aspects of the customer
life cycle and micro-segment the customer base to create
or release segment-specific retention and cross-sell and
up-sell offers to improve average revenue per user while
ensuring a uniform experience across customer touch
points.
Examples of business value include:
Monthly revenue increase of INR 47 million (US

$860,000) with reduced information latency and
availability of the latest customer profile, which allows
Aircel to recommend personalized products and
content through various touch points.
The campaign management system leverages the BI
profiling framework, which has helped increase annual
revenue by more than INR 160 million (US $2.9 million) by pushing best-fit products and offers to selected
customer segments.
tdwi.org
64
BI on a Limited Budget
USAA
USAA is a member-owned Fortune 500 company that
serves more than 9.4 million members of the military
and their families with a variety of insurance, banking,
investment, and financial service products.
The company looked for a solution to allow USAA to
quickly determine the impact of global financial events on
USAAs investment portfolio.
A core team (IT lead and business subject matter expert)
built the first iteration of the data mart tables and associated business semantic layer (the interface between the
data mart and the reports) to allow for fast development
and future intuitive ad hoc, self-service reporting. Only
after numerous report, table, and semantic layer iterations by the core team were ETL and database resources
brought in to build the ETL process and finalize the
database design. This minimized relatively expensive and
time-consuming ETL and data mart work.
The solution helps USAA meet increasingly stringent state
and federal requirements related to effective risk monitoring. It also reduces labor costs and increases accuracy in
reporting investment positions. The capabilities needed
to be delivered quickly due to the fast-paced and everchanging investment landscape.
Faced with increasingly stringent state and federal investment portfolio management requirements, USAAs CIO
team needed to quickly implement an effective solution
utilizing the limited development dollars available. This
project utilized creative development techniques with a
core team of multi-skilled resources; 70 percent of the
work was completed by two people with all the business
and IT skills needed to build the first working development solution prototype. The business resource had an
understanding of relational databases; the IT lead resource
understood financial investments. The iterations were
fast and strictly focused on processes that were absolutely
necessary.
In just 10 weeks, with an IT investment of $26,000, a
fully automated solution was delivered that far exceeded
the CIO teams expectations.
Advanced Analytics
Telenor Pakistan

Telenor Pakistan is the second largest mobile telecom

operator in Pakistan, with over 30 million subscribers
and a 25 percent market share. It is 100 percent owned
by Telenor Group, a Norwegian multinational telecommunications company.
When Telenor Pakistan was launched in 2005, it had a
basic reporting data warehouse with not enough subject
areas to enable views for operational business requirements or the scalability to add sources crucial for strong
positioning in a market increasingly dominated by price
competition. The shift to a robust Teradata Enterprise
Data Warehouse (EDW) delivered the 360-degree
customer view using data from multiple source systems.
Today, the EDW and integrated BI platform go beyond
traditional reporting, extending the EDW to commercial
applications of analytics that intelligently route calls,
drive cross-selling and up-selling, enable customer
centricity in marketing strategies, customize offers for
individual customers, target micro-segments, and predict
customer behavior.
This strategic EDW platform provides a sustainable
competitive advantage. With improved customer service
and increased revenue, Telenor Pakistan has strengthened
its second-place market position. Among the applications:
Magic screen combines customer data, CRM functionality, BI, and data mining in a single view
Behavior-centric routing scores every customer using

a host of criteria to route callers to the right type
of agent
System X allows marketing to rapidly define incentives

for customers in near real time
Churn prediction uses data mining techniques to quickly

identify probable churners and facilitates incentives for
them to stay
Behavioral segmentation allows respective brand managers to identify which groups of subscribers are behaving in line with the brands unique selling proposition
Location intelligence is a strategic tool for creating

visual representation of geographic relationships by
combining streams of commercial data and technology
KPIs in the EDW
tdwi.org
65
Birst
www.birst.com
BI Categories: Analytics and Reporting; Business Intelligence
for SAP; Dashboards, Scorecards, and Visualization; Enterprise
Business Analytics; Enterprise Business Intelligence; Predictive
Analytics
BI Solutions
Transforming Technologies
Our sponsors present their

solutions in the following
business intelligence categories:
Birst is the only enterprise-caliber business intelligence

platform born in the cloudgiving business teams the ability
and agility to solve real problems. Fast. Engineered with
automated data integration and powerful analytics, along
with the ease of use of the cloud, Birst is less costly and more
flexible than legacy BI and more powerful than data discovery.
Find out why Gartner named Birst a Challenger in its most
recent BI Magic Quadrant and why more than a thousand businesses rely on Birst for their analytic needs. Learn to think fast
at www.birst.com and join the conversation @birstbi.
Analytics and Reporting

Business Intelligence for SAP
Dashboards, Scorecards, and
Visualization
Data Warehousing
Enterprise Business Analytics
Enterprise Business Intelligence
Predictive Analytics
Self-Service Analytics
tdwi.org
66
HP Vertica
Information Builders
www.vertica.com
www.informationbuilders.com
BI CATEGORIES: Data Warehousing; Enterprise Business Analytics;

Predictive Analytics
BI CATEGORIES: Business Intelligence; Self-Service Analytics
Self-Service Analytics: Data-Informed Decisions for Everyone

Purpose built for big data analytics from the very first line of
code, the HP Vertica Analytics Platform enables companies to
perform analytics at the speed and scale they need to thrive
in todays big data world. HP Verticas scalability and flexibility
are unmatched in the industry, delivering 50x1,000x faster
performance than traditional enterprise data warehouses at
an overall lower total cost of ownership. And to avoid vendor
lock-in, HP Vertica is tightly integrated with Hadoop, R, and
other open source technologies, leading BI/ETL environments,
and runs on industry-standard hardware. Trusted by the largest
and most innovative organizations in the world, the HP Vertica
Analytics Platform includes such blue-chip companies as
Facebook, Verizon, Guess Inc., Zynga, Comcast, Cerner, and
thousands of others.
The more independent your organizations decision makers

can become, the more streamlined, efficient, and profitable your business can be. That starts with their ability
to analyze enterprise information when making those
decisions.
The self-service analytics capabilities in WebFOCUS, our
business intelligence (BI) and analytics platform, give
everyone in your organizationeven nontechnical professionalsthe power to generate dashboards and reports,
run queries, and conduct their own analyses, without the
assistance of IT staff.
Self-service analytics features in WebFOCUS include:
HP Vertica is part of HPs HAVEn platform, pulling together

multiple products and services into a comprehensive big
data platform that provides end-to-end information management for a wide range of structured and unstructured
data domains.
InfoAssist and App Studio, tools that enable business

users and developers to explore information
InfoApps and BI Portal, self-service solutions that fill
the needs of business users and consumers
Data Discovery, that allow analytical users to solve
issues by exploring their own data and enterprise data
With WebFOCUS, functional workers can make faster, better
decisions because they no longer have to wait during long
reporting backlogs. At the same time, technical teams
will be freed from the burden of satisfying end user report
requests, so they can focus their efforts on more strategic
IT initiatives.
Just as importantly, we enable organizations to embed
search, predictive, and social analytics into a single BI app.
tdwi.org
67
Paxata
Tableau Software
www.paxata.com
www.tableausoftware.com
BI CATEGORIES: Data Preparation; Data Integration; Data Quality
BI CATEGORIES: Analytics and Reporting; Business Intelligence;

Dashboards, Scorecards, and Visualization; Enterprise Business
Analytics; Enterprise Business Intelligence; Predictive Analytics
Just as Tableau and QlikView have revolutionized the way

you see data, Paxata has transformed the way you prepare
data. With Paxatas Adaptive Data Preparation solution,
every business analyst can:
Import data from Hadoop and other sources regardless
of format (Excel, Flat Files, Relational Databases, XML,
JSON). Paxata will automatically parse and identify the
meaning of the data.
Tableau Software helps people see and understand data.

Anyone can analyze, visualize, and share information quickly.
More than 15,000 customer accounts get rapid results with
Tableau in the office and on-the-go. And tens of thousands
of people use Tableau Public to share data in their blogs and
websites. See how Tableau can help you by downloading the
free trial at www.tableausoftware.com/tdwi.
Explore data like never before, with full-text search,

interactive textual and numeric filters and histograms,
and visual data quality heat maps highlighting errors,
duplicates, and missing data. Then transform data
on the fly (split columns, concatenate, de-duplicate,
detect, and remediate blanks and other errors) without
any scripting, SQL, or complex Excel functionality like
VLOOKUPS, pivot tables, and macros.
Enrich data with additional firmographic, demographic
and social data based on the context of your analytics.
Share and work simultaneously with peers on data sets
without a pre-determined workflow. The Paxata Data
Time Machine, the industrys first cloud-based data
lineage solution, provides security, versioning, and
usage visibility on every project.
Combine as the system detects common attributes
across multiple data sets, join them into a single view,
and merge multiple entity references into a single,
trusted master entity: the Paxata Answer Set.
Publish your Answer Sets into BI tools like Tableau,
QlikView, and Excel so you use the visualization and
discovery solutions you love for analytics.
Get a Paxata demo and trial today. Visit www.paxata.com.
tdwi.org
68
About TDWI
TDWI, a division of 1105 Media,
Inc., is the premier provider of indepth, high-quality education and
research in the business intelligence
and data warehousing industry.
TDWI is dedicated to educating
business and information technology
professionals about the best practices, strategies, techniques, and
tools required to successfully design,
build, maintain, and enhance
business intelligence and data warehousing solutions. TDWI also fosters
the advancement of business
intelligence and data warehousing
research and contributes to
knowledge transfer and the
professional development of its
members. TDWI offers a worldwide
membership program, five major
educational conferences, topical
educational seminars,
role-based training, onsite courses,
certication, solution provider
partnerships, an awards program
for best practices, live Webinars,
resourceful publications, an
in-depth research program, and a
comprehensive Web site, tdwi.org.
TDWI Education has even more to

offer. Visit tdwi.org/education for
a full lineup of Solution Summits,
Premium Membership
Chapters
tdwi.org/premiummembership
tdwi.org/chapters
In a challenging and ever-changing business

intelligence and data warehousing environment, TDWI Premium Membership offers a
cost-effective solution for maintaining your
competitive edge. TDWI will provide you
with a comprehensive and constantly growing selection of industry research, news and
information, and online resources. TDWI offers
a cost-effective way to keep your entire team
current on the latest trends and technologies.
TDWIs Team Membership program provides
significant discounts to organizations that register individuals as TDWI Team Members.
TDWI sponsors chapters in regions throughout

the world to foster education and networking
at the local level among business intelligence
and data warehousing professionals. Chapter
meetings are open to any BI/DW professional.
Please visit our Web site to find a local chapter
in your area.
World Conferences
tdwi.org/conferences
TDWI World Conferences provide a unique

opportunity to learn from world-class instructors, participate in one-on-one sessions with
industry gurus, peruse hype-free exhibits, and
network with peers. Each six-day conference
features a wide range of content that can help
business intelligence and data warehousing
professionals deploy and harness business
intelligence on an enterprisewide scale.
Seminar Series
tdwi.org/seminars
TDWI Seminars offer a broad range of courses

focused on the skills and techniques at the
heart of successful business intelligence
and data warehousing implementations. The
small class sizes and unique format of TDWI
Seminars provide a high-impact learning
experience with significant student-teacher
interactivity. TDWI Seminars are offered at
locations throughout the United States
and Canada.
O n s i t e e d u c at i o n
tdwi.org/onsite
TDWI Onsite Education is practical, high-quality, vendor-neutral BI/DW education brought

to your location. With TDWI Onsite Education,
you maximize your training budget as your
team learns practical skills they can apply to
current projectswith Onsite training tailored
to their specific needs.
Certified
P r o f e s s i o n a l ( C BI P )
tdwi.org/cbip
Convey your experience, knowledge, and

expertise with a credential respected by
employers and colleagues alike. CBIP is an
exam-based certification program that tests
industry knowledge, skills, and experience
within four areas of specializationproviding
the most meaningful and credible certification
available in the industry.
Webinar Series
tdwi.org/webinars
TDWI Webinars deliver unbiased information

on pertinent issues in the business intelligence and data warehousing industry. Each
live Webinar is roughly one hour in length and
includes an interactive question-and-answer
session following the presentation.
Solution Spotlights, Forums, and

BI Executive Summits.
tdwi.org
69

Tdwi Bobi 2014 Web Updated

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tdwi Bobi 2014 Web Updated

Uploaded by

Copyright:

Available Formats

Volume 11

The very best of

PLUS 2014 Forecast:

The State of Big Data Management

Inside Facebook's Relational Platform

Implementation Practices for

What Data Warehouses Can Learn from Big Data

The Future of Customer-Centric Retail

See and understand your data

Drag & drop data visualization

Tableau is changing the way companies are analyzing and sharing

5 2013 in Review: Cloud BI Takes Off

10 2014 Forecast: BI, Analytics, and Big Data Trends

TDWI BEST PRACTICES REPORTS

21 The State of Big Data Management

29 TEN MISTAKES TO AVOID SERIES

BUSINESS INTELLIGENCE JOURNAL

TDWI CHECKLIST REPORT

55 NEW! Vote for Your Favorite

TDWIS BEST OF BI VOL. 11

Take a look at whats under our hood attdwi.org

Editorial Directors note

Editorial Director Denelle Hanlon

Senior Production Editor Roxanne Cooke

Director of Education Paul Kautza

Director, Online Melissa Reeve

President & Neal Vitale

Senior Vice President & Richard Vitale

Executive Vice President Michael J. Valenti

Vice President, Information Erik A. Lindgren

Vice President, David F. Meyers

Chairman of the Board Jeffrey S. Klein

E-mail: To e-mail any member of the staff, please use the

Welcome to the eleventh annual TDWIs Best of Business Intelligence: A Year

Were also including a selection of our informative, on-demand Webinars, as

TDWIS BEST OF BI VOL. 11

Big Data Analytics - No Limits, No Compromises

Optimized Data Storage

Blazing - Fast Analytics

Store 10x-30x more data per server than

Leverage tight built-in support for Hadoop,

Run queries 50x-1,000x faster than legacy

ClOUD BI TAKES OFF

MicroStrategyalso championed self service as a solution

Search might be marketed as a silver

In Search of a Silver Bullet

rigid data model, information from multi-structured

At Long Last, Cloud?

TDWIS BEST OF BI VOL. 11

Data, which markets a hosted big data analytic service

As 2013 draws to a close, in fact, it's fair

It scales well. Just like any MPP system, it scales based on

Hadoop, NoSQL, and Google F1

TDWIS BEST OF BI VOL. 11

database on the Apache CouchDB project), DataStax

Final Thoughts: Business unIntelligence and

The Business of BI: Comings, Goings,

Over the last quarter century,

Over the last quarter century, organizations have invested

Elsewhere this year, Tableau's long-rumoured IPO finally

The net net of this ubiquity, as Devlin demonstrates

So, too, will that of Teradata, which this year became

Devlin was back in 2013 with a new book, Business

TDWIS BEST OF BI VOL. 11