Download as pdf or txt
Download as pdf or txt
You are on page 1of 468

Business Analytics

Sub Code - 705

Developed by
Prof. Abasaheb Chavan

On behalf of
Prin. L.N. Welingkar Institute of Management Development & Research
Advisory Board
Chairman
Prof. Dr. V.S. Prasad
Former Director (NAAC)
Former Vice-Chancellor
(Dr. B.R. Ambedkar Open University)

Board Members
1. Prof. Dr. Uday Salunkhe 2. Dr. B.P. Sabale 3. Prof. Dr. Vijay Khole 4. Prof. Anuradha Deshmukh
Group Director Chancellor, D.Y. Patil University, Former Vice-Chancellor Former Director
Welingkar Institute of Navi Mumbai (Mumbai University) (YCMOU)
Management Ex Vice-Chancellor (YCMOU)

Program Design and Advisory Team

Prof. B.N. Chatterjee Mr. Manish Pitke


Dean – Marketing Faculty – Travel and Tourism
Welingkar Institute of Management, Mumbai Management Consultant

Prof. Kanu Doshi Prof. B.N. Chatterjee


Dean – Finance Dean – Marketing
Welingkar Institute of Management, Mumbai Welingkar Institute of Management, Mumbai

Prof. Dr. V.H. Iyer Mr. Smitesh Bhosale


Dean – Management Development Programs Faculty – Media and Advertising
Welingkar Institute of Management, Mumbai Founder of EVALUENZ

Prof. B.N. Chatterjee Prof. Vineel Bhurke


Dean – Marketing Faculty – Rural Management
Welingkar Institute of Management, Mumbai Welingkar Institute of Management, Mumbai

Prof. Venkat lyer Dr. Pravin Kumar Agrawal


Director – Intraspect Development Faculty – Healthcare Management
Manager Medical – Air India Ltd.

Prof. Dr. Pradeep Pendse Mrs. Margaret Vas


Dean – IT/Business Design Faculty – Hospitality
Welingkar Institute of Management, Mumbai Former Manager-Catering Services – Air India Ltd.

Prof. Sandeep Kelkar Mr. Anuj Pandey


Faculty – IT Publisher
Welingkar Institute of Management, Mumbai Management Books Publishing, Mumbai

Prof. Dr. Swapna Pradhan Course Editor


Faculty – Retail Prof. Dr. P.S. Rao
Welingkar Institute of Management, Mumbai Dean – Quality Systems
Welingkar Institute of Management, Mumbai

Prof. Bijoy B. Bhattacharyya Prof. B.N. Chatterjee


Dean – Banking Dean – Marketing
Welingkar Institute of Management, Mumbai Welingkar Institute of Management, Mumbai

Mr. P.M. Bendre Course Coordinators


Faculty – Operations Prof. Dr. Rajesh Aparnath
Former Quality Chief – Bosch Ltd. Head – PGDM (HB)
Welingkar Institute of Management, Mumbai

Mr. Ajay Prabhu Ms. Kirti Sampat


Faculty – International Business Manager – PGDM (HB)
Corporate Consultant Welingkar Institute of Management, Mumbai

Mr. A.S. Pillai Mr. Kishor Tamhankar


Faculty – Services Excellence Manager (Diploma Division)
Ex Senior V.P. (Sify) Welingkar Institute of Management, Mumbai

COPYRIGHT © by Prin. L.N. Welingkar Institute of Management Development & Research.


Printed and Published on behalf of Prin. L.N. Welingkar Institute of Management Development & Research, L.N. Road, Matunga (CR), Mumbai - 400 019.

ALL RIGHTS RESERVED. No part of this work covered by the copyright here on may be reproduced or used in any form or by any means – graphic,
electronic or mechanical, including photocopying, recording, taping, web distribution or information storage and retrieval systems – without the written
permission of the publisher.

NOT FOR SALE. FOR PRIVATE CIRCULATION ONLY.

1st Edition, June 2020


BIO SKETCH

Abasaheb Chavan: Bio sketch

Abasaheb Chavan is a Professional Banker currently working with one of


the fastest growing private sector banks as a Consultant / Advisor and
Trainer for Foreign Exchange and Trade Products. He is professional Banker
withmore than 43 years of Banking experience.

His areas of expertise includes International Trade Finance; Corporate and


Retail Banking; Rupee Drawing Arrangements with International Banks and
Exchange Houses, NRI business, External Commercial Borrowings (ECB);
Capital investment under Foreign Direct Investment in India (FDI);
Overseas Direct Investment (ODI); Setting up of offices in India by
overseas entities such as Liaison Office (LO), Branch Office (BO), Project
office (PO) and also establishing Office overseas.

Mr. Abasaheb Chavan is experienced in preparation of policies and


procedural guidelines for various Trade products and other regulatory
matters which includes project finance, mergers, acquisitions,
amalgamation, take over etc and other capital account deals pertains to
venture capital and crowdfunding.

Mr. Abasaheb Chavan was an Executive Committee Member of


International Chamber of Commerce (ICC), Paris, Delhi, Managing
Committee Member of Foreign Exchange Dealers Association of India,
Mumbai (FEDAI),and Examiner for Indian Institute of Banking and Finance
(IIBF), Foreign Exchange and Risk Management and a Nodal Officer to the
Reserve Bank of India.

Mr. Chavan holds a Master’s degree in Science (M.Sc.) and is a Certified


Associate of Indian Institute of Banking and Finance (CAIIB).

3
BIO SKETCH

He is also a professional trainer and is providing training to bank


employees at various levels on various subjects. He provides the advisory
Services and conducts the seminars for Importers, Exporters and investors
and also participated in various conferences organized by renowned bodies
and government / semi government organization.

4
CONTENTS

Contents

Chapter No. Chapter Name Page No.

1 Business Analytics-Overview 6-43


2 Components of Business Analytics 44-92
3 Digital Data and its types 93-136
4 Business Intelligence 137-162
5 Big Data 163-197
6 Data Mining 198-228
7 Descriptive Analytics 229-256
8 Diagnostic Analytics 257-275
9 Predictive Analytics 276-298
10 Prescriptive Analytics 299-325
11 Business Analytics Process 326-350
12 Business Analytics Applications 351-378
13 Programming Languages and software used in 379-411
Data Analytics
14 Business Analytics and Digital Transformation 412-439
15 Case studies in Business Analytics 440-468

5
BUSINESS ANALYTICS: OVERVIEW

Chapter 1
BUSINESS ANALYTICS: OVERVIEW
Objectives:

This chapter will be overview of Business analytics considering some major


elements that appears in general when we talk about business analytics.
However there some other important elements which requires detailed
understanding that will be covered under different chapters. On completion
of this chapter, you will understand overall concept of Business Analytics
including need of Business Analytics, its components, types and techniques
etc. in brief considering following:

Structure:

1.1 Introduction

1.2 Business Analytics- Need

1.3 Developing Analytical Thinking

1.4 Components of Business Analytics

1.5 Types of Business Analytics

1.6 Analytical Models

1.7 Business Analytics, Reporting and Intelligence

1.8 Future of the Business analytics and Emerging trend

1.9 Future of the consulting and analytics

1.10 Introduction to new Analytical Architecture

1.11 Growth of advanced Analytical Techniques

1.12 Business Analytics Examples

1.13 Summary

1.14 Self Assessment Questions


1.15 Multiple Choice Questions

6
BUSINESS ANALYTICS: OVERVIEW

1.1 INTRODUCTION

In the era of knowledge economy, getting the right information to decision


makers at the right time is critical to their business success. One such
attempt includes the growing use of business analytics. Generally
speaking, business analytics refers to a broad use of various quantitative
techniques such as statistics, data mining, optimization tools, and
simulation supported by the query and reporting mechanism to assist
decision makers in making more informed decisions within a closed-loop
framework seeking continuous process improvement through monitoring
and learning. Business analytics also helps the decision maker predict the
future business activities based on the analysis of historical patterns of
past business activities. For example, your nearby grocery chain, such as
“Big Bazar”, might frequently issue discount coupons tailored for each
customer based on his past shopping patterns. This practice encourages
the customer to consider buying the discounted but favorite items
repeatedly, while building customer loyalty. This practice is possible, due to
smart use of business analytics which allows the grocery store to figure out
which items are likely to be purchased by which customer in his next
grocery shopping trip.

Analytics is the science of analyzing the data for decision making. Then,
what is Business Analytics? Business analytics is the process of collating,
sorting, processing, and studying business data, and using statistical
models and methodologies to transform data into business insights. The
goal of business analytics is to determine which datasets are useful and
how they can be leveraged to solve problems and increase efficiency,
productivity, and revenue.

Although, business analytics has been rapidly gaining popularity among


practitioners and academicians alike in the recent past, its conceptual
foundation has existed for centuries. One of the first forms of business
analytics may be statistics whose uses can be traced back at least to the
biblical times in ancient Egypt, Babylon, and Rome. Regardless of historical
facts, its longevity may be attributed to its usefulness for helping the policy
maker to take a better decision.

In other words, whatever the form of business analytics may be, it would
help us answer the following fundamental questions which are critical for
decision making:

7
BUSINESS ANALYTICS: OVERVIEW

1. What happened?
• What did the data tell us?

2. Why did a certain event take place?


• Why did it happen?
• What are the sources of problems?

3. Will the same event take place?


• Will the problem recur?
• Are there any noticeable patterns of the problem?

4. What will happen if we change what we used to do?


• How can we deal with the recurring problem?
• What is the value the change will bring?

5. How can we ensure that our changed practices actually work?


• Is there scientific evidence indicating the validity and usefulness of our
changed practices?

By answering the preceding questions, business analytics aims to


accomplish these various goals as under:

• Gaining insights into business practices and customer


behaviors: Business analytics is designed to transform unstructured,
non-standardized big data originated from multiple sources into
meaningful information helpful for a better business decision.

• Improving predictability: By deriving insights into customer behavioral


patterns and market trends, business analytics can improve the
organization’s ability to make demand forecast more accurately.

• Identifying risk: Risk cannot be managed without identifying it and


then preparing for it. Business analytics can function as an early warning
system for detecting the signs or symptoms of potential troubles by
dissecting the business patterns (e.g., shrinking market share, a higher
rate of customer defection, declining stock price).

• Improving the effectiveness of communication: With the query and


reporting mechanism of business analytics, it can not only speed up the
reporting procedures, but also provide user-friendly reports including

8
BUSINESS ANALYTICS: OVERVIEW

“what-if” scenarios. Such reports can be a valuable communication tool


among the decision makers and thus would help the management team
make more timely and accurate business decisions.

• Enhancing operating efficiency: By aiding the decision maker in


understanding the way business works and where the greatest business
opportunities are, business analytics can decrease the chances of making
poor investment decisions and misallocating the company’s resources
and thus would help improve the company’s operating efficiency.

1.2 BUSINESS ANALYTICS - NEED:

Today’s Business is growing increasingly digital and capable of accurately


measuring every aspect of their operations, from marketing to human
resources in real time.

However, data in its raw form is usually useless and driving force behind
any data driven organization is in sights: conclusion drawn from data which
can suggest new course of action. To reach these insights, organization
must use Business analytics tools and techniques to connect the data from
multiple sources, analyze the data and communicate the results in a way
that decision makers can understand. Typically, commercial organizations
use the business analytics in order to;

• Analyze the data from multiple sources


• Use advance analytics and statistics to find the hidden pattern in large
data
• Disseminate information to relevant stake holders through the interactive
dash board and reports
• Monitor KPI (Key Performance Indicators) and react to changing trend in
real time and
• Justify and revise decision based on up to date information

9
BUSINESS ANALYTICS: OVERVIEW

If your business is looking to achieve one or more these goals, business


analytics is way to go. The level of investment in tools, technology and
manpower should vary according to your need – in some cases increasing
your proficiency in excel might be suffice, while in others you might want
to look at specialized solution from BI ( Business Intelligence) software
vendor Analytics add value in there way: Insight, Prediction and
Optimization.

Insight delivers an accurate and deep understanding of the present. For


example, a customer value model tells you why some customers are higher
value, rather than just identifying them. Once you understand these
reasons you can focus on action to obtain and retain high value customers
and spend less on lower value customers.

Prediction uses what you currently know about your business to discover
what will happen next. Imagine the value of knowing your customers likely
future value and their receptiveness to future sales, from the moment they
sign on. Then you could place them in the appropriate channel from the
beginning and improve marketing Bang for Buck. Or imagine forecasting
KPIs based on all data collected by your organization rather than just
“extrapolating Trend”

Optimization turns analytics in to action, bolstering your business intuition


with data to narrow down many alternatives to the best. A next best action
model triggers the marketing approval most likely to yield success for that
particular customer, based on the segment and prior behavior. For
customers on the verge of churning, short term discount might keep them
onboard, while a high value customer may respond to new “Enterprise”
offering to meet their changing need.

10
BUSINESS ANALYTICS: OVERVIEW

1.3 DEVELOPING ANALYTICAL THINKING

Analytical Thinking Critical Thinking


Goal Seek answers/solutions for Determine what is right or wrong
ongoing issues
The Use of To support your conclusions To form your own opinions and
Facts beliefs
Style of Streamlined problem-solving Opinion-based approach with
Thinking approach by the step-by-step constant reasoning and
breakdown of the cause-and- questioning
effect relationships of events/
datasets
Mind-sets Organized system of Open-minded
thoughts
Interpretation Information is useful for No information is considered
of understanding certain events valid, true, applicable, and
Information and explaining patterns/ accurate automatically without
trends; thus, it helps gain clear evidence.
insights into problems to be
solved.
Key Tools Mind-maps, flow diagrams, Brainstorming sessions, open
mathematical tools forums for arguments

Developing analytical thinking is the first step of business analytics,


because it will determine which data should be gathered, where these data
should be acquired, how these data should be analyzed to extract
meaningful information, and how such information can be exploited to
address ongoing business issues. Analytical thinking, however, should not
be confused with critical thinking.

Below table summarizes the subtle differences between analytical and


critical thinking.

11
BUSINESS ANALYTICS: OVERVIEW

Analytical and Critical Thinking

Given the importance of analytical thinking to the successful application of


business analytics, the development of analytical thinking (or nurturing
analytical thinking skills) should precede the adoption of business analytics.

The following summarizes ways to develop analytical thinking in a


systematic manner:

1. Fact-finding and checking through thought experiments: Thought


experiments involve hypothesizing “what-if” scenarios in the imaginary
world and allow us to see the outcome (what will happen) if we select a
certain decision. If the repeated experiments result in the same
outcome, the pattern emerging from those experiments can be a basis
for facts. Also, we can map out the individual outcome resulting from
each choice of the decision.

2. Raising the correct line of reasoning for what you read, learned,
and wrote in the past: Unless we verify the validity of information
sources (e.g., books, published articles, digital media), we can be sold
on logical fallacies and may end up making wrong decisions. To obviate
such mistakes, we should be aware of common logical fallacies based on
the faulty reasoning. For example, a premise based on the inverse
reasoning stating “If you do not reduce product price, it will not affect
product value and thus will not hurt the potential sales of that
product” can lead to no business action when the product at the current
price is not selling well in the market. Instead, we should have
developed the proper reasoning stating “If you reduce product price, it
will improve product value for potential customers and thus increase its
sales.” Other potential sources of logical fallacies may include the hasty
generalization of one-time instance or limited anecdotal incidents, the
unconditional belief in the high authority’s opinions, and the false
association (e.g., a turtle brings good luck to the individual who owns it
as a pet, although it is more likely to contain deadly salmonella bacteria
than other pets).

12
BUSINESS ANALYTICS: OVERVIEW

3. Building a habit of thinking without preconceived notions or


biases: To avoid bias traps, one should bring various perspectives
(viewpoints) in a broad spectrum and figure out what we are taking for
granted. The thought process originating from many different angles will
lower the chance of getting trapped in one flat point of view reflecting
one’s bias or partial facts.

4. Structural thinking: Analytical thinking should guide to develop


meaningful inferences out of gathered data. However, it would be hard
for us to make meaningful inferences and draw conclusions without
structured, step-by-step thought processes. These processes may
include the following steps:

a. Setting purposes of solving the given problems emanating from


particular events, practices, and behaviors.

b. Raising questions about the nature of the given problems.

c. Gathering data and facts associated with the observed problems.

d. Utilizing well-established concepts, theories, axioms, laws, principles,


and models to dissect data and extract meaningful information.

e. Making inferences and drawing conclusions under proper


assumptions.

f. Understanding and generating implications of new problem solutions


found from inferences and conclusions.

13
BUSINESS ANALYTICS: OVERVIEW

1.4 COMPONENTS OF BUSINESS ANALYTICS

Business dashboards have similar /common components, but with a few


key differences. The components of business dashboards include:

• Data Aggregation

Before data can be analyzed, it must be collected, centralized, and


cleaned to avoid duplication, and filtered to remove inaccurate,
incomplete, and unusable data. Data can be aggregated from:

• Transactional records: Records that are part of a large dataset shared


by an organization or by an authorized third party (banking records,
sales records, and shipping records).

• Volunteered data: Data supplied via a paper or digital form that is


shared by the consumer directly or by an authorized third party (usually
personal information).

The best example of data aggression is - data aggregation in the


travel industry.

Data aggregation can be used for a wide range of purposes in the travel
industry. These include competitive price monitoring, competitor research,
gaining market intelligence, customer sentiment analysis, and capturing
images and descriptions for the services on their online travel sites.
Competition in the online travel industry is fierce, so data aggregation or
the lack thereof can make or break a travel company.

Travel companies need to keep up with the ever-changing travel costs and
property availability. They also need to know which destinations are
trending and which audiences they should target with their travel offers.
The data needed to gain these insights is spread across many places on the
internet, making it difficult to gather manually.

14
BUSINESS ANALYTICS: OVERVIEW

Data Mining

In the search to reveal and identify previously unrecognized trends and


patterns, models can be created by mining through vast amounts of data.
Data mining employs several statistical techniques to achieve clarification,
including:

• Classification: Used when variables such as demographics are known


and can be used to sort and group data

• Regression: A function used to predict continuous numeric values,


based on extrapolating historical patterns

• Clustering: Used when factors used to classify data are unavailable,


meaning patterns must be identified to determine what variables exist

Let us see in following 2 examples how datamining is working:

1. Banking - Banks use data mining to better understand market risks. It


is commonly applied to credit ratings and to intelligent anti-fraud
systems to analyse transactions, card transactions, purchasing patterns
and customer financial data. Data mining also allows banks to learn
more about our online preferences or habits to optimise the return on
their marketing campaigns, study the performance of sales channels or
manage regulatory compliance obligations.

2. Medicine - Data mining enables more accurate diagnostics. Having all


of the patient's information, such as medical records, physical
examinations, and treatment patterns, allows more effective treatments
to be prescribed. It also enables more effective, efficient and cost-
effective management of health resources by identifying risks,
predicting illnesses in certain segments of the population or forecasting
the length of hospital admission. Detecting fraud and irregularities, and
strengthening ties with patients with an enhanced knowledge of their
needs are also advantages of using data mining in medicine.

15
BUSINESS ANALYTICS: OVERVIEW

• Association and Sequence Identification

In many cases, consumers perform similar actions at the same time or


perform predictable actions sequentially. This data can reveal patterns such
as:

❖ Association: For example, two different items frequently being


purchased in the same transaction, such as multiple books in a series or
a toothbrush and toothpaste.

❖ Sequencing: For example, a consumer requesting a credit report


followed by asking for a loan or booking an airline ticket, followed by
booking a hotel room or reserving a car.

❖ Text Mining: Companies can also collect textual information from social
media sites, blog comments, and call center scripts to extract meaningful
relationship indicators. This data can be used to:

- Develop in-demand new products


- Improve customer service and experience
- Review competitor performance

❖ Forecasting: A forecast of future events or behaviors based on historical


data can be created by analyzing processes that occur during a specific
period or season. For example:

- Energy demands for a city with a static population in any given month
or quarter
- Retail sales for holiday merchandise, including biggest sales days for
both physical and digital stores
- Spikes in internet searches related to a specific recurring event, such
as the Super Bowl or the Olympics

❖ Predictive Analytics: Companies can create, deploy, and manage


predictive scoring models, proactively addressing events such as:

- Customer churn with specificity narrowed down to customer age


bracket, income level, lifetime of existing account, and availability of
promotions

16
BUSINESS ANALYTICS: OVERVIEW

- Equipment failure, especially in anticipated times of heavy use or if


subject to extraordinary temperature/humidity-related stressors
- Market trends including those taking place entirely online, as well as
patterns which may be seasonal or event-related

❖ Optimization

Companies can identify best-case scenarios and next best actions by


developing and engaging simulation techniques, including:

- Peak sales pricing and using demand spikes to scale production and
maintain a steady revenue flow
- Inventory stocking and shipping options that optimize delivery
schedules and customer satisfaction without sacrificing warehouse
space
- Prime opportunity windows for sales, promotions, new products, and
spin-offs to maximize profits and pave the way for future opportunities

❖ Data Visualization

Information and insights drawn from data can be presented with highly
interactive graphics to show:

- Exploratory data analysis


- Modeling output
- Statistical predictions

These data visualization components allow organizations to leverage their


data to inform and drive new goals for the business, increase revenues,
and improve consumer relations.

17
BUSINESS ANALYTICS: OVERVIEW

1.5 TYPES OF BUSINESS ANALYTICS

There are four types of business analytics, each increasingly complex and
closer to achieving real-time and future situation insight application. These
analytics types are usually implemented in stages, starting with the
simplest, though one type is not more important than another as all are
interrelated.

The following brief description provide insight into the roles of each type in
the analytics process. By leveraging these four types of analytics, big data
can be dissected, absorbed, and used to create solutions for many of the
biggest challenges facing businesses today.

18
BUSINESS ANALYTICS: OVERVIEW

1. Descriptive Analytics

Descriptive analytics describes or summarizes a business’s existing data to


get a picture of what has happened in the past or is happening currently. It
is the simplest form of analytics and employs data aggregation and mining
techniques. This type of business analytics applies descriptive statistics to
existing data to make it more accessible to members of an organization,
from investors and shareholders to marketing executives and sales
managers.

Descriptive analytics can help identify strengths and weaknesses and


provide insight into customer behavior. Strategies can then be developed
and deployed in the areas of targeted marketing and service improvement;
albeit at a more basic level than if more complex diagnostic procedures
were used. The most common physical product of descriptive analysis is a
report heavy with visual statistical aids.

Examples of descriptive analytics:

Many Learning Management Systems (LMS) platforms and learning


systems offer descriptive analytical reporting with the aim of help
businesses and institutions measure learner performance to ensure that
training goals and targets are met.

The findings from descriptive analytics can quickly identify areas that
require improvement - whether that be improving learner engagement or
the effectiveness of course delivery.

Here are some examples of how descriptive analytics is being used


in the field of learning analytics:

• Tracking course enrollments, course compliance rates,


• Recording which learning resources are accessed and how often
• Summarizing the number of times a learner posts in a discussion board
• Tracking assignment and assessment grades
• Comparing pre-test and post-test assessments
• Analyzing course completion rates by learner or by course

19
BUSINESS ANALYTICS: OVERVIEW

• Collating course survey results


• Identifying length of time that learners took to complete a course

2. Diagnostic Analytics

Diagnostic analytics shifts from the “what” of past and current events to
“how” and “why,” focusing on past performance to determine which factors
influence trends. This type of business analytics employs techniques such
as drill-down, data discovery, data mining, and correlations to uncover the
root causes of events.

Diagnostic analytics uses probabilities, likelihoods, and the distribution of


outcomes to understand why events may occur and employs techniques
including attribute importance, sensitivity analysis, and training algorithms
for classification and regression. However, diagnostic analysis has limited
ability to provide actionable insights, delivering correlation results as
opposed to confirmed causation. The most common physical product of
diagnostic analysis is a business dashboard.

The ideal Example of Diagnostic Analytics is Health care data analytics and
Real-Time Alerting.

In hospitals, Clinical Decision Support (CDS) software analyzes medical


data on the spot, providing health practitioners with advice as they make
prescriptive decisions. However, doctors want patients to stay away from
hospitals to avoid costly in-house treatments. Analytics, already trending
as one of the business intelligence buzzwords in 2019, has the potential to
become part of a new strategy. Wearable’s will collect patients’ health data
continuously and send this data to the cloud.

Additionally, this information will be accessed to the database on the state


of health of the general public, which will allow doctors to compare this
data in socioeconomic context and modify the delivery strategies
accordingly. Institutions and care managers will use sophisticated tools to
monitor this massive data stream and react every time the results will be
disturbing.

For example, if a patient’s blood pressure increases alarmingly, the system


will send an alert in real time to the doctor who will then take action to
reach the patient and administer measures to lower the pressure.

20
BUSINESS ANALYTICS: OVERVIEW

Another example is that of Asthma polis, which has started to use inhalers
with GPS-enabled trackers in order to identify asthma trends both on an
individual level and looking at larger populations. This data is being used in
conjunction with data from the CDC in order to develop better treatment
plans for asthmatics.

3. Predictive Analytics

Predictive analytics forecasts the possibility of future events using


statistical models and machine learning techniques. This type of business
analytics builds on descriptive analytics results to devise models that can
extrapolate the likelihood of select outcomes. Machine learning experts and
trained data scientists are typically employed to run predictive analysis
using learning algorithms and statistical models, enabling a higher level of
predictive accuracy than is achievable by business intelligence alone.

A common application of predictive analytics is sentiment analysis. Existing


text data can be collected from social media to provide a comprehensive
picture of opinions held by a user. This data can be analyzed to predict
their sentiment towards a new subject (positive, negative, neutral). The
most common physical product of predictive analysis is a detailed report
used to support complex forecasts in sales and marketing.

Each industry and sector puts predictive analytics to work in


different ways. We break them down by industry and use case as
examples.

• Retail: Probably the largest sector to use predictive analytics, retail is


always looking to improve its sales position and forge better relations
with customers. One of the most ubiquitous examples is Amazon’s
recommendations. When you make a purchase, it puts up a list of other
similar items that other buyers purchased. Much of this is in the pre-sale
area – with things like sales forecasting and market analysis, customer
segmentation, revisions to business models, aligning IT to business units,
managing inventory to account for seasonality, and finding best retail
locations. But it also acts post-sale, acting to reduce returns, get the
customer to come back and extend warranty sales.

21
BUSINESS ANALYTICS: OVERVIEW

• Health: One early attempt at this was Google Flu Trends (GFT). By
monitoring millions of users’ health tracking behaviors online and
comparing it to a historic baseline level of influenza activity for a
corresponding region, Google hoped to predict flu patterns. But its
numbers proved to be way overstated, owing to less than ideal
information from users. But there are other uses, such as predicting
epidemics or public health issues based on the probability of a person
suffering the same ailment again. Or predicting the chances of a person
with known illness ends up in Intensive Care due to changes in
environmental conditions. It can also predict when and why patients are
readmitted and when a patient needs behavioral health care as well.

• Sports: The most famous example is Bing Predicts, a prediction system


by Microsoft’s Bing search engine. It has scored in the 80 percentile for
singing contests like American Idol, the high 90s percentage in U.S.
House and Senate races, and went 15 for 15 in the 2014 World Cup. It
uses statistics and social media sentiment to make its assessments.

Another example is what’s known as “Money-ball,” based on a book


about how the Oakland Athletics baseball team used analytics and
evidence-based data to assemble a competitive team. It abandoned old
predictors of success, such as runs batted in, for overlooked ones, like
on-base. It took the Athletics to two consecutive playoffs.

• Weather: Weather forecasting has improved by leaps and bounds


thanks to predictive analytics models. Today’s five-day forecast is as
accurate as a one-day forecast from the 1980s. Forecasts as long as nine
to 10 days are now possible, and more important, 72-hour predictions of
hurricane tracks are more accurate than 24-hour forecasts from 40 years
ago. All of this is done thanks to satellites monitoring the land and
atmosphere. They feed that data into models that better represent our
atmospheric and physical systems.

• Insurance/Risk Assessment: Despite some awful disasters in 2017,


insurance firms lessened losses within risk tolerances, thanks to
predictive analytics. It helped them set competitive prices in
underwriting, analyze and estimate future losses, catch fraudulent
claims, plan marketing campaigns, and provide better insights into risk
selection.

22
BUSINESS ANALYTICS: OVERVIEW

• Financial modeling: Predictive modeling for financial services help


optimize the overall business strategy, revenue generation, resource
optimization, and generating sales. Automated financial services
analytics can allow firms to run thousands of models simultaneously and
deliver faster results than with traditional modeling.

It does this by analyzing strategic business investments, improve daily


operations, increase productivity, and predicting changes to the current
and future marketplace. The more common form of predictive analytics
in financial services is the credit scoring system used to approve or deny
loans, often within minutes.

• Energy: Analytics in power plants can reduce unexpected equipment


failures by predicting when a component might fail, thus helping reduce
maintenance costs and improve power availability.

Utilities can also predict when customers might get a high bill and send
out customer alerts to warn customers they are running up a large bill
that month. Smart meters allowed utilities to warn customers of spikes
at certain times of the day, helping them to know when to cut back on
power use.

• Social Media Analysis: Online social media is a fundamental shift of


how information is being produced, particularly as relates to businesses.
Tracking user comments on social media outlets enables companies to
gain immediate feedback and the chance to respond quickly. Nothing
makes a local business jump like a merchant respond like a bad review
on Amazon. This means collecting and sorting through massive amounts
of social media data and creating the right models to extract the useful
data.

• Alerting and Monitoring: This covers a wide range. Just in


transportation, modern automobiles have more than 100 sensors and
some are rapidly approaching 200 sensors. This gives a much more
accurate report than the old generic Check Engine light. Modern aircraft
have close to 6,000 sensors that generating more than 2TB of data per
day, which cannot be analyzed by human beings with any expedience.
Machine learning to recognize normal behavior as well as signs leading
up to failure can help predict a failure long before it happens.

23
BUSINESS ANALYTICS: OVERVIEW

Predictive analytics are needed to help sort what’s coming in to weed out
useless data and find what you need to take intelligent actions. In one
example, Cisco and Rockwell Automation helped a Japanese automation
equipment maker reduce down time of its manufacturing robots to near
zero by applying predictive analytics to operational data.

4. Prescriptive Analytics

Prescriptive analytics goes a step beyond predictive analytics, providing


recommendations for next best actions and allowing potential manipulation
of events to drive better outcomes. This type of business analytics is
capable of not only suggesting all favorable outcomes according to a
specified course of action, but recommending specific actions to deliver the
most desired result. Prescriptive analytics relies on a strong feedback
system and constant iterative analysis and testing to continually learn
more about the relationships between different actions and outcomes.

One of the most common uses of prescriptive analytics is the creation of


recommendation engines, which strive to match options to a consumer’s
real-time needs. The key to effective prescriptive analysis is the emergence
of deep learning and complex neural networks, which can micro-segment
data across multiple parameters and timelines simultaneously. The most
common physical product of prescriptive analysis is a focused
recommendation for next best actions, which can be applied to clearly
identified business goals.

These four different types of analytics may be implemented sequentially,


but there is no mandate. In many scenarios, organizations may jump
directly from descriptive to prescriptive analytics thanks to artificial
intelligence, which streamlines the process.

Example: A Training Manager uses predictive analysis to discover that


most learners without a particular skill will not complete the newly
launched course. What could be done? Now prescriptive analytics can be of
assistance on the matter and help determine options for action. Perhaps an
algorithm can detect the learners who require that new course, but lack
that particular skill, and send an automated recommendation that they
take an additional training resource to acquire the missing skill.

24
BUSINESS ANALYTICS: OVERVIEW

The accuracy of a generated decision or recommendation, however, is only


as good as the quality of data and the algorithmic models developed. What
may work for one company’s training needs may not make sense when put
into practice in another company’s training department. Models are
generally recommended to be tailored for each unique situation and need.

1.6 ANALYTICAL MODELS:

Analytical model are mathematical model that that have a closed form
solution, i.e. the solution to the equation used to describe changes in the
system can be expressed as mathematical analytic function.

An analytical model is simply a mathematical equation that describes the


relationship among variables in historical data set. The equation either
estimates or classifies data value. In essence, a model draws a line through
a set of data points that can be used to predict outcomes. The Analytical
Model consists of:

• Estimation Model: Algorithms that create analytical model (or


equations) come in all shapes and sizes. Classification algorithms such as
neural network, decision trees, clustering and logistic regression use a
variety of techniques to create formulas that segregate the data values in
to groups. Online retailers often use these algorithms to create target
market segments or determine which market products to recommend to
buyers based on their past and current purchase.

• Classification Models:

❖ Trusting Models: Some models are more opaque than others, i.e.it is
hard to understand the logic and model used to identify relevant patterns
and relationships in the data. The problem with these “Black Box” Models
is that business people often have a hard time trusting them until they
see quantitative results, such as reduced cost or higher revenue. Getting
business users to understand and trust the output of analytical models is
perhaps the biggest challenge in the data mining.

❖ Modelling process: Given the power of analytical mode, it is important


that analytical modellers take a disciplined approach. Analytical
modellers need to adhere to methodology to work productively and

25
BUSINESS ANALYTICS: OVERVIEW

generate accurate model. The modelling process consists of 6 distinct


tasks:

a. Define the project

b. Explore the data

c. Prepare the data

d. Create the Model

e. Deploy the model

f. Manage the model

Let us understand in brief these 6 tasks:

i. Project definition and objectives: It is the most critical task in the


process. Modellers that do not know explicitly what they are trying to
accomplish will not be able to create useful analytical model. Thus
before they start, good analytical modeller spend a lot of time on
defining objective, impact and scope.

The project objective consists of assumptions or hypothesis that model


will evaluate. Often, it helps brainstorm hypothesis and then prioritise
them based on the business requirements. Project impact defines the
model output( report or chart) , how the business will use the output and
projected return on investment. Project scope defines who, what, where,
when, why and how the project including time lines and staff
assignment.

ii. Data exploration: Data exploration / discovery involve sifting through


various sources of data to find the data set that best fit the project.
During this phase the analytical modeller will document each potential
data set with following items:

26
BUSINESS ANALYTICS: OVERVIEW

a. Access Method: source, system, data interface, machine formats,


access rights and data availability.

b. Data Characteristics: Field names , field length, content, format ,


granularity and statistics ( Count, mean , mode, etc)

c. Business rules: Referential integrity rules, defaults and other


business rules.

d. Data pollution: Data entry errors, misused fields and bogus data.

e. Data completeness: Empty and missing values and sparsely

f. Data consistency: labels and definition.

A data warehouse with well documented data can greatly accelerate the
data exploration phase because it also maintains mush of this
information.

iii. Data Preparation: One analytical modeller document and select the
data sets, then they must standardise and enrich the data. This means,
correcting any data errors that exist in the data and standardising in
machine format followed by merging and flattering that data in to single
wide table which may consists of hundreds of variables. After this,
analytical modellers transform the data

iv. Create Model or analytical modelling: Analytical modelling is as


much art as science. Much of the craft involves knowing what data sets
and variables to select and how to format and transform the data for
specific data model. Modeller may use the historical data also that has
enough of the “ answers” built-in it with a minimal amount of noise.
Finally, modeller must choose the right analytical technique and
algorithms or combinations of techniques to apply to given hypothesis.

27
BUSINESS ANALYTICS: OVERVIEW

• Analytical Models: summary table:

Task Use Techniques


Classification Assigned new records to a Logistic regression , decision
predefined class based on trees, natural network, link
features, used to predict an analysis
outcome. Yes/ no, High /
medium/ low
Forecasting Technique for predicting Linear regression, natural
numerical outcome networks
Prediction Uses estimation or Natural networks, decision
classification to predict the trees , Link Analysis, Genetic
future behaviour of values algorithms, Market Basket
analysis
Affinity Finds rule that defines which Market Basket analysis, Memory
grouping term go together, good for based Reasoning, Link Analysis
market basket, cross selling
and root cause analysis
Clustering Find natural groupings of Natural Networks, Decision
things that are more like each Trees, Cluster detection , Market
other than number of another Basket analysis, Memory based
cluster. reasoning

28
BUSINESS ANALYTICS: OVERVIEW

• Analytical Technique: Explained:

Technique Task Strength Consideration


Natural Networks Flexible, Mimics interactions Models are not easily
of Neurons in human brain explained at values, must be
can handle time based between 0-to-1 with no
inputs. Can model multiple nulls. Not great for
variables at once. categorical variables or lots
of variables
Decision Trees Models are easy to explain , Models can get “ bushy” with
good for categorical and sparse data and have to be
numerical data , good for “pruned”
creating the subset of fields
as input to another
techniques.
Memory based Find values that most Do not work with numeric
Reasoning resemble the variables to variables. Only categorical
make prediction , little variables. Does not work well
preparation, adapts to new with lots of variables
inputs without training, work
with text
Market Basket A form of clustering that Do not work with numeric
Analysis creates a rule about which variables. Only categorical
items are purchased variables.
together
Genetic Uses natural selection , tests Not for classification
Algorithms each prediction against each
other to determine the best
one
Clustering Undirected learning, find Not predictive
natural group, good way to
start

v. Deploy the Model: Model deployment takes many forms as explained


above and executive can simply look at the model, absorb its insights,
and use it to guide their strategic and operational planning. Models can
also be operationalized. Most basic way of operationalize the model is to
embed it in an operational report.

29
BUSINESS ANALYTICS: OVERVIEW

Model Management: Once a model is built and deployed, it must be


maintained. Models become obsolete over time as the market or
environment in which they operate changes. This particularly true for
volatile environment, such as customer marketing or risk management.

1.7 BUSINESS ANALYTICS, REPORTING AND


INTELLIGENCE:

Business analytics and business intelligence tools are being integrated with
the ERP (Enterprise Resource Planning) system to facilitate better, accurate
and quicker decision making. Companies have realised that to maximise
the value of information stored in their ERP system, it is necessary to
extend these ERP architecture to include more advanced reporting,
analytical and decision support capabilities. This is best accomplished
through the application of data warehousing, data mining and other
analysis, reporting and business intelligence tools and techniques
including:

• Requirement to integrate the information not contained in the ERP


system
• ERP data bases and files have been designed to optimise the
performance of getting data in to application and therefore, lack the
construct required for multi-dimensional analysis.
• Most ERP solutions lack the advance functionality of today’s reporting and
analytical tools and for those ERP systems that has the functionality,
companies are hesitant to use it because of the performance impact on
their operational system.

Many ERP solution providers are providing enhanced data extraction


functionality as the integral component of their analytic reporting
application solution. Further, there is explosion in number of companies
focussed on delivering the information extraction functionality optimised for
specific ERP solution. There are number of unique challenges associated
with this information integration. Such system can significantly extend the
values of ERP to the organisation, if implemented effectively.

30
BUSINESS ANALYTICS: OVERVIEW

1.8 FUTURE OF THE BUSINESS ANALYTICS AND EMERGING


TREND:

The future of analytics is its application on unstructured data as a weblog


data, and semantic web. Application of blog data on internet have already
been reported. There is an ample scope of bloggers and marketers using
analytics themselves and get insight for huge data available. There will be
convergence of structured and unstructured data analysis. If partner
companies can share private data automatically on a real-time basis,
analytics can be applied on it for getting insight. Through the web service,
companies can avail the business analytics via software as service. They
need not by the analytics software and worry about upgrading the same.
They will pay on per user basis. The analytics service provider provides the
data and analytic software online to the user’s computers software directly.
The user gets complete information from different source , both internally
and externally and can take the decision. The user need to be empowered
to analyse the data rather than taking help from any analytics expert. The
analytics model must be simple , easy to understand and subject to visual
interpretations. Companies have realised that business process can pay a
key differentiator. Demand for embedding analytics in business process will
increase. This will help automating operational decisions. The decisions can
be recorded, monitored, audited and reused. Organisations can address the
performance management and compliance issues. there will be transition
from offline business analytics to real-time analytics. The business process
that can take advantage of real time analytics are yield management, fraud
detection and personalisation. Corporate performance management,
business process management and business activities monitoring will be
more effective and efficient by using advanced analytics.

There would be emergence of more number of open source business


analytics tools. Currently vendor such as Pentaho, Jasper soft an actuates
BIRT are providing open source analytics tools. The war between Google
and Microsoft for dominating interest in the business analytics domain is
intense. Both companies are trying to leverage users online data analysis
potentials for getting insights. Microsoft offers analysis services as its
flagship business analytics products that utilises the power of excel. In
order to counteract Microsoft, Google offers Google analytics as free
software.

31
BUSINESS ANALYTICS: OVERVIEW

In future, there would be more use of predictive and destructive analytics


rather than just finding past trend and use of dash board. Visual analytics
is another trend in the field of analytics. There is mismatch between the
speed at which application data is generated, acquired and analysed. Visual
representation of complex dynamic information helps human in taking
decisions without any difficulty. The analytical reasoning is facilitated by
interactive visual interfaces. Visual analytics and use visualisation
technologies, data mining , statistics and other disciplines.

Analytics must be a part of overall systemin an organisation and hence to


be designed accordingly considering all related issues. The objectives of
many analytics are to develop customer centric applications. The
organisations should go beyond this and apply in its sales , marketing and
other key business process areas. A myth exists in companies that
analytics must lead to perfect decisions. It is not true. If appropriate
analytics are used and pitfalls of using analytics are avoided, analytics
would provide better decision.

1.9 FUTURE OF THE CONSULTING AND ANALYTICS:

Companies hire consultants to solve complex business challenge- the issue


that have become increasingly entrenched with manipulation of large
amount of data to draw a tangible insight. Firms wants answers to their
questions rooted in fact and hard numbers. It is no wonder that analytics
has become a critical part of the consulting industry and is poised to be the
most desirable skill consultants can have today’s market. Large consulting
firms are spending large amount of money hiring talents of broad range of
data science skills. Several years ago, data scientists may have been in
their limited roles and served primarily as analysts, solving specific data
problems in internal roles. Now firms are looking at consultants to serve
them with data sciences skill sets in a wide variety of consulting projects.
Thus the data scientists are now wearing a large number of hats in
consulting world.

The intersection between data science and consulting is growing due to two
major large scale transformation happening in today’s enterprise
information environment.

32
BUSINESS ANALYTICS: OVERVIEW

1.10 INTRODUCTION TO NEW ANALYTICAL ARCHITECTURE

With increase in volumes , the velocity of data generation , and the variety
of data more tools are available to deal with the “Big data” problem of
capturing and analysing large data to create value. Larger firms
specifically, are now faced with problem of integrating these new tools in to
their already complex information architecture environment. Traditional
relational data bases, such as oracle data bases, are no longer enough to
keep up with the new type of data that companies are looking to analyse,
such as Twitter feeds or call centre recordings. Data storage capabilities in
the form of Hadoop Distributed File System (HDFS) are becoming staple in
companies information architecture platform. Other competitors like
Teradata, are also becoming essential in firms data tool box.

On top of data storage tools new analytics tool are being deployed to draw
analytics insight with capabilities to extract and source the data from these
new data storage environment. Some tools focussed the capabilities that
performs the advanced analytics techniques , such as Python and R, while
other tools that provides quick and easy data manipulation in a more
intuitive way are also gaining tractions. Tools like Tableau and Spotfire are
providing ways for business analysts to gain visual insights in a wide
variety of graphs and infographics in a form of visual dashboard. They can
source from Bigdata storage environment, traditional data warehouses or
even simple text and excel files and are introducing more integration with
languages such as R to increase analytic capabilities. According to Tableau
Ä2013 report by Aberdeen group found that organisations that used the
visual discovery tools, 48 percentage of BI users can find the information
they need without the help of IT Staff. “

Without visual discovery, the rate drops to mere 23 per cent. These are
incredibly telling statistics because while data science is needed in the
market there are still many business analysts that do not necessarily have
development skills to analyse the data straight from the source.

These platforms allow analysts to analyse data interactively, allowing users


a better understanding of the finding over summary booklets of results
presented to in flat format. Analysts can quickly change what data they are
looking at, as different questions are answered by different data, as well as
change how they look at their data. Different views and charts of the same
data souse can be used to tell stories about the data, which allows viewer

33
BUSINESS ANALYTICS: OVERVIEW

to see a fluid relationship between different variables as they see fit. They
simply point and click on the visualisation they wish to use. Data scientist
can use analytic techniques to present the data in the format needed by
tool such as Tableau, which can then render the output in a clean visual
format for the end user. This becomes the part of inter woven information
environment where the drawn stream deliverables is greatly affected by
the data science work stream.

In many cases , companies invest in the tools above to keep up with the
place of data growth and invest in the future of the data capacity and
analysis. These tools are new and subject matter experts are needed not
only to develop relevant apps with these tools but also build out strategic
plans on how to deploy the tools across their information environment.
Consultants are brought in for both types of information architecture
projects for their knowledge of the Big Data architecture and development
capabilities.

Consultant data scientists are hired not for their knowledge of writing the
code for these tools , but for their understanding of the process flow of the
system and their use in the cases, and plans for deployments. The data
scientists doing the grunt work ( coders) are, also usually best people to
help create time lines for the project plans and strategic proposals on why
and how a firm would use big data tools to get the most value out of them.

1.11 GROWTH OF ADVANCED ANALYTICAL TECHNIQUES:

Analytics techniques are growing in the big data environment due to


several key reasons. Firstly, because companies can now capture and store
new type of data , there is more new data to analyse. Some of the most
important and hot data analytics currently enabled with Big Data are:

• Customer analytics
• Marketing analytics
• Web analytics
• Text and speech to text analytics
• Pricing and sales analytics
• Workforce (Human resources) analytics

34
BUSINESS ANALYTICS: OVERVIEW

Due to way the data is used for these types of analytics which is
aggregated and stored, nuanced way to track, measure and draw value
from the data are needed. As a result, they are being introduced rapidly in
to today’s market.

Secondly, further academic research is bringing new types of predictive


modelling techniques to forefront that have recently been developed and
are becoming easier to implement with today’s technology, e.g. machine
learning algorithms. Machine learning explores “the construction and study
of algorithms that can learn from and make predictions on data”. Examples
of machine learning technique include neural network, clustering analysis
and decision trees.

Consultants are in an ever evolving business where new tools and projects
come to forefront. Data science is now not just a role for an internal
analysts and coder, companies are investing more time and resources in to
getting these skill set in their consulting practices to help their clients.

One would be hard pressed to find areal world, large scale consulting
projects that do not require some analytics work to be done to propose
tangible solutions based on factual data insight. While analytics is indeed a
very broad term as it is used currently in the consulting market, with a way
to consulting market is evolving , there will be much larger demand for the
data science variety of analytics and smaller focus on solely strategic
projects. New products will come to the market to meet the client’s
particular needs ever more exactly- and data science will be interwoven
tightly with consulting. Consulting firms will compete against each other to
grasp the influx and analytics project that are going to be in the market
and hire the data science talent that can fulfil this demand. The tools and
techniques for consultant data scientists are ever growing and today is the
best time to get in to data science market as consultant and grow with it.

35
BUSINESS ANALYTICS: OVERVIEW

1.12 BUSINESS ANALYTICS EXAMPLES

When it comes to business analytics, success often depends on whether or


not all parties of an organization fully support adoption and execution.
Successful BA examples—and subsequent deployment of new predictive-
based initiatives—include:

• Predictive Maintenance: Shell

Royal Dutch Shell PLC recently implemented predictive maintenance driven


by artificial intelligence to cut down on time lost to machine failure. The AI-
powered tools predict when maintenance is needed on compressors,
valves, and other equipment, can autonomously analyze data to help steer
drill bits through shale deposits, and will soon be able to identify and alert
station employees of dangerous behavior by customers, reducing risks
from the drilling platform to the gas pump.

The systems can anticipate when and where more than 3,000 different oil
drilling machine parts might fail, keep Shell informed about the location of
parts at their worldwide facilities, and plan when to make purchases of
machine parts. These systems also determine where to place inventory
items and how long to keep parts before putting them into rotation or
replacing/returning them. Shell has since reduced inventory analysis from
over 48 hours to less than 45 minutes, saving millions of dollars each year
thanks to reduced costs of moving and reallocating inventory.

• Predictive Deliveries: Pitt Ohio

Pitt Ohio, a $700 million freight company, was significantly impacted by


Amazon’s same-day delivery initiative, which ramped up customer
expectations. Customers also became more demanding, requesting up-to-
the-minute tracking and estimated times of delivery that were much
narrower than formerly acceptable windows. The company turned to data
analysis to find a way to improve customer experiences.

A cross-departmental project involving market research, sales operations,


and IT was launched internally, leveraging data that was previously
unused. The historical data, predictive analytics, and algorithms that
calculated freight weight, driving distance, and several other factors in
real-time allowed Pitt Ohio to estimate delivery times at a 99 percent

36
BUSINESS ANALYTICS: OVERVIEW

accuracy rate. The company estimates that repeat orders increased its
revenue by $50,000 per year, and customer churn reduction equaled
retained revenues of $60,000 per year.

• Predictive Banking: Axis Bank

Axis Bank, the third-largest private sector bank in India, implemented


robotics process automation and deep learning to identify customer
behavioral patterns and recommend next best actions to prevent customer
churn, including streamlining document processing, identifying “events”
when customers were more likely to leave, and preemptively offering
special promotions targeted to those segmented audiences to prevent
churn.

For better customer experience, 125 “customer journeys” were identified,


analyzed, and retooled, and time spent verifying customer-provided data
across multiple documents in the back office dropped from 15 minutes to
2–3 minutes. Axis is now developing a “chat-bot” to speed customer
interactions and reduce wait times for service at busy branches and during
peak interface times.

37
BUSINESS ANALYTICS: OVERVIEW

1.13 SUMMARY

The word analytics has come into the foreground in last decade or so. The
proliferation of the internet and information technology has made analytics
very relevant in the current age. Analytics is a field which combines data,
information technology, statistical analysis, quantitative methods and
computer-based models into one. This all are combined to provide decision
makers all the possible scenarios to make a well thought and researched
decision. The computer-based model ensures that decision makers are able
to see performance of decision under various scenarios.

Business analytics has a wide range of application from customer


relationship management, financial management, and marketing, supply-
chain management, human-resource management, pricing and even in
sports through team game strategies.

There is Importance to business analytics in present word due to:

• Business analytics is a methodology or tool to make a sound commercial


decision. Hence it impacts functioning of the whole organization.
Therefore, business analytics can help improve profitability of the
business, increase market share and revenue and provide better return
to a shareholder.

• Facilitates better understanding of available primary and secondary data,


which again affect operational efficiency of several departments.

• Provides a competitive advantage to companies. In this digital age flow


of information is almost equal to all the players. It is how this information
is utilized makes the company competitive. Business analytics combines
available data with various well thought models to improve business
decisions.

• Converts available data into valuable information. This information can


be presented in any required format, comfortable to the decision maker.

Business analytics has a wide range of application and usages. It can be


used for descriptive analysis in which data is utilized to understand past
and present situation. This kind of descriptive analysis is used to asses’
current market position of the company and effectiveness of previous

38
BUSINESS ANALYTICS: OVERVIEW

business decision. It is used for predictive analysis, which is typical used to


asses’ previous business performance. Business analytics is also used for
prescriptive analysis, which is utilized to formulate optimization techniques
for stronger business performance. For example, business analytics is used
to determine pricing of various products in a departmental store based past
and present set of information.

Business analytics uses data from three sources for construction of the
business model. It uses business data such as annual reports, financial
ratios, marketing research, etc. It uses the database which contains
various computer files and information coming from data analysis.

There are four types of business analytics, each increasingly complex and
closer to achieving real-time and future situation insight application. These
analytics types are usually implemented in stages, starting with the
simplest, though one type is not more important than another as all are
interrelated.

Similarly, there are different types of techniques of data analytics briefed in


this chapter. In addition, there are several models of business analytics
important ones are explained brief in this chapter. There are four types of
business analytics, each increasingly complex and closer to achieving real-
time and future situation insight application. These analytics types are
usually implemented in stages, starting with the simplest, though one type
is not more important than another as all are interrelated

In respect of future of business analytics, it is said that the future of


analytics is its application on unstructured data as a weblog data, and
semantic web. Application. of blog data on internet have already been
reported. There is an ample scope of bloggers and marketers using
analytics themselves and get insight for huge data available. There will be
convergence of structured and unstructured data analysis. If partner
companies can share private data automatically on a real-time basis,
analytics can be applied on it for getting insight. Through the web service,
companies can avail the business analytics via software as service.

39
BUSINESS ANALYTICS: OVERVIEW

With increase in volumes , the velocity of data generation , and the variety
of data more tools are available to deal with the “Big data” problem of
capturing and analysing large data to create value. Larger firms
specifically, are now faced with problem of integrating these new tools in to
their already complex information architecture environment. Traditional
relational data bases, such as oracle data bases, are no longer enough to
keep up with the new type of data that companies are looking to analyse,
such as Twitter feeds or call centre recordings. Data storage capabilities in
the form of Hadoop distributed file system (HDFS) are becoming staple in
companies information architecture platform. Other competitors like
Teradata , are also becoming essential in firms data tool box.

In the following chapters, you can see more insight on various topics of
business analytics.

1.14 SELF ASSESSMENT QUESTIONS:

1. What is business analytics? Explain.

2. What are the different components of business analytics? Describe.

3. Write short note on.- Business analytical models

4. Explain: future of business analytics and its emerging trends

5. Write short notes on: Growth of advanced analytical techniques

40
BUSINESS ANALYTICS: OVERVIEW

1.15 MULTIPLE CHOICE QUESTIONS:

1. To reach the decision-making insights, organization must use


------------------------to connect the data from multiple sources, analyze
the data and communicate the results.
a. Business analytics tools and techniques
b. Business intelligence knowledge
c. Technical expert
d. Only Business analytics techniques

2. Descriptive analytics describes or summarizes a business’s existing data


to get a picture of -------------
a. what has happened in the past.
b. what is happening currently
c. what has happened in the past or is happening currently
d. What will happen in future.

3. To achieve clarification statistical techniques like classification,


regression and clustering are used in process of --------------------
a. Sequencing
b. Datamining
c. Visualisation
d. Optimisation

4. Which type of analytics is providing recommendations for next best


actions and allowing potential manipulation of events to drive better
outcomes in business analytics?
a. Descriptive analytics
b. Diagnostic analytics
c. Predictive analytics
d. Prescriptive analytics

5. What type of tools are being integrated with the ERP (Enterprise
Resource Planning) system to facilitate better, accurate and quicker
decision making?
a. Business intelligence
b. Business analytics
c. Both business analytics and business intelligence
d. Enhanced data extraction functionality

41
BUSINESS ANALYTICS: OVERVIEW

Answers: 1.(a), 2.(c), 3.(b), 4. (d), 5.(c)

42
BUSINESS ANALYTICS: OVERVIEW

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture - Part 1

Video Lecture - Part 2

Video Lecture - Part 3

43
COMPONENTS OF BUSINESS ANALYTICS

Chapter 2
Components of Business Analytics
Objectives:

On completion of this chapter, you will understand overall concept of


Business Analytics including need of Business Analytics, its components,
types and techniques etc. considering following:

Structure:

2.1 Introduction

2.2 Data aggregation

2.3 Data mining

2.4 Association and Sequence Identification

2.5 Text Mining

2.6 Forecasting

2.7 Predictive analytics

2.8 Optimisation

2.9 Visualisation

2.10 Summary

2.11 Self Assessment Questions

2.12 Multiple Choice Questions

44
COMPONENTS OF BUSINESS ANALYTICS

2.1 INTRODUCTION

Business Analytics have been used in business since the management


exercises were put into place in the late 19th century. At that time Analysts
measured the time of each component in their newly established assembly
line. But analytics began to command more attention in the late 1960s
when computers were used in decision support systems. Since then,
analytics have changed and formed with the development of enterprise
resource planning (ERP) systems, data warehouses, and a large number of
other software tools and processes.

In later years the business analytics have exploded with the introduction to
computers. This change has brought analytics to a whole new level and has
brought about endless possibilities.

Business analytics depends on sufficient volumes of high quality data. The


difficulty in ensuring data quality is integrating and reconciling data across
different systems, and then deciding what subsets of data to make
available.

Earlier, analytics was considered as a type of after-the-fact method of


forecasting consumer behaviour by examining the number of units sold in
the last quarter or the last year. This type of data warehousing required a
lot more storage space than it did speed. Now business analytics is
becoming a tool that can influence the outcome of customer interactions.
When a specific customer type is considering a purchase, an analytics-
enabled enterprise can modify the sales pitch to appeal to that consumer.
This means the storage space for all that data must react extremely fast to
provide the necessary data in real-time.

There are multiple definitions available but as our focus is on Simplified-


Analytics, the one below will help you understand better. Business Analytics
is the use of statistical tools & technologies to:

• Find patterns in your data for further analysis e.g. product association
• Find out outliers from the huge data points e.g. fraud detection
• Identify relationships within the key data variables for further prediction
e.g. next likely purchase from the Customer

45
COMPONENTS OF BUSINESS ANALYTICS

• Provide insights as to what will happen next e.g. which of the Customers
are leaving us
• Gain the competitive advantage.

So a more detailed comparison of Business Analytics with Business


Intelligence will help you understand better.

Business Intelligence Business Analytics


What does it do?
Reports on what happened in the past Investigate why it happened & predict
or what is happening in now, in current what may happen in future.
time.
How is it achieved?
• Basic querying and reporting • Applying statistical and mathematical
• OLAP cubes, slice and dice, drill- techniques
down • Identifying relationships between key
• Interactive display options – data variables
Dashboards, Scorecards, Charts, • Reveal hidden patterns in data
graphs, alerts
What does your business gain?
• Dashboards with “how are we doing” • Response to “what do we do next?”
information • Proactive and planned solutions for
• Standard reports and pre-set KPIs unknown circumstances
• Alert mechanisms when something • The ability to adapt and respond to
goes wrong changes and challenges

Now that you know the difference between BI & BA, let us discuss the
typical components in Analytics. Following are major components or
categories in any analytics solution.

46
COMPONENTS OF BUSINESS ANALYTICS

2.2 DATA AGGREGATION

Before data can be analysed, it must be collected, centralized, and cleaned


to avoid duplication, and filtered to remove inaccurate, incomplete, and
unusable data. Data can be aggregated from:

• Transactional records: Records that are part of a large dataset shared


by an organization or by an authorized third party (banking records,
sales records, and shipping records).

• Volunteered data: Data supplied via a paper or digital form that is


shared by the consumer directly or by an authorized third party (usually
personal information).

Therefore, Data aggregation is the process of gathering data and


presenting it in a summarized format. The data may be gathered from
multiple data sources with the intent of combining these data sources into
a summary for data analysis. This is a crucial step, since the accuracy of
insights from data analysis depends heavily on the amount and quality of
data used. It is important to gather high-quality accurate data and a large
enough amount to create relevant results. Data aggregation is useful for
everything from finance or business strategy decisions to product, pricing,
operations, and marketing strategies.

Example of aggregate data: Here is an example of aggregate data


in business:

Companies often collect data on their online customers and website


visitors. The aggregate data would include statistics on
customer demographic and behaviour metrics, such as average age or
number of transactions. This aggregated data can be used by the
marketing team to personalize messaging, offers, and more in the user’s
digital experience with the brand. It can also be used by the product team
to learn which products are successful and which are not. And furthermore,
the data can also be used by company executives and finance teams to
help them choose how to allocate budget towards marketing or product
development strategies.

47
COMPONENTS OF BUSINESS ANALYTICS

• Data aggregation in the financial and investing sectors

Finance and investment firms are increasingly basing their


recommendations on alternative data. A large portion of that data comes
from the news, since investors need to stay up-to-date on industry and
company financial trends. So, financial firms can use data aggregation to
gather headlines and article copy and use that data for predictive analytics,
to find trends, events, and shifting views that could affect the finances of
the companies and products they are tracking.

This market information is available on news websites for free, but it is


spread across hundreds of websites. Combing through each individual
website manually is time-consuming and may produce unreliable datasets
due to missing data. We’ll talk more about how financial and
investment firms can speed up the process in this use case at the end of
this post.

• Data aggregation in the retail industry

The retail and ecommerce industries have many possible uses for data
aggregation. One is competitive price monitoring. Competitive research is
necessary to be successful in the ecommerce and retail space. Companies
have to know what they’re up against. So, they must always be gathering
new information about their competitors’ product offerings, promotions,
and prices. This data can be pulled from competitor’s websites or from
other sites their products are listed on. In order to get accurate
information, the data needs to be aggregated from every single relevant
source. That’s a tall order for manual web data analysis.

Another way retail and ecommerce companies use data aggregation is to


gather images and product descriptions to use on their site. These often
come from manufacturers, and it is much easier to reuse the already-
existing images and descriptions from them than to craft your own.
Manually gathering product listings or competitor prices is time consuming
and makes it almost impossible to make sure it is constantly up-to-date.

48
COMPONENTS OF BUSINESS ANALYTICS

• Data aggregation in the travel industry

Data aggregation can be used for a wide range of purposes in the travel
industry. These include competitive price monitoring, competitor research,
gaining market intelligence, customer sentiment analysis, and capturing
images and descriptions for the services on their online travel sites.
Competition in the online travel industry is fierce, so data aggregation or
the lack thereof can make or break a travel company.

Travel companies need to keep up with the ever-changing travel costs and
property availability. They also need to know which destinations are
trending and which audiences they should target with their travel offers.
The data needed to gain these insights is spread across many places on the
internet, making it difficult to gather manually. That’s where our data
extraction and aggregation service, Web Data Integration, comes in.

• Data Aggregation with Web Data Integration

Web Data Integration (WDI) is a solution to the time-consuming nature of


web data mining. WDI can extract data from any website your organization
needs to reach. Applied to the use cases previously discussed or to any
field, Web Data Integration can cut the time it takes to aggregate data
down to minutes and increase accuracy by eradicating human error in the
data aggregation process. This allows companies to get the data they
need, when they need it, from wherever they need it. All with built-in
quality control to ensure accuracy.

WDI not only extracts and aggregates the data you need, it
also prepares and cleans the data and delivers it in a consumable format
for integration, discovery and analysis. So, if company needs accurate, up-
to-date data from the web, Web Data Integration is right choice for
company.

49
COMPONENTS OF BUSINESS ANALYTICS

2.3 DATA MINING

Data mining is looking for hidden, valid, and potentially useful patterns in
huge data sets. Data Mining is all about discovering unsuspected/
previously unknown relationships amongst the data. It is a multi-
disciplinary skill that uses machine learning, statistics, and database
technology. The insights derived via Data Mining can be used for
marketing, fraud detection, and scientific discovery, etc.

Data mining is also called as Knowledge discovery, Knowledge extraction,


data/pattern analysis, information harvesting, etc.

Types of Data

Data mining can be performed on following types of data


• Relational databases
• Data warehouses
• Advanced DB and information repositories
• Object-oriented and object-relational databases
• Transactional and Spatial databases
• Heterogeneous and legacy databases
• Multimedia and streaming database
• Text databases
• Text mining and Web mining

Data Mining Implementation Process

Let's look in to Data Mining implementation process in detail

50
COMPONENTS OF BUSINESS ANALYTICS

I. Business understanding:

In this phase, business and data-mining goals are established.


• First, you need to understand business and client objectives. You need to
define what your client wants (which many times even they do not know
themselves)
• Take stock of the current data mining scenario. Factor in resources,
assumption, constraints, and other significant factors into your
assessment.
• Using business objectives and current scenario, define your data mining
goals.
• A good data mining plan is very detailed and should be developed to
accomplish both business and data mining goals.

II.Data understanding:

In this phase, sanity check on data is performed to check whether it’s


appropriate for the data mining goals.
• First, data is collected from multiple data sources available in the
organization.
• These data sources may include multiple databases, flat filer or data
cubes. There are issues like object matching and schema integration
which can arise during Data Integration process. It is a quite complex
and tricky process as data from various sources unlikely to match easily.
For example, table A contains an entity named cust-No whereas another
table B contains an entity named cust-id.
• Therefore, it is quite difficult to ensure that both of these given objects
refer to the same value or not. Here, Metadata should be used to reduce
errors in the data integration process.
• Next, the step is to search for properties of acquired data. A good way to
explore the data is to answer the data mining questions (decided in
business phase) using the query, reporting, and visualization tools.
• Based on the results of query, the data quality should be ascertained.
Missing data if any should be acquired.

51
COMPONENTS OF BUSINESS ANALYTICS

III.Data preparation:

In this phase, data is made production ready.


• The data preparation process consumes about 90% of the time of the
project.
• The data from different sources should be selected, cleaned,
transformed, formatted, anonymized, and constructed (if required).
• Data cleaning is a process to "clean" the data by smoothing noisy data
and filling in missing values.

For example, for a customer demographics profile, age data is missing. The
data is incomplete and should be filled. In some cases, there could be data
outliers. For instance, age has a value 300. Data could be inconsistent. For
instance, name of the customer is different in different tables.

Data transformation operations change the data to make it useful in data


mining. Following transformation can be applied

IV.Data transformation:

Data transformation operations would contribute toward the success of the


mining process.
• Smoothing: It helps to remove noise from the data.
• Aggregation: Summary or aggregation operations are applied to the
data. i.e., the weekly sales data is aggregated to calculate the monthly
and yearly total.
• Generalization: In this step, Low-level data is replaced by higher-level
concepts with the help of concept hierarchies. For example, the city is
replaced by the county.
• Normalization: Normalization performed when the attribute data are
scaled up o scaled down. Example: Data should fall in the range -2.0 to
2.0 post-normalization.
• Attribute construction: these attributes are constructed and included
the given set of attributes helpful for data mining.

52
COMPONENTS OF BUSINESS ANALYTICS

The result of this process is a final data set that can be used in modelling.

V. Modelling

In this phase, mathematical models are used to determine data patterns.


• Based on the business objectives, suitable modelling techniques should
be selected for the prepared dataset.
• Create a scenario to test check the quality and validity of the model.
• Run the model on the prepared dataset.
• Results should be assessed by all stakeholders to make sure that model
can meet data mining objectives.

VI.Evaluation:

In this phase, patterns identified are evaluated against the business


objectives.
• Results generated by the data mining model should be evaluated against
the business objectives.
• Gaining business understanding is an iterative process. In fact, while
understanding, new business requirements may be raised because of
data mining.
• A go or no-go decision is taken to move the model in the deployment
phase.

53
COMPONENTS OF BUSINESS ANALYTICS

VII.Deployment:

In the deployment phase, you ship your data mining discoveries to


everyday business operations.

• The knowledge or information discovered during data mining process


should be made easy to understand for non-technical stakeholders.
• A detailed deployment plan, for shipping, maintenance, and monitoring
of data mining discoveries is created.
• A final project report is created with lessons learned and key experiences
during the project. This helps to improve the organization's business
policy.

2.3.1 Data Mining Techniques:

1. Classification:

This analysis is used to retrieve important and relevant information about


data, and metadata. This data mining method helps to classify data in
different classes.

2. Clustering:

Clustering analysis is a data mining technique to identify data that are like
each other. This process helps to understand the differences and
similarities between the data.

3. Regression:

Regression analysis is the data mining method of identifying and analysing


the relationship between variables. It is used to identify the likelihood of a
specific variable, given the presence of other variables.

4. Association Rules:

This data mining technique helps to find the association between two or
more Items. It discovers a hidden pattern in the data set.

54
COMPONENTS OF BUSINESS ANALYTICS

5. Outer detection:

This type of data mining technique refers to observation of data items in


the dataset which do not match an expected pattern or expected
behaviour. This technique can be used in a variety of domains, such as
intrusion, detection, fraud or fault detection, etc. Outer detection is also
called Outlier Analysis or Outlier mining.

6. Sequential Patterns:

This data mining technique helps to discover or identify similar patterns or


trends in transaction data for certain period.

7. Prediction:

Prediction has used a combination of the other data mining techniques like
trends, sequential patterns, clustering, classification, etc. It analyses past
events or instances in a right sequence for predicting a future event.

2.3.2 Challenges of Implementation of Data mine:


• Skilled Experts are needed to formulate the data mining queries.
• Over fitting: Due to small size training database, a model may not fit
future states.
• Data mining needs large databases which sometimes are difficult to
manage
• Business practices may need to be modified to determine to use the
information uncovered.
• If the data set is not diverse, data mining results may not be accurate.
• Integration information needed from heterogeneous databases and
global information systems could be complex

55
COMPONENTS OF BUSINESS ANALYTICS

2.3.3 Data mining Examples:

Example 1:

Consider a marketing head of telecom services provider who wants to


increase revenues of long distance services. For high ROI on his sales and
marketing efforts customer profiling is important. He has a vast data pool
of customer information like age, gender, income, credit history, etc. But
it’s impossible to determine characteristics of people who prefer long
distance calls with manual analysis. Using data mining techniques, he may
uncover patterns between high long distance call users and their
characteristics.

For example, he might learn that his best customers are married females
between the age of 45 and 54 who make more than $80,000 per year.
Marketing efforts can be targeted to such demographic.

Example 2:

A bank wants to search new ways to increase revenues from its credit card
operations. They want to check whether usage would double if fees were
halved.

Bank has multiple years of record on average credit card balances,


payment amounts, credit limit usage, and other key parameters. They
create a model to check the impact of the proposed new business policy.
The data results show that cutting fees in half for a targeted customer base
could increase revenues by $10 million.

56
COMPONENTS OF BUSINESS ANALYTICS

Following are some more industry specific examples of data


mining:

• Marketing. Data mining is used to explore increasingly large databases


and to improve market segmentation. By analysing the relationships
between parameters such as customer age, gender, tastes, etc., it is
possible to guess their behaviour in order to direct personalised loyalty
campaigns. Data mining in marketing also predicts which users are
likely to unsubscribe from a service, what interests them based on
their searches, or what a mailing list should include to achieve a higher
response rate.

• Retail. Supermarkets, for example, use joint purchasing patterns to


identify product associations and decide how to place them in the aisles
and on the shelves. Data mining also detects which offers are most
valued by customers or increase sales at the checkout queue.

• Banking. Banks use data mining to better understand market risks. It is


commonly applied to credit ratings and to intelligent anti-fraud
systems to analyse transactions, card transactions, purchasing patterns
and customer financial data. Data mining also allows banks to learn
more about our online preferences or habits to optimise the return
on their marketing campaigns, study the performance of sales channels
or manage regulatory compliance obligations.

• Medicine. Data mining enables more accurate diagnostics. Having all of


the patient's information, such as medical records, physical
examinations, and treatment patterns, allows more effective treatments
to be prescribed. It also enables more effective, efficient and cost-
effective management of health resources by identifying risks,
predicting illnesses in certain segments of the population or forecasting
the length of hospital admission. Detecting fraud and irregularities, and
strengthening ties with patients with an enhanced knowledge of their
needs are also advantages of using data mining in medicine.

• Television and radio. There are networks that apply real time data
mining to measure their online television (IPTV) and
radio audiences. These systems collect and analyse, on the
fly, anonymous information from channel views, broadcasts and
programming. Data mining allows networks to make personalised

57
COMPONENTS OF BUSINESS ANALYTICS

recommendations to radio listeners and TV viewers, as well as get to


know their interests and activities in real time and better understand
their behaviour. Networks also gain valuable knowledge for their
advertisers, who use this data to target their potential customers more
accurately.

2.3.4 Data Mining Tools

Following are 2 popular Data Mining Tools widely used in Industry

1) R-language:

R language is an open source tool for statistical computing and graphics. R


has a wide variety of statistical, classical statistical tests, time-series
analysis, classification and graphical techniques. It offers effective data
handling and storage facility.

2) Oracle Data Mining:

Oracle Data mining popularly knowns as ODM is a module of the Oracle


Advanced Analytics Database. This Data mining tool allows data analysts to
generate detailed insights and makes predictions. It helps predict customer
behaviour, develops customer profiles, identifies cross-selling
opportunities.

2.3.5 Benefits of Data Mining:


• Data mining technique helps companies to get knowledge-based
information.
• Data mining helps organizations to make the profitable adjustments in
operation and production.
• The data mining is a cost-effective and efficient solution compared to
other statistical data applications.
• Data mining helps with the decision-making process.
• Facilitates automated prediction of trends and behaviours as well as
automated discovery of hidden patterns.
• It can be implemented in new systems as well as existing platforms

58
COMPONENTS OF BUSINESS ANALYTICS

• It is the speedy process which makes it easy for the users to analyse
huge amount of data in less time.

2.3.6 Disadvantages of Data Mining


• There are chances of companies may sell useful information of their
customers to other companies for money. For example, American
Express has sold credit card purchases of their customers to the other
companies.
• Many data mining analytics software is difficult to operate and requires
advance training to work on.
• Different data mining tools work in different manners due to different
algorithms employed in their design. Therefore, the selection of correct
data mining tool is a very difficult task.
• The data mining techniques are not accurate, and so it can cause serious
consequences in certain conditions.

2.3.7 Data Mining Applications

Application Uses

Communications Data mining techniques are used in communication sector to predict


customer behaviour to offer highly targeted and relevant
campaigns.
Insurance Data mining helps insurance companies to price their products
profitable and promote new offers to their new or existing
customers.
Education Data mining benefits educators to access student data, predict
achievement levels and find students or groups of students which
need extra attention. For example, students who are weak in maths
subject.
Manufacturing With the help of Data Mining Manufacturers can predict wear and
tear of production assets. They can anticipate maintenance which
helps them reduce them to minimize downtime.
Banking Data mining helps finance sector to get a view of market risks and
manage regulatory compliance. It helps banks to identify probable
defaulters to decide whether to issue credit cards, loans, etc.

59
COMPONENTS OF BUSINESS ANALYTICS

Retail Data Mining techniques help retail malls and grocery stores identify
and arrange most sellable items in the most attentive positions. It
helps store owners to comes up with the offer which encourages
customers to increase their spending.
Service Providers Service providers like mobile phone and utility industries use Data
Mining to predict the reasons when a customer leaves their
company. They analyse billing details, customer service
interactions, complaints made to the company to assign each
customer a probability score and offers incentives.
E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-
sells through their websites. One of the most famous names is
Amazon, who use Data mining techniques to get more customers
into their eCommerce store.
Super Markets Data Mining allows supermarket's develop rules to predict if their
shoppers were likely to be expecting. By evaluating their buying
pattern, they could find woman customers who are most likely
pregnant. They can start targeting products like baby powder, baby
shop, diapers and so on.
Crime Data Mining helps crime investigation agencies to deploy police
Investigation workforce (where is a crime most likely to happen and when?), who
to search at a border crossing etc.
Bioinformatics Data Mining helps to mine biological data from massive datasets
gathered in biology and medicine.

2.4 ASSOCIATION AND SEQUENCE IDENTIFICATION

In many cases, consumers perform similar actions at the same time or


perform predictable actions sequentially. This data can reveal patterns such
as:

• Association: For example, two different items frequently being


purchased in the same transaction, such as multiple books in a series or
a toothbrush and toothpaste.

• Sequencing: For example, a consumer requesting a credit report


followed by asking for a loan or booking an airline ticket, followed by
booking a hotel room or reserving a car.

60
COMPONENTS OF BUSINESS ANALYTICS

Association rules are an important class of regularities in data. Mining of


association rules is a fundamental data mining task. It is perhaps the most
important model invented and extensively studied by the database and
data mining community. Its objective is to find all co-occurrence
relationships, called associations, among data items. Many efficient
algorithms, extensions and applications have been reported. The classic
application of association rule mining is the market basket data analysis,
which aims to discover how items purchased by customers in a
supermarket (or a store) are associated.

An example association rule is

Cheese → Beer [support = 10%, confidence = 80%]

The rule says that 10% customers buy Cheese and Beer together, and
those who buy Cheese also buy Beer 80% of the time.

Support and confidence are two measures of rule strength.

Association rule mining, however, does not consider the sequence in which
the items are purchased. Sequential pattern mining takes care of that. An
example of a sequential pattern is “5% of customers buy bed first, then
mattress and then Sequential Rule Mining is a data mining technique which
consists of discovering rules in sequences. Sequential Rule Mining has
many applications for example for analysing the behaviour of customers in
supermarkets or users on a website or passengers at an airport.

Discovering sequential patterns in sequences

An important data mining problem is to design algorithm. To understand


the sequences of the patterns in activities pattern data set
for discovering hidden patterns in sequences, we have to implement a
bunch of Sequence Rule Mining algorithms and Pattern Mining techniques.
There have been a lot of research on this topic in the field of data mining
and various algorithms have been proposed.

61
COMPONENTS OF BUSINESS ANALYTICS

A sequential pattern is a subsequence that appear in several sequences of


a dataset. For example, the sequential pattern <{a}{c}{e}> appears in
the two first sequences of our dataset. This pattern is quite interesting. It
indicates that customers who bought {a}, often bought {c} after, followed
by buying {e}.

Such a pattern is said to have a Support of two sequences because it


appears in two sequences from the dataset. Several algorithms have been
proposed for finding all sequential patterns in a dataset such
as Apriori, SPADE, Prefix Span and GSP. These algorithms take as input
a sequence dataset and a minimum support threshold (min-sup). Then,
they will output all sequential patterns having a support no less than min-
sup. Those patterns are said to be the frequent sequential patterns.

62
COMPONENTS OF BUSINESS ANALYTICS

Association Analysis

There are a couple of terms used in association analysis that are important
to understand. Association rules are normally written like this: {Diapers}
-> {Beer} which means that there is a strong relationship between
customers that purchased diapers and also purchased beer in the same
transaction. In the above example, the {Diaper} is the antecedent and the
{Beer} is the consequent. Both antecedents and consequents can have
multiple items. In other words, {Diaper, Gum} -> {Beer, Chips} is a valid
rule.

Support is the relative frequency that the rules show up. In many
instances, you may want to look for high support in order to make sure it is
a useful relationship. However, there may be instances where a low support
is useful if you are trying to find “hidden” relationships.

Confidence is a measure of the reliability of the rule. A confidence of 0.5


in the above example would mean that in 50% of the cases where Diaper
and Gum were purchased, the purchase also included Beer and Chips. For
product recommendation, a 50% confidence may be perfectly acceptable
but in a medical situation, this level may not be high enough.

Lift is the ratio of the observed support to that expected if the two rules
were independent. The basic rule of thumb is that a lift value close to 1
means the rules were completely independent. Lift values > 1 are generally
more “interesting” and could be indicative of a useful rule pattern.

So, there are three important parameters -support, confidence and lift.

Suppose there a set of transactions with item1 --> item 2. So, support for
item 1 will be defined by n(item1) / n (total transactions). Confidence on
the other hand is defined as, n (item1 & item2) / n(item1). So, confidence
tells us the strength of the association and support tells us the relevance of
the rule. Because we don’t want to include rules about items that are
seldom bought, or in other words, have low support. Lift is Confidence/
Support. Higher the lift, more the significance of applying the Apriori
algorithm to determine the rule.

The figures below describe the process in more intuitive manner.

63
COMPONENTS OF BUSINESS ANALYTICS

2.5 TEXT MINING

Discover and extract meaningful patterns and relationships from text


collections e.g. understand sentiments of Customers on social media sites
like Twitter, Face book, Blogs, Call centre scripts etc. which are used to
improve the Product or Customer service or understand how competitors
are doing.

Text Mining is one of the most critical ways of analysing and processing
unstructured data which forms nearly 80% of the world’s data. Today a
majority of organizations and institutions gather and store massive
amounts of data in data warehouses, and cloud platforms and this data
continues to grow exponentially by the minute as new data comes pouring
in from multiple sources. As a result, it becomes a challenge for companies
and organizations to store, process, and analyse vast amounts of textual
data with traditional tools. This is where text mining applications, text
mining tools, and text mining techniques come in.

64
COMPONENTS OF BUSINESS ANALYTICS

2.5.1 What is Text Mining?

According to Wikipedia, “Text mining, also referred to as text data mining,


roughly equivalent to text analytics, is the process of deriving high-quality
information from text.” The definition strikes at the primary chord of text
mining – to delve into unstructured data to extract meaningful patterns
and insights required for exploring textual data sources.

Text mining incorporates and integrates the tools of information retrieval,


data mining, machine learning, statistics, and computational linguistics,
and hence, it is nothing short of a multidisciplinary field. Text mining deals
with natural language texts either stored in semi-structured or
unstructured formats.

12 Ways to Connect Data Analytics to Business Outcomes

The five fundamental steps involved in text mining are:

• Gathering unstructured data from multiple data sources like plain text,
web pages, pdf files, emails, and blogs, to name a few.
• Detect and remove anomalies from data by conducting pre-processing
and cleansing operations. Data cleansing allows you to extract and retain
the valuable information hidden within the data and to help identify the
roots of specific words.
• For this, you get a number of text mining tools and text mining
applications.
• Convert all the relevant information extracted from unstructured data
into structured formats.
• Analyze the patterns within the data via the Management Information
System (MIS).
• Store all the valuable information into a secure database to drive trend
analysis and enhance the decision-making process of the organization.

65
COMPONENTS OF BUSINESS ANALYTICS

2.5.2 Text Mining Techniques

Text mining techniques can be understood at the processes that go into


mining the text and discovering insights from it. These text mining
techniques generally employ different text mining tools and applications for
their execution. Now, let us now look at the various text mining
techniques:

Let us now look at the most famous techniques used in text mining
techniques:

1. Information Extraction
This is the most famous text mining technique. Information exchange
refers to the process of extracting meaningful information from vast chunks
of textual data. This text mining technique focuses on identifying the
extraction of entities, attributes, and their relationships from semi-
structured or unstructured texts. Whatever information is extracted is then
stored in a database for future access and retrieval. The efficacy and
relevancy of the outcomes are checked and evaluated using precision and
recall processes.

2. Information Retrieval
Information Retrieval (IR) refers to the process of extracting relevant and
associated patterns based on a specific set of words or phrases. In this text
mining technique, IR systems make use of different algorithms to track and
monitor user behaviours and discover relevant data accordingly. Google
and Yahoo search engines are the two most renowned IR systems.

66
COMPONENTS OF BUSINESS ANALYTICS

3. Categorization
This is one of those text mining techniques that is a form of “supervised”
learning wherein normal language texts are assigned to a predefined set of
topics depending upon their content. Thus, categorization or rather Natural
Language Processing (NLP) is a process of gathering text documents and
processing and analyzing them to uncover the right topics or indexes for
each document. The co-referencing method is commonly used as a part of
NLP to extract relevant synonyms and abbreviations from textual data.
Today, NLP has become an automated process used in a host of contexts
ranging from personalized commercials delivery to spam filtering and
categorizing web pages under hierarchical definitions, and much more.

4. Clustering
Clustering is one of the most crucial text mining techniques. It seeks to
identify intrinsic structures in textual information and organize them into
relevant subgroups or ‘clusters’ for further analysis. A significant challenge
in the clustering process is to form meaningful clusters from the unlabelled
textual data without having any prior information on them. Cluster
analysis is a standard text mining tool that assists in data distribution or
acts as a pre-processing step for other text mining algorithms running on
detected clusters.

5. Summarisation
Text summarisation refers to the process of automatically generating a
compressed version of a specific text that holds valuable information for
the end-user. The aim of this text mining technique is to browse through
multiple text sources to craft summaries of texts containing a considerable
proportion of information in a concise format, keeping the overall meaning
and intent of the original documents essentially the same. Text
summarisation integrates and combines the various methods that employ
text categorization like decision trees, neural networks, regression models,
and swarm intelligence.

67
COMPONENTS OF BUSINESS ANALYTICS

2.5.3 Text Mining Applications

Text mining techniques and text mining tools are rapidly penetrating the
industry, right from academia and healthcare to businesses and social
media platforms. This is giving rise to a number of text mining
applications. Here are a few text mining applications used across the globe
today:

1. Risk Management
One of the primary causes of failure in the business sector is the lack of
proper or insufficient risk analysis. Adopting and integrating risk
management software powered by text mining technologies such as SAS
Text Miner can help businesses to stay updated with all the current trends
in the business market and boost their abilities to mitigate potential risks.
Since text mining tools and technologies can gather relevant information
from across thousands of text data sources and create links between the
extracted insights, it allows companies to access the right information at
the right moment, thereby enhancing the entire risk management process.

68
COMPONENTS OF BUSINESS ANALYTICS

2. Customer Care Service


Text mining techniques, are finding increasing importance in the field of
customer care. Companies are investing in text analytics software to
enhance their overall customer experience by accessing the textual data
from varied sources such as surveys, customer feedback, and customer
calls, etc. Text analysis aims to reduce the response time of the company
and help address the grievances of the customers speedily and efficiently.

3. Fraud Detection
Text analytics backed by text mining techniques provides a tremendous
opportunity for domains that gather a majority of data in the text format.
Insurance and finance companies are harnessing this opportunity. By
combining the outcomes of text analyses with relevant structured data
these companies are now able to process claims swiftly as well as to detect
and prevent frauds.

4. Business Intelligence
Organizations and business firms have started to leverage text mining
techniques as part of their business intelligence. Apart from providing
profound insights into customer behaviour and trends, text mining
techniques also help companies to analyse the strengths and weaknesses
of their rivals, thus, giving them a competitive advantage in the market.
Text mining tools such as Cogito Intelligence Platform and IBM text
analytics provide insights on the performance of marketing strategies,
latest customer and market trends, and so on.

5. Social Media Analysis


There are many text mining tools designed exclusively for analyzing the
performance of social media platforms. These help to track and interpret
the texts generated online from the news, blogs, emails, etc. Furthermore,
text mining tools can efficiently analyse the number of posts, likes, and
followers of your brand on social media, thereby allowing you to
understand the reaction of people who are interacting with your brand and
online content. The analysis will enable you to understand ‘what’s hot and
what’s not’ for your target audience.

69
COMPONENTS OF BUSINESS ANALYTICS

Here are two case examples where text mining has transformed
real world data to real world evidence.

Case 1: Evidence landscape from literature for drug economics

Understanding the potential for market access is essential for all pharma
companies, and information to characterize the burden of disease and local
standard of care in different countries across the globe is critical for any
new drug launch. Companies need an assessment of the landscape of
epidemiological data, health economics and outcomes information to
inform the optimal commercial strategy.

Valuable data is published every month in scientific journals, abstracts, and


conferences. One of Linguamatics’ Top 10 pharma customers decided to
utilize text mining to extract, normalize, and visualize these data. They
then used this structured data to generate a comprehensive understanding
of the available evidence, thus establishing the market “gaps” they could
address. Focusing on a particular therapeutic area of immunological
diseases, the organization was able to develop precise searches with
increased recall across these different data sources, including full-text
literature.

Linguamatics “I2E” enables the use of ontologies to improve disease


coverage, and to incorporate domain knowledge to increase the
identification of particular geographical regions (for example, enabling the
use of the adjectival form of the country, e.g. French as well as France, and
cities, e.g. Paris, Toulouse). I2E also extracts and normalizes numbers,
which is useful to standardize epidemiological reports for incidence and
prevalence of disease. Searching within full-text papers can be noisy, and
I2E allows search to be specific, and to exclude certain parts of the
document from a search, such as the references.

I2E can provide the starting point for efficiently performing evidence based
systematic reviews over very large sets of scientific literature, enabling
researchers to answer questions around commercial business decisions.

70
COMPONENTS OF BUSINESS ANALYTICS

Case 2: Gaining insights from medical science liaison professionals

Conversations between medical science liaison (MSL) professionals


and patients or healthcare professionals (HCPs) can lead to valuable
insights. The role of the MSL is to ensure the effective use, and success, of
a pharmaceutical company’s drug. MSLs act as the therapy area experts for
internal colleagues, and maintain good relationships with external experts,
such as leading physicians, to educate and inform on new drugs and
therapeutics.

Top pharma company Novo Nordisk uses text mining to gain clinical
insights from MSL interactions with HCPs. These interactions may be broad
ranging, covering topics such as safety and efficacy, dosing, cost, special
populations, indication, comparisons, competitor products, etc. MSLs may
use approved slide decks, package inserts (PIs), factsheets, studies or
publications to answer HCP questions. Linguamatics’ text mining
platform I2E is used to structure these source files with custom ontologies
(e.g. for material types, product, disease terminology variation, topics).

This analysis enables Novo Nordisk to better address what support HCPs
may need in their interactions with patients, insurance providers, and other
clinicians and invest in resource development appropriately.

2.6 FORECASTING

In addition, it is not unusual to hear a company's management speak


about forecasts: "Our sales did not meet the forecasted numbers," or "we
feel confident in the forecasted economic growth and expect to exceed our
targets." In the end, all financial forecasts, whether about the specifics of a
business, like sales growth, or predictions about the economy as a whole,
are informed guesses. In this article, we'll look at some of the methods
behind financial forecasts, as well as the process, and some of the risks
that crop up when we seek to predict the future.

Financial Forecasting Methods

There are several different methods by which a business forecast can be


made. All the methods fall into one of two overarching
approaches: qualitative and quantitative.

71
COMPONENTS OF BUSINESS ANALYTICS

a) Qualitative Models

Qualitative models have typically been successful with short-term


predictions, where the scope of the forecast was limited. Qualitative
forecasts can be thought of as expert-driven, in that they depend
on market mavens or the market as a whole to weigh in with an informed
consensus. Qualitative models can be useful in predicting the short-term
success of companies, products, and services, but has limitations due to its
reliance on opinion over measurable data. Qualitative models include:

• Market Research Polling a large number of people on a specific product or


service to predict how many people will buy or use it once launched.

• Delphi Method: Asking field experts for general opinions and then
compiling them into a forecast. (For more on qualitative modeling, read
"Qualitative Analysis: What Makes a Company Great?")

b) Quantitative Models

Quantitative models discount the expert factor and try to remove the
human element out of the analysis. These approaches are concerned solely
with data and avoid the fickleness of the people underlying the numbers.
They also try to predict where variables like sales, gross domestic product,
housing prices, and so on, will be in the long-term, measured in months or
years. Quantitative models include:

• The Indicator Approach: The indicator approach depends on the


relationship between certain indicators, for example, GDP
and unemployment rates, remaining relatively unchanged over time. By
following the relationships and then following indicators that are leading,
you can estimate the performance of the lagging indicators, by using
the leading indicator data.

• Econometric Modeling: This is a more mathematically rigorous version


of the indicator approach. Instead of assuming that relationships stay the
same, econometric modeling tests the internal consistency of datasets
over time and the significance or strength of the relationship between
data sets. Econometric modeling is sometimes used to create custom
indicators that can be used for a more accurate indicator approach.
However, the econometric models are more often used in academic fields

72
COMPONENTS OF BUSINESS ANALYTICS

to evaluate economic policies. (For a basic explanation on applying


econometric models, read "Regression Basics for Business Analysis.")

• Time Series Methods: This refers to a collection of different


methodologies that use past data to predict future events. The difference
between the time series methodologies is usually in fine details, like
giving more recent data more weight or discounting certain outlier
points. By tracking what happened in the past, the forecaster hopes to
be able to give a better than average prediction about the future. This is
the most common type of business forecasting because it's
inexpensive and no better or worse than other methods.

How Does Forecasting Work?

There is a lot of variation on a practical level when it comes to business


forecasting. However, on a conceptual level, all forecasts follow the same
process.

1. A problem or data point is chosen. This can be something like "will


people buy a high-end coffee maker?" or "what will our sales be in
March next year?"

2. Theoretical variables and an ideal data set are chosen. This is where
the forecaster identifies the relevant variables that need to be
considered and decides how to collect the data.

3. Assumption time. To cut down the time and data needed to make a
forecast, the forecaster makes some explicit assumptions to simplify the
process.

4. A model is chosen. The forecaster picks the model that fits the dataset,
selected variables, and assumptions.

5. Analysis. Using the model, the data is analysed and a forecast made
from the analysis.

6. Verification. The forecaster compares the forecast to what happens to


tweak the process, identify problems or in the rare case of an accurate
forecast, pat himself on the back.

73
COMPONENTS OF BUSINESS ANALYTICS

Problems With Forecasting

Business forecasting is very useful for businesses, as it allows them to plan


production, financing, and so on. However, there are three problems with
relying on forecasts:

1. The data is always going to be old. Historical data is all we have to go


on, and there is no guarantee that the conditions in the past will
continue in the future.

2. It is impossible to factor in unique or unexpected events,


or externalities. Assumptions are dangerous, such as the assumptions
that banks were properly screening borrowers prior to the subprime
meltdown. And black swan events have become more common as our
dependence on forecasts has grown.

3. Forecasts can't integrate their own impact. By having forecasts,


accurate or inaccurate, the actions of businesses are influenced by a
factor that can't be included as a variable. This is a conceptual knot. In
a worst-case scenario, management becomes a slave to historical data
and trends rather than worrying about what the business is doing now.

The Bottom Line

Forecasting can be a dangerous art, because the forecasts become a focus


for companies and governments, mentally limiting their range of actions,
by presenting the short to long-term future as already being determined.
Moreover, forecasts can easily break down due to random elements that
can't be incorporated into a model, or they can be just plain wrong from
the start.

The negatives aside, business forecasting isn't going anywhere.


Appropriately used, forecasting allows businesses to plan ahead of their
needs, raising their chances of staying healthy through all markets. That's
one function of business forecasting that all investors can appreciate.

74
COMPONENTS OF BUSINESS ANALYTICS

2.7 PREDICTIVE ANALYTICS

This analytics is used to Create, manage and deploy predictive scoring


models. e.g. Customer churn & retention, Credit Scoring, predicting failure
in shop floor machinery

• Predictive analytics is the art and science of creating predictive systems


and models. These models, with tuning over time, can then predict an
outcome with a far higher statistical probability than mere guesswork.

• Often, though, predictive analytics is used as an umbrella term that also


embraces related types of advanced analytics. These include descriptive
analytics, which provides insights into what has happened in the past;
and prescriptive analytics, used to improve the effectiveness of decisions
about what to do in the future.

• Starting the Predictive Analytics Modelling Process

• Each predictive analytics model is composed of several predictors, or


variables, that will impact the probability of various results. Before
launching a predictive modelling process, it's important to identify the
business objectives, scope of the project, expected outcomes, and data
sets to be used.

Data Collection and Mining

• Prior to the development of predictive analytics models, data mining is


typically performed to help determine which variables and patterns to
consider in building the model.

• Prior to that, relevant data is collected and cleaned. Data from multiple
sources may be combined into a common source. Data relevant to the
analysis is selected, retrieved, and transformed into forms that will work
with data mining procedures.

75
COMPONENTS OF BUSINESS ANALYTICS

• List of Predictive Analytics Techniques

Some predictive analytics techniques, such as decision trees, can be used


with both numerical and non-numerical data, while others, such as multiple
linear regression, are designed for quantified data. As its name implies,
text analysis is designed strictly for analyzing text.

Decision Trees
• Decision tree techniques, also based on ML, use classification algorithms
from data mining to determine the possible risks and rewards of pursuing
several different courses of action. Potential outcomes are then
presented as a flowchart which helps humans to visualize the data
through a tree-like structure.
• A decision tree has three major parts: a root node, which is the starting
point, along with leaf nodes and branches. The root and leaf nodes ask
questions.
• The branches connect the root and leaf nodes, depicting the flow from
questions to answers. Generally, each node has multiple additional nodes
extending from it, representing possible answers. The answers can be as
simple as "yes" and "no."

Text Analytics

Much enterprise data is still stored neatly in easily quarriable relational


database management systems (RDBMS). However, the big data boom has
ushered in an explosion in the availability of unstructured and semi-
structured data from sources such as emails, social media, web pages, and
call centre logs.

To find answers in this text data, organizations are now experimenting with
new advanced analytics techniques such as topic modeling and sentiment
analysis. Text analytics uses ML, statistical, and linguistics techniques.

• Topic modeling is already proving itself to be very effective at examining


large clusters of text to determine the probability that specific topics are
covered in a specific document.

• To predict the topics of a given document, it examines words used in the


document. For instance, words such as hospital, doctor, and patient

76
COMPONENTS OF BUSINESS ANALYTICS

would result in "healthcare." A law firm might use topic modeling, for
instance, to find case law pertaining to a specific subject.

• One predictive analytics technique leveraged in topic modeling,


probabilistic latent semantic indexing (PLSI), uses probability to model
co-occurrence data, a term referring to an above-chance frequency of
occurrence of two terms next to each other in a certain order.

Sentiment analysis.

• This also known as opinion mining, is an advanced analytics technique


still in earlier phases of development.
• Through sentiment analysis, data scientists seek to identity and
categorize people's feelings and opinions. Reactions expressed in social
media, Amazon product reviews, and other pieces of text can be
analysed to assess and make decisions about attitudes toward a specific
product, company, or brand. Through sentiment analysis, for example,
Expedia Canada decided to fix a marketing campaign featuring a
screeching violin that consumers were complaining about loudly online.
• One technique used in sentiment analysis, dubbed polarity analysis, tells
whether the tone of the text is negative or positive. Categorization can
then be used be used to hone in further on the writer's attitude and
emotions. Finally, a person's emotions can be placed on a scale, with 0
meaning "sad" and 10 signifying "happy."
• Sentiment analysis, though, has its limits. According to Matthew Russell,
CTO at Digital Reasoning and principal at Zaffra, it's critical to use a large
and relevant data sample when measuring sentiment. That's because
sentiment is inherently subjective as well as likely to change over time
due to factors running the gamut from a consumer's mood that day to
the impacts of world events.

77
COMPONENTS OF BUSINESS ANALYTICS

Simple Statistical Modeling

Statistical techniques in predictive analytics modeling can range all the way
from simple traditional mathematical equations to complex deep machine
learning processes running on sophisticated neural networks. Multiple
linear regression is the most commonly used simple statistical method.

• In predictive analytics modeling, multiple linear regression models the


relationship between two or more independent variables and one
continuous dependent variable by fitting a linear equation to observed
data.

• Each value of the independent variable x is associated with a value of the


dependent variable y. Let's say, for example, that data analysts want to
answer the question of whether age and IQ scores effectively predict
grade point average (GPA). In this case, GPA is the dependent variable
and the independent variables are age and IQ scores.

• Multiple linear regression can be used to build models which either


identify the strength of the effect of independent variables on the
dependent variable, predict future trends, or forecast the impact of
changes. For instance, a predictive analytics model could be built which
forecasts the amount by which GPA is expected to increase (or decrease)
for every one-point increase (or decrease) in intelligence quotient.

Neural Networks

The traditional ML-based predictive analytics techniques like multiple linear


regression aren't always good at handling big data. For instance, big data
analysis often requires an understanding of the sequence or timing of
events. Neural networking techniques are much more adept at dealing with
sequence and internal time orderings. Neural networks can make better
predictions on time series information like weather data, for instance. Yet
although neural networking excels at some types of statistical analysis, its
applications range much further than that.

In a recent study by TDWI, respondents were asked to name the most


useful applications of Hadoop if their companies were to implement it. Each
respondent was allowed up to four responses. A total of 36 percent named
a "quarriable archive for non-traditional data," while 33 percent chose a

78
COMPONENTS OF BUSINESS ANALYTICS

"computational platform and sandbox for advanced analytics." In


comparison, 46 percent named "warehouse extensions." Also showing up
on the list was "archiving traditional data," at 19 percent.

• For its part, non-traditional data extends way beyond text data such
social media tweets and emails. For data input such as maps, audio,
video, and medical images, deep learning techniques are also required.
These techniques create layer upon layer of neural networks to analyze
complex data shapes and patterns, improving their accuracy rates by
being trained on representative data sets.

• Deep learning techniques are already used in image classification


applications such as voice and facial recognition and in predictive
analytics techniques based on those methods. For instance, to monitor
viewers' reactions to TV show trailers and decide which TV programs to
run in various world markets, BBC Worldwide has developed an emotion
detection application. The application leverages an offshoot of facial
recognition called face tracking, which analyzes facial movements. The
point is to predict the emotions that viewers would experience when
watching the actual TV shows.

Example: The (Future) Brains Behind Self-Driving Cars

Much research is now focused on self-driving cars, another deep learning


application which uses predictive analytics and other types of advanced
analytics. For instance, to be safe enough to drive on a real roadway,
autonomous vehicles need to predict when to slow down or stop because a
passenger is about to cross the street.

Beyond issues related to the development of adequate machine vision


cameras, building and training neural networks which can produce the
needed degree of accuracy presents a set of unique challenges.

• Clearly, a representative data set would have to include an adequate


amount of driving, weather, and simulation patterns. This data has yet to
be collected, however, partly due to the expense of the endeavour,
according to Carl Gutierrez of consultancy and professional services
company Altoros.

79
COMPONENTS OF BUSINESS ANALYTICS

• Other barriers that come into play include the levels of complexity and
computational powers of today's neural networks. Neural networks need
to obtain either enough parameters or a more sophisticated architecture
to train on, learn from, and be aware of lessons learned in autonomous
vehicle applications. Additional engineering challenges are posed by
scaling the data set to a massive size.

2.8 OPTIMIZATION

Use of simulations techniques to identify scenarios which will produce best


results e.g. Sale price optimization, identifying optimal Inventory for
maximum fulfilment& avoid stock outs

An optimization problem consists of maximizing or minimizing a real


function by systematically choosing inputvalues from within an allowed set
and computing the value of the function. The generalization of optimization
theory and techniques to other formulations constitutes a large area
of applied mathematics. More generally, optimization includes finding "best
available" values of some objective function given a defined domain (or
input), including a variety of different types of objective functions and
different types of domains.

Optimization is an important tool in making decisions and in analyzing


physical systems. In mathematical terms, an optimization problem is the
problem of finding the best solution from among the set of
all feasible solutions.

1. Constructing a Model

The first step in the optimization process is constructing an appropriate


model; modeling is the process of identifying and expressing in
mathematical terms the objective, the variables, and the constraints of
the problem.

• An objective is a quantitative measure of the performance of the system


that we want to minimize or maximize. In manufacturing, we may want
to maximize the profits or minimize the cost of production, whereas in
fitting experimental data to a model, we may want to minimize the total
deviation of the observed data from the predicted data.

80
COMPONENTS OF BUSINESS ANALYTICS

• The variables or the unknowns are the components of the system for
which we want to find values. In manufacturing, the variables may be the
amount of each resource consumed or the time spent on each activity,
whereas in data fitting, the variables would be the parameters of the
model.

The constraints are the functions that describe the relationships among the
variables and that define the allowable values for the variables. In
manufacturing, the amount of a resource consumed cannot exceed the
available amount.

2. Determining the Problem Type

The second step in the optimization process is determining in which


category of optimization your model belongs. The page Types of
Optimization Problems provides some guidance to help you classify your
optimization model; for the various optimization problem types, there is a
linked page with some basic information, links to algorithms and software,
and online and print resources. For an alphabetical listing of all of the
optimization problem types, see Optimization Problem Types: Alphabetical
Listing.

3. Selecting Software

The third step in the optimization process is selecting software appropriate


for the type of optimization problem that you are solving. Optimization
software comes in two related but very different kinds of packages:

• Solver software is concerned with finding a solution to a specific instance


of an optimization model. The solver takes an instance of a model as
input, applies one or more solution methods, and returns the results.

• Modeling software is designed to help people formulate optimization


models and analyze their solutions. A modeling system takes as input a
description of an optimization problem in a symbolic form and allows the
solution output to be viewed in similar terms; conversion to the forms
required by the algorithm(s) is done internally. Modeling systems vary in
the extent to which they support importing data, invoking solvers,
processing results, and integrating with larger applications. Modeling
systems are typically built around a modeling language for representing

81
COMPONENTS OF BUSINESS ANALYTICS

the problem in symbolic form. The modeling language may be specific to


the system or adapted from an existing programming or scripting
language.

What is optimization and how it improves planning outcomes

A simple question you can ask yourself- when you last used car navigation?
and you will have no problem answering. Your car navigation system is an
example of optimization.

Optimization models are built to achieve a goal while considering


constraints and variables. With your car navigation example, the goal is the
destination, the constraints are the limited roadways, and the variables
might be traffic or road closures. After considering traffic and various
routes, your car navigation suggests the best path forward.

Optimization has a range of applications from investment strategy -- what


is the right portfolio mix for my retirement? -- to planning a vacation --
what should our schedule be at Disneyland? It is best applied when
you're deciding among a large number of alternatives and more valuable
when those alternatives can lead to significantly different outcomes.

With car navigation it's easy to see how optimization may help you get to
the office on time, but now let's explore how you can use it when you get
there.

One of the areas where optimization can have significant impact is


planning. Today, organizations face a range of complex planning questions
which require blending top-down (strategic) and bottom-up (tactical)
planning data and expertise from across their business units. These
planning decisions can have significant short and long term impacts,
putting pressure on organizations to go from a plan to the right plan.

Take production planning, for example. Production planning involves


questions such as what product to produce, when to produce it, how much
to produce, and where to produce it.

82
COMPONENTS OF BUSINESS ANALYTICS

Businesses need to respond to demand with optimal production cost,


speed, and flexibility in order to maximize profit. To answer these planning
questions, businesses face a series of hurdles. First, they need to assemble
data from sales, marketing, operations, and finance into a single view in
order to get a true picture of their business. Second, to make
the right decisions they need an easy way to evaluate options by exploring
trade-offs and asking "what if" questions. Lastly, they need to act, which
requires identifying the best path forward. Underpinning these activities is
the need to be responsive and fast.

Planning Analytics is the planning, budgeting, and forecasting backbone for


more than half of the global Fortune 500. It unifies planning data from
across the business, linking operational drivers to financial outcomes. This
provides insight into business drivers and opportunities to adapt plans to
changing business conditions.

The planner no longer has the tedious task of a trial-and-error approach,


instead exploring scenarios with a click-of-a-button in a fraction of the
time, accelerating planning cycles and improving plan quality.
Organizations can capture more value in the marketplace by improving
operations, save money by managing resources more effectively, and
mitigate risk by gaining insight into how decisions can impact their
business.

2.9 VISUALIZATION

The Enhanced exploratory data analysis & output of modeling results with
highly interactive statistical graphics

Visualization techniques have been used by successful people to visualize


their desired outcomes for ages. The practice has even given some high
achievers what seems like super-powers, helping them create their dream
lives by accomplishing one goal or task at a time with hyper focus and
complete confidence. In fact, we all have this awesome power, but most of
us have never been taught to use it effectively.

Elite athletes use it. The super-rich use it. And peak performers in all fields
now use it. That power is called visualization.

83
COMPONENTS OF BUSINESS ANALYTICS

The daily practice of visualizing your dreams as already complete can


rapidly accelerate your achievement of those dreams, goals, and
ambitions. Using visualization techniques to focus on your goals and
desires yields of four very important benefits.

1) It activates your creative subconscious which will start generating


creative ideas to achieve your goal.

2) It programs your brain to more readily perceive and recognize the


resources you will need to achieve your dreams.

3) It activates the law of attraction, thereby drawing into your life the
people, resources, and circumstances you will need to achieve your goals.

4) It builds your internal motivation to take the necessary actions to


achieve your dreams.

Visualization is really quite simple. You sit in a comfortable position, close


your eyes and imagine — in as vivid detail as you can — what you would
be looking at if the dream you have were already realized. Imagine being
inside of yourself, looking out through your eyes at the ideal result.

Visualize with the ‘Mental Rehearsal’ TechniqueFor athletes,


visualization process is called “mental rehearsal,” and they have
been using these exercises since the 1960s when we learned about
it from the Russians.

All you have to do is set aside a few minutes a day. The best times are
when you first wake up, after meditation or prayer, and right before you go
to bed. These are the times you are most relaxed.

84
COMPONENTS OF BUSINESS ANALYTICS

Go through the following three steps:

• STEP 1. Imagine sitting in a movie theatre, the lights dim, and then the
movie starts. It is a movie of you doing perfectly whatever it is that you
want to do better. See as much detail as you can create, including your
clothing, the expression on your face, small body movements, the
environment and any other people that might be around. Add in any
sounds you would be hearing — traffic, music, other people talking,
cheering. And finally, recreate in your body any feelings you think you
would be experiencing as you engage in this activity.

• STEP 2. Get out of your chair, walk up to the screen, open a door in the
screen and enter into the movie. Now experience the whole thing again
from inside of yourself, looking out through your eyes. This is called an
“embodied image” rather than a “distant image.” It will deepen the
impact of the experience. Again, see everything in vivid detail, hear the
sounds you would hear, and feel the feelings you would feel.

• STEP 3. Finally, walk back out of the screen that is still showing the
picture of you performing perfectly, return to your seat in the theatre,
reach out and grab the screen and shrink it down to the size of a cracker.
Then, bring this miniature screen up to your mouth, chew it up and
swallow it. Imagine that each tiny piece — just like a hologram —
contains the full picture of you performing well. Imagine all these little
screens traveling down into your stomach and out through the
bloodstream into every cell of your body. Then imagine that every cell of
your body is lit up with a movie of you performing perfectly. It’s like one
of those appliance store windows where 50 televisions are all tuned to
the same channel.

When you have finished this process — it should take less than five
minutes — you can open your eyes and go about your business. If you
make this part of your daily routine, you will be amazed at how much
improvement you will see in your life.

85
COMPONENTS OF BUSINESS ANALYTICS

• Create Goal Pictures

Another powerful visualization technique is to create a photograph or


picture of yourself with your goal, as if it were already completed. If one of
your goals is to own a new car, take your camera down to your local auto
dealer and have a picture taken of yourself sitting behind the wheel of your
dream car. If your goal is to visit Paris, find a picture or poster of the Eiffel
Tower and cut out a picture of yourself and place it into the picture.

• Create a Visual Picture and an Affirmation for Each Goal


It is recommended that you find or create a picture of every aspect of your
dream life. Create a picture or a visual representation for every goal you
have — financial, career, recreation, new skills and abilities, things you
want to purchase, and so on.

Example : When we were writing the very first Chicken Soup for the
Soulbook, we took a copy of the New York Times best seller list, scanned it
into our computer, and using the same font as the newspaper, typed
Chicken Soup for the Soul into the number one position in the “Paperback
Advice, How-To and Miscellaneous” category. We printed several copies and
hung them up around the office. Less than two years later, our book was
the number one book in that category and stayed there for over a year.
Now that’s a pretty solid example of a successful visualization technique!

86
COMPONENTS OF BUSINESS ANALYTICS

2.10 SUMMARY

The components of business analytics dashboards include:

• Data Aggregation

Before data can be analysed, it must be collected, centralized, and cleaned


to avoid duplication, and filtered to remove inaccurate, incomplete, and
unusable data. Data can be aggregated from:

Transactional records: Records that are part of a large dataset shared by


an organization or by an authorized third party (banking records, sales
records, and shipping records).

Volunteered data: Data supplied via a paper or digital form that is shared
by the consumer directly or by an authorized third party (usually personal
information).

• Data Mining

In the search to reveal and identify previously unrecognized trends and


patterns, models can be created by mining through vast amounts of
data. Data mining employs several statistical techniques to achieve
clarification, including:

Classification: Used when variables such as demographics are known and


can be used to sort and group data

Regression: A function used to predict continuous numeric values, based


on extrapolating historical patterns

Clustering: Used when factors used to classify data are unavailable,


meaning patterns must be identified to determine what variables exist

87
COMPONENTS OF BUSINESS ANALYTICS

• Association and Sequence Identification

In many cases, consumers perform similar actions at the same time or


perform predictable actions sequentially. This data can reveal patterns such
as:

Association: For example, two different items frequently being purchased


in the same transaction, such as multiple books in a series or a toothbrush
and toothpaste.

Sequencing: For example, a consumer requesting a credit report followed


by asking for a loan or booking an airline ticket, followed by booking a
hotel room or reserving a car.

• Text Mining

Companies can also collect textual information from social media sites,
blog comments, and call center scripts to extract meaningful relationship
indicators. This data can be used to:

Develop in-demand new products


Improve customer service and experience
Review competitor performance

• Forecasting

A forecast of future events or behaviours based on historical data can be


created by analyzing processes that occur during a specific period or
season. For example:

Energy demands for a city with a static population in any given month or
quarter

Retail sales for holiday merchandise, including biggest sales days for both
physical and digital stores

Spikes in internet searches related to a specific recurring event, such as


the Super Bowl or the Olympics

88
COMPONENTS OF BUSINESS ANALYTICS

• Predictive Analytics

Companies can create, deploy, and manage predictive scoring models,


proactively addressing events such as:

Customer churn with specificity narrowed down to customer age bracket,


income level, lifetime of existing account, and availability of promotions

Equipment failure, especially in anticipated times of heavy use or if subject


to extraordinary temperature/humidity-related stressors

Market trends including those taking place entirely online, as well as


patterns which may be seasonal or event-related

• Optimization

Companies can identify best-case scenarios and next best actions by


developing and engaging simulation techniques, including:

Peak sales pricing and using demand spikes to scale production and
maintain a steady revenue flow
Inventory stocking and shipping options that optimize delivery schedules
and customer satisfaction without sacrificing warehouse space

Prime opportunity windows for sales, promotions, new products, and spin-
offs to maximize profits and pave the way for future opportunities

• Data Visualization

Information and insights drawn from data can be presented with highly
interactive graphics to show:

Exploratory data analysis


Modeling output
Statistical predictions

These data visualization components allow organizations to leverage their


data to inform and drive new goals for the business, increase revenues,
and improve consumer relations.

89
COMPONENTS OF BUSINESS ANALYTICS

2.11 SELF ASSESSMENT QUESTIONS:


1. What are the main components in business analytics? Explain

2. Write short note on : Association and sequence identification.

3. Explain in short : Text mining process

4. What is predictive analytics?

5. Explain: Forecasting

2.12 MULTIPLE CHOICE QUESTIONS:

1. The process of gathering data and presenting it in a summarized format


is called as ……………..
a. Data analysis
b. Data aggregation
c. Optimisation
d. Data visualisation

2. Two different items frequently being purchased in the same transaction,


such as multiple books in a series or a toothbrush and toothpaste, the
data collected is forming a part of process called…………….
a. Association and Sequence Identification
b. Sequence Identification
c. Association Identification
d. Forecasting

3. What is the use of Optimization as one of the tool in data analytics?


a. Making decisions
b. Analyzing physical systems
c. Data segregation and clubbing
d. Making decisions and in analyzing physical systems

90
COMPONENTS OF BUSINESS ANALYTICS

4. Prior to the development of predictive analytics models, ------------------


is typically performed to help to determine which variables and patterns
to consider in building the model.
a. Data mining
b. Data collection
c. Data segregation
d. Cleaning of data collected

5. What is significant challenge in the clustering process ?


a. To form a meaningful clusters from the unlabelled textual data
without having any prior information on them.
b. To extract relevant and associated patterns based on a specific set of
words or phrases.
c. To extract meaningful information from vast chunks of textual data.
d. To automatically generate a compressed version of a specific text that
holds valuable information for the end-user

Answers: 1. (a), 2. (c), 3. (d), 4. (a), 5.(a)

91
COMPONENTS OF BUSINESS ANALYTICS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture - Part 1

Video Lecture - Part 2

92
DIGITAL DATA AND ITS TYPES

Chapter 3
Digital Data and its Types
Objectives:

On completion of this chapter, you will understand what is digital data and
types of digital data, its source, storage and characteristics of structured,
unstructured and semi structured data, OLAP v/s OLTP and data models to
it. Considering following:

Structure:

3.1 Introduction

3.2 Digital media and it’s different forms

3.3 Data types for digital Media products

3.4 Digital data sources

3.5 Digital data storage

3.6 Characteristics of Structured, Unstructured and semi-structured data

3.7 OLAP (online analytical processing)

3.8 OLTP ( Online Transaction Processing

3.9 Summary

3.10 Self Assessment Questions

3.11 Multiple Choice Questions

93
DIGITAL DATA AND ITS TYPES

3.1 INTRODUCTION:

Digital data is data that represents other forms of data using specific
machine language system that can be interpreted by using various
technologies. The most fundamental of these systems is binary system,
which simply store complex audio, video or text information in series of
binary characters, traditionally ones and zeros, or “on” and “off” values.

One of the biggest strength of the digital data is that all sorts of very
complex analog input can be represented with binary system. Along with
smaller microprocessors and larger data storage centres, this model of
information capture has helped parties like business and government
agencies to explore new frontiers of data collections and to represent
more impressive simulations through a digital interface.

From the earliest primitive digital data design to new, highly sophisticated
and massive volume of binary data, digital data seek to capture elements
of physical world and simulate them for technological use. This is done in
many different ways, but with specific techniques for capturing the real
world events and converting them in to digital form. One simple example is
conversion of physical scene in to digital image. In this, new digital data is
somewhat similar to older data system that converted a physical view or
scene to chemical film. One of the major difference is that digital data
records visual information in to bit map or pixelated map , that stores the
particular colour property for each bit on precise and sophisticated grid. By
using this straightforward essential system data transfer, the digital image
was created. Similar techniques are used to record audio streams in to
digital form

94
DIGITAL DATA AND ITS TYPES

3.2 DIGITAL MEDIA AND ITS DIFFERENT FORMS:

Digital data can be defined in short as -Digital data is data, that represents
other forms of data using specific machine language systems that can be
interpreted by various technologies. The most fundamental of these
systems is a binary system, which simply stores complex audio, video or
text information in a series of binary characters, traditionally ones and
zeros, or "on" and "off" values.

Digital media is digitised content that can be transmitted over the internet
or computer network. This can include Text, audio, video and graphics.
Following are forms of digital media:

i. E-Music – is the audio accessed from the internet. It has been made
possible using compressed file format such as MP3. E-music has allowed
the people to easily download the music from the internet and copy
music from CD in the magnetic disc. However, it has also allowed people
around the world to make illegal copies of music. E-music can be played
back using a media player on the computer or using an MP3 playback
device.

ii. Digital newspaper provides information on stories of special interest


from the internet. They provide the latest news , as stories are being
constantly updated. Digital newspaper can e mail the subscribers a page
of news headlines on the areas they nominate. Each item of the text is
linked to the full story on the website.

iii. Digital Television provides interactive Television. It merges


communication , television and computer technologies. Digital Television
offers more channels choices, a higher resolution screen and greater
control over programs the user is watching. For example , a user
watching the football match could choose to watch a particular football
player rather than decisions made by the producer.

iv. Electronic games are played using game machine displayed on


television or using gaming software on computer. The latest games
have different levels and are becoming more realistic. Games are also
designed for multiple players, who can interact and compete with each
other. The internet is commonly used to play games.

95
DIGITAL DATA AND ITS TYPES

Thus, with audio, text, Video and graphic in digital format can be stored
on server , hard drive and mobile device. One of the main advantage of
this is that it reduces the cost. The cost includes the cost to produce,
deliver and store the physical formats that contain movies , TV shows and
music. The production cost is reduced by eliminating the factories that
manufactures the discs that our media is stored on today. These costs will
be replaced by the cost to host downloads of the content or stream it from
the cloud. While there is cost involved in hosting the content , it is far less
than the cost to build the factories , train workers , and ship in the raw
material to make the discs. There is also cost involved in shipping the discs
to relatives and friends. With digital media, a corrupted file can simply be
redistributed with no extra cost. Another advantage is that digital media is
compatible with different pieces of hard ware, while physical media are
limited to just a few that are compatible. This means that there is more
freedom of choice for customer on how they view media content , whether
it is from computer, TV or mobile device. There is much flexibility with
digital formats over physical ones.

3.3 DATA TYPES FOR DIGITAL MEDIA PRODUCTS:

Following are the data types that are used for digital media products:

• Text and hypertext – of all the data types, text requires the least
amount of storage and processing power in a computer. Common text
formats include Word documents, PDF documents, HTML, and Text
documents. Hypertext is text that contains a link to other information or
files (such as a web link).

• Audio – can be stored in many different formats that each have their
own advantages and disadvantages. Digitised sound takes small pieces
or ‘samples’ of a sound and stores them all digitally. The quality of the
sound depends on how fast the sample is taken (sample rate) and the
bits available for storage (sample size). A music CD is samples at 44kHz
which means it has 44100 sample per seconds. A music CD also has a
sample size of 16 bits. MIDI files are smaller than normal audio files as
they do not actually record human speech or other sounds but simple
store information about an instrument as well as pitch, timing, and
duration of notes. MIDI files are smaller than digitised sound files.
Common file formats for digitised sound include MP3, WAV, and WMA.
Some file formats such as MP3 can compress sound to make files smaller

96
DIGITAL DATA AND ITS TYPES

in size but can reduce the quality of the sound. Audio file formats that
lose quality when compressed are called lossy audio files (such as MP3
and AAC). Audio files formats that retain quality when compressed are
called lossless audio files (such as WAV and FLAC).

• Graphics – there are two main types of graphics: bitmap and vector. For
each type of graphics there are also different formats. For example,
digital camera use bitmapped images and store photos in the JPEG
format. Drawing programs such as Illustrator can store vector images in
formats such as SVG. Other programs such as Photoshop, Paint and
SketchUp also have their own graphics formats.

• Video – there are many different video file formats available today and
each one has its own advantages and disadvantages for different
applications. Some video formats may be more suitable for web
streaming while others may be more suitable for televisions or mobile
devices. Common video formats include MP4 AVI, WMV, FLV and
QuickTime (MDV).

• Animation – there are several animation file formats that are used by
different animation authoring programs such as the SWF format used by
Adobe Flash. However, animations can be exported to file formats that
are also used for video such as MP4 and FLV.

• Video productions – created using video clips (or video sequences)


from a digital video camera and video editing software. It involves adding
video clips, audio or pictures to a timeline or storyline. Each video clip is
made up of frames that contain individual images. When the video is
played, the frames are displayed in sequence. Some common file formats
for video and MPEG, AVI, DivX, WMV and MOV.

97
DIGITAL DATA AND ITS TYPES

How data types combine to produce and enhance a digital media


product:

Audio plays a big role for many different applications from movies/videos
to games. Adding audio to computer games makes it more enjoyable, even
if it is only adding background music, it can make the game more
appealing to play.

Animation software can be the most complicated and difficult software to


learn and use. With animation, you can create GIF icons for webpages to
feature-length, animated movies such as Shrek and Finding Nemo.
Selected animations, even simple ones, can add impact to a webpage or a
presentation.

Video software allows you to cut, copy, and paste video and audio
sequences. It can add many effects such as titles, fades and wipes
between scenes. Most digital video media is intended for storage and
distribution on CDs or DVDs, or even for broadcasting on TV. Distributing
video media over the internet is very difficult as only users who have high-
speed broadband internet connections can hope to receive high quality
images.

3.4 DIGITAL DATA SOURCES:

What would happen to your digital business if the data that feeds it
would suddenly be unavailable? Let’s look at the various sources
you may use, and how to secure them.

Your data-driven business is fuelled by insights obtained from the


processing and analysis of data from various sources. Have you considered
what would happen if one -- or several -- of the data sources that feed this
digital engine, were to dry up? If you were suddenly unable to access the
critical data that make your business run?

Let's look at where your data comes from, and consider which concrete
actions you can take to secure its supply.

98
DIGITAL DATA AND ITS TYPES

i. Internal transactional data

Internal data, and especially data which primary purpose is distinct than
the usage your digital business makes, is both the easiest and the trickiest
to secure. It's easy because you don't have to negotiate a formal contract
with a third party, and if there is executive buy-in for what you do, then
getting the data owner to provide access should not be a problem. But it's
also tricky precisely because of this lack of formal contract, because people
change, because priorities shift. Whether accidental or not, you may find
your access cut off overnight, and the restoration of this access not being a
top priority for the data owner. Or data schemas may change and require
that you rebuild you entire collection processes.

Action: make sure the proper processes and SLAs are in place, and follow
very closely organization and staff movements to inform new stakeholders
of why your access to data must remain safe.

ii. Connected objects data

If you process data from the Internet of Things, and especially consumer
connected devices, your challenge to securing access is primary legal.
There are two questions you need to consider:

• Who owns the data? Does it belong to the owner of the device, the
account holder, or to your organization?

• What can you do with the data? Surely, you can use it to render a
service to your subscriber, but can you aggregate it with data from other
subscribers? Can you resell this data (anonymized or not)? Can you
derive insights, and resell this insight?

Action: review your terms of use and ensure these questions are being
addressed. Also consider whether privacy laws and customs in various
countries or regions may have an impact.

iii. Syndicated data

Syndicated data is usually the easiest to control. Because you are paying a
service provider to deliver data to you, you have a contract with this

99
DIGITAL DATA AND ITS TYPES

provider. This contract will cover service level agreements, licensing and
usage limitations, and should ensure continued access.

However, you still need to consider what will happen if the service provider
goes out of business, or changes its business model (like Twitter's recent
announcement that they are shutting down their firehose to better control
their supply chain).

Action: review if alternate sources are available, and keep these options at
hand in case you need them.

iv. Trading partners data

The case of trading partners data is very similar to the one of syndicated
data, except that the data is usually not provided as a standalone service
but as part of a broader relationship -- for example between a retailer and
a manufacturer. Enforcing service level agreements can become tricky, if it
puts at risk an otherwise profitable relationship.

Action: like you do for syndicated data, always have in mind alternate
sources, if applicable.

v. Open data

The good news with open data is that it's free -- but it's also the bad news.
Assuming you study carefully the terms of use and licensing agreement for
the data, you should be safe legally. But there is no guarantee that this
service will be provided in the long run, or that it will be provided
consistently. The risks of changes in the data structures and the access
methods provided, is very high. And if the service is not responding, you
have no recourse.

Action: find multiple sources, and do not build your business on the
assumption that open data feeds will remain available in the long run.

100
DIGITAL DATA AND ITS TYPES

vi. Harvested data

Harvesting data from web sites (screen scraping) or public APIs is common
practice, but it is also the least secure source of data you can consider.

From the legal standpoint, this practice is often borderline since there is no
licensing agreement that permits you to use the data harvested in such
ways.

From the data availability standpoint, web sites change all the time, and
your scraping routines will become obsolete in no time.

Action: stay away from data harvesting! And if data harvesting is your
only option, be prepared to suffer outages, and to have to redevelop your
routines all the time. And maybe get a lawyer.

3.5 DIGITAL DATA STORAGE:

Digital Data Storage (DDS) is a format for storing and backing up computer
data on tape that evolved from the Digital Audio Tape (DAT) technology.
DAT was created for CD-quality audio recording. In 1989, Sony and
Hewlett Packard defined the DDS format for data storage using DAT tape
cartridges. Tapes conforming to the DDS format can be played by either
DAT or DDS tape drives. However, DDS tape drives cannot play DAT tapes
since they can't pick up the audio on the DAT tape.

DDS uses a 4-mm tape. A DDS tape drive uses helical scanning for
recording, the same process used by a video recorder (VCR). There are two
read heads and two write heads. The read heads verify the data that has
been written (recorded). If errors are present, the write heads rewrite the
data. When restoring a backed-up file, the restoring software reads the
directory of files located at the beginning of the tape, winds the tape to the
location of the file, verifies the file, and writes the file onto the hard drive.
DDS cannot update a backed-up file in the same place it was originally
recorded. In general, DDS requires special software for managing the
storage and retrieval of data from DDS tape drives.

101
DIGITAL DATA AND ITS TYPES

There are four types of DDS drives:

1. DDS-1: Stores up to 2 gigabytes of uncompressed data on a 120-


minute cartridge.

2. DDS-2: Stores up to 8 GB of data in compressed format on a 120-


minute cartridge. DDS-2 is ideal for small network servers.

3. DDS-3: Stores up to 24 GB of data on a 125-minute cartridge. The


DDS-3 drive is ideal for medium-sized servers. DDS-3 uses PRML
( Pa r t i a l Re s p o n s e M a x i m u m L i k e l i h o o d ) . P R M L e l i m i n a t e s
electronic noise for a cleaner data recording.

4. DDS-4: The newest DDS drive, DDS-4 stores up to 40 GB of data on a


125-minute cartridge. Small to mid-size businesses benefit from the
DDS-4 drive.

A DDS cartridge needs to be retired after 2,000 passes or 100 full backups.
You should clean your DDS tape drive every 24 hours with a cleaning
cartridge and discard the cleaning cartridge after 30 cleanings. DDS tapes
have an expected life of at least 10 years.

3.6 CHARACTERISTICS OF STRUCTURED, UNSTRUCTURED


AND SEMI-STRUCTURED DATA:

Big Data includes huge volume, high velocity, and extensible variety of
data. These are 3 types: Structured data, Semi-structured data, and
Unstructured data. In computer science, a data structure is a particular
way of organising and storing data in a computer such that it can be
accessed and modified efficiently. More precisely, a data structure is a
collection of data values, the relationships among them, and the functions
or operations that can be applied to the data.

There are three different data structures:

For the analysis of data, it is important to understand that there are three
common types of data structures:

102
DIGITAL DATA AND ITS TYPES

1. Structured data –
Structured data is data whose elements are addressable for effective
analysis. It has been organized into a formatted repository that is typically
a database. It concerns all data which can be stored in database SQL in a
table with rows and columns. They have relational keys and can easily be
mapped into pre-designed fields. Today, those data are most processed in
the development and simplest way to manage
information. Example: Relational data.

In other words, Structured data is data that adheres to a pre-defined


data model and is therefore straightforward to analyse. Structured data
conforms to a tabular format with relationship between the different rows
and columns. Common examples of structured data are Excel files or SQL
databases. Each of these have structured rows and columns that can be
sorted.

Structured data depends on the existence of a data model – a model of


how data can be stored, processed and accessed. Because of a data model,
each field is discrete and can be accesses separately or jointly along with
data from other fields. This makes structured data extremely powerful: it is
possible to quickly aggregate data from various locations in the database.

Structured data is is considered the most ‘traditional’ form of data storage,


since the earliest versions of database management systems (DBMS) were
able to store, process and access structured data.

103
DIGITAL DATA AND ITS TYPES

2. Semi-Structured data-

Semi-structured data is information that does not reside in a relational


database but that have some organizational properties that make it easier
to analyze. With some process, you can store them in the relation database
(it could be very hard for some kind of semi-structured data), but Semi-
structured exist to ease space. Example: XML data.

Thus, Semi-structured data is a form of structured data that does not


conform with the formal structure of data models associated with relational
databases or other forms of data tables, but nonetheless contain tags or
other markers to separate semantic elements and enforce hierarchies of
records and fields within the data. Therefore, it is also known as self-
describing structure. Examples of semi-structured data include JSON and
XML are forms of semi-structured data.

The reason that this third category exists (between structured and
unstructured data) is because semi-structured data is considerably easier
to analyse than unstructured data. Many Big Data solutions and tools have
the ability to ‘read’ and process either JSON or XML. This reduces the
complexity to analyse structured data, compared to unstructured data.

3. Unstructured data –

Unstructured data is a data that is which is not organized in a pre-defined


manner or does not have a pre-defined data model, thus it is not a good fit
for a mainstream relational database. So for Unstructured data, there are
alternative platforms for storing and managing, it is increasingly prevalent
in IT systems and is used by organizations in a variety of business
intelligence and analytics applications. Example: Word, PDF, Text, Media
logs.

In short, Unstructured data is information that either does not have a


predefined data model or is not organised in a pre-defined manner.
Unstructured information is typically text-heavy, but may contain data such
as dates, numbers, and facts as well. This results in irregularities and
ambiguities that make it difficult to understand using traditional programs
as compared to data stored in structured databases. Common examples of
unstructured data include audio, video files or No-SQL databases.

104
DIGITAL DATA AND ITS TYPES

The ability to store and process unstructured data has greatly grown in
recent years, with many new technologies and tools coming to the market
that are able to store specialised types of unstructured data. MongoDB, for
example, is optimised to store documents. Apache Giraph as an opposite
example, is optimised for storing relationships between nodes.

The ability to analyse unstructured data is especially relevant in the


context of Big Data, since a large part of data in organisations is
unstructured. Think about pictures, videos or PDF documents. The ability to
extract value from unstructured data is one of main drivers behind the
quick growth of Big Data.

Differences between Structured, Semi-structured and Unstructured


data:

PROPERTIES STRUCTURED DATA SEMI- UNSTRUCTURED


STRUCTURED DATA
DATA
Technology It is based on It is based on It is based on
Relational database XML/RDF character and binary
table data
Transaction Matured transaction Transaction is No transaction
management and various adapted from management and no
concurrency DBMS not concurrency
technique matured
Version Versioning over Versioning over Versioned as whole
management tuples, row, tables tuples or graph is
possible
Flexibility It is scheme It is more flexible it very flexible and
dependent and less than structured there is absence of
flexible data but less than schema
flexible than
unstructured data
Scalability It is very difficult to It’s scaling is It is very scalable
scale DB schema simpler than
structured data
Robustness Very robust New technology, —
not very spread

105
DIGITAL DATA AND ITS TYPES

Query Structured query Queries over Only textual query


performance allow complex joining anonymous are possible
nodes are
possible

Metadata – Data about Data


A last category of data type is metadata. From a technical point of view,
this is not a separate data structure, but it is one of the most important
elements for Big Data analysis and big data solutions. Metadata is data
about data. It provides additional information about a specific set of data.

In a set of photographs, for example, metadata could describe when and


where the photos were taken. The metadata then provides fields for dates
and locations which, by themselves, can be considered structured data.
Because of this reason, metadata is frequently used by Big Data solutions
for initial analysis.

A. Different shapes of data:

Data manifests itself in many different shapes. Each shape of the data may
hold much value to the business. In some shapes, this is easier to extract
than others. Different shapes of the data require different storage solutions
and should therefore be dealt with in different ways. We can distinguish
between there shapes of data as under:

1. Un structured data:
Unstructured data is the rawest form of data. It can be any type of file ,
e.g. texts, pictures, sounds or videos. This data is often stored in
repository of files. Think of this as a very well organised directory on your
computer hard drive. Extracting value out of this shape of data is often the
hardest. Since you first need to extract the structured features from the
data that describes or abstract from it. For example , to use the text you
might want to extract the topics, and whether the text is positive and
negative about them

106
DIGITAL DATA AND ITS TYPES

2. Structured data:
Structured data is tabular data ( rows and columns) which are very well
defined. Meaning that we know which columns are there and what kind of
data it contains. Often such type of data is stored in database. In data
bases , we can use the power of language SQL to answer the queries about
the data and easily create the data sets to use in data science solutions.

3. Semi structured data:


Semi structured data is anywhere between structured and unstructured da.
A consistent format is defined, however the structure is not very strict, like
it is not necessarily tabular and parts of the data may be incomplete or of
the different types. Semi structured data often stored as files. However,
some kind of semi structured data ( like JSON or XML ) can be stored in
document oriented database. Such databases allow you to query the semi
structured data. Following are the different shapes of data:

You can find these three shapes of data within the organisation, but you
can also find them in external data sources like the internet. You may also
find them in combine shapes of data from different sources in to single
source.

107
DIGITAL DATA AND ITS TYPES

B. Data source within the organisation (Internal) :

The first place to look for data is within the organisation. Most
organisations have a ERP, CRM, Workflow management systems. These
systems often use a database to store the data in structured way. These
data bases contain huge amount of data from which you can extract the
values. For example from the workflow management system you can easily
get insight about bottlenecks of business processes , or by using data from
ERP system you can make sales predictions.

So far we looked in to structured data source within the organisation, but


what about unstructured data? Many organisations receive and send a lot
of documents, pictures, sounds or videos. You can probably imagine that
for example, an insurance company receives large number of claims on
paper or in PDF, possibly attached with pictures. These files are manually
transformed in to more structured format before processing. However, in
this transformation some information will be lost. When you are trying to
improve your data science you could use these files to extract the
additional data like situational sketch e.g. may be we could improve our
fraudulent claim detection using this additional data.

• External Data Source:

Following are some of the pictures of external data source from which you
can extract the data.

108
DIGITAL DATA AND ITS TYPES

The real fun starts when we enrich the organisations data with external
data sources. We distinguish four kind of external sources. The most
obvious are publicly available data sets. Often government organisations
release demographic and economic data sets every year. An example of
such data is population/ Km2 per region.

There are companies that have made it their core business to collect,
estimate and sell data. We have worked with data sets from such
companies and it contain certain information such as the net income of the
address, the size of the house and even the probability that person has a
dog. We can use this data to enrich the organisations data to improve their
customer profile. Can we use this data to predict the credit risk of our
customer?

Many websites these days provide APIs which allow the programmers to
build interactive apps on their platform e.g. Twitter, Facebook, Linked in
etc. However, such APIs can also be used to collect the data. In case of
Twitter, you can request all tweets which contain a certain hash tags
Customer support software’s are often able to extract social media feeds
using these APIs and perform sentiment analysis. Sentiment analysis is a
method to determine whether the text is positive or negative about a topic.
Using this method , customer support division can efficiently focus on
unsatisfied customers.

Mast but not least is a scraping. With scrapping you extract a relevant data
of an unstructured data source. With scrapping you are able to extract
anything you see on website.

109
DIGITAL DATA AND ITS TYPES

3.7 OLAP (ONLINE ANALYTICAL PROCESSING)

OLAP (online analytical processing) is a computing method that enables


users to easily and selectively extract and query data in order to analyze
it from different points of view. OLAP business intelligence queries often aid
in trends analysis, financial reporting, sales forecasting, budgeting and
other planning purposes.

For example, a user can request that data be analysed to display a


spreadsheet showing all of a company's beach ball products sold in Florida
in the month of July, compare revenue figures with those for the same
products in September and then see a comparison of other product sales in
Florida in the same time period.

• How OLAP systems work

To facilitate this kind of analysis, data is collected from multiple data


sources and stored in data warehouses then cleansed and organized into
data cubes. Each OLAP cube contains data categorized by dimensions (such
as customers, geographic sales region and time period) derived
by dimensional tables in the data warehouses. Dimensions are then
populated by members (such as customer names, countries and months)
that are organized hierarchically. OLAP cubes are often pre-summarized
across dimensions to drastically improve query time over relational
databases

Analysts can then perform five types of OLAP analytical operations


against these multidimensional databases:

1. Roll-up. Also known as consolidation, or drill-up, this operation


summarizes the data along the dimension.

2. Drill-down. This allows analysts to navigate deeper among the


dimensions of data, for example drilling down from "time period" to
"years" and "months" to chart sales growth for a product.

3. Slice. This enables an analyst to take one level of information for


display, such as "sales in 2017."

110
DIGITAL DATA AND ITS TYPES

4. Dice. This allows an analyst to select data from multiple dimensions to


analyze, such as "sales of blue beach balls in Iowa in 2017."

5. Pivot. Analysts can gain a new view of data by rotating the data axes of
the cube.

OLAP software then locates the intersection of dimensions, such as all


products sold in the Eastern region above a certain price during a certain
time period, and displays them. The result is the "measure"; each OLAP
cube has at least one to perhaps hundreds of measures, which are derived
from information stored in fact tables in the data warehouse.

• OLAP Server:

There are three types of OLAP servers:-

1. Relational OLAP (ROLAP)


2. Multidimensional OLAP (MOLAP)
3. Hybrid OLAP (HOLAP)

1. Relational OLAP (ROLAP)


Relational On-Line Analytical Processing (ROLAP) work mainly for the
data that resides in a relational database, where the base data and
dimension tables are stored as relational tables. ROLAP servers are
placed between the relational back-end server and client front-end
tools. ROLAP servers use RDBMS to store and manage warehouse data,
and OLAP middleware to support missing pieces.

• Advantages of ROLAP

1. ROLAP can handle large amounts of data.


2. Can be used with data warehouse and OLTP systems.

• Disadvantages of ROLAP

1. Limited by SQL functionalities.


2. Hard to maintain aggregate tables.

111
DIGITAL DATA AND ITS TYPES

2. Multidimensional OLAP (MOLAP)

Multidimensional On-Line Analytical Processing (MOLAP) support


multidimensional views of data through array-based multidimensional
storage engines. With multidimensional data stores, the storage utilization
may be low if the data set is sparse.

• Advantages of MOLAP

1. Optimal for slice and dice operations.


2. Performs better than ROLAP when data is dense.
3. Can perform complex calculations.

• Disadvantages of MOLAP

1. Difficult to change dimension without re-aggregation.


2. MOLAP can handle limited amount of data.

3. Hybrid OLAP (HOLAP)

Hybrid On-Line Analytical Processing (HOLAP) is a combination of ROLAP


and MOLAP. HOLAP provide greater scalability of ROLAP and the faster
computation of MOLAP.

• Advantages of HOLAP

1. HOLAP provide advantages of both MOLAP and ROLAP.


2. Provide fast access at all levels of aggregation.

• Disadvantages of HOLAP
1. HOLAP architecture is very complex because it support both MOLAP and
ROLAP servers.

• The OLAP process

112
DIGITAL DATA AND ITS TYPES

OLAP begins with data accumulated from multiple sources and stored in a
data warehouse. The data is then cleansed and stored in OLAP cubes,
which users run queries against.

Historically, many OLAP products used as MOLAP storage which led


inflexible cubes database , management of more applicated data and the
limitations on data volume and level of detail that can be analysed.

Every BI deployment needs an OLAP component. Not only it is necessary


to facilitates analysis but also it can be significantly reduce the number of
reports either IT developers or bossiness users have to create.

With OLAP , a report is just starting view, say sales for 2015 by country- a
summarised starting point. As user clicks , drills and pivot, the end result
might be sales, unit price, volume for one quarter, for 2 products , in

113
DIGITAL DATA AND ITS TYPES

particular city- a detailed focussed end point. In a strictly reflationary


reporting world , the staring view and end result would be 2 entirely
separate reports with dozens of iterations in between

• OLAP viewers:
Microsoft excel is one of the most popular interfaces to OLAP data. In fact
for three of the leading OLAP products ( Oracles Hyperion Essbase,
Microsoft analysis services, SAP business explorer), the spreadsheet was
initially the only interface. User would open a spreadsheet and could
immediately begin drilling within cells and excel pivot tables to retrieve and
explore their data.

Today, excel continues to be important OLAP interface, but in addition user


can explore the data via OLAP viewers. These OLAP viewers may be web-
based and will have advanced charting and navigation capabilities. In
addition , business query tools and production reporting tools may also be
able to work as OLAP viewers.

3.8 OLTP ( ONLINE TRANSACTION PROCESSING)

What is OLTP:-

Online transaction processing, or OLTP, refers to a class of systems that


facilitate and manage transaction-oriented applications, typically for data
entry and retrieval transaction processing. The term is somewhat
ambiguous; some understand a "transaction" in the context of computer or
database transactions, while others (such as the Transaction Processing
Performance Council) define it in terms of business or commercial
transactions. OLTP has also been used to refer to processing in which the
system responds immediately to user requests. An automatic teller
machine (ATM) for a bank is an example of a commercial transaction
processing application.

The technology is used in a number of industries, including banking,


airlines, mail-order, supermarkets, and manufacturing. Applications include
electronic banking, order processing, employee time clock systems, e-
commerce, and e-Trading. The most widely used OLTP system is probably
IBM's CICS.

114
DIGITAL DATA AND ITS TYPES

In computer science, transaction processing is information processing that


is divided into individual, indivisible operations, called transactions. Each
transaction must succeed or fail as a complete unit; it cannot remain in an
intermediate state.

Definition: –

Databases must often allow the real-time processing of SQL transactions to


support e-commerce and other time-critical applications. This type of
processing is known as online transaction processing (OLTP).

OLTP (online transaction processing) is a class of program that facilitates


and manages transaction-oriented applications, typically for data entry and
retrieval transactions in a number of industries, including banking, airlines,
mail-order, supermarkets, and manufacturers. Probably the most widely
installed OLTP product is IBM's CICS (Customer Information Control
System).

Today's online transaction processing increasingly requires support for


transactions that span a network and may include more than one company.
For this reason, new OLTP software uses client/server processing and
brokering software that allows transactions to run on different computer
platforms in a network.

A system consisting of interconnected computers that share a central


storage system and various peripheral devices such as a printer, scanner,
or router. Each computer connected to the system can operate
independently, but has the ability to communicate with other external
devices and computers.

• Requirements:-

Online transaction processing increasingly requires support for transactions


that span a network and may include more than one company. For this
reason, new OLTP software uses client/server processing and brokering
software that allows transactions to run on different computer platforms in
a network.

In large applications, efficient OLTP may depend on sophisticated


transaction management software (such as CICS) and/or database

115
DIGITAL DATA AND ITS TYPES

optimization tactics to facilitate the processing of large numbers of


concurrent updates to an OLTP-oriented database.

For even more demanding Decentralized database systems, OLTP brokering


programs can distribute transaction processing among multiple computers
on a network. OLTP is often integrated into service-oriented architecture
(SOA) and Web services.

• Benefits:-

Online Transaction Processing has two key benefits: simplicity and


efficiency. Reduced paper trails and the faster, more accurate forecasts for
revenues and expenses are both examples of how OLTP makes things
simpler for businesses.

ADVANTAGES OF OLTP:-

Online Transaction Processing (OLTP) has the following advantages:

• It provides faster and more accurate forecast for revenues and expenses.
• It provides a concrete foundation for a stable organization because of
timely modification of all transactions.
• It makes the transactions much easier on behalf of the customers by
allowing them to make the payments according to their choice.
• It broadens the customer base for an organization by simplifying and
speeding up individual processes.

116
DIGITAL DATA AND ITS TYPES

DISADVANTAGES OF OLTP:-

As with any information processing system, security and reliability are


considerations. Online transaction systems are generally more susceptible
to direct attack and abuse than their offline counterparts. When
organizations choose to rely on OLTP, operations can be severely impacted
if the transaction system or database is unavailable due to data corruption,
systems failure, or network availability issues. Additionally, like many
modern online information technology solutions, some systems require
offline maintenance which further affects the cost-benefit analysis. Online
Transaction Processing (OLTP) has the following disadvantages:
• It makes the database much more susceptible to intruders and hackers
because it makes the database available worldwide.
• For B2B (business-to-business) transactions, businesses must go offline
to complete certain steps of an individual process, causing buyers and
suppliers to miss out on some of the efficiency benefits that the system
provides. As simple as OLTP is, the simplest disruption in the system has
the potential to cause a great deal of problems, causing a waste of both
time and money.
• It can lead to server failure, which may cause delays or even wipe out
large amounts of data from the database.

OLTP vs. OLAP

We can divide IT systems into transactional (OLTP) and analytical (OLAP).


In general we can assume that OLTP systems provide source data to data
warehouses, whereas OLAP systems help to analyze it.

117
DIGITAL DATA AND ITS TYPES

• OLTP (On-line Transaction Processing) is characterized by a large number


of short on-line transactions (INSERT, UPDATE, DELETE). The main
emphasis for OLTP systems is put on very fast query processing,
maintaining data integrity in multi-access environments and an
effectiveness measured by number of transactions per second. In OLTP
database there is detailed and current data, and schema used to store
transactional databases is the entity model (usually 3NF).

• OLAP (On-line Analytical Processing) is characterized by relatively low


volume of transactions. Queries are often very complex and involve
aggregations. For OLAP systems a response time is an effectiveness
measure. OLAP applications are widely used by Data Mining techniques.
In OLAP database there is aggregated, historical data, stored in multi-
dimensional schemas (usually star schema).

The following table summarizes the major differences between LTP System
Online Transaction Processing (Operational

118
DIGITAL DATA AND ITS TYPES

• DIFFERENCE BETWEEN OLAP & OLTP:-

OLTP OLAP
1. Current data. 1. Current and historical data.
2. Short database transactions . 2. Long database transactions.
3. Short database transactions . 3. Batch update/insert/delete.
4. Normalization is promoted . 4. Denormalization is promoted .
5. High volume transactions . 5. Low volume transactions.
6. Transaction recovery is necessary. 6. Transaction recovery is not
necessary.

• HOW OLTP WORKS :-

TRANSACTION PROCESSING SYSTEM:

A Transaction Processing System or Transaction Processing Monitor is a set


of information which processes a data transaction in a database system
that monitors transaction programs (a special kind of program). The
essence of a transaction program is that it manages data that must be left
in a consistent state. E.g. if an electronic payment is made, the amount
must be both withdrawn from one account and added to the other; it
cannot complete only one of those steps. Either both must occur, or
neither. In case of a failure preventing transaction completion, the partially
executed transaction must be 'rolled back' by the TPS. While this type of
integrity must be provided also for batch transaction processing, it is
particularly important for online processing: if e.g. an airline seat
reservation system is accessed by multiple operators, after an empty seat
inquiry, the seat reservation data must be locked until the reservation is
made, otherwise another user may get the impression a seat is still free
while it is actually being booked at the time. Without proper transaction
monitoring, double bookings may occur. Other transaction monitor
functions include deadlock detection and resolution (deadlocks may be
inevitable in certain cases of cross-dependence on data), and transaction
logging (in 'journals') for 'forward recovery' in case of massive failures.

119
DIGITAL DATA AND ITS TYPES

TYPES:-

Contrasted with batch processing…..

Batch processing is not transaction processing. Batch processing involves


processing several transactions at the same time, and the results of each
transaction are not immediately available when the transaction is being
entered; there is a time delay. transactions are accumulated for a certain
period (say for day) where updates are made especially after work.

Real-time and batch processing

There are a number of differences between real-time and batch processing.


These are outlined below:
• Each transaction in real-time processing is unique. It is not part of a
group of transactions, even though those transactions are processed in
the same manner. Transactions in real-time processing are stand-alone
both in the entry to the system and also in the handling of output.
• Real-time processing requires the master file to be available more often
for updating and reference than batch processing. The database is not
accessible all of the time for batch processing.
• Real-time processing has fewer errors than batch processing, as
transaction data is validated and entered immediately. With batch
processing, the data is organised and stored before the master file is
updated. Errors can occur during these steps.
• Infrequent errors may occur in real-time processing; however, they are
often tolerated. It is not practical to shut down the system for infrequent
errors.
• More computer operators are required in real-time processing, as the
operations are not centralised. It is more difficult to maintain a real- time
processing system than a batch processing system.

120
DIGITAL DATA AND ITS TYPES

Features

• Rapid response
Fast performance with a rapid response time is critical. Businesses cannot
afford to have customers waiting for a TPS to respond, the turnaround time
from the input of the transaction to the production for the output must be
a few seconds or less.

• Reliability
Many organizations rely heavily on their TPS; a breakdown will disrupt
operations or even stop the business. For a TPS to be effective its failure
rate must be very low. If a TPS does fail, then quick and accurate recovery
must be possible. This makes well–designed backup and recovery
procedures essential.

• Inflexibility
A TPS wants every transaction to be processed in the same way regardless
of the user, the customer or the time for day. If a TPS were flexible, there
would be too many opportunities for non-standard operations, for example,
a commercial airline needs to consistently accept airline reservations from
a range of travel agents, accepting different transactions data from
different travel agents would be a problem.

• Controlled processing
The processing in a TPS must support an organization's operations. For
example if an organization allocates roles and responsibilities to particular
employees, then the TPS should enforce and maintain this requirement.
Example : ATM Transaction

• Consistency
A transaction is a correct transformation of the state. The actions taken as
a group do not violate any of the integrity constraints associated with the
state. This requires that the transaction be a correct program!
Isolation
Even though transactions execute concurrently, it appears to each
transaction T, that others executed either before T or after T, but not both.

• Durability
Once a transaction completes successfully (commits), its changes to the
state survive failures.

121
DIGITAL DATA AND ITS TYPES

• Concurrency
Ensures that two users cannot change the same data at the same time.
That is, one user cannot change a piece of data before another user has
finished with it. For example, if an airline ticket agent starts to reserve the
last seat on a flight, then another agent cannot tell another passenger that
a seat is available.

• Storing and retrieving


Storing and retrieving information from a TPS must be efficient and
effective. The data are stored in warehouses or other databases, the
system must be well designed for its backup and recovery procedures.

• Databases and files


The storage and retrieval of data must be accurate as it is used many
times throughout the day. A database is a collection of data neatly
organized, which stores the accounting and operational records in the
database. Databases are always protective of their delicate data, so they
usually have a restricted view of certain data. Databases are designed
using hierarchical, network or relational structures; each structure is
effective in its own sense.
• Hierarchical structure: organizes data in a series of levels, hence why it is
called hierarchal. Its top to bottom like structure consists of nodes and
branches; each child node has branches and is only linked to one higher
level parent node.
• Network structure: Similar to hierarchical, network structures also
organizes data using nodes and branches. But, unlike hierarchical, each
child node can be linked to multiple, higher parent nodes.
• Relational structure: Unlike network and hierarchical, a relational
database organizes its data in a series of related tables. This gives
flexibility as relationships between the tables are built.

122
DIGITAL DATA AND ITS TYPES

The following features are included in real time transaction


processing systems:
• Good data placement: The database should be designed to access
patterns of data from many simultaneous users.
• Short transactions: Short transactions enables quick processing. This
avoids concurrency and paces the systems.
• Real-time backup: Backup should be scheduled between low times of
activity to prevent lag of the server.
• High normalization: This lowers redundant information to increase the
speed and improve concurrency, this also improves backups.
• Archiving of historical data: Uncommonly used data are moved into other
databases or backed up tables. This keeps tables small and also improves
backup times.
• Good hardware configuration: Hardware must be able to handle many
users and provide quick response times.
• In a TPS, there are 5 different types of files. The TPS uses the files to
store and organize its transaction data:

A relational structure.
• Master file: Contains information about an organization’s business
situation. Most transactions and databases are stored in the master file.
• Transaction file: It is the collection of transaction records. It helps to
update the master file and also serves as audit trails and transaction
history.
• Report file: Contains data that has been formatted for presentation to a
user.
• Work file: Temporary files in the system used during the processing.
• Program file: Contains the instructions for the processing of data.

123
DIGITAL DATA AND ITS TYPES

Data warehouse

A data warehouse is a database that collects information from different


sources. When it's gathered in real-time transactions it can be used for
analysis efficiently if it's stored in a data warehouse. It provides data that
are consolidated, subject-oriented, historical and read-only:
• Consolidated: Data are organised with consistent naming conventions,
measurements, attributes and semantics. It allows data from a data
warehouse from across the organization to be effectively used in a
consistent manner.
• Subject-oriented: Large amounts of data are stored across an
organization, some data could be irrelevant for reports and makes
querying the data difficult. It organizes only key business information
from operational sources so that it's available for analysis.
• Historical: Real-time TPS represent the current value at any time, an
example could be stock levels. If past data are kept, querying the
database could return a different response. It stores series of snapshots
for an organisation's operational data generated over a period of time.
• Read-only: Once data are moved into a data warehouse, it becomes
read-only, unless it was incorrect. Since it represents a snapshot of a
certain time, it must never be updated. Only operations which occur in a
data warehouse are loading and querying data.

Backup procedures

A Dataflow Diagram of backup and recovery procedures.


Since business organizations have become very dependent on TPSs, a
breakdown in their TPS may stop the business' regular routines and thus
stopping its operation for a certain amount of time. In order to prevent
data loss and minimize disruptions when a TPS breaks down a well-
designed backup and recovery procedure is put into use. The recovery
process can rebuild the system when it goes down.

Recovery process
A TPS may fail for many reasons. These reasons could include a system
failure, human errors, hardware failure, incorrect or invalid data, computer
viruses, software application errors or natural or man- made disasters. As
it's not possible to prevent all TPS failures, a TPS must be able to cope with

124
DIGITAL DATA AND ITS TYPES

failures. The TPS must be able to detect and correct errors when they
occur. A TPS will go through a recovery of the database to cope when the
system fails, it involves the backup, journal, checkpoint, and recovery
manager:
• Journal: A journal maintains an audit trail of transactions and database
changes. Transaction logs and Database change logs are used, a
transaction log records all the essential data for each transactions,
including data values, time of transaction and terminal number. A
database change log contains before and after copies of records that
have been modified by transactions.
• Checkpoint: A checkpoint record contains necessary information to
restart the system. These should be taken frequently, such as several
times an hour. It is possible to resume processing from the most-recent
checkpoint when a failure occurs with only a few minutes of processing
work that needs to be repeated.
• Recovery Manager: A recovery manager is a program which restores the
database to a correct condition which can restart the transaction
processing.

Depending on how the system failed, there can be two different recovery
procedures used. Generally, the procedures involves restoring data that
has been collected from a backup device and then running the transaction
processing again. Two types of recovery are backward recovery and
forward recovery:

• Backward recovery: used to undo unwanted changes to the database.


It reverses the changes made by transactions which have been aborted.
It involves the logic of reprocessing each transaction, which is very time-
consuming.

• Forward recovery: it starts with a backup copy of the database. The


transaction will then reprocess according to the transaction journal that
occurred between the time the backup was made and the present time.
It's much faster and more accurate.

125
DIGITAL DATA AND ITS TYPES

Types of back-up procedures

There are two main types of Back-up Procedures: Grandfather- father-son


and Partial backups:

A. Grandfather-father-son

This procedure refers to at least three generations of backup master files.


thus, the most recent backup is the son, the oldest backup is the
grandfather. It's commonly used for a batch transaction processing system
with a magnetic tape. If the system fails during a batch run, the master file
is recreated by using the son backup and then restarting the batch.
However if the son backup fails, is corrupted or destroyed, then the next
generation up backup (father) is required. Likewise, if that fails, then the
next generation up backup (grandfather) is required. Of course the older
the generation, the more the data may be out of date. Organizations can
have up to twenty generations of backup.

B. Partial backups

This only occurs when parts of the master file are backed up. The master
file is usually backed up to magnetic tape at regular times, this could be
daily, weekly or monthly. Completed transactions since the last backup are
stored separately and are called journals, or journal files. The master file
can be recreated from the journal files on the backup tape if the system is
to fail.

Updating in a batch

This is used when transactions are recorded on paper (such as bills and
invoices) or when it's being stored on a magnetic tape. Transactions will be
collected and updated as a batch at when it's convenient or economical to
process them. Historically, this was the most common method as the
information technology did not exist to allow real-time processing.

126
DIGITAL DATA AND ITS TYPES

The two stages in batch processing are:

• Collecting and storage of the transaction data into a transaction file - this
involves sorting the data into sequential order.
• Processing the data by updating the master file - which can be difficult,
this may involve data additions, updates and deletions that may require
to happen in a certain order. If an error occurs, then the entire batch
fails.

Updating in batch requires sequential access - since it uses a magnetic


tape this is the only way to access data. A batch will start at the beginning
of the tape, then reading it from the order it was stored; it's very time-
consuming to locate specific transactions. The information technology used
includes a secondary storage medium which can store large quantities of
data inexpensively (thus the common choice of a magnetic tape). The
software used to collect data does not have to be online - it doesn't even
need a user interface.

Updating in real-time
This is the immediate processing of data. It provides instant confirmation
of a transaction. This involves a large amount of users who are
simultaneously performing transactions to change data. Because of
advances in technology (such as the increase in the speed of data
transmission and larger bandwidth), real-time updating is now possible.

Steps in a real-time update involve the sending of a transaction data to an


online database in a master file. The person providing information is
usually able to help with error correction and receives confirmation of the
transaction completion.

Updating in real-time uses direct access of data. This occurs when data are
accessed without accessing previous data items. The storage device stores
data in a particular location based on a mathematical procedure. This will
then be calculated to find an approximate location of the data. If data are
not found at this location, it will search through successive locations until
it's found.

127
DIGITAL DATA AND ITS TYPES

The information technology used could be a secondary storage medium


that can store large amounts of data and provide quick access (thus the
common choice of a magnetic disk). It requires a user- friendly interface as
it's important for rapid response time. Reservation Systems Reservation
systems are used for any type of business where a service or a product is
set aside for a customer to use for a future time.

MERCHANT PAYMENT SYSTEM:-

Electronic payment systems exist in a variety of forms which can be


divided into two groups: wholesale payment systems and retail payment
systems. Wholesale payment systems exist for non-consumer transactions-
transactions initiated among and between banks, corporations,
governments, and other financial service firms. High-value wholesale
payments flow through the three major interbank funds transfer systems:
the Clearing House Interbank Payment Systems (CHIPS),6 the Society for
Worldwide Interbank Financial Telecommunications (SWIFT),7 and
Fedwire.8 Electronic transfers utilizing these types of payment systems are
beyond the scope of this Note.

Retail electronic payment systems encompass those transactions involving


consumers. These transactions involve the use of such payment
mechanisms as credit cards, automated teller machines (ATMs), debit
cards, point-of-sale (POS) terminals, home banking, and telephone bill-
paying services. Payments for these mechanisms are conducted online and
flow through the check truncation system9 and the ACH.10 Electronic
transfers involving these types of payment mechanisms and payment
systems are also beyond the scope of this Note.

The distinction between wholesale and retail electronic payment systems


parallels the distinction that has evolved in regulating these systems.

The term 'electronic payment' is a collective phrase for the many different
kinds of electronic payment methods available (also meaning online
payment), and the processing of transactions and their application within
online merchants and ecommerce websites.

128
DIGITAL DATA AND ITS TYPES

It is essential for all online businesses to be able to accept and process


electronic payments in a fast and secure way. Businesses can gain a
significant advantage over their competitors by providing an instant
electronic payment service as it lets customers pay by their preferred way
of credit or debit card.

Electronic payments systems can also increase your cash flow, reduce
administrative costs and labour and provide yet another way for your
customers to pay. Care must be taken when choosing an electronic
payment solution as it will need to fit within the constraints of your
particular online business and integrate seamlessly within your website.

129
DIGITAL DATA AND ITS TYPES

3.9 SUMMARY:

With audio, text, Video and graphic in digital format can be stored on
server , hard drive and mobile device. One of the main advantage of this is
that it reduces the cost. The cost includes the cost to produce , deliver and
store the physical formats that contain movies , TV shows and music. The
production cost is reduced by eliminating the factories that manufactures
the discs that our media is stored on today. These costs will be replaced by
the cost to host downloads of the content or stream it from the cloud.
While there is cost involved in hosting the content , it is far less than the
cost to build the factories , train workers , and ship in the raw material to
make the discs. There is also cost involved in shipping the discs to relatives
and friends. With digital media, a corrupted file can simply be redistributed
with no extra cost. Another advantage is that digital media is compatible
with different pieces of hard ware, while physical media are limited to just
a few that are compatible. This means that there is more freedom of choice
for customer on how they view media content , whether it is from
computer, TV or mobile device. There is much flexibility with digital formats
over physical ones.

Following are the data types that are used for digital media products:
• Text and hypertext
• Audio
• Graphics
• Video
• Animation
• Video productions

Audio plays a big role for many different applications from movies/videos
to games. Adding audio to computer games makes it more enjoyable, even
if it is only adding background music, it can make the game more
appealing to play.

Animation software can be the most complicated and difficult software to


learn and use. With animation, you can create GIF icons for webpages to
feature-length, animated movies such as Shrek and Finding Nemo.
Selected animations, even simple ones, can add impact to a webpage or a
presentation.

130
DIGITAL DATA AND ITS TYPES

Video software allows you to cut, copy, and paste video and audio
sequences. It can add many effects such as titles, fades and wipes
between scenes. Most digital video media is intended for storage and
distribution on CDs or DVDs, or even for broadcasting on TV. Distributing
video media over the internet is very difficult as only users who have high-
speed broadband internet connections can hope to receive high quality
images.

Your data-driven business is fuelled by insights obtained from the


processing and analysis of data from various sources. Have you considered
what would happen if one -- or several -- of the data sources that feed this
digital engine, were to dry up? If you were suddenly unable to access the
critical data that make your business run? The data sourced is from
Internal transactional data, connected object data, syndicated data,
Trading partners data, open data, harvested data etc

Digital Data Storage (DDS) is a format for storing and backing up computer
data on tape that evolved from the Digital Audio Tape (DAT) technology.
DAT was created for CD-quality audio recording. In 1989, Sony and
Hewlett Packard defined the DDS format for data storage using DAT tape
cartridges. Tapes conforming to the DDS format can be played by either
DAT or DDS tape drives. However, DDS tape drives cannot play DAT tapes
since they can't pick up the audio on the DAT tape.

DDS uses a 4-mm tape. A DDS tape drive uses helical scanning for
recording, the same process used by a video recorder (VCR). There are two
read heads and two write heads. The read heads verify the data that has
been written (recorded). If errors are present, the write heads rewrite the
data. When restoring a backed-up file, the restoring software reads the
directory of files located at the beginning of the tape, winds the tape to the
location of the file, verifies the file, and writes the file onto the hard drive.
DDS cannot update a backed-up file in the same place it was originally
recorded. In general, DDS requires special software for managing the
storage and retrieval of data from DDS tape drives.

Big Data includes huge volume, high velocity, and extensible variety of
data. These are 3 types: Structured data, Semi-structured data, and
Unstructured data. In computer science, a data structure is a particular
way of organising and storing data in a computer such that it can be
accessed and modified efficiently. More precisely, a data structure is a

131
DIGITAL DATA AND ITS TYPES

collection of data values, the relationships among them, and the functions
or operations that can be applied to the data.

Structured data is data whose elements are addressable for effective


analysis. It has been organized into a formatted repository that is typically
a database. It concerns all data which can be stored in database SQL in a
table with rows and columns. They have relational keys and can easily be
mapped into pre-designed fields. Today, those data are most processed in
the development and simplest way to manage information.
Example: Relational data.

Semi-structured data is information that does not reside in a relational


database but that have some organizational properties that make it easier
to analyze. With some process, you can store them in the relation database
(it could be very hard for some kind of semi-structured data), but Semi-
structured exist to ease space. Example: XML data.

Unstructured data is a data that is which is not organized in a pre-defined


manner or does not have a pre-defined data model, thus it is not a good fit
for a mainstream relational database. So for Unstructured data, there are
alternative platforms for storing and managing, it is increasingly prevalent
in IT systems and is used by organizations in a variety of business
intelligence and analytics applications. Example: Word, PDF, Text, Media
logs.

OLAP (online analytical processing) is a computing method that enables


users to easily and selectively extract and query data in order to analyze
it from different points of view. OLAP business intelligence queries often aid
in trends analysis, financial reporting, sales forecasting, budgeting and
other planning purposes.

For example, a user can request that data be analysed to display a


spreadsheet showing all of a company's beach ball products sold in Florida
in the month of July, compare revenue figures with those for the same
products in September and then see a comparison of other product sales in
Florida in the same time period.

132
DIGITAL DATA AND ITS TYPES

Online transaction processing, or OLTP, refers to a class of systems that


facilitate and manage transaction-oriented applications, typically for data
entry and retrieval transaction processing. The term is somewhat
ambiguous; some understand a "transaction" in the context of computer or
database transactions, while others (such as the Transaction Processing
Performance Council) define it in terms of business or commercial
transactions. OLTP has also been used to refer to processing in which the
system responds immediately to user requests. An automatic teller
machine (ATM) for a bank is an example of a commercial transaction
processing application.

3.10 SELF ASSESSMENT QUESTIONS:


1. Define and explain the digital data.

2. Explain the storage of digital data.

3. Describe various sources of data.

4. What is structured, unstructured and semi-structured data?

5. The 2 types of IT systems are transactional and analytical. Describe


OLTP and OLAP as also point of difference among them.

133
DIGITAL DATA AND ITS TYPES

3.11 MULTIPLE CHOICE QUESTIONS:

1. Structured data is data whose elements are addressable for effective


analysis. It has been organized into a formatted repository that is
typically a database. Therefore , all data which can be stored in
database SQL in -----------------
a. a table in any form
b. a table only in form of rows
c. a table with rows and columns.
d. a table only in the form of columns.

2. Unstructured data is a data which is not organized in a ----------------


a. pre-defined manner
b. pre-defined data model
c. good fit manner
d. pre-defined data model, which is not a good fit for a mainstream
relational database

3. Analysts that performs five types such as roll-up, drill down, slice, dice
and pivot are in ------------- analytical operations against
multidimensional data base.
a. OLTP
b. OLAP

4. OLAP manages large amount of historical data , can summarise and


aggregate store and manage Info at various granular level. whereas
OLTP manages…………………
a. Current and highly detailed data which can-not be used for making
decisions.
b. Current data
c. Highly detailed data
d. Data which is not useful for decision making

5. Digital Data Storage (DDS) is a format for storing and backing up


computer data on tape that evolved from the Digital Audio Tape (DAT)
technology.
a. format for storing computer data on tape
b. format for backing up computer data
c. format for storing and backing up computer data on tape
d. Not a format for storing and backing up

134
DIGITAL DATA AND ITS TYPES

Answers: 1.(c), 2. (d), 3.(b), 4. (a), 5.(c)

135
DIGITAL DATA AND ITS TYPES

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture - Part 1

Video Lecture - Part 2

136
BUSINESS INTELLIGENCE

Chapter 4
Business Intelligence
Objectives:

On completion of this chapter, you will understand definition and examples


of business intelligence , data mining , big data, web and social media
analytics , machine learning, data science, data lake , various perspectives
or managing data, need features and use of business intelligence
considering following:

Structure:

4.1 Introduction

4.2 Definition and importance of BI

4.3 Business Intelligence Examples

4.4 Business Intelligence Tools

4.5 Business Intelligence Techniques

4.6 A Case study: Business intelligence for sales analysis and reporting

4.7 Data mining in Business Intelligence and process

4.8 Advantages and disadvantages of Business Intelligence

4.9 Trends in Business Intelligence

4.10 Summary

4.11 Self Assessment Questions

4.12 Multiple Choice Questions

137
BUSINESS INTELLIGENCE

4.1 INTRODUCTION:

Business intelligence (BI) leverages software and services to transform


data into actionable insights that inform an organization’s strategic and
tactical business decisions. BI tools access and analyze data sets and
present analytical findings in reports, summaries, dashboards, graphs,
charts and maps to provide users with detailed intelligence about the state
of the business. The term business intelligence often also refers to a range
of tools that provide quick, easy-to-digest access to insights about an
organization's current state, based on available data.

BI has a direct impact on organization's strategic, tactical and operational


business decisions. BI supports fact-based decision making using historical
data rather than assumptions and gut feeling.

BI tools perform data analysis and create reports, summaries, dashboards,


maps, graphs, and charts to provide users with detailed intelligence about
the nature of the business.

Business intelligence technologies provide historical, current and predictive


views of the business operations. The common functions of business
intelligence technologies include reporting, online analytical processing,
analytics, data mining process mining, complex event processing, business
performance management, bench making, text mining, predictive analytics
and perspective analytics.

BI technologies can handle large amount of structured and sometimes


unstructured data to help identify , develop and otherwise create new
statistic business opportunities . They aim to allow for the easy
interpretation of these big data. Identifying new opportunities and
implementing an effective strategy based on insight can provide business
with a competitive market advantage and long term stability.

Business intelligence can be used by business enterprises to support a wide


range of business decisions ranging from operational to strategic. Basic
operating decisions include product, positioning or pricing. Strategic
business decisions involves priorities, goals and directions at the broadest
level. In all cases, BI is most effective when it combines the data derived
from the market in which the company operates (external data) with data
from company sources internal to the business such as financial and

138
BUSINESS INTELLIGENCE

operational data (internal data). When combined external and internal data
can provide a complete picture which in effect, create intelligence that
cannot be derived from any singular set of data. BI tools empowers
organisations to gain an insight in to new market , to assess the demand
and suitability of the product and services for different market segments
and to gauge the impact of marketing efforts.

4.2 DEFINITION AND IMPORTANCE OF BI:

The BI(Business Intelligence) is a set of processes, architectures, and


technologies that convert raw data into meaningful information that drives
profitable business actions. It is a suite of software and services to
transform data into actionable intelligence and knowledge.

Therefore, business intelligence is about intelligence and intelligence is the


ability to
• Learn and understand from experience
• Respond quickly successfully to a new situation.

Why is BI important?

Business intelligence is important in respect of :


• Measurement: creating KPI (Key Performance Indicators) based on
historic data
• Identify and set benchmarks for varied processes.
• With BI systems organizations can identify market trends and spot
business problems that need to be addressed.
• BI helps on data visualization that enhances the data quality and thereby
the quality of decision making.
• BI systems can be used not just by enterprises but SME (Small and
Medium Enterprises)

139
BUSINESS INTELLIGENCE

How Business Intelligence systems are implemented?

Following steps that proceeds with implementation of business


intelligence:

• Step 1) Raw Data from corporate databases is extracted. The data could
be spread across multiple systems heterogeneous systems.
• Step 2) The data is cleaned and transformed into the data warehouse.
The table can be linked, and data cubes are formed.
• Step 3) Using BI system the user can ask quires, request ad-hoc reports
or conduct any other analysis.

There are Four types of BI users also. They are the four key players
who are used Business Intelligence System:

1. The Professional Data Analyst:


The data analyst is a statistician who always needs to drill deep down into
data. BI system helps them to get fresh insights to develop unique
business strategies.

2. The IT users:
The IT user also plays a dominant role in maintaining the BI infrastructure.

3. The head of the company:


CEO or CXO can increase the profit of their business by improving
operational efficiency in their business.

4. The Business Users"


Business intelligence users can be found from across the organization.
There are mainly two types of business users
1. Casual business intelligence user
2. The power user.

The difference between both of them is that a power user has the
capability of working with complex data sets, while the casual user need
will make him use dashboards to evaluate predefined sets of data.

140
BUSINESS INTELLIGENCE

4.3 BUSINESS INTELLIGENCE EXAMPLES:

There are so many different sectors where Business Intelligence is used.


The term Business Intelligence (BI) provides the user with the data and
tool to answer the questions which are important to run the business or
part of business. In short, the business intelligence is used for reporting
the specified data of any business which is very important and using which
the higher management of the business will take the decisions for growth
of business.

Different Real life Business Intelligence Examples:


Now a days business intelligence is a heart of any sector whether it is a
retail sector or e-commerce sector.BI is used in other sectors also.
Everybody wants to know about the history of the business and needs to
take further steps to grow the business, Business Intelligence helps the
business to grow and check the different trends of the business and do the
business accordingly. Following are some real life Business Intelligence
Examples:

• Example 1: Hotel Industry


The Hotel industry always uses the statistical information to set up the
different rates for room booking. If you go for trivago app this is the best
examples of one of the best Business Intelligence application. Trivago will
compare different websites and rates of hotels on different websites. It will
give the list of best hotel rates with different websites.

Also a hotel owner uses BI analytical applications to gather statistical


information regarding average occupancy and room rate. It helps to find
aggregate revenue generated per room. By analyzing the different trends
of business it is easy for hotel owner to set different discounts to the room
so as to grow the business.

• Example 2: Weather Forecast


The Business Intelligence applications are used for doing the weather
forecast. To do the forecast of the weather the scientist needs to use the
historical data of the weather as well as current data of the weather. With
using the historical data of the weather scientist will study the
differentiation between current weather and previous weather.

141
BUSINESS INTELLIGENCE

The UK company has benefited from the Business Intelligence application.


They reportedly linked their BI tool to the MET office to predict changes in
weather. A move which sparked great results for them in the UK summer
that has just passed. So the Business Intelligence is widely used for
forecasting the weather as well as checking the difference between
weather historical data as well as current weather.

• Example 3: Food Industry


The Business intelligence is widely used in food industry to analyse the
customer requirements. The applications like Food panda and Zomato will
collect all the related data of the customer which includes the food liking of
the customer. If person ‘A’ has ordered the Chinese food 3 days a week
then the system will get to know that person will like the Chinese food and
the application will recommend you the best Chinese restaurants on their
dashboard. The user will get the great experience of system intelligence.

• Example 4: Bank
The Business Intelligence i widely used in different banks. A bank gives
branch managers access to BI applications. It helps branch manager to
determine who are the most profitable customers and which customers
they should work on. The use of BI tools frees information technology staff
from the task of generating analytical reports for the departments. It also
gives department personnel access to a richer data source.

The Business Intelligence applications are also used to check the customer
credit score. The Credit score apps compares the customer data and check
whether he/she has cleared the EMI of the loan as well as credit bills on
time. These Business Intelligence apps helps the loan manager to give the
credit card to the customer or to give the loan to the customer. These kind
of apps are useful to the customer to check their credit score and he/she
should take the steps to improve it.

• Example 5: E-Ticket reservation systems


The Business Intelligence is widely used in Bus, Train or flight E-ticket
reservation systems. There are so many uses of historical data and
business intelligence application apps in E-ticket reservation systems. The
first use of this is checking the departure places of the customer so that
using that data they could arrange the Bus, train or flight accordingly. If
300 people have booked the ticket for Mumbai on Friday and waiting list is
too huge, then using that historical data there is need of extra buses which

142
BUSINESS INTELLIGENCE

will go to the Mumbai for Friday. By analysing the data using business
intelligence tool the Buses will be arranged according to the need.

The second use of Business intelligence is checking the historical data of


customer to decide the prices of the tickets. If one of the customer is
booking the ticket at every Friday then that travel agency will give him/her
some offer which is useful to that customer. The Business Intelligence
reports helps the user to check the availability of the tickets also.

• Example-6: Stock optimization


Sectors with a pronounced seasonal business cycle often find it
very difficult to optimize their stock. For example, if sales of particular
product shoot up in summer or at Christmas, it is a big challenge to store
the right amount of stock in order to maximize profit.

To address this issue, some companies in the canning, preserving and


general food sector have been able to increase profitability by nearly
10% using BI techniques based on:

• The adoption of a decision support system (DSS).


• The exhaustive analysis of historical sales and stocktaking data for
warehouse products.

In many cases, the results obtained have made possible a much more
efficient and profitable redesign of the entire logistical and productive
warehousing process.

143
BUSINESS INTELLIGENCE

4.4 BUSINESS INTELLIGENCE TOOLS:

Business Intelligence (BI) is a set of tools supporting the transformation of


raw data into useful information which can support decision making.
Business Intelligence provides reporting functionality, tools for identifying
data clusters, support for data mining techniques, business performance
management and predictive analysis.

The aim of Business Intelligence is to support decision making. In fact, BI


tools are often called Decision Support Systems (DSS) or fact-based
support systems as they provide business users with tools to analyze their
data and extract information.

Business Intelligence tools often source the data from data warehouses.
The reason is straightforward: a data warehouse already has data from
various production systems within an enterprise; the data is cleansed,
consolidated, conformed and stored in one location. Because of this BI
tools are able to concentrate on analyzing the data.

BI makes large amount of raw data more accessible , understandable and


useful by providing computer based tools for user to directly process,
organise, manipulate ,integrate and analyse the data. BI Tools help users
to turn basic data in to information and knowledge.

• Dashboards:
Software that provides real time digital visual indicators of how well
predetermined aspects of an organisation are working. Think of monitors
on dashboards of car.

• Data Mart:
This is a subset of data warehouse that focuses on particular aspect of an
organisations activities

• Data warehouse:
This is a comprehensive data base containing the information that has
been extracted , cleaned up, filtered , organised and integrated from
several electronic source of data. At Berkeley, the national History Museum
have contributed extracted data, formatted according to the Darwin core
data standard , to an online data warehouse that lets users query the
holding of all the museums through a portal.

144
BUSINESS INTELLIGENCE

• ETL:
Extract, Transform and Load- the software and processes needed to find ,
cleanse, and process the data in to data warehouse, data mart, or other
integrated data base or system

• Portal:
A website that provides the access to a structured set of online resources,
such as a search engine , a news services , a company home page or
other online services that a user wants to have access to on daily basis.

• Scorecard:
Software that provides visual digital measurement of the factors identified
by an organisation as a critical to its success.
Above mentioned are the places from where actually data is picked up for
processing and making the decisions. Some of the techniques / tools that
are used in business intelligence are as under:

4.5 BUSINESS INTELLIGENCE TECHNIQUES

Following are some major business intelligence techniques:

• Data Visualization
When data is stored as a set or matrix of numbers, it is precise but difficult
to interpret. For example, are sales going up, down or holding steady?
When looking at more than one dimension of the data, this becomes even
harder. Hence the visualization of data in charts is a convenient way to
immediately understand how to interpret the data.

• Data Mining
Data mining is a computer supported method to reveal previously unknown
or unnoticed relations among data entities. Data mining techniques are
used in a myriad of ways: shopping basket analysis, measurement of
products consumers buy together in order to promote other products; in
the banking sector, client risk assessment is used to evaluate whether the
client is likely to pay back the loan based on historical data; in the
insurance sector, fraud detection based on behavioral and historical data;
in medicine and health, analysis of complications and/or common diseases
may help to reduce the risk of cross infections.

145
BUSINESS INTELLIGENCE

• Reporting
Design, schedule and generation of the performance, sales, reconciliation
and savings reports is an area where BI tools help business users. Reports
output by BI tools efficiently gather and present information to support the
management, planning and decision making process. Once the report is
designed it can be automatically send to a predefined distribution list in the
required form presenting daily/weekly/monthly statistics.

• Time-series Analysis Including (Predictive Techniques)


Nearly all data warehouses and all enterprise data have a time dimension.
For example, product sales, phone calls, patient hospitalizations, etc. It is
extremely important to reveal the changes in user behaviour in time,
relation between products, or changes in sale contracts based on
marketing promotion. Based on the historical data, we may also endeavour
to predict future trends or outcomes.

• On-line Analytical Processing (OLAP)


OLAP is best known for the OLAP-cubes which provide a visualization of
multidimensional data. OLAP cubes display dimensions on the cube edges
(e.g. time, product, customer type, customer age etc.). The values in the
cube represent the measured facts (e.g. value of contracts, number of sold
products etc.). The user can navigate through OLAP cubes using drill-up,
-down and -across features. The drill-up functionality enables the user to
easily zoom out to more coarse-grained details. Conversely, drill-down
displays the information with more details. Finally, drilling-across means
that the user can navigate to another OLAP cube to see the relations on
another dimension(s). All the functionality is provided in real-time.

• Statistical Analysis
Statistical analysis uses the mathematic foundations to qualify the
significance and reliability of the observed relations. The most interesting
features are distribution analysis, confidence intervals (for example for
changes in user behaviours, etc). Statistical analysis is used for devising
and analyzing the results from data mining.

146
BUSINESS INTELLIGENCE

Based on the above mentioned techniques here are some popular


Business Intelligence Tools available in market and these are as
under:

• Oracle Enterprise BI Server


• SAP Business Objects Enterprise
• SAP NetWeaver BI
• SAS Enterprise BI Server
• Microsoft BI platform
• IBM Cognos Series 8
• Board Management IntelligenceToolkit
• BizzScore Suite
• WebFocus
• QlikView
• Microstrategy
• Oracle Hyperion System
• Actuate

4.6 A CASE STUDY: BUSINESS INTELLIGENCE FOR SALES


ANALYSIS AND REPORTING

• Client background:
The client is involved in developing and manufacturing performance
material for industry. These materials are used and available in 100
countries to produce high performing environmental friendly products. The
client’s customers include the leading manufacturers in their respective
industries.

• Business requirement:

The client requires an analytic tool to:

1. Perform detailed analytical view on the performance of organisation


form the year 2000 till date

2. Perform cleansing and mining of the data captured with different data
base resources at different locations

3. Create dash boards, performance chart and analysis chart

147
BUSINESS INTELLIGENCE

4. Generate intelligence and analysis report to meet and help to achieve


business goals of top management, accounts department, managers,
delivery department and sales representative.

Solution:
The data were fetched from various sources such as FoxPro, MySQL,
Oracle, MS SQL server- cleansed and integrated using SQL server
integration services (SSIS) to create the relational data warehouse. The
data cleaning included record matching , reduplication and column
segmentation using horizontal platform of SSIS and processes : schema
extraction and translation , schema matching and integration , schema
implementation.

The advantage of using SSIS is that it includes the new pipeline


architecture that allows data to be retrieved from various sources,
manipulated in memory and stored in the destination data base without
being written to disc in the intermediate location that has reasonable
performance benefits. The manipulation that can be performed in pipe line
include data validation, storage key lookups, derivation of new column ,
aggregation , sorting and many more. The relational engine of SQL server
2005 provides the benefit of sorting long term information that can be used
for other reporting and analysis requirement. Various report formats such
as XLS and PDF were used.

The solution was built on support large volume of data and dynamic
business rules and included in batch window processing for daily
incremental processing and monthly restatement.

The BI bet practices were adopted to create BI applications using Microsoft


SQL SERVER 2005. The client ( sample data) was used and replicated the
issues faced by the client during deployment. Some of them were:

• Relational and Analytical service schemes


• Implementation of data extraction , transformation and loading (ETL)
process
• Design and development of client front end system which also includes
reporting services and interactive analysis.

148
BUSINESS INTELLIGENCE

System management and maintenance on an ongoing basis including bulk


incremental update on data cube.

Other Examples of Business Intelligence System used in Practice :


Example 1:

In an Online Transaction Processing (OLTP) system information that could


be fed into product database could be
• add a product line
• change a product price

Correspondingly, in a Business Intelligence system query that would be


executed for the product subject area could be did the addition of new
product line or change in product price increase revenues

In an advertising database of OLTP system query that could be executed


• Changed in advertisement options
• Increase radio budget

Correspondingly, in BI system query that could be executed would be how


many new clients added due to change in radio budget

149
BUSINESS INTELLIGENCE

In OLTP system dealing with customer demographic data bases data that
could be fed would be
• increase customer credit limit
• change in customer salary level

Correspondingly in the OLAP system query that could be executed would


be can customer profile changes support higher product price

Example 2:
A hotel owner uses BI analytical applications to gather statistical
information regarding average occupancy and room rate. It helps to find
aggregate revenue generated per room.

It also collects statistics on market share and data from customer surveys
from each hotel to decides its competitive position in various markets.

By analyzing these trends year by year, month by month and day by day
helps management to offer discounts on room rentals.

Example 3:
A bank gives branch managers access to BI applications. It helps branch
manager to determine who are the most profitable customers and which
customers they should work on.

The use of BI tools frees information technology staff from the task of
generating analytical reports for the departments. It also gives department
personnel access to a richer data source.

150
BUSINESS INTELLIGENCE

4.7 DATA MINING IN BUSINESS INTELLIGENCE AND


PROCESS:

Data mining process is the discovery through large data sets of patterns,
relationships and insights that guide enterprises measuring and managing
where they are and predicting where they will be in the future.

Large amount of data and databases can come from various data sources
and may be stored in different data warehouses. And, data mining
techniques such as machine learning, artificial intelligence (AI) and
predictive modeling can be involved.

The data mining process requires commitment. But experts agree, across
all industries, the data mining process is the same. And should follow a
prescribed path. Here are the 6 essential steps of the data mining process.

151
BUSINESS INTELLIGENCE

1. Business understanding

In the business understanding phase:

First, it is required to understand business objectives clearly and find out


what are the business’s needs.

Next, assess the current situation by finding the resources, assumptions,


constraints and other important factors which should be considered.

Then, from the business objectives and current situations, create data
mining goals to achieve the business objectives within the current
situation.

Finally, a good data mining plan has to be established to achieve both


business and data mining goals. The plan should be as detailed as possible.

2. Data understanding

The data understanding phase starts with initial data collection, which is
collected from available data sources, to help get familiar with the data.
Some important activities must be performed including data load and data
integration in order to make the data collection successfully.

Next, the “gross” or “surface” properties of acquired data need to be


examined carefully and reported.

Then, the data needs to be explored by tackling the data mining questions,
which can be addressed using querying, reporting, and visualization.

Finally, the data quality must be examined by answering some important


questions such as “Is the acquired data complete?”, “Is there any missing
values in the acquired data?”

3. Data preparation

The data preparation typically consumes about 90% of the time of the
project. The outcome of the data preparation phase is the final data set.
Once available data sources are identified, they need to be selected,
cleaned, constructed and formatted into the desired form. The data

152
BUSINESS INTELLIGENCE

exploration task at a greater depth may be carried during this phase to


notice the patterns based on business understanding.

4. Modeling

First, modeling techniques have to be selected to be used for the prepared


data set. Next, the test scenario must be generated to validate the quality
and validity of the model. Then, one or more models are created on the
prepared data set. Finally, models need to be assessed carefully involving
stakeholders to make sure that created models are met business
initiatives.

5. Evaluation

In the evaluation phase, the model results must be evaluated in the


context of business objectives in the first phase. In this phase, new
business requirements may be raised due to the new patterns that have
been discovered in the model results or from other factors. Gaining
business understanding is an iterative process in data mining. The go or
no-go decision must be made in this step to move to the deployment
phase.

6. Deployment

The knowledge or information, which is gained through data mining


process, needs to be presented in such a way that stakeholders can use it
when they want it. Based on the business requirements, the deployment
phase could be as simple as creating a report or as complex as a
repeatable data mining process across the organization. In the deployment
phase, the plans for deployment, maintenance, and monitoring have to be
created for implementation and also future supports. From the project
point of view, the final report of the project needs to summary the project
experiences and review the project to see what need to improved created
learned lessons.

These 6 steps describe the Cross-industry standard process for data


mining, known as CRISP-DM. It is an open standard process model that
describes common approaches used by data mining experts. It is the most
widely-used analytics model.

153
BUSINESS INTELLIGENCE

4.8 ADVANTAGES AND DISADVANTAGES OF BUSINESS


INTELLIGENCE

A. Advantages:
Here are some of the advantages of using Business Intelligence System:

1. Boost productivity
With a BI program, It is possible for businesses to create reports with a
single click thus saves lots of time and resources. It also allows employees
to be more productive on their tasks.

2. To improve visibility
BI also helps to improve the visibility of these processes and make it
possible to identify any areas which need attention.

3. Fix Accountability
BI system assigns accountability in the organization as there must be
someone who should own accountability and ownership for the
organization's performance against its set goals.

4. It gives a bird's eye view:


BI system also helps organizations as decision makers get an overall bird's
eye view through typical BI features like dashboards and scorecards.

5. It streamlines business processes:


BI takes out all complexity associated with business processes. It also
automates analytics by offering predictive analysis, computer modeling,
benchmarking and other methodologies.

6. It allows for easy analytics.


BI software has democratized its usage, allowing even nontechnical or non-
analysts users to collect and process data quickly. This also allows putting
the power of analytics from the hand's many people.

154
BUSINESS INTELLIGENCE

B. Disadvantages

1. Cost:
Business intelligence can prove costly for small as well as for medium-sized
enterprises. The use of such type of system may be expensive for routine
business transactions.

2. Complexity:
Another drawback of BI is its complexity in implementation of data-
warehouse. It can be so complex that it can make business techniques
rigid to deal with.

3. Limited use
Like all improved technologies, BI was first established keeping in
consideration the buying competence of rich firms. Therefore, BI system is
yet not affordable for many small and medium size companies.

4. Time Consuming Implementation


It takes almost one and half year for data warehousing system to be
completely implemented. Therefore, it is a time-consuming process.

4.9 TRENDS IN BUSINESS INTELLIGENCE

The following are some business intelligence and analytics trends that you
should be aware of

• Artificial Intelligence: Gartner' report indicates that AI and machine


learning now take on complex tasks done by human intelligence. This
capability is being leveraged to come up with real-time data analysis and
dashboard reporting.

• Collaborative BI: BI software combined with collaboration tools,


including social media, and other latest technologies enhance the
working and sharing by teams for collaborative decision making.

155
BUSINESS INTELLIGENCE

• Embedded BI: Embedded BI allows the integration of BI software or


some of its features into another business application for enhancing and
extending its reporting functionality.

• Cloud Analytics: BI applications will be soon offered in the cloud, and


more businesses will be shifting to this technology. As per their
predictions within a couple of years, the spending on cloud-based
analytics will grow 4.5 times faster.

156
BUSINESS INTELLIGENCE

4.10 SUMMARY

Maximizing decision impact through business intelligence (BI) increases


enterprise effectiveness at all levels, contributing to mission or growth
goals by enabling workers and managers to direct business or mission
decisions toward desired outcomes. Business Intelligence consists of a
variety of tools and methods that allow businesses to collect, observe, and
present data. Businesses are able to source data from external and internal
systems, assemble the data for analysis, generate and execute queries
against the collected data to create reports and dashboards for decision
makers. Visualizations within these reports and dashboards are powerful
statistical tools that empower decision makers to action data quicker. These
visual aids can present data in many ways which allow for deeper dives
into different data resources.
Business intelligence is important for the following:
• Measurement: creating KPI (Key Performance Indicators) based on
historic data
• Identify and set benchmarks for varied processes.
• With BI systems organizations can identify market trends and spot
business problems that need to be addressed.
• BI helps on data visualization that enhances the data quality and thereby
the quality of decision making.
• BI systems can be used not just by enterprises but SME (Small and
Medium Enterprises)

157
BUSINESS INTELLIGENCE

There are Four types of BI users also. They are the four key players who
are used Business Intelligence System:

1. The Professional Data Analyst:


The data analyst is a statistician who always needs to drill deep down into
data. BI system helps them to get fresh insights to develop unique
business strategies.

2. The IT users:
The IT user also plays a dominant role in maintaining the BI infrastructure.

3. The head of the company:


CEO or CXO can increase the profit of their business by improving
operational efficiency in their business.

4. The Business Users"


Business intelligence users can be found from across the organization.
There are mainly two types of business users Casual business intelligence
user The power user.

The difference between both of them is that a power user has the
capability of working with complex data sets, while the casual user need
will make him use dashboards to evaluate predefined sets of data.

There are so many different sectors where Business Intelligence is used.


The term Business Intelligence (BI) provides the user with the data and
tool to answer the questions which are important to run the business or
part of business. In short, the business intelligence is used for reporting
the specified data of any business which is very important and using which
the higher management of the business will take the decisions for growth
of business. In this article we will give some basic as well as advanced level
Business Intelligence Examples which will help the user to understand the
concept of Business Intelligence.

Business Intelligence (BI) is a set of tools supporting the transformation of


raw data into useful information which can support decision making.
Business Intelligence provides reporting functionality, tools for identifying
data clusters, support for data mining techniques, business performance
management and predictive analysis.

158
BUSINESS INTELLIGENCE

Data mining process is the discovery through large data sets of patterns,
relationships and insights that guide enterprises measuring and managing
where they are and predicting where they will be in the future.

Large amount of data and databases can come from various data sources
and may be stored in different data warehouses. And, data mining
techniques such as machine learning, artificial intelligence (AI) and
predictive modeling can be involved.

Last but not the least to summarise :


• BI is a set of processes, architectures, and technologies that convert raw
data into meaningful information that drives profitable business actions.
• BI systems help businesses to identify market trends and spot business
problems that need to be addressed.
• BI technology can be used by Data analyst, IT people, business users
and head of the company.
• BI system helps organization to improve visibility, productivity and fix
accountability
• The draw-backs of BI is that it is time-consuming costly and very
complex process.

4.11 SELF ASSESSMENT QUESTIONS:

1. Define the business intelligence and illustrate with some examples


where BI is used.

2. Explain data mining.

3. Describe the data mining parameters.

4. Write short note on Business intelligence techniques.

5. What are the advantages and disadvantages of business intelligence?


Explain

159
BUSINESS INTELLIGENCE

4.12 MULTIPLE CHOICE QUESTIONS:

1. The BI(Business Intelligence) is a set of processes, architectures, and


technologies that convert ------------------into meaningful information
that drives profitable business actions.
a. Big data
b. Raw data
c. Structured data
d. Semi-structured data

2. The data analyst is a statistician who always needs to drill deep down
into data. BI system helps them to get fresh insights to develop unique
business strategies. This user is called as-------------------
a. The IT users
b. The Professional Data Analyst
c. The head of the company
d. The Business Users"

3. While preparing data set, at what stage data need to be assessed


carefully involving stakeholders to make sure that what is created will
meet business initiatives?
a. Data understanding
b. Data preparation
c. Data modelling
d. Evaluation

4. The knowledge or information, which is gained through data mining


process, needs to be presented in such a way that stakeholders can use
it when they want it. Based on the business requirements, which phase
should be as simple as creating a report or as complex as a repeatable
data mining process?
a. Evaluation
b. Data preparation
c. Deployment phase
d. Data understanding

160
BUSINESS INTELLIGENCE

5. Business Intelligence (BI) is a set of tools supporting the transformation


of raw data into useful information which can support……………...
a. Reporting functionality
b. Tool for identification of data clusters
c. Business performance
d. Decision making

Answers: 1. (b), 2. (b), 3. (c), 4. (c), 5.(d)

161
BUSINESS INTELLIGENCE

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture

162
BIG DATA

Chapter 5
Big Data
Objectives:

On completion of this chapter, you will understand about the big data in
data analytics considering following:

Structure:

5.1 Introduction

5.2 Definition

5.3 Big Data and History of Big Data

5.4 Big Data: Types of Data Used in Analytics

5.5 Big data Analytic Applications

5.6 The Three Vs of Big Data

5.7 Breaking down of 3 Vs of Big data

5.8 Some More Big Data Use Cases

5.9 Big Data Challenges

5.10 How Big Data Works and Best Practices

5.11 Big data infrastructure demand

5.12 Big data Case study: Royal Bank of Scotland (RBS)

5.13 Summary

5.14 Self Assessment Questions

5.15 Multiple Choice Questions

163
BIG DATA

5.1 INTRODUCTION:

Big data is a field that treats ways to analyze, systematically extract


information from, or otherwise deal with data sets that are too large or
complex to be dealt with by traditional data-processing application
software. Data with many cases (rows) offer greater statistical power, while
data with higher complexity (more attributes or columns) may lead to a
higher false discovery rate. Big data challenges include capturing
data, data storage, data analysis, search, sharing, transfer as well as
visualization, querying, updating, information privacy and data source. Big
data was originally associated with three key concepts: volume, variety,
and velocity.

When we handle big data, we may not sample but simply observe and
track what happens. Therefore, big data often includes data with sizes that
exceed the capacity of traditional software to process within an acceptable
time and value.

Current usage of the term big data tends to refer to the use of predictive
analytics, user behavioural analytics or certain other advanced data
analytics methods that extract value from data, and seldom to a particular
size of data set. "There is little doubt that the quantities of data now
available are indeed large, but that's not the most relevant characteristic of
this new data ecosystem." Analysis of data sets can find new correlations
to "spot business trends, prevent diseases, combat crime and so on."
Scientists, business executives, practitioners of medicine, advertising
and governments alike regularly meet difficulties with large data-sets in
areas including Internet searches, fintech, urban informatics, and business
informatics. Scientists encounter limitations in e-Science work,
including meteorology, genomics, connectomes complex physics
simulations, biology and environmental research.

Data sets grow rapidly, to a certain extent because they are increasingly
gathered by cheap and numerous information-sensing Internet of
things devices such as mobile devices, aerial (remote sensing), software
logs, cameras, microphones, radio-frequency identification (RFID) readers
and wireless sensor networks. One question for large enterprises is
determining who should own big-data initiatives that affect the entire
organization.

164
BIG DATA

Relational database management System, desktop statistics and software


packages used to visualize data often have difficulty handling big data. The
work may require "massively parallel software running on tens, hundreds,
or even thousands of servers". What qualifies as being "big data" varies
depending on the capabilities of the users and their tools, and expanding
capabilities make big data a moving target. "For some organizations, facing
hundreds of gigabytes of data for the first time may trigger a need to
reconsider data management options. For others, it may take tens or
hundreds of terabytes before data size becomes a significant
consideration."

5.2 DEFINITION:

To understand big data, it’s important to know some historical background.


Here is Gartner’s definition, circa 2001 (which is still the go-to definition):
Big data is data that contains greater variety arriving in increasing
volumes and with ever-higher velocity. This is known as the three “Vs”

These data sets are so voluminous that traditional data processing software
just can’t manage them. But these massive volumes of data can be used to
address business problems that you would not have been able to tackle
before.

The other definitions are also worth considering and they are as under:

1. The Big data is a terms that is used to describe the data, that is high
volume, high velocity and or high variety, requires new technologies and
techniques to capture , store and analyse it, and is used to enhance
decision making provide insight and discovery and support and
optimise the process.Therefore big data is high volume, high velocity
and high variety information assets that demand cost effective ,
innovative forms of information processing for enhanced insight and
decision making.

2. Big data is term that describes the large volumes of high velocity,
complex and variable data that requires advanced techniques and
technologies to enable the capture, storage , distribution , management
and analysis of the information.

165
BIG DATA

3. A collection of data sets so large and complex that it becomes difficult to


process using on hand data bae management tools or traditional data
processing applications.

5.3 BIG DATA AND HISTORY OF BIG DATA

The emergence of new data source and need to analyse everything from
the live data streams in real-time to huge amount of unstructured content
has made many businesses to realise that they are now in era where the
spectrum of analytical workload is so broad that it cannot all be dealt with
using single enterprise data warehouse. It goes well beyond this. While
data warehouse are very much part of the analytical landscape, business
requirements are now dictating that a new more complex analytical
environment is needed to support a range of analytical workloads that
cannot be easily supported in a traditionally environment.

The new environment includes multiple underlying technology platforms in


addition to data warehouse, each of which is optimised for specific
analytical workloads. Further, it should be possible to make use of these
platforms independently for specific workload, and also together to solve
the business problems. The objective now is to cater for the complete
spectrum of analytical workload. This includes traditional and new “big
data” analytical workloads.

Big data is therefore a term associated with the new type of workload and
underlying technologies needed to solve the business problems that we
could not previously support due to technology limitations, prohibitive cost
or both.

Big data analytics is about analytical workloads that are associated with
some combination of data volume , data velocity and data variety that
may include complex analytics and complex data types.

For the reason, the big data analytics can include the traditional data
warehouse environment because some analytical workload may need both
traditional and workload optimised platforms to solve the business
problems. The new enterprise analytical environment encompasses
traditional data warehousing and other analytical platforms best suited to
certain analytical workload. Bog data does not replace a data warehouse.

166
BIG DATA

On the contrary the data warehouse is an integral part of the extended


analytical environment.

Analytical requirements and data characteristic will dictate the technology


deployed in a big data environment. For the reason big data solution may
be implemented on a range of technology platform including stream
processing engine, relational DBMS , analytical DBMS or non -relational
data management platforms such as commercialised Hadoop platform or
specialised NoSQL data store e.g. graph data base. More importantly , it
could be combination of all of these that is needed to support business
requirement.

History of Big Data

Although the concept of big data itself is relatively new, the origins of large
data sets go back to the 1960s and '70s when the world of data was just
getting started with the first data centres and the development of the
relational database.

Around 2005, people began to realize just how much data users generated
through Facebook, YouTube, and other online services. Hadoop (an open-
source framework created specifically to store and analyze big data sets)
was developed that same year. NoSQL also began to gain popularity during
this time.

The development of open-source frameworks, such as Hadoop (and more


recently, Spark) was essential for the growth of big data because they
make big data easier to work with and cheaper to store. In the years since
then, the volume of big data has skyrocketed. Users are still generating
huge amounts of data—but it’s not just humans who are doing it.

With the advent of the Internet of Things (IoT), more objects and devices
are connected to the internet, gathering data on customer usage patterns
and product performance. The emergence of machine learning has
produced still more data.

While big data has come far, its usefulness is only just beginning. Cloud
computing has expanded big data possibilities even further. The cloud
offers truly elastic scalability, where developers can simply spin up ad hoc
clusters to test a subset of data.

167
BIG DATA

Benefits of Big Data and Data Analytics:

• Big data makes it possible for you to gain more complete answers
because you have more information.

• More complete answers mean more confidence in the data—which means


a completely different approach to tackling problems

5.4 BIG DATA: TYPES OF DATA USED IN ANALYTICS

Data types involved in Big Data analytics are many: structured,


unstructured, geographic, real-time media, natural language, time series,
event, network and linked. It is necessary here to distinguish between
human-generated data and device-generated data since human data is
often less trustworthy, noisy and unclean.

A brief description of each type is given below.

168
BIG DATA

1. Structured data: data stored in rows and columns, mostly numerical,


where the meaning of each data item is defined. This type of data
constitutes about 10% of the today’s total data and is accessible
through database management systems. Example sources of structured
(or traditional) data include official registers that are created by
governmental institutions to store data on individuals, enterprises and
real estates; and sensors in industries that collect data about the
processes. Today, sensor data is one of the fast growing areas,
particularly that sensors are installed in plants to monitor movement,
temperature, location, light, vibration, pressure, liquid and flow.

2. Unstructured data: : data of different forms like e.g. text, image,


video, document, etc. It can also be in the form of customer complaints,
contracts, or internal emails. This type of data accounts for about 90%
of the data created in this century. In fact, the volcanic growth of social
media (e.g. Facebook and Twitter), since the middle of the last decade,
is responsible for the major part of the unstructured data that we have
today. Unstructured data cannot be stored using traditional relational
databases. Storing data with such a variety and complexity requires the
use of adequate storage systems, commonly referred to as NoSQL
databases, like e.g. MongoDB and CouchDB. The importance of
unstructured data is located in the embedded interrelationships that
may not be discovered if other types of data are considered. What
makes data generated in social media different from other types of data
is that data in social media has a personal taste.

3. Geographic data: data related to roads, buildings, lakes, addresses,


people, workplaces, and transportation routes, that are generated from
geographic information systems. These data link between place, time,
and attributes (i.e. descriptive information). Geographic data, which is
digital, have huge benefits over traditional data sources such as maps,
such as paper maps, written reports from explorers, and spoken
accounts in that digital data are easy to copy, store, and transmit. More
importantly, they are easy to transform, process, and analyze. Such
data is useful in urban planning and for monitoring environmental
effects. A branch of statistics that is involved in spatial or
spatiotemporal data is called Geostatistics.

169
BIG DATA

4. Real-time media: real-time streaming of live or stored media data. A


special characteristic of real-time media is the amount of data being
produced which will be more confusing in the future in terms of storage
and processing. One of the main sources of media data is services like
e.g. YouTube, Flicker, and Vimeo that produce a huge amount of video,
pictures, and audio. Another important source or real-time media is
video conferencing (or visual collaboration) which allow two or more
locations to communicate simultaneously in two-way video and audio
transmission.

5. Natural language Data: human-generated data, particularly in the


verbal form. Such data differ in terms of the level of abstraction and
level of editorial quality. The sources of natural language data include
speech capture devices, land phones, mobile phones, and Internet of
Things that generate large sizes of text-like communication between
devices.

6. Time series: a sequence of data points (or observations), typically


consisting of successive measurements made over a time interval. The
goal is to detect trends and anomalies, identify context and external
influences, and compare individual against the group or compare
individual at different times. There are two kinds of time series data: (i)
continuous, where we have an observation at every instant of time and
(ii) where we have an observation at (usually regularly) spaced
intervals. Examples of such data include ocean tides, counts of
sunspots, the daily closing value of the Dow Jones Industrial Average,
and measuring the level of unemployment each month of the year.

7. Event data: data generated from the matching between external


events with time series. This requires the identification of important
events from the unimportant. For example, information related to
vehicle crashes or accidents can be collected and analysed to help
understand what the vehicles were doing before, during and after the
event. The data in this example is generated by sensors fixed in
different places of the vehicle body. Event data consists of three mains
pieces of information: (i) action, which is the event itself, (ii)
timestamp, the time when this event happened, and (iii) state, which
describes all other information relevant to this event. Event data is
usually described as rich, denormalized, nested and schemaless.

170
BIG DATA

8. Network data: data concerns very large networks, such as social


networks (e.g. Facebook and Twitter), information networks (e.g. the
World Wide Web), biological networks (e.g. biochemical, ecological and
neural networks), and technological networks (e.g. the Internet,
telephone and transportation networks). Network data is represented as
nodes connected via one or more types of relationship. In social
networks, nodes typically represent people. In information networks,
nodes represent data items (e.g. webpages). In technological networks,
nodes may represent Internet devices (e.g. routers and hubs) or
telephone switches. In biological networks, nodes may represent neural
cells. Much of the interesting work here is on network structure and
connections between network nodes.

9. Linked data: data that is built upon standard Web technologies such as
HTTP, RDF, SPARQL and URIs to share information that can be
semantically queried by computers (rather than serving human needs).
This allows data from different sources to be connected and read. The
term was coined by Tim Berners-Lee, director of the World Wide Web
Consortium, in a design note about the Semantic Web project. This
project allowed the Web to connect related data that wasn’t linked in
the past by providing the mechanisms and lowering the barriers to
linking data currently linked. Examples of repositories for linked data
include (i) DBpedia, a dataset containing extracted data from Wikipedia,
(ii) GeoNames, RDF descriptions of more than 7,500,000 geographical
features worldwide, (iii) UMBEL, a lightweight reference structure of
20,000 subject concept classes and their relationships derived from
OpenCyc, and (iv) FOAF, friend of a friend, a dataset describing persons,
their properties and relationships. Linked open data is another project
that targets linked data with open content.

Finally, each data type has different requirements for analysis and poses
different challenges. In principle, the interpretation of data is known but in
practice, nobody has the full picture.

171
BIG DATA

5.5 BIG DATA ANALYTIC APPLICATIONS

Many analytic applications have emerged around structured and multi-


structured data. The following table also shows some examples of industry
use cases for big data analytics.

Industry Use case


Financial Services Improved risk decisions “Know your customer”
3600 customer insight,
fraud detection programmatic trading
Insurance Driver behaviour analysis (smart box)
Broker document analysis to deepen insight on insured risk
to improves risk management
Health care Medical record analytics to understand why patients
are being re-admitted.
Disease surveillance genomics
Manufacturing “Smart” product usage and health monitoring
Improved customer service by analysing the service
records
Field service optimisation
Production and distribution optimisation by relating to
reported service problems to detect early warning in
product quality and by analysing sensor data
Oil and gas Sensor data analysis in wells, rigs and pipelines for health
and safety, risk, cost management, production optimisation
Telecommunication Network analytics and optimisation from device , sensor
and GPS inputs to enhance the social networking and
promotion opportunities
Utilities Smart meter data analysis, Grid optimisation, customer
insight from social network

Web data, sensor data and text data have emerged as popular data source
for big data analytical projects.

Big data is often characterised by three “V”s : The extreme volume data ,
the wide variety of data types and velocity at which the data must be
processed. Although, Big data does not equates to any specific volume of

172
BIG DATA

data , the erm is often used to describe the terabytes , petabytes and even
exabytes of data captured over time.

5.6 THE THREE VS OF BIG DATA

Every business, big or small, is managing a considerable amount of data


generated through its various data points and business processes. At
times, businesses are able to handle these data using excel sheets, access
databases or other similar tools. However, when data cannot fit into such
tools, and human error instances increase above acceptable limits due to
intensive manual processing, it is time to think about Big Data and
analytics.

In this article, we are talking about how Big Data can be defined using the
famous 3 Vs -Volume, Velocity and Variety.

1. Volume: The amount of data matters. With big data, you’ll have to
process high volumes of low-density, unstructured data. This can be
data of unknown value, such as Twitter data feeds, clickstreams on a
webpage or a mobile app, or sensor-enabled equipment. For some
organizations, this might be tens of terabytes of data. For others, it may
be hundreds of petabytes. The volumes of the data is being captured by
enterprises.

Within the Social Media space for example, Volume refers to the amount
of data generated through websites, portals and online applications.
Especially for B2C companies, Volume encompasses the available data
that are out there and need to be assessed for relevance. Consider the
following -Facebook has 2 billion users, YouTube 1 billion users, Twitter
350 million users and Instagram 700 million users. Every day, these
users contribute to billions of images, posts, videos, tweets etc. You can
now imagine the insanely large amount -or Volume- of data that is
generated every minute and every hour.

2. Velocity: Velocity is the fast rate at which data is received and


(perhaps) acted on. Normally, the highest velocity of data streams
directly into memory versus being written to disk. Some internet-
enabled smart products operate in real time or near real time and will
require real-time evaluation and action. Thus ,Velocity or is nothing but
rate at which data is being generated

173
BIG DATA

With Velocity we refer to the speed with which data are being generated.
Staying with our social media example, every day 900 million photos are
uploaded on Facebook, 500 million tweets are posted on Twitter, 0.4
million hours of video are uploaded on YouTube and 3.5 billion searches
are performed in Google. This is like a nuclear data explosion. Big Data
helps the company to hold this explosion, accept the incoming flow of
data and at the same time process it fast so that it does not create
bottlenecks.

3. Variety: Variety refers to the many types of data that are available.
Traditional data types were structured and fit neatly in a relational
database. With the rise of big data, data comes in new unstructured
data types. Unstructured and semi structured data types, such as text,
audio, and video, require additional pre-processing to derive meaning
and support metadata. This data is captured by enterprises

Variety in Big Data refers to all the structured and unstructured data that
has the possibility of getting generated either by humans or by
machines. The most commonly added data are structured -texts, tweets,
pictures & videos. However, unstructured data like emails, voicemails,
hand-written text, ECG reading, audio recordings etc, are also important
elements under Variety. Variety is all about the ability to classify the
incoming data into various categories

Two more Vs have emerged over the past few years:


value and veracity. Out of this value is already made known by the
enterprises , whereas veracity is nothing but the trust worthiness of the
data.

Data has intrinsic value. But it’s of no use until that value is discovered.
Equally important: How truthful is your data—and how much can you rely
on it?

Today, big data has become capital. Think of some of the world’s biggest
tech companies. A large part of the value they offer comes from their data,
which they’re constantly analyzing to produce more efficiency and develop
new products.

174
BIG DATA

Recent technological breakthroughs have exponentially reduced the cost of


data storage and compute, making it easier and less expensive to store
more data than ever before. With an increased volume of big data now
cheaper and more accessible, you can make more accurate and precise
business decisions.

Finding value in big data isn’t only about analyzing it (which is a whole
other benefit). It’s an entire discovery process that requires insightful
analysts, business users, and executives who ask the right questions,
recognize patterns, make informed assumptions, and predict behaviour.

175
BIG DATA

5.7 BREAKING DOWN OF 3 VS OF BIG DATA

Such voluminous data can come from myriad different sources, such as
business sales record, the collected result of scientific experiments or real
time sensors used in internet of things. Data may be raw or pre-processed
using separate software tools before analytics are applied.

Data may also exists in a wide variety of file types, including the structured
data , such as SQL database stores, unstructured data, such as document
files, or streaming data from sensors. Further big data may involve
multiple, simultaneous data sources which may not otherwise be
integrated. For example, a big data analytics project may attempt to gauge
a product’s success and future sales by correlating past sales data, return
data and online buyer review data for the product.

Finally velocity refers to the speed at which big data must be analysed.
Every big data analytic project will ingest correlate and analyse the data
sources and then render an answer or result based on an overarching
query. This means human analysts must have a detailed understanding of
available data and possess some sense of what answer they are looking
for.

Velocity is also meaningful , as big data analysis expand in to fields like


machine learning and artificial intelligence, where analytical process mimic
perception by finding and using patterns in collected data.

176
BIG DATA

5.8 SOME MORE BIG DATA USE CASES

Big data can help you address a range of business activities, from customer
experience to analytics. Here are just a few.

1. Product Development
Companies like Netflix and Procter & Gamble use big data to anticipate
customer demand. They build predictive models for new products and
services by classifying key attributes of past and current products or
services and modeling the relationship between those attributes and the
commercial success of the offerings.

In addition, P&G uses data and analytics from focus groups, social media,
test markets, and early store rollouts to plan, produce, and launch new
products.

2. Predictive Maintenance
Factors that can predict mechanical failures may be deeply buried in
structured data, such as the year, make, and model of equipment, as well
as in unstructured data that covers millions of log entries, sensor data,
error messages, and engine temperature. By analyzing these indications of
potential issues before the problems happen, organizations can deploy
maintenance more cost effectively and maximize parts and equipment
uptime.

3. Customer Experience
The race for customers is on. A clearer view of customer experience is
more possible now than ever before. Big data enables you to gather data
from social media, web visits, call logs, and other sources to improve the
interaction experience and maximize the value delivered. Start delivering
personalized offers, reduce customer churn, and handle issues proactively.

4. Fraud and Compliance


When it comes to security, it’s not just a few rogue hackers—you’re up
against entire expert teams. Security landscapes and compliance
requirements are constantly evolving. Big data helps you identify patterns
in data that indicate fraud and aggregate large volumes of information to
make regulatory reporting much faster.

177
BIG DATA

5. Machine Learning
Machine learning is a hot topic right now. And data—specifically big data—
is one of the reasons why. We are now able to teach machines instead of
program them. The availability of big data to train machine learning models
makes that possible.

6. Operational efficiency
Operational efficiency may not always make the news, but it’s an area in
which big data is having the most impact. With big data, you can analyze
and assess production, customer feedback and returns, and other factors
to reduce outages and anticipate future demands. Big data can also be
used to improve decision-making in line with current market demand.

7. Drive Innovation
Big data can help you innovate by studying interdependencies among
humans, institutions, entities, and process and then determining new ways
to use those insights. Use data insights to improve decisions about financial
and planning considerations. Examine trends and what customers want to
deliver new products and services. Implement dynamic pricing. There are
endless possibilities.

5.9 BIG DATA CHALLENGES

While big data holds a lot of promise, it is not without its challenges.
• First, big data is…big. Although new technologies have been developed
for data storage, data volumes are doubling in size about every two
years. Organizations still struggle to keep pace with their data and find
ways to effectively store it.
• But it’s not enough to just store the data. Data must be used to be
valuable and that depends on curation. Clean data, or data that’s
relevant to the client and organized in a way that enables meaningful
analysis, requires a lot of work. Data scientists spend 50 to 80 percent of
their time curating and preparing data before it can actually be used.
• Finally, big data technology is changing at a rapid pace. A few years ago,
Apache Hadoop was the popular technology used to handle big data.
Then Apache Spark was introduced in 2014. Today, a combination of the
two frameworks appears to be the best approach. Keeping up with big
data technology is an ongoing challenge.

178
BIG DATA

Some of the most common of those big data challenges include the
following:

1. Dealing with data growth

The most obvious challenge associated with big data is simply storing and
analyzing all that information. In its Digital Universe report, IDC estimates
that the amount of information stored in the world's IT systems is doubling
about every two years. By 2020, the total amount will be enough to fill a
stack of tablets that reaches from the earth to the moon 6.6 times. And
enterprises have responsibility or liability for about 85 percent of that
information.

Much of that data is unstructured, meaning that it doesn't reside in a


database. Documents, photos, audio, videos and other unstructured data
can be difficult to search and analyze. In order to deal with data growth,
organizations are turning to a number of different technologies. When it
comes to storage, converged and hyperconverged infrastructure and
software-defined storage can make it easier for companies to scale their
hardware. And technologies like compression, deduplication and tiering can
reduce the amount of space and the costs associated with big data storage.

On the management and analysis side, enterprises are using tools like
NoSQL databases, Hadoop, Spark, big data analytics software, business
intelligence applications, artificial intelligence and machine learning to help
them comb through their big data stores to find the insights their
companies need.

2. Generating insights in a timely manner

Of course, organizations don't just want to store their big data — they want
to use that big data to achieve business goals. According to the
NewVantage Partners survey, the most common goals associated with big
data projects included the following:

1. Decreasing expenses through operational cost efficiencies

2. Establishing a data-driven culture


3. Creating new avenues for innovation and disruption

179
BIG DATA

4. Accelerating the speed with which new capabilities and services are
deployed

5. Launching new product and service offerings

All of those goals can help organizations become more competitive — but
only if they can extract insights from their big data and then act on those
insights quickly. PwC's Global Data and Analytics Survey 2016 found,
"Everyone wants decision-making to be faster, especially in banking,
insurance, and healthcare."

To achieve that speed, some organizations are looking to a new generation


of ETL and analytics tools that dramatically reduce the time it takes to
generate reports. They are investing in software with real-time analytics
capabilities that allows them to respond to developments in the
marketplace immediately.

3. Recruiting and retaining big data talent

But in order to develop, manage and run those applications that generate
insights, organizations need professionals with big data skills. That has
driven up demand for big data experts.

In order to deal with talent shortages, organizations have a couple of


options. First, many are increasing their budgets and their recruitment and
retention efforts. Second, they are offering more training opportunities to
their current staff members in an attempt to develop the talent they need
from within. Third, many organizations are looking to technology. They are
buying analytics solutions with self-service and/or machine learning
capabilities. Designed to be used by professionals without a data science
degree, these tools may help organizations achieve their big data goals
even if they do not have a lot of big data experts on staff.

180
BIG DATA

4. Integrating disparate data sources

The variety associated with big data leads to challenges in data integration.
Big data comes from a lot of different places — enterprise applications,
social media streams, email systems, employee-created documents, etc.
Combining all that data and reconciling it so that it can be used to create
reports can be incredibly difficult. Vendors offer a variety of ETL and data
integration tools designed to make the process easier, but many
enterprises say that they have not solved the data integration problem yet.

In response, many enterprises are turning to new technology solutions. In


the IDG report, 89 percent of those surveyed said that their companies
planned to invest in new big data tools in the next 12 to 18 months. When
asked which kind of tools they were planning to purchase, integration
technology was second on the list, behind data analytics software.

5. Validating data

Closely related to the idea of data integration is the idea of data validation.
Often organizations are getting similar pieces of data from different
systems, and the data in those different systems doesn't always agree. For
example, the ecommerce system may show daily sales at a certain level
while the enterprise resource planning (ERP) system has a slightly different
number. Or a hospital's electronic health record (EHR) system may have
one address for a patient, while a partner pharmacy has a different address
on record.

The process of getting those records to agree, as well as making sure the
records are accurate, usable and secure, is called data governance. And in
the AtScale 2016 Big Data Maturity Survey, the fastest-growing area of
concern cited by respondents was data governance.

Solving data governance challenges is very complex and is usually requires


a combination of policy changes and technology. Organizations often set up
a group of people to oversee data governance and write a set of policies
and procedures. They may also invest in data management solutions
designed to simplify data governance and help ensure the accuracy of big
data stores — and the insights derived from them.

181
BIG DATA

6. Securing big data

Security is also a big concern for organizations with big data stores. After
all, some big data stores can be attractive targets for hackers or advanced
persistent threats (APTs).

However, most organizations seem to believe that their existing data


security methods are sufficient for their big data needs as well. In the IDG
survey, less than half of those surveyed (39 percent) said that they were
using additional security measure for their big data repositories or
analyses. Among those who do use additional measures, the most popular
include identity and access control (59 percent), data encryption (52
percent) and data segregation (42 percent).

7. Organizational resistance

It is not only the technological aspects of big data that can be challenging
— people can be an issue too.

In the NewVantage Partners survey, 85.5 percent of those surveyed said


that their firms were committed to creating a data-driven culture, but only
37.1 percent said they had been successful with those efforts. When asked
about the impediments to that culture shift, respondents pointed to three
big obstacles within their organizations:

• Insufficient organizational alignment (4.6 percent)


• Lack of middle management adoption and understanding (41.0 percent)
• Business resistance or lack of understanding (41.0 percent)

In order for organizations to capitalize on the opportunities offered by big


data, they are going to have to do some things differently. And that sort of
change can be tremendously difficult for large organizations.

The PwC report recommended, "To improve decision-making capabilities at


your company, you should continue to invest in strong leaders who
understand data’s possibilities and who will challenge the business."

182
BIG DATA

One way to establish that sort of leadership is to appoint a chief data


officer, a step that NewVantage Partners said 55.9 percent of Fortune 1000
companies have taken. But with or without a chief data officer, enterprises
need executives, directors and managers who are going to commit to
overcoming their big data challenges, if they want to remain competitive in
the increasing data-driven economy.

5.10 HOW BIG DATA WORKS AND BEST PRACTICES

Big data gives you new insights that open up new opportunities and
business models. Getting started involves three key actions:

1. Integrate
Big data brings together data from many disparate sources and
applications. Traditional data integration mechanisms, such as ETL (extract,
transform, and load) generally aren’t up to the task. It requires new
strategies and technologies to analyze big data sets at terabyte, or even
petabyte, scale.
During integration, you need to bring in the data, process it, and make
sure it’s formatted and available in a form that your business analysts can
get started with.

2. Manage
Big data requires storage. Your storage solution can be in the cloud, on
premises, or both. You can store your data in any form you want and bring
your desired processing requirements and necessary process engines to
those data sets on an on-demand basis. Many people choose their storage
solution according to where their data is currently residing. The cloud is
gradually gaining popularity because it supports your current compute
requirements and enables you to spin up resources as needed.

3. Analyse
Your investment in big data pays off when you analyze and act on your
data. Get new clarity with a visual analysis of your varied data sets.
Explore the data further to make new discoveries. Share your findings with
others. Build data models with machine learning and artificial intelligence.
Put your data to work.

183
BIG DATA

Best Practices
To help you on your big data journey, we’ve put together some key best
practices for you to keep in mind. Here are our guidelines for building a
successful big data foundation.

1. Align Big Data with Specific Business Goals

More extensive data sets enable you to make new discoveries. To that end,
it is important to base new investments in skills, organization, or
infrastructure with a strong business-driven context to guarantee ongoing
project investments and funding. To determine if you are on the right
track, ask how big data supports and enables your top business and IT
priorities. Examples include understanding how to filter web logs to
understand ecommerce behaviour, deriving sentiment from social media
and customer support interactions, and understanding statistical
correlation methods and their relevance for customer, product,
manufacturing, and engineering data.

2. Ease Skills Shortage with Standards and Governance

One of the biggest obstacles to benefiting from your investment in big data
is a skills shortage. You can mitigate this risk by ensuring that big data
technologies, considerations, and decisions are added to your IT
governance program. Standardizing your approach will allow you to
manage costs and leverage resources. Organizations implementing big
data solutions and strategies should assess their skill requirements early
and often and should proactively identify any potential skill gaps. These
can be addressed by training/cross-training existing resources, hiring new
resources, and leveraging consulting firms.

3. Optimize Knowledge Transfer with a Centre of Excellence

Use a centre of excellence approach to share knowledge, control oversight,


and manage project communications. Whether big data is a new or
expanding investment, the soft and hard costs can be shared across the
enterprise. Leveraging this approach can help increase big data capabilities
and overall information architecture maturity in a more structured and
systematic way.

184
BIG DATA

4. Top Payoff Is Aligning Unstructured with Structured Data

It is certainly valuable to analyze big data on its own. But you can bring
even greater business insights by connecting and integrating low density
big data with the structured data you are already using today.

Whether you are capturing customer, product, equipment, or


environmental big data, the goal is to add more relevant data points to
your core master and analytical summaries, leading to better conclusions.
For example, there is a difference in distinguishing all customer sentiment
from that of only your best customers. Which is why many see big data as
an integral extension of their existing business intelligence capabilities,
data warehousing platform, and information architecture.

Keep in mind that the big data analytical processes and models can be both
human- and machine-based. Big data analytical capabilities include
statistics, spatial analysis, semantics, interactive discovery, and
visualization. Using analytical models, you can correlate different types and
sources of data to make associations and meaningful discoveries.

5. Plan Your Discovery Lab for Performance

Discovering meaning in your data is not always straightforward. Sometimes


we don’t even know what we’re looking for. That’s expected. Management
and IT needs to support this “lack of direction” or “lack of clear
requirement.”

At the same time, it’s important for analysts and data scientists to work
closely with the business to understand key business knowledge gaps and
requirements. To accommodate the interactive exploration of data and the
experimentation of statistical algorithms, you need high-performance work
areas. Be sure that sandbox environments have the support they need—
and are properly governed.

6. Align with the Cloud Operating Model

Big data processes and users require access to a broad array of resources
for both iterative experimentation and running production jobs. A big data
solution includes all data realms including transactions, master data,
reference data, and summarized data. Analytical sandboxes should be

185
BIG DATA

created on demand. Resource management is critical to ensure control of


the entire data flow including pre- and post-processing, integration, in-
database summarization, and analytical modeling. A well-planned private
and public cloud provisioning and security strategy plays an integral role in
supporting these changing requirements.

5.11 BIG DATA INFRASTRUCTURE DEMAND:

The need of big data velocity imposes unique demands on the underlying
compute infrastructure. The computing power required to quickly process
huge volumes and verities of data can overwhelm a single server or server
cluster. Organisations must apply adequate compute power to big data task
to achieve the desired velocity. This can potentially demand hundreds or
thousands of servers that can distribute the work and operate
collaboratively.

Achieving such velocity in a cost effective manner is also headache. Many


enterprises leaders are reticent to invest in an extensive server and
storage infrastructure that might only be used occasionally to complete big
data task. As a result public cloud computing has emerged as primary
vehicle for hosting big data analytics projects. A public cloud provider can
store petabytes of data and scale up thousands of servers just long enough
to accomplish the big data projects. The business only pays for storage and
compute time actually used and the cloud instances can be turned off until
they are needed again.

To improve the service level further, some public cloud providers offer big
data capabilities such as highly distributed Hadoop compute instances,
data warehouses, databases and other related cloud services. Amazon web
services elastic MapReduce is one of the example of big data services in a
public cloud.

Human side of big data analytics:

Ultimately, the value and effectiveness of big data depends on human


operators tasked with understanding the data and formulating the proper
queries to direct the big data projects. Some big data tools meet
specialised niches and allow less technical users to make various
predictions from everyday business data. Still other tools are appearing ,
such as Hadoop appliances to help the business implement a suitable

186
BIG DATA

compute infrastructure to tackle big data project while minimising the need
for hardware and distributed compute software know how.

But these tools only addresses limited use cases. Many other bigdata
tasks , such as determining the effectiveness of new drug , can require
substantial scientific and computational expertise from analytical staff.
Three currently shortage of data scientists and other analysts who have
experience working with big data in a distributed , open source
environment.

Big data can be contrasted with small data , another evolving term that is
often used to describe data whose volume and format can be easily used
for self-service analytics . A commonly quoted axiom is that “Big data is for
machines , small data is for people.”

5.12 BIG DATA CASE STUDY: ROYAL BANK OF SCOTLAND


(RBS)

“ USING BIG DATA TO MAKE CUSTOMER SERVICE MORE


PERSONAL”

Back ground:
Prior to the 2008 financial crisis RBS were at one point the largest bank in
the world. When their exposure to the subprime mortgage market
threatened to collapse the business, the UK government stepped in , at
one time holding 84% of the company shares.

Currently undergoing a process of re-privatisation , bank have chosen


improvising customer service as their strategy to. Fight for their share of
retail banking market.

Big data analysis has a key part to play in this plan. The bank have
recently announced a sterling 100 million investment in data analytics
technology , and has named one of their first initiatives “Personology”-
emphasizing a focus on customer rather than financial product.

187
BIG DATA

• What problem is big data helping to solve?

During 1970-80 says RBS Head of analytics Christian Nellisen, Banks


became detached from their customers. The focus was on pushing products
and hitting sales target, without regard as to whether they were providing
their customers with services they needed.

“In the seventies “ say Nellisen , “banks , through the agency of their
branch staff and managers, knew their customers individually. They knew
who they were and how they fitted in-who their family were trying to do.

At some point in eighties, he says , his personal relationship was lost as


retail banking transitioned from helping customers look after their finances
to pushing all manners. Of financial and insurance services in search for
new stream of revenue.

Whereas before they would have concentrated on meeting customer


expectations, focus shifted to “getting products out of the door”-in Nelisse’s
words. Banks would have a target of selling a particular number of balance
transfer or credit cards , and that’s what they would try to sell to
customers who came through the door, whether or not that’s what they
wanted or needed.

• How is the Big data used in practice?

RBS are attempting to use analytics and machines to restore a level of


personal services – which at first may seem counterintuitive . But their
analytics team have developed a philosophy they call “Personology” in
order to better understand their customers and meet their needs.

Our banks have enormous amounts of information on us available to them.


Records of how we spend our money and manage our finances can give
incredibly detailed picture of how we live our lives-when and where we
take our vacations, get married , feel unwell and , if we are lucky enough
to have any, what sort of things we spend our excess income on.

Nellisen says: If you look at someone like Amazon, they know relatively
little about heir customers compared to us. But they make very good use
of the data they do have.

188
BIG DATA

We have traditionally been in opposite position – we have huge amount of


data about our customers but we were just starting to make use of it.
There is huge richness in. what we have and we are only just starting to
get to potential of it.

A very simple and straight forward example , which makes the nice starting
point , is congratulating customers personally when they contact the
branch on their birthday. That’s not exactly big data analytics but it’s in
line with the concept of Personology.

Systems have also been developed to let customers know individually how
they would benefit from deals and promotions being offered. While in the
past , logging in to an online account , or telephoning the customer service
, would have been opportunity for the bank to offer whichever services it
could most profitably offload, now customer will receive personalised
recommendations showing exactly how much they would save by taking up
a particular offer.

Additionally, transactional data is analysed to pin point occurrences of


customers paying twice for financial products, for example paying for
insurance or breakdown assistance that is already provided as a part of
packaged bank account.

• What were the results?

Even though it is early days, Nelissen is able to report some initial results.
For example, every single customer contacted regarding duplicate financial
products they were paying for opted to cancel the third party product
rather than the RBS products.

Nelissen says, “we are very excited about the stuff were doing. We are
seeing significantly improved response rates and more engagements”

Computer weekly reported that one octogenarian customer was reduced


to tears ( as were members of the bank staff) when he was wished a
happy birthday: no one else had remembered while looking at isolated
example may seem counter to the philosophy of big data, it’s immensely
important to remember that ultimately it is the way in which strategies
such as this affect people on an individual basis.

189
BIG DATA

• What data was used?

RBS use data on their customers , including their account transactional


history and personal information to determine what product or services
would be most useful.

• What are technical details ?

The bank use analytics - based CRM software developed by Pegasystems to


make real time recommendations to staff in branches and call centres
about how to help the specific customers. They have also built their own
dash boards using SAS and use open source technology, including Hadoop
(supplied by cloudera) and Cassandra.

• Any challenges that had to be overcome?

According to Nelissen , getting staff on board was one of the major


challenges faced at the start. We are at the point where the staff feel like
they are having valuable conversations with their customers.

They are at the point where they understand what the data is trying to do
and feel it helps them have good conversation – and that’s the big shift
from where we were before.

Staff engagement is critical – the ideas that works best , and that have the
best resonance with customers , are the once that we either got from the
front line or we worked really closely with the frontline to develop.

• What are the key learning points and takeaways?

In sales and marketing terms , data is useless if it does not tells us


something that we don’t already know about our customers.

By understanding customers better, organisation can position themselves


to better meet their needs.

Engaging with staff and other stake holders is essential. They must fully
understand the reason that data analytics is being used in customer facing
situations if they are going to make the most effective use of the insight
being uncovered.

190
BIG DATA

5.13 SUMMARY

In order to understand 'Big Data', you first need to know What is Data. The
quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical
signals and recorded on magnetic, optical, or mechanical recording media.

Big Data is also data but with a huge size. Big Data is a term used to
describe a collection of data that is huge in size and yet growing
exponentially with time. In short such data is so large and complex that
none of the traditional data management tools are able to store it or
process it efficiently.

Examples Of Big Data

Following are some the examples of Big Data-

• The New York Stock Exchange generates about one terabyte of new
trade data per day.
• Social Media

The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges,
putting comments etc.

• A single Jet engine can generate 10+terabytes of data in 30 minutes of


flight time. With many thousand flights per day, generation of data
reaches up to many Petabytes.

Types Of Big Data

Big Data' could be found in three forms:

1. Structured

2. Unstructured

3. Semi-structured

191
BIG DATA

Structured

Any data that can be stored, accessed and processed in the form of fixed
format is termed as a 'structured' data. Over the period of time, talent in
computer science has achieved greater success in developing techniques
for working with such kind of data (where the format is well known in
advance) and also deriving value out of it. However, nowadays, we are
foreseeing issues when a size of such data grows to a huge extent, typical
sizes are being in the rage of multiple zettabytes.

Examples Of Structured Data is an 'Employee' table in a database is an


example of Structured Data

Unstructured
Any data with unknown form or the structure is classified as unstructured
data. In addition to the size being huge, un-structured data poses multiple
challenges in terms of its processing for deriving value out of it. A typical
example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc. Now day organizations
have wealth of data available with them but unfortunately, they don't know
how to derive value out of it since this data is in its raw form or
unstructured format.

Examples Of Un-structured Data is output returned by 'Google


Search'

Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-
structured data as a structured in form but it is actually not defined with
e.g. a table definition in relational DBMS. Example of semi-structured data
is a data represented in an XML file.
Examples Of Semi-structured Data is data in XML file.

Data Growth over the years


Please note that web application data, which is unstructured, consists of
log files, transaction history files etc. OLTP systems are built to work with
structured data wherein data is stored in relations (tables).

192
BIG DATA

Characteristics Of Big Data


(i) Volume – The name Big Data itself is related to a size which is
enormous. Size of data plays a very crucial role in determining value out of
data. Also, whether a particular data can actually be considered as a Big
Data or not, is dependent upon the volume of data. Hence, 'Volume' is
one characteristic which needs to be considered while dealing with Big
Data.

(ii) Variety – The next aspect of Big Data is its variety.


Variety refers to heterogeneous sources and the nature of data, both
structured and unstructured. During earlier days, spreadsheets and
databases were the only sources of data considered by most of the
applications. Nowadays, data in the form of emails, photos, videos,
monitoring devices, PDFs, audio, etc. are also being considered in the
analysis applications. This variety of unstructured data poses certain issues
for storage, mining and analyzing data.

(iii) Velocity – The term 'velocity' refers to the speed of generation of


data. How fast the data is generated and processed to meet the demands,
determines real potential in the data.

Big Data Velocity deals with the speed at which data flows in from sources
like business processes, application logs, networks, and social media sites,
sensors, Mobile devices, etc. The flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the
data at times, thus hampering the process of being able to handle and
manage the data effectively.

Benefits of Big Data Processing

Ability to process Big Data brings in multiple benefits, such as-


• Businesses can utilize outside intelligence while taking decisions
• Access to social data from search engines and sites like Facebook, twitter
are enabling organizations to fine tune their business strategies.
• Improved customer service

193
BIG DATA

Traditional customer feedback systems are getting replaced by new


systems designed with Big Data technologies. In these new systems, Big
Data and natural language processing technologies are being used to read
and evaluate consumer responses.
• Early identification of risk to the product/services, if any
• Better operational efficiency

Big Data technologies can be used for creating a staging area or landing
zone for new data before identifying what data should be moved to the
data warehouse. In addition, such integration of Big Data technologies and
data warehouse helps an organization to offload infrequently accessed
data.

Thus,
• Big Data is defined as data that is huge in size. Bigdata is a term used to
describe a collection of data that is huge in size and yet growing
exponentially with time.
• Examples of Big Data generation includes stock exchanges, social media
sites, jet engines, etc.
• Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
• Volume, Variety, Velocity, and Variability are few Characteristics of
Bigdata
• Improved customer service, better operational efficiency, Better Decision
Making are few advantages of Bigdata

5.14 SELF ASSESSMENT QUESTIONS

1. Define the Big data.

2. What are the types of Big data? Explain

3. Write short notes on – 3 ‘V’s in big data analytics

4. Explain the challenges in Big data analytics.

5. What are the beast practices in working of Big data? describe

194
BIG DATA

5.15 MULTIPLE CHOICE QUESTIONS

1. Big data analytics is about analytical workloads that are associated with
some combination of -------------that may include complex analytics and
complex data types.
a. Data volume , data velocity and data variety
b. Data velocity and data variety
c. Varieties of data
d. Speed of the data with varieties

2. Data types involved in Big Data analytics are many: structured,


unstructured, geographic, real-time media, natural language, time
series, event, network and linked. It is necessary here to distinguish
between -------------------
a. Human-generated data
b. Device-generated data
c. Natural data
d. Human-generated and device-generated data

3. Finding value in big data isn’t only about analyzing it but it’s an entire
discovery process that requires --------------
a. insightful analysts, business users, and executives who ask the right
questions, recognize patterns, make informed assumptions, and
predict behaviour
b. insightful analysts, business users, and executives
c. The people who ask the right questions,
d. Staff who recognize patterns, make informed assumptions, and
predict behaviour

4. What for Companies like Netflix and Procter & Gamble use big data to
anticipate customer demand?
a. Operational efficiency
b. Product developments
c. Drive innovation
d. Understand customer experience

195
BIG DATA

5. What are the advantages of Bigdata?


a. Improved customer service
b. Better operational efficiency
c. Better Decision Making
d. The 3 major advantages are decision making by showing operational
efficiency for better customer service

Answers: 1.(a), 2. (c), 3. (a), 4. (b), 5. (d)

196
BIG DATA

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture - Part 1

Video Lecture - Part 2

Video Lecture - Part 3

197
DATA MINING

Chapter 6
Data Mining
Objectives:

On completion of this chapter, you will understand about the importance of


Data mining in business analytics considering following points:

Structure:

6.1 Introduction

6.2 Definition

6.3 Data Mining Process

6.4 Challenges of Data Mining

6.5 Types of Data Mining

6.6 Data Mining Trends

6.7 Data Mining Tools

6.8 Data Mining Techniques

6.9 Advantages of data mining:

6.10 How Data Mining Works

6.11 Requirement of Technological Infrastructure:

6.12 Resent Success Story- of usage of data mining

6.13 Summary

6.14 Self Assessment Questions

6.15 Multiple Choice Questions

198
DATA MINING

6.1 INTRODUCTION:

Data mining is the process of finding anomalies, patterns and correlations


within large data sets to predict outcomes. Using a broad range of
techniques, you can use this information to increase revenues, cut costs,
improve customer relationships, reduce risks and more.

The process of digging through data to discover hidden connections and


predict future trends has a long history. Sometimes referred to as
"knowledge discovery in databases," the term "data mining" wasn’t coined
until the 1990s. But its foundation comprises three intertwined scientific
disciplines: statistics (the numeric study of data relationships), artificial
intelligence (human-like intelligence displayed by software and/or
machines) and machine learning (algorithms that can learn from data to
make predictions). What was old is new again, as data mining technology
keeps evolving to keep pace with the limitless potential of big data and
affordable computing power.

Over the last decade, advances in processing power and speed have
enabled us to move beyond manual, tedious and time-consuming practices
to quick, easy and automated data analysis. The more complex the data
sets collected, the more potential there is to uncover relevant insights.
Retailers, banks, manufacturers, telecommunications providers and
insurers, among others, are using data mining to discover relationships
among everything from pricing, promotions and demographics to how the
economy, risk, competition and social media are affecting their business
models, revenues, operations and customer relationships.

6.2 DEFINITION:

In simple words, data mining is defined as a process used to extract usable


data from a larger set of any raw data. It implies analysing data patterns in
large batches of data using one or more software. Data mining has
applications in multiple fields, like science and research. As an application
of data mining, businesses can learn more about their customers and
develop more effective strategies related to various business functions and
in turn leverage resources in a more optimal and insightful manner. This
helps businesses be closer to their objective and make better decisions.
Data mining involves effective data collection and warehousing as well as
computer processing. For segmenting the data and evaluating the

199
DATA MINING

probability of future events, data mining uses sophisticated mathematical


algorithms. Data mining is also known as Knowledge Discovery in Data
(KDD).

Key features of data mining:


• Automatic pattern predictions based on trend and behaviour analysis.
• Prediction based on likely outcomes.
• Creation of decision-oriented information.
• Focus on large data sets and databases for analysis.
• Clustering based on finding and visually documented groups of facts not
previously known.

The Data Mining Process: Technological Infrastructure Required: 1.


Database Size: For creating a more powerful system more data is required
to processed and maintained. 2. Query complexity: For querying or
processing more complex queries and the greater the number of queries,
the more powerful system is required. Uses: 1. Data mining techniques are
useful in many research projects, including mathematics, cybernetics,
genetics and marketing. 2. With data mining, a retailer could manage and
use point-of-sale records of customer purchases to send targeted
promotions based on an individual’s purchase history. The retailer could
also develop products and promotions to appeal to specific customer
segments based on mining demographic data from comment or warranty
cards.

200
DATA MINING

6.3 DATA MINING PROCESS:

How to do Data Mining: The accepted data mining process involves six
steps:

1. Business understanding
The first step is establishing the goals of the project are and how data
mining can help you reach that goal. A plan should be developed at this
stage to include timelines, actions, and role assignments.

2. Data understanding
Data is collected from all applicable data sources in this step. Data
visualization tools are often used in this stage to explore the properties of
the data to ensure it will help achieve the business goals.

3. Data preparation
Data is then cleansed, and missing data is included to ensure it is ready to
be mined. Data processing can take enormous amounts of time depending
on the amount of data analyzed and the number of data sources.
Therefore, distributed systems are used in modern database management
systems (DBMS) to improve the speed of the data mining process rather
than burden a single system. They’re also more secure than having all an
organization’s data in a single data warehouse. It’s important to include
failsafe measures in the data manipulation stage so data is not
permanently lost.

4. Data Modeling
Mathematical models are then used to find patterns in the data using
sophisticated data tools.

5. Evaluation
The findings are evaluated and compared to business objectives to
determine if they should be deployed across the organization.

6. Deployment
In the final stage, the data mining findings are shared across everyday
business operations. An enterprise business intelligence platform can be
used to provide a single source of the truth for self-service data discovery.

201
DATA MINING

6.4 CHALLENGES OF DATA MINING

While a powerful process, data mining is hindered by the


increasing quantity and complexity of big data. Where exabytes of
data are collected by firms every day, decision-makers need ways
to extract, analyze, and gain insight from their abundant repository
of data.

• Big Data

The challenges of big data are prolific and penetrate every field that
collects, stores, and analyzes data. Big data is characterized by four major
challenges: volume, variety, veracity, and velocity. The goal of data mining
is to mediate these challenges and unlock the data’s value.

Volume describes the challenge of storing and processing the enormous


quantity of data collected by organizations. This enormous amount of data
presents two major challenges: first, it is more difficult to find the correct
data, and second, it slows down the processing speed of data mining
tools.

Variety encompasses the many different types of data collected and stored.
Data mining tools must be equipped to simultaneously process a wide
array of data formats. Failing to focus an analysis on both structured and
unstructured data inhibits the value added by data mining.

Velocity details the increasing speed at which new data is created,


collected, and stored. While volume refers to increasing storage
requirement and variety refers to the increasing types of data, velocity is
the challenge associated with the rapidly increasing rate of data
generation.

202
DATA MINING

Finally, veracity acknowledges that not all data is equally accurate. Data
can be messy, incomplete, improperly collected, and even biased. With
anything, the quicker data is collected, the more errors will manifest within
the data. The challenge of veracity is to balance the quantity of data with
its quality.

• Over-Fitting Models

Over-fitting occurs when a model explains the natural errors within the
sample instead of the underlying trends of the population. Over-fitted
models are often overly complex and utilize an excess of independent
variables to generate a prediction. Therefore, the risk of over-fitting is
heighted by the increase in volume and variety of data. Too few variables
make the model irrelevant, where as too many variables restrict the model
to the known sample data. The challenge is to moderate the number of
variables used in data mining models and balance its predictive power with
accuracy.

203
DATA MINING

• Cost of Scale
As data velocity continues to increase data’s volume and variety, firms
must scale these models and apply them across the entire organization.
Unlocking the full benefits of data mining with these models requires
significant investment in computing infrastructure and processing power. To
reach scale, organizations must purchase and maintain powerful
computers, servers, and software designed to handle the firm’s large
quantity and variety of data.

• Privacy and Security


The increased storage requirement of data has forced many firms to turn
toward cloud computing and storage. While the cloud has empowered
many modern advances in data mining, the nature of the service creates
significant privacy and security threats. Organizations must protect their
data from malicious figures to maintain the trust of their partners and
customers.

With data privacy comes the need for organizations to develop internal
rules and constraints on the use and implementation of a customer’s data.
Data mining is a powerful tool that provides businesses with compelling
insights into their consumers. However, at what point do these insights
infringe on an individual’s privacy? Organizations must weigh this
relationship with their customers, develop policies to benefit consumers,
and communicate these policies to the consumers to maintain a
trustworthy relationship.

204
DATA MINING

6.5 TYPES OF DATA MINING

Data mining has two primary processes: supervised and


unsupervised learning.

• Supervised Learning

The goal of supervised learning is prediction or classification. The easiest


way to conceptualize this process is to look for a single output variable. A
process is considered supervised learning if the goal of the model is to
predict the value of an observation. One example is spam filters, which use
supervised learning to classify incoming emails as unwanted content and
automatically remove these messages from your inbox.

Common analytical models used in supervised data mining approaches are:

• Linear Regressions
Linear regressions predict the value of a continuous variable using one or
more independent inputs. Realtors use linear regressions to predict the
value of a house based on square footage, bed-to-bath ratio, year built,
and zip code.

• Logistic Regressions
Logistic regressions predict the probability of a categorical variable using
one or more independent inputs. Banks use logistic regressions to predict
the probability that a loan applicant will default based on credit score,
household income, age, and other personal factors.

• Time Series
Time series models are forecasting tools which use time as the primary
independent variable. Retailers, such as Macy’s, deploy time series models
to predict the demand for products as a function of time and use the
forecast to accurately plan and stock stores with the required level of
inventory.

• Classification or Regression Trees


Classification Trees are a predictive modeling technique that can be used to
predict the value of both categorical and continuous target variables. Based
on the data, the model will create sets of binary rules to split and group
the highest proportion of similar target variables together. Following those

205
DATA MINING

rules, the group that a new observation falls into will become its predicted
value.

• Neural Networks
- A neural network is an analytical model inspired by the structure of the
brain, its neurons, and their connections. These models were originally
created in 1940s but have just recently gained popularity with statisticians
and data scientists. Neural networks use inputs and, based on their
magnitude, will “fire” or “not fire” its node based on its threshold
requirement. This signal, or lack thereof, is then combined with the other
“fired” signals in the hidden layers of the network, where the process
repeats itself until an output is created. Since one of the benefits of neural
networks is a near-instant output, self-driving cars are deploying these
models to accurately and efficiently process data to autonomously make
critical decisions.

• K-Nearest Neighbor
The K-nearest neighbour method is used to categorize a new observation
based on past observations. Unlike the previous methods, k-nearest
neighbour is data-driven, not model-driven. This method makes no
underlying assumptions about the data nor does it employ complex
processes to interpret its inputs. The basic idea of the k-nearest neighbour
model is that it classifies new observations by identifying its closest K
neighbours and assigning it the majority’s value. Many recommender
systems nest this method to identify and classify similar content which will
later be pulled by the greater algorithm.

206
DATA MINING

• Unsupervised Learning
Unsupervised tasks focus on understanding and describing data to reveal
underlying patterns within it. Recommendation systems employ
unsupervised learning to track user patterns and provide them with
personalized recommendations to enhance their customer experience.
Common analytical models used in unsupervised data mining approaches
are:

• Clustering
Clustering models group similar data together. They are best employed
with complex data sets describing a single entity. One example is lookalike
modeling, to group similarities between segments, identify clusters, and
target new groups who look like an existing group.

• Association Analysis
Association analysis is also known as market basket analysis and is used to
identify items that frequently occur together. Supermarkets commonly use
this tool to identify paired products and spread them out in the store to
encourage customers to pass by more merchandise and increase their
purchases.

• Principal Component Analysis


Principal component analysis is used to illustrate hidden correlations
between input variables and create new variables, called principal
components, which capture the same information contained in the original
data, but with less variables. By reducing the number of variables used to
convey the same level information, analysts can increase the utility and
accuracy of supervised data mining models.

• Supervised and Unsupervised Approaches in Practice


While you can use each approach independently, it is quite common to use
both during an analysis. Each approach has unique advantages and
combine to increase the robustness, stability, and overall utility of data
mining models. Supervised models can benefit from nesting variables
derived from unsupervised methods. For example, a cluster variable within
a regression model allows analysts to eliminate redundant variables from
the model and improve its accuracy. Because unsupervised approaches
reveal the underlying relationships within data, analysts should use the
insights from unsupervised learning to springboard their supervised
analysis.

207
DATA MINING

6.6 DATA MINING TRENDS

• Language Standardization
Similar to the way that SQL evolved to become the preeminent language
for databases, users are beginning to demand a standardization among
data mining. This push allows users to conveniently interact with many
different mining platforms while only learning one standard language.
While developers are hesitant to make this change, as more users continue
to support it, we can expect a standard language to be developed within
the next few years.

• Scientific Mining
With its proven success in the business world, data mining is being
implemented in scientific and academic research. Psychologists now use
association analysis to track and identify broader patterns in human
behavior to support their research. Economists similarly employ forecasting
algorithms to predict future market changes based on present-day
variables.

• Complex Data Objects


As data mining expands to influence other departments and fields, new
methods are being developed to analyze increasingly varied and complex
data. Google experimented with a visual search tool, whereby users can
conduct a search using a picture as input in place of text. Data mining tools
can no longer just accommodate text and numbers, they must have the
capacity to process and analyze a variety of complex data types.

• Increased Computing Speed


As data size, complexity, and variety increase, data mining tools require
faster computers and more efficient methods of analyzing data. Each new
observation adds an extra computation cycle to an analysis. As the
quantity of data increases exponentially, so do the number of cycles
needed to process the data. Statistical techniques, such as clustering, were
built to efficiently handle a few thousand observations with a dozen
variables. However, with organizations collecting millions of new
observations with hundreds of variables, the calculations can become too
complex for many computers to handle. As the size of data continues to
grow, faster computers and more efficient methods are needed to match
the required computing power for analysis.

208
DATA MINING

• Web mining
With the expansion of the internet, uncovering patterns and trends in
usage is a great value to organizations. Web mining uses the same
techniques as data mining and applies them directly on the internet. The
three major types of web mining are content mining, structure mining, and
usage mining. Online retailers, such as Amazon, use web mining to
understand how customers navigate their webpage. These insights allow
Amazon to restructure their platform to improve customer experience and
increase purchases.

The proliferation of web content was the catalyst for the World Wide Web
Consortium (W3C) to introduce standards for the Semantic Web. This
provides a standardized method to use common data formats and
exchange protocols on the web. This makes data more easily shared,
reused, and applied across regions and systems. This standardization
makes it easier to mine large quantities of data for analysis.

6.7 DATA MINING TOOLS

Data mining solutions have proliferated, so it’s important to thoroughly


understand your specific goals and match these with the right tools and
platforms.

• RapidMiner

RapidMiner is an open source software written in Java. RapidMiner is one of


the best platforms to conduct predictive analyses and offers integrated
environments for deep learning, text mining, and machine learning. The
platform can utilize either on-premise or cloud-based servers and has been
implemented across a diverse array of organizations. RapidMiner offers a
great balance of custom coding features and a user-friendly interface,
which allow the platform to be leveraged most effectively by those with a
solid foundation in coding and data mining.

209
DATA MINING

Orange

Orange is an open source component-based software written in Python.


Orange boasts painless data pre-processing features and is one of the best
platforms for basic data mining analyses. Orange takes a user-oriented
approach to data mining with a unique and user-friendly interface.
However, one of the major drawbacks is its limited set of external data
connectors. Orange is perfect for organizations looking for user-friendly
data mining and who use on-premise storage.

Mahout

Developed by the Apache Foundation, Mahout is an open source platform


which focuses on the unsupervised learning process. The software excels at
creating machine learning algorithms for clustering, classification, and
collaborative filtering. Mahout is catered toward individuals with more
advanced backgrounds. The program allows mathematicians, statisticians,
and data scientists to create, test, and implement their own algorithms.
While Mahout does include several turn-key algorithms, such as a
recommender, which organizations can deploy with minimal effort, the
larger platform does require a more specialized background to leverage its
full capabilities.

210
DATA MINING

• Microstrategy

MicroStrategy is business intelligence and data analytics software that


complements all data mining models. With a wide array of native gateways
and drivers, the platform can connect to any enterprise resource and
analyze its data. MicroStrategy excels at transforming complex data into
accessible visualizations to be distributed across an organization. The
software can track and analyze the performance of all data mining models
in real time and clearly display these insights for decision makers. Pairing
MicroStrategy with a data mining tool enables users to create advanced
data mining models, deploy them across the organization, and make
decisions from its insights and performance in the market.

• WEKA:
This is a JAVA based customization tool, which is free to use. It includes
visualization and predictive analysis and modeling techniques, clustering,
association, regression and classification.

• R-Programming Tool:
This is written in C and FORTRAN, and allows the data miners to write
scripts just like a programming language/platform. Hence, it is used to
make statistical and analytical software for data mining. It supports
graphical analysis, both linear and nonlinear modeling, classification,
clustering and time-based data analysis.

• Knime:
Primarily used for data pre-processing – i.e. data extraction,
transformation and loading, Knime is a powerful tool with GUI that shows
the network of data nodes. Popular amongst financial data analysts, it has
modular data pipe lining, leveraging machine learning, and data mining
concepts liberally for building business intelligence reports.

Data mining tools and techniques are now more important than ever for all
businesses, big or small, if they would like to leverage their existing data
stores to make business decisions that will give them a competitive edge.
Such actions based on data evidence and advanced analytics have better
chances of increasing sales and facilitating growth. Adopting well-
established techniques and tools and availing the help of data mining

211
DATA MINING

experts shall assist companies to utilize relevant and powerful data mining
concepts to their fullest potential.

6.8 DATA MINING TECHNIQUES

The art of data mining has been constantly evolving. There are a number
of innovative and intuitive techniques that have emerged that fine-tune
data mining concepts in a bid to give companies more comprehensive
insight into their own data with useful future trends. Many techniques are
employed by the data mining experts, some of which are listed below:

1. Seeking Out Incomplete Data:


Data mining relies on the actual data present, hence if data is incomplete,
the results would be completely off-mark. Hence, it is imperative to have
the intelligence to sniff out incomplete data if possible. Techniques such as
Self-Organizing-Maps (SOM’s), help to map missing data based by
visualizing the model of multi-dimensional complex data. Multi-task
learning for missing inputs, in which one existing and valid data set along
with its procedures is compared with another compatible but incomplete
data set is one way to seek out such data. Multi-dimensional preceptors
using intelligent algorithms to build imputation techniques can address
incomplete attributes of data.

2. Dynamic Data Dashboards:


This is a scoreboard, on a manager or supervisor’s computer, fed with real-
time from data as it flows in and out of various databases within the
company’s environment. Data mining techniques are applied to give live
insight and monitoring of data to the stakeholders.

3. Database Analysis:
Databases hold key data in a structured format, so algorithms built using
their own language (such as SQL macros) to find hidden patterns within
organized data is most useful. These algorithms are sometimes inbuilt into
the data flows, e.g. tightly coupled with user-defined functions, and the
findings presented in a ready-to-refer-to report with meaningful analysis.

A good technique is to have the snapshot dump of data from a large


database in a cache file at any time and then analyze it further. Similarly,
data mining algorithms must be able to pull out data from multiple,
heterogeneous databases and predict changing trends.

212
DATA MINING

4. Text Analysis:
This concept is very helpful to automatically find patterns within the text
embedded in hordes of text files, word-processed files, PDFs, and
presentation files. The text-processing algorithms can for instance, find out
repeated extracts of data, which is quite useful in the publishing business
or universities for tracing plagiarism.

5. Efficient Handling of Complex and Relational Data:


A data warehouse or large data stores must be supported with interactive
and query-based data mining for all sorts of data mining functions such as
classification, clustering, association, prediction. OLAP (Online Analytical
Processing) is one such useful methodology. Other concepts that facilitate
interactive data mining are analyzing graphs, aggregate querying, image
classification, meta-rule guided mining, swap randomization, and
multidimensional statistical analysis.

6. Relevance and Scalability of Chosen Data Mining Algorithms:


While selecting or choosing data mining algorithms, it is imperative that
enterprises keep in mind the business relevance of the predictions and the
scalability to reduce costs in future. Multiple algorithms should be able to
be executed in parallel for time efficiency, independently and without
interfering with the transnational business applications, especially time-
critical ones. There should be support to include SVMs on larger scale.

213
DATA MINING

6.9 ADVANTAGES OF DATA MINING:

Data mining has many enormous advantages are shown below:

1. Marketing/Retails
In order to create models, marketing companies use data mining. This was
based on history to forecast who’s going to respond to new marketing
campaigns such as direct mail, online marketing, etc. This means that
marketers can sell profitable products to targeted customers.

2. Finance/Banking
Since data extraction provides information to financial institutions on loans
and credit reports, data can determine good or bad credits by creating a
model for historic customers. It also helps banks to detect fraudulent
transactions by credit cards that protect the owner of a credit card.

214
DATA MINING

3. Researchers
Data mining can motivate researchers to accelerate when the method
analysis the data. Therefore they can work more time on other projects.
Shopping behaviours can be detected. Most of the time, you may
experience new problems while designing certain shopping patterns.
Therefore data mining is used to solve these problems. All the information
on these shopping patterns can be found by mining methods. This process
also creates an area where all the unexpected shopping patterns are
calculated. This data extraction can be beneficial when shopping patterns
are identified.

4. Determining Customer Groups


We are using data mining to respond from marketing campaigns to
customers. It also provides information during the identification of
customer groups. Some surveys can be used to begin these new customer
groups. And these investigations are one of the forms of data mining.

5. Increases Brand Loyalty


In marketing campaigns, mining techniques are used. This is to understand
their own customers ‘needs and habits. And from that customers can also
choose their brand’s clothes. Thus, you can definitely be self-reliant with
the help of this technique. However, it provides possible information when
it comes to decisions.

6. Helps in Decision Making


These data mining techniques are used by people to help them in making
some sort of decisions in marketing or in business. Today, with the use of
this technology, all information can be determined. Also, using such
technology, one can decide precisely what is unknown and unexpected.

7. Increase Company Revenue


Data mining is a process in which some kind of technology is involved. One
must collect information on goods sold online, this eventually reduces
product costs and services, which is one of the benefits of data mining.

8. To Predict Future Trends


All information factors are part of the working nature of the system. The
data mining systems can also be obtained from these. They can help you
predict future trends and with the help of this technology, this is quite
possible. And people also adopt behavioural changes.

215
DATA MINING

9. Increases Website Optimization


We use data mining to find all kinds of unseen element information. And
adding data mining helps you to optimize your website. Similarly, this data
mining provides information that may use the technology of data mining.

In addition to above following are some more benefits of Data mining:

• Automated Decision-Making
Data Mining allows organizations to continually analyze data and automate
both routine and critical decisions without the delay of human judgment.
Banks can instantly detect fraudulent transactions, request verification, and
even secure personal information to protect customers against identity
theft. Deployed within a firm’s operational algorithms, these models can
collect, analyze, and act on data independently to streamline decision
making and enhance the daily processes of an organization.

• Accurate Prediction and Forecasting


Planning is a critical process within every organization. Data mining
facilitates planning and provides managers with reliable forecasts based on
past trends and current conditions. Macy’s implements demand forecasting
models to predict the demand for each clothing category at each store and
route the appropriate inventory to efficiently meet the market’s needs.

• Cost Reduction
Data mining allows for more efficient use and allocation of resources.
Organizations can plan and make automated decisions with accurate
forecasts that will result in maximum cost reduction. Delta imbedded RFID
chips in passengers checked baggage and deployed data mining models to
identify holes in their process and reduce the number of bags mishandled.
This process improvement increases passenger satisfaction and decreases
the cost of searching for and re-routing lost baggage.

• Customer Insights
Firms deploy data mining models from customer data to uncover key
characteristics and differences among their customers. Data mining can be
used to create personas and personalize each touchpoint to improve overall
customer experience. In 2017, Disney invested over one billion dollars to
create and implement “Magic Bands.” These bands have a symbiotic
relationship with consumers, working to increase their overall experience at

216
DATA MINING

the resort while simultaneously collecting data on their activities for Disney
to analyze to further enhance their customer experience.

What data mining can do? Examples:


Data mining is primarily used today by companies with strong consumer
focus in retail, consumer, financial and marketing organisation. It enables
these companies to determine relationship among internal factors such as
economic indicators , competition and customer demographics. It enables
them to determine the impact of sales , customer satisfaction and
corporate profit. Finally it enables them to drill down in to summary
information to view detailed transaction data.

With data mining a retailer can use point of sale records of customer
purchase targeted promotions based on individuals purchase history. By
mining demographic data from comment or warranty card the retailer could
develop product and promotions to appeal to specific customer segment.

For Example – Blockbuster entertainment mines its video rental history


database to recommend rentals to individual customers. American express
can suggest product to its card holder based on analysis of their monthly
expenditure.

Walmart has pioneered massive data mining to transform its supplier


relationship. Walmart captures point of sale transactions from over 2900
stores in 6 countries and continuously transmit this data to its massive 7.5
terabytes Teradata data warehouse. Walmart allows more than 3500
suppliers, to access data on their product and perform data analysis. These
suppliers use their data to identify the customers buying pattern at the
store display level. Thy use this information to manage local store
inventory and identify new merchandising opportunities. Walmart
computers process over 1 million complex data queries.

The National Basketball Association ( NBA) is exploring the data mining


application that can be used in conjunction with image recording of
basketball games . The advance scout software analyses the movement of
players to help coaches orchestrate plays and strategies.

217
DATA MINING

6.10 HOW DATA MINING WORKS

How is data mining able to tell you important things that you didn't know
or what is going to happen next? That technique that is used to perform
these feats is called modeling. Modeling is simply the act of building a
model (a set of examples or a mathematical relationship) based on data
from situations where the answer is known and then applying the model to
other situations where the answers aren't known. Modeling techniques
have been around for centuries, of course, but it is only recently that data
storage and communication capabilities required to collect and store huge
amounts of data, and the computational power to automate modeling
techniques to work directly on the data, have been available.

As a simple example of building a model, consider the director of


marketing for a telecommunications company. He would like to focus his
marketing and sales efforts on segments of the population most likely to
become big users of long distance services. He knows a lot about his
customers, but it is impossible to discern the common characteristics of his
best customers because there are so many variables. From his existing
database of customers, which contains information such as age, sex, credit
history, income, zip code, occupation, etc., he can use data mining tools,
such as neural networks, to identify the characteristics of those customers
who make lots of long distance calls. For instance, he might learn that his
best customers are unmarried females between the age of 34 and 42 who
make in excess of $60,000 per year. This, then, is his model for high value
customers, and he would budget his marketing efforts to accordingly.

Data mining consists of 5 major elements:


• Extract, transform and load transaction data in to data warehouse
system
• Store and manage the data in multidimensional data base system.
• Provide data access to business analysts and IT professionals
• Analyse the data by application software
• Present the data in useful format such as graph or table.

218
DATA MINING

Level of Analysis:

• Artificial neural network:


This is nonlinear predictive model that learns through a training and
resemble biological neural network in structure.

• Genetic Algorithms :
Optimisation technique that use processes such as genetic combination ,
mutation and natural selection in design based on concept of natural
evolution

• Decision Tree:
Tree shaped structure that represent the set of decision. These decisions
generate rule for classification of data set

• Nearest neighbourhood method:


The technique that classifies each record in data set based on combination
of classes of the records most similar to it in historical data set, sometimes
called as nearest / neighbour technique.

• Rule induction:
The extraction of useful if and then rules from data based on statistical
significance.

• Data visualisation:
The visual interpretation of complex relationships in multidimensional data.
Graphic tools are used to illustrate data relationship.

219
DATA MINING

6.11 REQUIREMENT OF TECHNOLOGICAL


INFRASTRUCTURE:

There are 2 critical technological drivers:

1. Size of the data base:


The more data being processed and maintained, the more powerful system
is required.

2. Query complexity:
The more complex the queries and greater the number of queries being
processed , the more powerful system is required.

Relational data base storage and management technology is adequate for


many data mining applications less than 50 gigabytes . However this
infrastructure need to be significantly enhanced to support larger
applications. Some vendors have added extensive indexing capabilities to
improve query performance. Others use new hardware architecture such as
massively parallel process (MPP) to achieve order of magnitude
improvement in query time. For example MPP system for NCR link
hundreds of high speed Pentium processors to achieve performance level
exceeding those of the largest super computers.

• Data mining engine:


Data mining engine is very essential to the data mining system. It consists
of set of functional modules that perform characterisation , association and
correlation analysis, classification, prediction , cluster analysis , outliner
analysis and evolution analysis.

• Knowledge base:
This is the domain knowledge. This knowledge is used to guide the search
or evaluate interestingness of the resulting pattern.

• Knowledge Discovery:
The steps involved in the knowledge discovery process are: Data cleaning ,
Data integration, Data selection, Data transformation, Data mining,,
pattern evaluation and knowledge presentation.

220
DATA MINING

• User Interface:
It is the module of data mining system that helps the communication
between the users and the data mining system. User interface allows the
following functionalities:
• Interact with the system by specifying the data mining query task
• Providing the information to help focus the search
• Mining based on the intermediate data mining result
• Browse data base and data warehouse schemas or data structures.
• Evaluate mined patterns
• Visualise the patterns in the different forms

• Data cleaning
Data cleaning is the technique that is applied to remove the noisy data and
correct the inconsistencies in data. Data cleaning involves transformation
to correct the wrong data. Data cleaning is performed as a data processing
step while preparing the data for a data warehouse.

• Data Integration:
Data integration is the processing technique that merges the data from
multiple heterogenous data sources in to coherent data store. Data
integration may involve inconsistent data , and therefore needs data
cleaning.

• Data selection:
Data selection is the process where data relevant to the analysis task are
retrieved from the data base. Sometime, data transformation and
consolidation are performed before the data selection process.

• Clusters:
Clusters refers to a group of similar kind of objects. Cluster analysis refers
to forming group of objects that are very similar to each other but are
highly different from the objects in other clusters.

• Data transformation
The data is transformed or consolidated in to forms appropriate for mining,
by performing summary or aggregation operations.

221
DATA MINING

6.12 RESENT SUCCESS STORY- OF USAGE OF DATA


MINING

• Malls Go Data Mining to Get Shoppers What They Want: Malls


access data from shops to learn buying patterns and cater to the
needs (The Economic Times dated 23 Dec 2019)

Pacific Mall in West Delhi figured out through algorithms that 65% of the
customers at its food-court preferred vegetarian food. That prompted the
mall to add a Halidram outlet and sales at the food-court went up by ₹50
lakh a month.

In Bengaluru, Orion Mall found that most of its customers are “trendier”
young crowds who mostly purchased fashion and electronics, prompting it
to ramp up those verticals.

Taking a leaf out of the ecommerce textbook, malls have started, albeit in
a small way, mining customers data and using algorithms to drive sales.
Prominent malls in India for years had revenue-sharing agreements with
retailers and the shopping centres would receive daily or real-time sales
data from brands through a common technological platform. Now such
platforms are evolving to capture various other information on buying
patterns and preferences of consumers to help malls drive sales and
footfalls.

“We have built a platform which gives them insights into what are the
areas they need to concentrate. We have an AI (artificial intelligence)
platform and through data science we forecast their revenue and trends,
and we tell them the buying habits of the consumers,” said AM Navail, an
assistant vice president at tech firm Pathfinder.

According to the company, it offers technology services to more than 100


malls and is helping dozens of them in mining consumer data to drive sales
and footfalls.

Malls these days have a host of tech at their disposal to help them not only
drive sales but also enhance the overall consumer experiences.

222
DATA MINING

For example, high-definition CCTV cameras not only capture pictures but
also generates heatmap of visitors around the mall that helps mall owners
to assign facilities and manpower. Such cameras are also used to analyse
gender and age brackets of customers and the stores they are entering. “If
the customers are thronging to the sports area, we can figure out with the
heatmap technology and tally with the conversion rates with those retailers
and realise we need more brands in that category,” said Deepak Zutshi, the
centre head at New Delhi’s Select Citywalk Mall.

West Delhi’s Pacific Mall has installed a technology that can track the
duration of cars parked in the parking lot. “That way we are getting
average three hours of dwell time of cars that are coming into the mall,”
said Abhishek Bansal, its executive director. Rajneesh Mahajan, the CEO of
InOrbit Malls that operates shopping centres in several cities, said churning
consumer data was in its infancy in India due to limited and non-uninform
data available from retailers to mall owners.

“We are still at an initial stage and this will evolve and people will get
unified platforms to get the data,” he said. “Unless everybody comes on
board and data in a certain manner and the KPIs (key performance
indicators) are defined, it won’t be that meaningful.”

223
DATA MINING

6.13 SUMMARY:

There is a huge amount of data available in the Information Industry. This


data is of no use until it is converted into useful information. It is necessary
to analyze this huge amount of data and extract useful information from it.
Extraction of information is not the only process we need to perform; data
mining also involves other processes such as Data Cleaning, Data
Integration, Data Transformation, Data Mining, Pattern Evaluation and Data
Presentation. Once all these processes are over, we would be able to use
this information in many applications such as Fraud Detection, Market
Analysis, Production Control, Science Exploration, etc.

Data Mining is defined as extracting information from huge sets of data. In


other words, we can say that data mining is the procedure of mining
knowledge from data. The information or knowledge extracted so can be
used for any of the following applications −
• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration

Data Mining Applications

Data mining is highly useful in the following domains −

a. Market Analysis and Management

b. Corporate Analysis & Risk Management

c. Fraud Detection

Apart from these, data mining can also be used in the areas of production
control, customer retention, science exploration, sports, astrology, and
Internet Web Surf-Aid

224
DATA MINING

Market Analysis and Management


Listed below are the various fields of market where data mining is used −
• Customer Profiling − Data mining helps determine what kind of people
buy what kind of products.
• Identifying Customer Requirements − Data mining helps in identifying
the best products for different customers. It uses prediction to find the
factors that may attract new customers.
• Cross Market Analysis − Data mining performs Association/correlations
between product sales.
• Target Marketing − Data mining helps to find clusters of model customers
who share the same characteristics such as interests, spending habits,
income, etc.
• Determining Customer purchasing pattern − Data mining helps in
determining customer purchasing pattern.
• Providing Summary Information − Data mining provides us various
multidimensional summary reports.

Corporate Analysis and Risk Management

Data mining is used in the following fields of the Corporate Sector −

• Finance Planning and Asset Evaluation − It involves cash flow analysis


and prediction, contingent claim analysis to evaluate assets.
• Resource Planning − It involves summarizing and comparing the
resources and spending.
• Competition − It involves monitoring competitors and market directions.

Fraud Detection
Data mining is also used in the fields of credit card services and
telecommunication to detect frauds. In fraud telephone calls, it helps to
find the destination of the call, duration of the call, time of the day or
week, etc. It also analyzes the patterns that deviate from expected norms.

225
DATA MINING

6.14 SELF ASSESSMENT QUESTIONS:


1. Define the Data Mining and explain the key features of data mining.

2. Explain the data mining process in short.

3. What are the benefits of data mining? Describe

4. Write short notes on : Data mining tools.

5. Explain how data mining works?

6.15 MULTIPLE CHOICE QUESTIONS:

1. As an application of data mining, businesses can learn more about their


------------and develop more effective strategies related to various
business functions and in turn leverage resources in a more optimal and
insightful manner.
a. Customers
b. Systems
c. Volume of data
d. Choice of individual

2. Data mining is highly useful in the following domains --------------


a. Risk Management
b. Corporate Analysis
c. Fraud Detection , Market Analysis and Management
d. Retail and corporate analysis and risk management including fraud
detection

3. The sets of functional modules that perform characterisation ,


association and correlation analysis, classification, prediction , cluster
analysis , outliner analysis and evolution analysis is forming the
essential part of -------------
a. User interface
b. Data cleaning
c. Data mining engine
d. Data Integration

226
DATA MINING

4. The increased storage requirement of data has forced many firms to


turn toward cloud computing and storage. While the cloud has
empowered many modern advances in data mining, the nature of the
service creates significant ---------------------Organizations must protect
their data from malicious figures to maintain the trust of their partners
and customers.
a. Data mining
b. Privacy and security threats
c. Data storage
d. Cost of scale

5. What happens in data integration?


a. The data from multiple heterogenous data sources in to coherent
data store.
b. The Data integration may involve inconsistent data only
c. Data integration needs data cleaning.
d. Collect & select heterogenous data in to data store and clean it.

Answers: 1.(a), 2.(d), 3.(c), 4.(b), 5.(d)

227
DATA MINING

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture - Part 1

Video Lecture - Part 2

228
DESCRIPTIVE ANALYTICS

Chapter 7
Descriptive Analytics
Objectives:

On completion of this chapter, you will understand about the descriptive


analytics and its importance of data analytics considering the following
aspects:

Structure:

7.1 Introduction

7.2 Definition

7.3 Basic Principles of Descriptive Analytics

7.4 How does descriptive analytics work?

7.5 Descriptive analytics -Tools

7.6 Examples of descriptive analytics

7.7 Advantages of descriptive analytics

7.8 Case Study: Coca - Cola Enterprises (CCE)

7.9 The Role of Descriptive Analytics in Future Data Analysis

7.10 Summary

7.11 Self Assessment Questions

7.12 Multiple Choice Questions

229
DESCRIPTIVE ANALYTICS

7.1 INTRODUCTION

For different stages of business analytics huge amount of data is processed


at various stages. Depending on the stage of the workflow and the
requirement of data analysis, there are four main kinds of analytics –
descriptive, diagnostic, predictive and prescriptive. These four types
together answer everything a company needs to know- from what’s going
on in the company to what solutions to be adopted for optimising the
functions.

The four types of analytics are usually implemented in stages and no one
type of analytics is said to be better than the other. They are interrelated
and each of these offers a different insight. With data being important to so
many diverse sectors- from manufacturing to energy grids, most of the
companies rely on one or all of these types of analytics. With the right
choice of analytical techniques, big data can deliver richer insights for the
companies. Before diving deeper into each of these, let’s define the four
types of analytics:

1) Descriptive Analytics: Describing or summarising the existing data


using existing business intelligence tools to better understand what is
going on or what has happened.

2) Diagnostic Analytics: Focus on past performance to determine what


happened and why. The result of the analysis is often an analytic
dashboard.

3) Predictive Analytics: Emphasizes on predicting the possible outcome


using statistical models and machine learning techniques.

4) Prescriptive Analytics: It is a type of predictive analytics that is used


to recommend one or more course of action on analyzing the data.

230
DESCRIPTIVE ANALYTICS

In this chapter we are going to discuss all about descriptive


analytics.

Descriptive analyses or statistics do precisely what the phrase infers; they


“describe”. Descriptive analysis summarizes raw data and makes that data
easily deciphered. It describes the past, where the past refers to any point
of time when an event has occurred, whether it was one minute ago, or
one year ago. The technique uses data aggregation and data mining to
provide insight into the past and answer the question, “What has
happened?” This, in turn, helps us understand how the past might
influence future outcomes.

Descriptive statistics are valuable to describe items like total stock in


inventory, average dollars spent per customer, and year-over-year change
in sales. Common instances of descriptive analytics are reports that
provide chronicled bits of insights in to a company’s production, financials,
operations, sales, finance, inventory, and clients. Descriptive analytics can
be utilized when we have to comprehend, at an aggregate level, what is
happening in the organization, and when we want to outline and portray
different aspects of the business.

Descriptive analysis is an appropriate way to understand attributes of


particular data. Deeper analysis provides the following:
• It estimates and outlines the data by organizing it in tables and graphs to
help meet targets
• It provides information about the fluctuation or vulnerability of the data
• It provides indications of unexpected patterns and perceptions that
should be considered when doing formal analysis

231
DESCRIPTIVE ANALYTICS

7.2 DEFINITION

Descriptive Analytics is a statistical method that is used to search and


summarise historical data in order to identify the pattern or meaning. For
learning analytics, this is a reflective analysis of learner data and is meant
to provide insight into historical patterns of behaviours and performance in
online learning environments.

For example, in an online learning course with a discussion board,


descriptive analytics could determine how many students participated in
the discussion, or how many times a particular student posted in the
discussion forum.

Descriptive Analytics, the conventional form of Business Intelligence and


data analysis, seeks to provide a depiction or “summary view” of facts and
figures in an understandable format, to either inform or prepare data for
further analysis. It uses two primary techniques, namely data aggregation
and data mining to report past events. It presents past data in an easily
digestible format for the benefit of a wide business audience.

A common example of Descriptive Analytics are company reports that


simply provide a historic review of an organization’s operations, sales,
financials, customers, and stakeholders. It is relevant to note that in the
Big Data world, the “simple nuggets of information” provided by
Descriptive Analytics become prepared inputs for more advanced Predictive
or Prescriptive Analytics that deliver real-time insights for business decision
making.

Descriptive Analytics helps to describe and present data in a format which


can be easily understood by a wide variety of business readers. Descriptive
Analytics rarely attempts to investigate or establish cause and effect
relationships. As this form of analytics doesn’t usually probes beyond
surface analysis, the validity of results is more easily implemented. Some
common methods employed in Descriptive Analytics are observations, case
studies, and surveys. Thus, collection and interpretation of large amount of
data may be involved in this type of analytics.

232
DESCRIPTIVE ANALYTICS

7.3 BASIC PRINCIPLES OF DESCRIPTIVE ANALYTICS

Data given by descriptive analytics end up as prepared inputs for further


developed predictive or prescriptive analytics that deliver real-time insights
for business decision making. Descriptive analytics seldom endeavours to
explore or set up circumstances and connections to end-results.

Some of the common methods employed in descriptive analytics are


observations, case studies, and surveys. Accumulation and translation of a
substantial amount of data is involved in this type of analytics, with most
statistical calculations generally being applied to descriptive analytics.

Descriptive Analytics Illustrations

233
DESCRIPTIVE ANALYTICS

Some of the common applications of descriptive analytics are as under:


• Summarizing past events such as territorial customer attrition, sales, or
success of marketing campaigns.
• Tabulating of social media metrics such as Facebook preferences, tweets,
or followers.

In an analytics study conducted by McKinsey in 2016, the US retail (40%)


industry and GPS-based services (60%) showed rapid adoption of
descriptive analytics to track teams, customers, and assets across locations
to capture enhanced insights for operational efficiency. McKinsey also
claimed that in today’s business climate, the three most critical barriers to
data analytics are lack of organizational strategy, lack of involved
management, and lack of available talent. Another report suggests that
descriptive analytics has made great strides in Supply Chain Mapping
(SCM), manufacturing plant sensors, and GPS vehicle tracking, to gather,
organize, and view past events.

Investors and brokers perform analytical and empirical analysis on their


investments, which helps them in settling on better investment decisions in
the future.

Descriptive analysis can also be called post-mortem analysis. It is utilized


for almost all administration reporting, such as marketing, sales, finance,
and operations. To gain the competitive edge, organizations utilize
advanced analytics, which likewise underpins them in estimating future
trends. The forecasting allows companies to make optimized decisions,
thus increasing their profitability.

Descriptive analytics can be utilized in future analysis as data-driven


organizations keep on using the outcomes from descriptive analytics to
optimize their supply chains and improve their decision-making power.
Data analytics will now, however, move further away from predictive
analytics toward prescriptive analytics, or rather, towards a “blend of
forecasts, simulations, and optimization.”

234
DESCRIPTIVE ANALYTICS

7.4 HOW DOES DESCRIPTIVE ANALYTICS WORK?

Data aggregation and data mining are two techniques used in


descriptive analytics to discover historical data. Data is first gathered and
sorted by data aggregation in order to make the datasets more
manageable by analysts.

Data mining describes the next step of the analysis and involves a search
of the data to identify patterns and meaning. Identified patterns are
analysed to discover the specific ways that learners interacted with the
learning content and within the learning environment.

The kind of information that descriptive analytics can provide depends on


the learning analytic capability of the learning management system (LMS)
being used and what the system is reporting on specifically.

Some common indicators that can be identified include learner


engagement and learner performance. With learner engagement, analysts
can detect the participation level of learners in the course and how and
when course resources were accessed.

Performance data provides analysts with insight into how well learners
succeeded on the course; this information could come from data taken
from assessments or assignments. It’s important to note that insights
learned from descriptive analysis are not used for making inferences or
predictions about a learner’s future performance.

The analytical method is meant to provide strategic insight into where


learners, or a specific learner, may have needed more support. It can also
help course designers improve the design of learning by providing insight
into what went well and what did not go well on the course.

235
DESCRIPTIVE ANALYTICS

7.5 DESCRIPTIVE ANALYTICS-TOOLS

There are three major types of measures / tools of central tendency in


descriptive analytics.

1. Mean
The Mean or average is probably the most commonly used methods of
describing a central tendency. The mean represents the centre of gravity of
distribution. Each score in a distribution contributes to the determination of
mean. It is also known as arithmetic average. Mean is the average of all
values in a distribution

To compute the mean, all the values are added and divided by the total
number of values. It is the ratio of summation of all scores to the total
numbers of scores. Using mean one can compare different groups. It also
helps in computing further statistics. Since this method involves handling
of large numbers and entails tedious calculations, the researcher used data
analysis tools available in a simple Microsoft® office suite, Excel 2007 to
calculate the mean.

2. Median

The median is the positional average that divides a distribution into two
equal parts so that one half of items falls above it and the other half below
it.

In other words, the midpoint of a distribution of values is called the


median. It is the point, below and above which 50% of the population lies.
The Median is the score found in the exact middle of the set of values. One
way to compute the median is to list all scores in numerical order, and then
locate the scores in the centre of the sample. If there is an even number of
numbers in the set, then the median calculates the average of the two
numbers in the middle.

The researcher used data analysis tools available in the simple Microsoft®
office suite, Excel 2007 to calculate the median. The mode of function is
Formulas/More functions/Statistical/ Median.

236
DESCRIPTIVE ANALYTICS

3. Mode
The mode is the most frequently occurring value in the set of scores. The
mode is indirectly calculated mean and median. It is a quick and
appropriate measure of central tendency.

The mode can be calculated as the largest frequency in the distribution,


using the following formula:

Mode = 3 (median) – 2 (mean)

The researcher used data analysis tools available in the simple Microsoft®
office suite, Excel 2007 to calculate the mode. The mode of function is
Formulas/More functions/Statistical/ Mode.

• Measures of variability
The measures of central tendency indicate the central value of the
distribution. However, the central value alone is not sufficient to fully
describe the distribution.

In addition to the measures of centrality, we require a measure of the


spread of the actual scores. The extent of such spread may vary from one
distribution to another. The extent of such variability is measured by the
measures of variability.

Variability describes the way the classes are distributed and how they are
changing in relation to a variety. For example, Range and Standard
Deviation. The technique employed in the present study is Standard
Deviation. The range is simply the highest value minus the lowest value.
The standard deviation is more accurate and detail measure of dispersion.

Standard Deviation

The standard deviation shows the relation that set of scores has with the
mean of the sample. Standard deviation is expressed as the positive
square root of the sum of the squared deviations from the mean divided by
the number of scores minus one. It is the average difference between
observed values and the mean. The standard deviation is used when
expressing dispersion in the same unit as the original measurement. It is
designated as (σ)

237
DESCRIPTIVE ANALYTICS

The researcher used data analysis tools available in the simple Microsoft®
office suite, Excel 2007 to calculate the S.D. The mode of function is
Formulas/More functions/Statistical/ STDEV.

• Measure of Divergence from Normality


An important aspect of the “description” of a variable is the shape of its
distribution, which tells the frequency of values from different range of
variables. A researcher is interested in how well the distribution can be
approximated from the normal distribution. Simple description statistic can
provide some information relevant to this issue. The two measures used to
determine the shape of distributions are skewness and kurtosis.

Skewness: Many times it is seen that the mean, median and mode of the
distribution don’t fall at the same place, i.e. the scores may extend much
farther in one direction than the other. Such a distribution is called a
skewed distribution.

Positively skewed distribution: The distribution is positively skewed


when most of the scores pile up at the low end (or left) of the distribution
and spreads out more gradually towards the high end of it. In a positively
skewed distribution, the mean falls on the right side of the median.

Negatively skewed distribution: The distribution is negatively skewed if


the scores are concentrated towards the upper value and it is positively
skewed if they cluster towards lower value. The mean of the distribution is
higher than the median in positive skewness whereas the median value is
greater than the mean in negative skewness.
Skewness = Mean - Mode SD
For the present study skewness was calculated using Microsoft Excel
2007.The mode of function is Formulas/More functions/Statistical/ SKEW.

Kurtosis: The term “Kurtosis “refers to “peakedness ” or the flatness of a


frequency distribution as compared with the normal. A frequency
distribution more peaked than the normal is said to be Leptokurtic and a
frequency distribution flatter than the normal is called Platykurtic. A normal
curve is also termed as Mesokurtic.

Positive kurtosis indicates a relatively peaked distribution leptokurtic and


negative kurtosis indicates a relatively flat distribution, which is platykurtic.

238
DESCRIPTIVE ANALYTICS

The researcher used data analysis tools available in the simple Microsoft®
office suite, Excel 2007 to calculate the Kurtosis. The mode of function is
Formulas/More functions/Statistical/ KURT.

• Measures of Probability (fiduciary limits)

In order to estimate the population mean or the probable variability, it is


necessary to set up limits for a given degree of confidence which will
embrace the mean or the standard deviation since limits define the
confidence interval.

Estimation of Population parameters :- (Fiduciary Limits)


The limits of the confidence intervals of parameters are called fiduciary
limits.

Describing data is an essential part of statistical analysis aiming to provide


a complete picture of the data before moving to advanced methods. The
type of statistical methods used for this purpose are
called descriptive statistics. They include both numerical (e.g. mean,
mode, variance…) and graphical tools (e.g. histogram, boxplot…) which
allow to summarize a set of data and extract important information such as
central tendencies and dispersion. Moreover, you can use them to describe
the association between several variables.

In order to choose the right descriptive statistics tool, one need to consider
the types and the number of variables available as well as the objective of
the analysis. Based on these three criteria the grid can be generated that
will help to decide which tool to use according to situation.

The first column of the grid refers to data types:

• Quantitative: containing variables that describe quantities of the


objects of interest. The values are numbers. The weight of an infant is an
example of a quantitative variable.

• Qualitative: containing variables that describe qualities of the objects of


interest. These values are called categories, also referred as levels or
modalities. The gender of an infant is an example of a qualitative
variable. The possible values are the categories male and female.

239
DESCRIPTIVE ANALYTICS

• Mixed: containing both types of variables.

The second column indicates the number of variables. The proposed tools
can handle either the description of one (univariate analysis) or the
description of the relationships between two (bivariate analysis) or several
variables. The grid also includes a column with an example for each
situation.

Grid
Please note that the list below is not exhaustive. However, it contains the
most commonly used descriptive statistics, all available in XLSTAT.

Data Example Numerical Graphical


Objective
description tool tool
Quantitative Estimate a How many Frequency Histogram
frequency people per age table
distribution class attended
this event?
(here the
investigated
variable is age
in a
quantitative
form)
Measure the What is the Mean, Box plot
One central average grade median, Scattergra
variable tendency of in a classroom? mode m
(univariate one sample Strip plot
analysis) Measure the How widely or Range, Box plot
dispersion of narrowly are standard Scattergra
one sample the grades deviation, m
dispersed variance, Strip plot
around the coefficient of
mean grade in variation,
a classroom? quartiles
Characterize Is the employee Skewness Histogram
the shape of a wage and kurtosis
distribution distribution in a coefficients
company
symmetric?

240
DESCRIPTIVE ANALYTICS

Visual control What is the Probability


-whether a theorical plot
sample percentage of
follows a students who
given obtained a
distribution better note
than a given
threshold
Measure the What data point Quantiles or Box plot
position of a can be used to Percentiles
value within a split the sample
sample into 95% of low
values and 5%
of high values?
Detect Is the height of Box plot
extreme 184cm an
values extreme value
in this group of
students?
Describe the Does plant Correlation Correlation
Two
association biomass coefficients Map
variables
between two increase or Scatterplot
(bivariate
variables decrease with
analysis)
soil Pb content?
Describe the What is the Correlation Motion
association evolution of the coefficients charts
between life expectancy, (up to 3
multiple the fertility rate variables
variables and the size of to describe
population over over time)
the last 10 Scatterplot
years in this or
country? 3D
Scatterplot
Several (up to 3
variables variables
to
describe)
Describe the How to visualize Ternary
association the proportions diagram
between three of three ice
variables cream
under specific ingredients in
conditions several ice
scream
samples?

241
DESCRIPTIVE ANALYTICS

Describe the Does the RV


Two association evaluation of a coefficient
matrices of between two series of
several matrices products differ
variables from a panel to
another?
Qualitative Compute the How many Frequency Bar chart
One frequencies of clients said they table Pie chart
variable different are satisfied by
(univariate categories the service and
analysis) how many said
they were not?
Detect the Which is the Mode Bar chart
most frequent most frequent Pie chart
category hair colour in
this country?
Measure the Does the Contingency 3D graph
association presence of a table (or of
Two between two trace element cross-tab) contingenc
variables variables change y table
(bivariate according to the Stacked or
analysis) presence of clustered
another trace bars
element?
Mixed Describe the Is the Biserial Boxplot
(quantitative relationship concentration of correlation
& qualitative) between a a molecule in
binary and a rats linked to
continuous the rats' sex
variable (M/F)?

Two Describe the Does sepal Univariate Boxplot


variables relationship length differ descriptive
(bivariate between a between three statistics for
analysis) categorical flower species? the
and a quantitative
continuous variable
variable within each
category of
the
qualitative
variable

242
DESCRIPTIVE ANALYTICS

Describe the Does the Scatterplot


relationship amount of (with
between one money spent on groups)
categorical this commercial
Several
and two website change
variables
quantitative according to the
variables age class and
the salary of
the customers?

7.6 EXAMPLES OF DESCRIPTIVE ANALYTICS

Here are some common applications of Descriptive Analytics:


• Summarizing past events such as regional sales, customer attrition, or
success of marketing campaigns.
• Tabulation of social metrics such as Facebook likes, Tweets, or followers.
• Reporting of general trends like hot travel destinations or news trends.
• Every modern business needs to build its Data Analytics framework,
where the latest data technologies like Big Data play a crucial role.
• Data and technology should be made available at every corner of an
enterprise to develop and nurture a widespread data-driven culture.
• If data and analytics are aligned with overall business goals, then day-to-
day business decisions will be more driven by data-driven insights.
• As people drive businesses, the manpower engaged in Data Analytics
must be competent and adequately trained to support enterprise goals.
• A centrally managed team must lead the analytics production and
consumption efforts in the enterprise to bring behavioral change towards
a data culture.
• The concept of Data Analytics must be spread through both formal data
centres and informal social networks for an inclusive growth.

Many LMS platforms and learning systems offer descriptive analytical


reporting with the aim of help businesses and institutions measure learner
performance to ensure that training goals and targets are met.

243
DESCRIPTIVE ANALYTICS

The findings from descriptive analytics can quickly identify areas that
require improvement - whether that be improving learner engagement or
the effectiveness of course delivery.

Here are some examples of how descriptive analytics is being used in the
field of learning analytics:
• Tracking course enrolments, course compliance rates,
• Recording which learning resources are accessed and how often
• Summarizing the number of times a learner posts in a discussion board
• Tracking assignment and assessment grades
• Comparing pre-test and post-test assessments
• Analyzing course completion rates by learner or by course
• Collating course survey results
• Identifying length of time that learners took to complete a course

7.7 ADVANTAGES OF DESCRIPTIVE ANALYTICS:

When learners engage in online learning, they leave a digital trace behind
with every interaction they have in the learning environment. This means
that descriptive analytics in online learning can gain insight into behaviours
and performance indicators that would otherwise not be known.

Here are some advantages to utilizing this information:


• Quickly and easily report on the Return on Investment (ROI) by showing
how performance achieved business or target goals.
• Identify gaps and performance issues early - before they become
problems.
• Identify specific learners who require additional support, regardless of
how many students or employees there are.
• Identify successful learners in order to offer positive feedback or
additional resources.

244
DESCRIPTIVE ANALYTICS

• Analyze the value and impact of course design and learning resources.

Thus, Descriptive Analytics is focused solely on historical data

7.8 CASE STUDY: COCA - COLA ENTERPRISES (CCE)

Case Study: The Thirst for HR Analytics Grows.

• Data analytics journey

The HR analytics journey within Coca- Cola Enterprises (CCE) really began
in 2010. Given the complexity of the CCE operation, its global footprint and
various business units, a team was needed which was able to provide a
centralised HR reporting and analytics service to the business. This led to
the formation of a HR analytics team serving 8 countries. As a new team
they had the opportunity to work closely with the HR function to
understand their needs and build a team not only capable of delivering
those requirements but also challenge the status quo.

"When I first joined Coca-Cola Enterprises in 2010, it was very early on in


their transformation programme and reporting was transitioned from North
America to Europe. At that point we did not have a huge suite of reports
and there was limited structure in place. We had a number of scheduled
reports to run each month, but not really an offering of scorecards or
anything more advanced."

The first step was to establish strong foundations for the new data
analytics programme.

It was imperative to get the basics right, enhance credibility, and automate
as many of the basic descriptive reports as possible. The sheer number of
requests the team received was preventing them from adding value and
providing more sophisticated reports and scorecards.

CCE initiated a project to reduce the volume of scheduled reports sent to


customers, which enabled them to decrease the hours per month taken to
run the reports by 70%. This was a game changer in CCE’s journey. Many
of the remaining, basic, low value reports were then automated which
allowed the team to move onwards in their journey and look more at the
effectiveness of the HR function by developing key measures. The analytics

245
DESCRIPTIVE ANALYTICS

team was soon able to focus on more "value-adding" analytics, instead of


being overwhelmed with numerous transactional requests which consumed
resources.

‘’In the early stages requests were very basic. For example, how many
people am I supporting? How many people have started or left? How many
promotions have there been in my part of the organisation? The majority
of requests were therefore very descriptive in their nature. There was an
obvious need to automate as much as we could, because if we could not
free ourselves of that kind of transactional reporting, there was no way we
were going to add any value with analytics.’’

• Standardising and reporting: towards a basic scorecard

The team soon found that the more they provided reports, the more
internal recognition they received. This ultimately created a thirst within
HR for more data and metrics for measuring the performance of the
organisation from a HR perspective. The HR analytics function knew this
was an important next step but it wasn’t where they wanted the journey to
end. They looked for technology that would allow them to automate as
many of these metrics as possible whilst having the capability to combine
multiple HR systems and data sources.

A breakthrough, and the next key milestone in the journey for CCE, was
when they invested in an "out of the box" system which provided them
with standard metrics and measures, and enabled quick and simple
descriptive analytics.

Instead of building a new set of standards from scratch, CCE piloted pre-
existing measures within the application and applied these to their data.
The result was that the capability to deliver more sophisticated descriptive
analytics was realised quicker and began delivering results sooner than
CCE business customers had expected.

‘‘We were able to segment tasks based on the skill set of the team. This
created a natural talent development pipeline and ensured the right skill
set was dedicated to the appropriate task. This freed up time for some of
the team to focus on workforce analytics. We implemented a solution that
combines data from various sources, whether it is our HR system, the case
management system for the service centre, or our on- boarding /

246
DESCRIPTIVE ANALYTICS

recruitment tools. We brought all that data in to one central area and
developed a lot of ratios and measures. That really took it to the next
level.’’

As with any major transformation, the evolution from transactional to more


advanced reporting took time, resource and commitment from the
business, and there were many challenges for the team to overcome.

‘’There were a lot of lessons. With the workforce analytics implementation


we probably underestimated the resource and the time needed. Sometimes
less is more and we provided too many metrics at first. The key was to
really collaborate with our HR leaders and understand what the key metrics
were.’’

With the standards in place CCE then turned to establishing a basic


scorecard approach to illustrate the data. Scorecards are a common
instrument used by many organisations to provide an overview of the
performance of a function. Typically they consist of clear targets illustrated
in a dashboard fashion and are utilised by senior management to guide
their leadership of the organisation. The leadership team's familiarity with
the scorecard methodology meant that the analytics team could simply fit
in to a standard reporting process. But for CCE to create its HR dashboard
it was apparent that a clear purpose and objective for the analytics was
needed, and that the development of future scorecards should be as
automated as possible.

• Consulting to the business: HR as a centre of "people expertise"

At CCE it’s clear that HR analytics, insights, and combining HR and


business data is an illustration of the value that HR can add to the
business. CCE has developed a partnership approach which demonstrates
the power that high quality analytics can deliver, and its value as a
springboard to more effective HR practices in the organisation. By acting in
a consultative capacity HR is able to better understand what makes CCE
effective at delivering against its objectives, HR ensures both parties within
the partnership use the data which is extracted, and find value in the
insights which HR are developing.

247
DESCRIPTIVE ANALYTICS

‘’To be a consultant in this area, you have to understand the business


you’re working in. If you understand the business problem then you can
help with your understanding of HR, together with your understanding of
all the data sets you have available.

You can really help by extracting the right questions. If you have the right
question, then the analysis you are going to complete will be meaningful
and insightful.’’

• Moving from descriptive reporting towards correlation analysis

There are numerous examples where the HR reporting and analytics team
have partnered with the HR function and provided insights that have
helped to develop more impactful HR processes and deliver greater
outcomes for the business. As with many organisations it is the
engagement data with which the majority of HR insight is created.
Developing further insight beyond standard survey outputs has meant that
CCE has begun to increase the level of insights developed through the
method, and by using longitudinal data they have started to track
sentiment in the organisation. Tracking sentiment alongside other
measures provides leaders with a good indicator for sense- checking the
power of HR initiatives and general business processes. The question is
whether the relationship between engagement and business results is
causal or correlative. For CCE this point is important when explaining the
implications HR data insights to the rest of the business.

‘’There have definitely been a number of examples where we are starting


to share insights that are being acted upon. One example is our
engagement survey that is run every couple of years. Within the survey
there are three questions related to communication.

The business was keen to understand if there was a correlation between


how an employee scores a manager, in terms of communication, and key
performance indicators across our sites.

We demonstrated that across all of our sites there was a positive


correlation between how leaders communicate and business outcomes.
That is great but it is not implying causation. There is something there to
explore further, but we cannot go and say, good communication causes
better business performance.’’

248
DESCRIPTIVE ANALYTICS

• Building analytics capability within HR at CCE

For CCE's analytics team one of the most important next steps is to share
the experience and knowledge gained from developing the analytics
function with their colleagues, and build capability across HR.

‘’We are also reviewing the learning and development curriculum for HR to
see what skills and competencies we need to build. One of the
competencies that we have introduced is HR professionals being data
analysers.

For me, it is not only understanding a spreadsheet or how to do a pivot


table, it is more understanding what a ratio is, or understanding what their
business problems are, or how data can really help them in their quest to
find an intervention that is going to add value and shape business
outcomes.”

• Barriers

As with any long journey the analytics team at CCE have faced numerous
barriers. The challenges they list are common to most HR professionals
attempting to establish a significant new process, but it is the challenge of
establishing new capability and embedding fit-for-purpose technologies
which has created the greatest challenge at CCE.

‘’In terms of barriers, technology is one. For example having the right data
warehouse in place that allows you to extract the data very quickly. From a
HR perspective we are well placed, however extracting data from the rest
of the business, is a challenge. At CCE HR is trying to branch out and get
the data from other parts of the business, which is probably quite unusual.
People probably do not expect HR to be that kind of driving force.’’

CCE recognises a recruitment challenge centred on sourcing the capabilities


to develop high-impact HR analytics, which includes hiring individuals with
the ability to analyse data, develop insights and the communication know-
how to share across the business. One challenge for HR is to sell the
profession as suitable for analytical high-potentials to build their broader
business acumen: to move away from the traditional view of transactional
HR with little or no analytical capability, to a function based around high-
quality data and business insights. For CCE this represents a significant

249
DESCRIPTIVE ANALYTICS

opportunity- high calibre analysts must see HR as a profession in which


they're able to build a lasting career.

‘’At conferences I have listened to major firms who have PhD students in
their business intelligence teams, who appear to be very good at not only
analytics but also presenting information. They are few and between and I
believe that people who have that skill set would not naturally go into HR.
If I reference the recent big data conference I went to, and the projects
that some of these companies were doing outside of HR with customer
data, Twitter data, really what I would call ‘big data,’ it may seem a lot
more appetising and appealing than HR analytics. If I was a PhD student, I
am not sure I would consider HR as a place to go to develop my career and
also, whether I would see any longevity in it. As a function we need to
change that.’’

• Utilising predictive analytics: CCE's approach

For organisations like CCE natural progression in analytics is towards


mature data processes that utilise the predictive value of HR and business
data. For most organisations this can too often remain an objective that
exists in the far future, and one which without significant investment may
never be realised. Alongside the resource challenges in building capability
there also exists the need to understand exactly how data may provide
value, and the importance of objective and critical assessment as to how
data can be exploited. Without appreciation for methodological challenges,
data complexity and nuances in analysis, it may be that organisations use
data without fully understanding the exact story the data is telling.

‘’Predictive analytics is difficult. We are very much in the early stages as we


are only starting to explore what predictive analytics might enable us to
do, and what insights it could enable us to have. If we can develop some
success stories, it will grow. If we go down this route and start to look at
some predictive analytics and actually, there is not the appetite in the
business, or they do not believe it is the right thing to do, it might not take
off.

If you think about the 2020 workplace, the issues that we have around
leadership development, multi-generational workforces, people not staying
with companies for as long as they have done in the past, there are a lot of

250
DESCRIPTIVE ANALYTICS

challenges out there for HR. These are all areas where the use of HR
analytics can provide the business with valuable insights.’’

For CCE it appears that analytics and HR insight are gaining significant
traction within the organisation. Leaders are engaging at all levels and the
HR function is increasingly sharing insights across business boundaries.
This hasn't been without its challenges: CCE face HR's perennial issues of
technology and the perceived lack of analytics capability. However their
approach of creating quality data sets and automated reporting processes
has provided them with the foundations and opportunity to begin to
develop real centres of expertise capable of providing high quality insight
to the organisation. It is clear CCE remains focused on continuing its HR
analytical journey.

7.9 THE ROLE OF DESCRIPTIVE ANALYTICS IN FUTURE


DATA ANALYSIS

As data-driven businesses continue to use the results from Descriptive


Analytics to optimize their supply chains and enhance their decision-
making powers, Data Analytics will move further away from Predictive
Analytics toward Prescriptive Analytics or rather towards a “mash-up of
predictions, simulations, and optimization.”

The future of Data Analytics lies in not only describing what has happened,
but in accurately predicting what might happen in the future. This claim is
explained in the article titled The Future of Analytics Is Prescriptive, Not
Predictive. This article cites a GPS navigation system, where Descriptive
Analytics is used to provide directional cues. However, such analysis is
reinforced by “Predictive Analytics” offering important details about the
journey like the time duration. Now, if the GPS system is further powered
by Prescriptive Analytics, then the navigation system will not only provide
directions and time, but also the quickest way to reach the destination. The
best part of such a super-charged navigation system is that it can even
compare several traveling routes and recommend the best solution.

As Data Mining and Machine Learning jointly offer solutions to predict


customer segments and marketing ROIs, the future Predictive Analytics
techniques will continue to evolve into Prescriptive Analytics, creating a
mash-up of “predictions, simulations, and optimization.”

251
DESCRIPTIVE ANALYTICS

7.10 SUMMARY

Descriptive Analytics, the conventional form of Business Intelligence and


data analysis, seeks to provide a depiction or “summary view” of facts and
figures in an understandable format, to either inform or prepare data for
further analysis. It uses two primary techniques, namely data aggregation
and data mining to report past events. It presents past data in an easily
digestible format for the benefit of a wide business audience.

A common example of Descriptive Analytics are company reports that


simply provide a historic review of an organization’s operations, sales,
financials, customers, and stakeholders. It is relevant to note that in the
Big Data world, the “simple nuggets of information” provided by
Descriptive Analytics become prepared inputs for more advanced Predictive
or Prescriptive Analytics that deliver real-time insights for business decision
making.

Descriptive Analytics helps to describe and present data in a format which


can be easily understood by a wide variety of business readers. Descriptive
Analytics rarely attempts to investigate or establish cause and effect
relationships. As this form of analytics doesn’t usually probes beyond
surface analysis, the validity of results is more easily implemented. Some
common methods employed in Descriptive Analytics are observations, case
studies, and surveys. Thus, collection and interpretation of large amount of
data may be involved in this type of analytics.

The descriptive analyst simply offers the existing data in a more


understandable format without any further investigation. Thus, Descriptive
Analytics is more suited for a historical account or a summary of past data.
Most statistical calculations are generally applied to Descriptive Analytics.

252
DESCRIPTIVE ANALYTICS

some common applications of Descriptive Analytics consists of


• Summarizing past events such as regional sales, customer attrition, or
success of marketing campaigns.
• Tabulation of social metrics such as Facebook likes, Tweets, or followers.
• Reporting of general trends like hot travel destinations or news trends.
• Every modern business needs to build its Data Analytics framework,
where the latest data technologies like Big Data play a crucial role.
• Data and technology should be made available at every corner of an
enterprise to develop and nurture a widespread data-driven culture.
• If data and analytics are aligned with overall business goals, then day-to-
day business decisions will be more driven by data-driven insights.
• As people drive businesses, the manpower engaged in Data Analytics
must be competent and adequately trained to support enterprise goals.
• A centrally managed team must lead the analytics production and
consumption efforts in the enterprise to bring behavioural change
towards a data culture.
• The concept of Data Analytics must be spread through both formal data
centres and informal social networks for an inclusive growth.

The future of Data Analytics lies in not only describing what has happened,
but in accurately predicting what might happen in the future. This claim is
explained in the article titled The Future of Analytics Is Prescriptive, Not
Predictive. This article cites a GPS navigation system, where Descriptive
Analytics is used to provide directional cues. However, such analysis is
reinforced by “Predictive Analytics” offering important details about the
journey like the time duration. Now, if the GPS system is further powered
by Prescriptive Analytics, then the navigation system will not only provide
directions and time, but also the quickest way to reach the destination. The
best part of such a super-charged navigation system is that it can even
compare several traveling routes and recommend the best solution.

253
DESCRIPTIVE ANALYTICS

7.11 SELF ASSESSMENT QUESTIONS:

1. Define the Descriptive Analytics.

2. What are the basic principles of descriptive analytics? Explain

3. Write short notes on : Descriptive analytic tools.

4. Give few examples of descriptive analytics.

5. What are the advantages of descriptive analytics? Explain

7.12 MULTIPLE CHOICE QUESTIONS:

1. The main characteristic features of descriptive analytics consists of


-----------------------
a. Describing or summarising the existing data using existing business
intelligence tools to better understand what is going on or what has
happened.
b. Focus on past performance to determine what happened and why.
The result of the analysis is often an analytic dashboard.
c. Emphasizes on predicting the possible outcome using statistical
models and machine learning techniques.
d. Data is used to recommend one or more course of action on
analyzing the data.

2. Descriptive Analytics uses primary techniques, namely ---------------to


report past events. It presents past data in an easily digestible format
for the benefit of a wide business audience.
a. data aggregation
b. data mining
c. data aggregation and data mining
d. data storage

3. The measures / tools of central tendency in descriptive analytics is


consists of ---------------------------
a. Skewness
b. Mean, median and mode
c. Standard deviation
d. Kurtosis

254
DESCRIPTIVE ANALYTICS

4. In order to choose the right descriptive statistics tool, one need to


consider the --------------based on these criteria the grid can be
generated that helps to decide which tool to use according to situation.
a. Types of variables
b. Number of variables available
c. Objective of the analysis
d. Types and the number of variables available as well as the objective
of the analysis.

5. As data-driven businesses continue to use the results from Descriptive


Analytics to optimize their supply chains and enhance their decision-
making powers, Data Analytics will move further away from ------------
toward Prescriptive Analytics or rather towards a “mash-up of
predictions, simulations, and optimization.”
a. Predictive Analytics
b. Descriptive Analytics
c. Big data analytics
d. Data mining

Answers: 1.(a), 2.(c), 3.(b), 4.(d), 5.(a)

255
DESCRIPTIVE ANALYTICS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture - Part 1

Video Lecture - Part 2

256
DIAGNOSTIC ANALYTICS

Chapter 8
Diagnostic Analytics
Objectives:

On completion of this chapter, you will understand about diagnostic


analytics ,it’s importance in data analytics considering the following
aspects:

Structure:

8.1 Introduction

8.2 Definition

8.3 Diagnostic Analytics process & Tools

8.4 Benefits of Diagnostic Analytics

8.5 Future - diagnostic analytics

8.6 An example of diagnostic analytics

8.7 Summary

8.8 Self Assessment Questions

8.9 Multiple Choice Questions

257
DIAGNOSTIC ANALYTICS

8.1 INTRODUCTION

Diagnostic Analytics in in contrast to the descriptive analytics. Diagnostic


Analytics is less focused on what has occurred but rather focused on why
something happened. In general, these analytics are looking on the
processes and causes, instead of the result. Therefore, Diagnostic
analytics is a form of advanced analytics that examines data or content to
answer the question, “Why did it happen?” It is characterized by
techniques such as drill-down, data discovery, data mining and
correlations.

Diagnostic analytics really fits into spectrum of analysis, really going from
basic to more complex. We are going to discuss about diagnostic analytics,
which is really the most abstract of any of phases of analysis. And it really
answers the questions of “why”. Why things are happening? What's driving
things to go up, or down, or anything along those lines?

So in order to understand let's imagine that we are grocery store owners,


and we want to know what's causing our revenue for any given product to
go up or go down so that we know how to stock our shelves. Find
correlations in data where we want to know what's driving the revenue?
What's causing it to go up or down? So we're going to pick out a bunch of
features that we think might impact that revenue and then we're going to
basically find a number that tells us it's either highly correlated or low
correlation.

All these kind of questions i.e. why are things happening? What's causing
things to go on? will be answered in the diagnostic analytics.

258
DIAGNOSTIC ANALYTICS

8.2 DEFINITION

Diagnostic analytics is a form of advanced analytics that examines data or


content to answer the question, “Why did it happen?” It is characterized by
techniques such as drill-down, data discovery, data mining and
correlations.

In contrast to descriptive analytics, diagnostic analytics is less focused on


what has occurred but rather focused on why something happened. In
general, these analytics are looking on the processes and causes,
instead of the result.

We have descriptive analytics as a start where we're really wrangling our


data, we clean it, we relate it, we visualize it. We start to get a feel for
what's going on in our business and then we start to get to the cool stuff
where we're doing some diagnostic and some data science stuff. And so we
do some drill downs, some data discovery, we find some correlations, we
do things of that nature. So that's diagnostic analytics.

Diagnostic analytics is a form of advance analytics which examines data or


content to answer the question “Why did it happen?”, and is characterized
by techniques such as drill-down, data discovery, data mining and
correlations. Diagnostic analytics takes a deeper look at data to attempt to
understand the causes of events and behaviours.

8.3 DIAGNOSTIC ANALYTICS PROCESS & TOOLS:

For the purpose of better understanding lets us take the example of


calculations and payments of indirect tax matters.

First, one need to understand in diagnostic analytics what information is


relevant, and how should it be interpreted and used? Enterprise
intelligence (EI) can help an organization optimize its performance by
identifying relevant information, analyzing it in a way that produces
insights, thereby allowing the organization to act.

259
DIAGNOSTIC ANALYTICS

• Analytics – making sense of “big data” to identify gaps and


opportunities

Indirect tax analytics is a wide term used to describe the identification and
analysis of indirect tax issues through data interrogation. Approaches
range from one-off reports for specific issues, to custom-made tools
developed for in-house use, to continuous monitoring by third-party
providers.

The complexity of indirect tax reporting means that many — and possibly
most — multinational companies have significant indirect tax exposures. At
the same time, many are becoming aware of the cost of indirect taxes —
including high duty rates, unclaimed input VAT/GST, penalties and the costs
of financing indirect tax payments.

Tax and customs administrations are also becoming more active and more
sophisticated in their methods of auditing large companies’ indirect tax
affairs. But, until recently, most companies have not been able to identify
the major indirect tax risks they carry, nor have they been in a position to
optimize their working capital and cash flow on a global basis.

260
DIAGNOSTIC ANALYTICS

Many are now turning to technology to diagnose underlying issues and


weaknesses in indirect tax reporting and to identify opportunities to
improve performance. The service delivery models may be broadly
categorized as follows:

• Analytics as an outsourced one-off service — the company provides


data to a third-party service provider to analyze and report back findings
for a one-off project or specific purpose
• Analytics as an ongoing outsourced service — the company provides
data on a regular basis to a third party and gets the results back online
• Analytics as in-house software — the company buys or commissions
software to use in-house, often supported by a third-party service
provider for the implementation or design
• Differentiate your company through enterprise intelligence
• EI (enterprise intelligence) is how companies manage and exploit big
data. Using information helps businesses sharpen their performance,
differentiate their offerings, identify new revenue and innovation
opportunities, minimize their exposure to risk, improve organizational
efficiency, and facilitate the uncertainty of a volatile global economy

261
DIAGNOSTIC ANALYTICS

• Properly utilizing the information they store and matching it from


different sources is fast becoming a competitive differentiator for
forward-leaning companies
• Tax reporting begins and ends with data
• Data is the starting point and the end deliverable of every tax task. If
companies do not seize the challenge to manage their tax data
effectively, tax and customs administrations will. Tax administrations are
becoming smarter, faster and more efficient at using data analytic tools
to obtain, analyze and assess underpaid tax and duty amounts. In-depth
reviews that once took from three months to two years to complete can
now be done on a data-driven basis in a matter of weeks

• As companies begin to outsource tax compliance and run their own data
warehousing and dash-boarding solutions, their analysis of tax and trade
data is becoming much more proactive. And as companies use data
analysis tools more effectively, and their understanding improves,
processes become more streamlined, response times fall, opportunities
increase and the number of unpleasant tax surprises drops considerably

262
DIAGNOSTIC ANALYTICS

• Diagnostic tools and data analytics


Data analysis can help companies look into the future as well as into the
past. By bringing together information from a range of corporate functions
and external sources, companies can simulate “what if” scenarios and
identify where risks or opportunities could arise and where future resources
should be focused.

8.4 BENEFITS OF DIAGNOSTIC ANALYTICS

Diagnostic analytics lets you understand your data faster to answer critical
workforce questions. Cornerstone View provides the fastest and simplest
way for organizations to gain more meaningful insight into their employees
and solve complex workforce issues. Interactive data visualization tools
allow managers to easily search, filter and compare people by centralizing
information from across the Cornerstone unified talent management suite.

For example, users can find the right candidate to fill a position, select high
potential employees for succession, and quickly compare succession
metrics and performance reviews across select employees to reveal
meaningful insights about talent pools. Filters also allow for a snapshot of
employees across multiple categories such as location, division,
performance and tenure.

263
DIAGNOSTIC ANALYTICS

The functions of diagnostic analytics fall broadly into three categories:

1. Identify anomalies: Based on the results of descriptive analysis,


analysts must identify areas that require further study because they
raise questions that cannot be answered simply by looking at the data.
These could include questions like why sales have increased in a region
where there was no change in marketing, or why there was a sudden
change in traffic to a website without an obvious cause.

2. Drill into the analytics (discovery): Analysts must identify the data
sources that will help them explain these anomalies. Often, this step
requires analysts to look for patterns outside the existing data sets, and
it might require pulling in data from external sources to identify
correlations and determine if any of them are causal in nature.

3. Determine causal relationships: Hidden relationships are uncovered


by looking at events that might have resulted in the identified
anomalies. Probability theory, regression analysis, filtering, and time-
series data analytics can all be useful for uncovering hidden stories in
the data.

In the past, all of these functions would be completely manual; they would
rely on the abilities of an analyst to identify anomalies, detect patters, and
determine relationships. In that setting, a few of the most experienced
analysts would outperform their peers. However, even those top analysts
wouldn’t be able to guarantee consistency or results. As data volume,
variety, and velocity has increased, such purely manual efforts for
diagnostic analytics are no longer feasible.

Modern solutions for diagnostic analytics must employ machine learning


techniques to augment the analysts. Machines are infinitely more capable
at recognising patterns, detecting anomalies, surfacing ‘unusual’ events,
and identifying drivers of KPIs. The latter capability requires application of
different analytical techniques, chosen from a portfolio of algorithms, to
determine causation and identify independent variables that enterprises
can adjust to effect positive change.

Enabled by machine learning, diagnostic analytics serve an important


function in reducing unintentional bias and misinterpretation of correlation
as causation. Yet today’s diagnostic analytics must still be governed by

264
DIAGNOSTIC ANALYTICS

people. Just as machines can be used to help reduce the bias in human
decision making, so should people be used to contextualize the outputs of
machine decision making.

The prediction is that, by 2021- 25% of large enterprises will have


supplemented data scientists with data ethnographers to provide
contextual interpretations of data by using qualitative research methods
that uncover people’s emotions, stories, and perceptions of their world.

Thus, Diagnostic analytics lets you understand your data faster to answer
critical workforce questions. Cornerstone View provides the fastest and
simplest way for organizations to gain more meaningful insight into their
employees and solve complex workforce issues. Interactive data
visualization tools allow managers to easily search, filter and compare
people by centralizing information from across the Cornerstone unified
talent management suite. For example, users can find the right candidate
to fill a position, select high potential employees for succession, and
quickly compare succession metrics and performance reviews across select
employees to reveal meaningful insights about talent pools. Filters also
allow for a snapshot of employees across multiple categories such as
location, division, performance and tenure.

8.5 FUTURE- DIAGNOSTIC ANALYTICS

The world revolves around data, and every industry uses analytics to make
informed decisions. However, lack of understanding of what advanced
analytics is, and is not, dissuades organizations from examining their
potential to improve supply chain processes.

The analytics value chain typically starts with gathering data from all
possible relevant sources. These are analysed in real time to answer the
“What” question. For each “What” reply, the related historical data is then
dissected further to understand the reasons why it was happening. This is
what we call “Diagnostic Analytics”.

The results of this stage are then investigated to predict what are the
(desired) outcomes that can be potentially created. All this distilled
information is used to arrive at actionable insights that create business
value.

265
DIAGNOSTIC ANALYTICS

The process starts at the descriptive analytics phase and moves into the
predictive analytics stage.

Diagnostic analytics commonly relates to data discovery tools and


visualization, where data is layered for the determination of trends. It
enables easy manipulation of data and uses past performance/data to
perform complete root-cause analysis and uncover patterns in business
processes.

It helps in identifying the factors that directly or indirectly affect the


bottom line. Smarter decisions can be made, and, for example, questions
like why the company did not have the right amount of inventory can be
answered using this methodology.

It is also applied in marketing analytics to assess the number of posts,


shares, mentions and fan interactions and to figure out what worked in
past campaigns and what did not. Key performance indicators prove
quantitative goals for performance, but why specific performance was
“good” or “bad” can be analysed through diagnostic analytics.

Location or geospatial intelligence is among the most useful


applications of diagnostic analytics. Geospatial algorithms look at
time and space and help in computing the shortest or most optimal
routes, keeping in mind variable data like weather, demographics
and consumer diagnostic analytics overview

Previously, we had discussed how descriptive analytics will tell


you what just happened. To understand why, however, you need to do
some more work. You need to perform diagnostic analytics.

In many cases, when there is a single ‘root cause’ of the situation,


diagnostic analytics can be quick and simple – you just need to find that
root cause. But, if no root cause is apparent, then you need to use
diagnostic techniques to discover a causal relationships between two or
more data sets.

The analyst also needs to make it clear what data is relevant to the
analysis so that the relationship between the two data sets is clear.

266
DIAGNOSTIC ANALYTICS

8.6 AN EXAMPLE OF DIAGNOSTIC ANALYTICS

In a descriptive report, generally website revenue is down 8% from the


same quarter last year. In an attempt to get ahead of your boss’s
questions, you conduct diagnostic analytics to find out why.

First, you look for a root cause. Perhaps there was a change in ad spend, a
rise in cart abandonments, or even a change in Google’s algorithm which
has affected your web traffic.

Finding nothing, you then look at one of the data sets which contribute to
revenue: impressions, clicks, conversions, and new customer sign-ups.

You discover from the data that changes in revenue closely tracks changes
in new customer sign-ups, and so you isolate these two data series in a
graph showing the relationship. This then leaves you, or one of your
colleagues, to conduct diagnostic analysis on user registrations to find out
why they are down.

267
DIAGNOSTIC ANALYTICS

• The distinguishing features of diagnostic analytics

Like descriptive analytics, diagnostics requires past ‘owned’ data but,


unlike descriptive analytics, diagnostic analytics will often include outside
information if it helps determine what happened.
❖ From the example above, it’s clear that domain knowledge is also more
important with diagnostic analytics. External information from a wide
range of sources should be considered in root cause analysis.
❖ And, when comparing data sets looking for a relationship, statistical
analysis may be required for a diagnoses, specifically regression analysis
(see point 2 below).
❖ Finally, with diagnostic analytics you are trying to tell a story which isn’t
apparent in the data and so the analyst needs to go ‘out on a limb’ and
offer an opinion.

• How to do diagnostic analytics:

1. Identify something worth investigating

The first step is doing diagnostic analytics is to find something that


is worth investigating. Typically this is something bad, like a fall in
revenue or clicks, but it could also be an unexpected performance
boost.

Regardless, the change you’re looking to diagnose should be rare as


analysing volatile data is a pointless exercise.

2. Do the analysis

As shown in the example above, diagnostic analytics may be as


straightforward as finding a single root cause – i.e. revenue dropped last
month because new customer sign-ups were down.

More complex analyses, however, may require multiple data sets and the
search for a correlation using regression analysis.

How to carry out regression analysis is beyond the scope of this chapter.

268
DIAGNOSTIC ANALYTICS

What you are trying to accomplish in this step is to find a statistically valid
relationship between two data sets, where the rise (or fall) in one causes a
rise (or fall) in another.

More advanced techniques in this area include data mining and principal
component analysis, but straightforward regression analysis is a great
place to get started.

3. Selectively filter your diagnoses

While it may be interesting that a variety of factors contributed to a change


in performance, it’s not helpful to list every possible cause in a report.
Instead an analyst should aim to discover the single, or at most two, most
influential factor(s) in the issue being diagnosed.

4. State your conclusion clearly

Finally, a diagnostic report must come to a conclusion and make a very


clear case for it.

It does not have to include all of the background work, but you should:

- identify the issue you’re diagnosing,


- state why you think it happened, and
- provide your supporting evidence

• Diagnostic analytics best practices

Here are a few more things to keep in mind when doing diagnostic
analytics.

Correlation does not prove causation. Correlation will tell you when
two variables (say clicks and conversions) move in sync with one
another.

While it’s tempting to draw conclusions from that fact, the


correlation must also make sense before it can be considered as
causal evidence.

269
DIAGNOSTIC ANALYTICS

Currently, analytics seems to be largely focused on describing data through


reports. The potential for the practice, however, is far greater than
displaying data and letting the audience make conclusions.

Analysts can do better, though. They can provide further insights into the
data by using diagnostic analytics to try and explain why certain things
happen.

Ideally, marketing reports should contain both. Descriptive charts and


graphs to keep people informed about the systems and results which
concern them and separate, diagnostic reports which aim to explain a
significant phenomenon such as a decline in new business or a change in
web browsing behaviour.

Not only will this help the user to understand why some decisions have
been made, but it also provides evidence that the report writer
understands the data and the point of collecting it. That is, we collect data
so that we can make better-informed decisions through analytics.

Diagnostic analytics plays a key role in supply chain risk management as


well. It helps in the identification of exposure, weak links and points of
failure by mapping the flow of services, products, valuables (cash or
equivalent) and, most importantly, information. This helps in devising
contingency plans to mitigate risk and infuses the much-needed resilience
into today’s supply chain.

Further, diagnostic analytics helps an organization benchmark itself against


the best in the industry when it comes to critical parameters like time to
market and stock outs. The current parameters of the organization are
compared with industry best practices and improvement areas are
identified. It also helps improve performance by correlating the
improvement areas with the priorities and the capabilities of the
organization.

In a nutshell, being right with diagnostics analytics really matters for an


enterprise as it provides enough value levers to not only bridge the gap
with its competitors in key performance areas but also to ensure business
continuity and risk mitigation.

270
DIAGNOSTIC ANALYTICS

8.7 SUMMARY

Diagnostic analytics is a form of advanced analytics that examines data or


content to answer the question, “Why did it happen?” It is characterized by
techniques such as drill-down, data discovery, data mining and
correlations.

In contrast to descriptive analytics, diagnostic analytics is less focused on


what has occurred but rather focused on why something happened. In
general, these analytics are looking on the processes and causes,
instead of the result.

First, one need to understand in diagnostic analytics what information is


relevant, and how should it be interpreted and used? Enterprise
intelligence (EI) can help an organization optimize its performance by
identifying relevant information, analyzing it in a way that produces
insights, thereby allowing the organization to act.

The world revolves around data, and every industry uses analytics to make
informed decisions. However, lack of understanding of what advanced
analytics is, and is not, dissuades organizations from examining their
potential to improve supply chain processes.

The analytics value chain typically starts with gathering data from all
possible relevant sources. These are analysed in real time to answer the
“What” question. For each “What” reply, the related historical data is then
dissected further to understand the reasons why it was happening. This is
what we call “Diagnostic Analytics”.

The results of this stage are then investigated to predict what are the
(desired) outcomes that can be potentially created. All this distilled
information is used to arrive at actionable insights that create business
value.

The process starts at the descriptive analytics phase and moves into the
predictive analytics stage.

Diagnostic analytics commonly relates to data discovery tools and


visualization, where data is layered for the determination of trends. It
enables easy manipulation of data and uses past performance/data to

271
DIAGNOSTIC ANALYTICS

perform complete root-cause analysis and uncover patterns in business


processes.

It helps in identifying the factors that directly or indirectly affect the


bottom line. Smarter decisions can be made, and, for example, questions
like why the company did not have the right amount of inventory can be
answered using this methodology.

Currently, analytics seems to be largely focused on describing data through


reports. The potential for the practice, however, is far greater than
displaying data and letting the audience make conclusions.

Analysts can do better, though. They can provide further insights into the
data by using diagnostic analytics to try and explain why certain things
happen.

Diagnostic analytics plays a key role in supply chain risk management as


well. It helps in the identification of exposure, weak links and points of
failure by mapping the flow of services, products, valuables (cash or
equivalent) and, most importantly, information. This helps in devising
contingency plans to mitigate risk and infuses the much-needed resilience
into today’s supply chain. In a nutshell, being right with diagnostics
analytics really matters for an enterprise as it provides enough value levers
to not only bridge the gap with its competitors in key performance areas
but also to ensure business continuity and risk mitigation.

8.8 SELF ASSESSMENT QUESTIONS

1. What is diagnostic Analytics? Define.

2. Describe the process and tools used in diagnostic analytics?

3. Explain the benefits of diagnostic analytics.

4. It is stated that , the functions of diagnostic analytics fall broadly into


three categories: explain these categories.

5. Explain best practices in diagnostic analytics.

272
DIAGNOSTIC ANALYTICS

8.9 MULTIPLE CHOICE QUESTIONS

1. Diagnostic analytics is a form of advanced analytics that examines data


or content to answer the question, --------------It is characterized by
techniques such as drill-down, data discovery, data mining and
correlations.
a. Why did it happen?”
b. What was happened ?
c. What will happen ?
d. How was it happened?

2. Using information helps businesses to sharpen their performance,


differentiate their offerings, identify new revenue and innovation
opportunities, minimize their exposure to risk, improve organizational
efficiency, and facilitate the uncertainty of a volatile global economy.
The techniques that is used by companies is falls under…………….
a. Business intelligence
b. Enterprise Intelligence
c. Data intelligence
d. Big data analysis

3. The functions of diagnostic analytics fall broadly into which of the


following
categories ?
a. Identify anomalies
b. Drill into the analytics (discovery)
c. Determine causal relationships
d. All the categories to identify, discover and determining relationship of
data

4. Enabled by machine learning, diagnostic analytics serve an important


function in ------------------------as causation.
a. reducing unintentional misinterpretation of correlation
b. reducing unintentional bias decision
c. reducing unintentional bias and misinterpretation of correlation
d. maintains transparency in data output

273
DIAGNOSTIC ANALYTICS

5. Diagnostic analytics plays a key role in supply chain risk management


as well. This helps in devising contingency plans to ------------------into
today’s supply chain.
a. Mitigate the risk
b. Infuse resilience
c. mitigate risk and infuses the much-needed resilience
d. measures the risk in advance

Answers: 1.(a), 2.(b), 3.(d), 4.(c), 5.(c)

274
DIAGNOSTIC ANALYTICS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture

275
PREDICTIVE ANALYTICS

Chapter 9
Predictive Analytics
Objectives:

On completion of this chapter, you will understand about Predictive


Analytics, it’s importance in data analytics considering the following
aspects:

Structure:

9.1 Introduction

9.2 Definition

9.3 Predictive analytics workflow

9.4 Benefits of predictive analytics

9.5 Predictive analytics examples

9.6 Predictive analytics tools

9.7 Predictive modeling - working and techniques

9.8 Why is predictive analytics important?

9.9 Case Study

9.10 Summary

9.11 Self Assessment Questions

9.12 Multiple Choice Questions

276
PREDICTIVE ANALYTICS

9.1 INTRODUCTION

Predictive analytics is the process of using data analytics to make


predictions based on data. This process uses data along with analysis,
statistics, and machine learning techniques to create a predictive model for
forecasting future events.

The term “predictive analytics” describes the application of a statistical or


machine learning technique to create a quantitative prediction about the
future. Frequently, supervised machine learning techniques are used to
predict a future value (How long can this machine run before requiring
maintenance?) or to estimate a probability (How likely is this customer to
default on a loan?).

Predictive analytics starts with a business goal to use data to reduce waste,
save time, or cut costs. The process harnesses heterogeneous, often
massive, data sets into models that can generate clear, actionable
outcomes to support achieving that goal, such as less material waste, less
s t o c k e d i n v e n t o r y, a n d m a n u f a c t u r e d p r o d u c t t h a t m e e t s
specificationsPredictive analytics draws its power from a wide range of
methods and technologies, including big data, data mining, statistical
modeling, machine learning and assorted mathematical processes.
Organizations use predictive analytics to sift through current and historical
data to detect trends and forecast events and conditions that should occur
at a specific time, based on supplied parameters.

With predictive analytics, organizations can find and exploit patterns


contained within data in order to detect risks and opportunities. Models can
be designed, for instance, to discover relationships between various
behaviour factors. Such models enable the assessment of either the
promise or risk presented by a particular set of conditions, guiding
informed decision-making across various categories of supply chain and
procurement events.

277
PREDICTIVE ANALYTICS

Rise of Big Data

Predictive analytics is often discussed in the context of big


data, Engineering data, for example, comes from sensors, instruments,
and connected systems out in the world. Business system data at a
company might include transaction data, sales results, customer
complaints, and marketing information. Increasingly, businesses make
data-driven decisions based on this valuable trove of information.

Increasing Competition
With increased competition, businesses seek an edge in bringing products
and services to crowded markets. Data-driven predictive models can help
companies solve long-standing problems in new ways.

Equipment manufacturers, for example, can find it hard to innovate in


hardware alone. Product developers can add predictive capabilities to
existing solutions to increase value to the customer. Using predictive
analytics for equipment maintenance, or predictive maintenance, can
anticipate equipment failures, forecast energy needs, and reduce operating
costs. For example, sensors that measure vibrations in automotive parts
can signal the need for maintenance before the vehicle fails on the road.

Companies also use predictive analytics to create more accurate forecasts,


such as forecasting the demand for electricity on the electrical grid. These
forecasts enable resource planning (for example, scheduling of various
power plants), to be done more effectively.

Cutting-Edge Technologies for Big Data and Machine Learning

To extract value from big data, businesses apply algorithms to large data
sets using tools such as Hadoop and Spark. The data sources might consist
of transactional databases, equipment log files, images, video, audio,
sensor, or other types of data. Innovation often comes from combining
data from several sources.

With all this data, tools are necessary to extract insights and
trends. Machine learning techniques are used to find patterns in data and
to build models that predict future outcomes. A variety of machine learning
algorithms are available, including linear and nonlinear regression, neural
networks, support vector machines, decision trees, and other algorithms.

278
PREDICTIVE ANALYTICS

9.2 DEFINITION

Predictive analytics is a category of data analytics aimed at making


predictions about future outcomes based on historical data and analytics
techniques such as statistical modeling and machine learning. The science
of predictive analytics can generate future insights with a significant degree
of precision. With the help of sophisticated predictive analytics tools and
models, any organization can now use past and current data to reliably
forecast trends and behaviours milliseconds, days, or years into the future.

Thus, Predictive analytics is the practice of extracting information from


existing data sets in order to determine patterns and predict future
outcomes and trends. Predictive analytics does not tell you what will
happen in the future. Instead, it forecasts what might happen in the future
with an acceptable level of reliability, and includes what-if scenarios and
risk assessment.

Predictive analytics uses historical data to predict future events. Typically,


historical data is used to build a mathematical model that captures
important trends. That predictive model is then used on current data to
predict what will happen next, or to suggest actions to take for optimal
outcomes.
Predictive analytics has received a lot of attention in recent years due to
advances in supporting technology, particularly in the areas of big data and
machine learning. Therefore 3 Things You Need to Know

1. Why It Matters

2. How It Works

3. Why Predictive Analytics Matters

279
PREDICTIVE ANALYTICS

9.3 PREDICTIVE ANALYTICS WORKFLOW

Let us presume that , we are all familiar with predictive models for weather
forecasting. A vital industry application of predictive models relates to
energy load forecasting to predict energy demand.

In this case, energy producers, grid operators, and traders need accurate
forecasts of energy load to make decisions for managing loads in the
electric grid. Vast amounts of data are available, and using predictive
analytics, grid operators can turn this information into actionable insights.
Now let us understand -Step-by-Step Workflow for Predicting Energy
Loads. Typically, the workflow for a predictive analytics application follows
these basic steps:

1. Import data from varied sources, such as web archives, databases,


and spreadsheets.
Data sources include energy load data in a CSV file and national
weather data showing temperature and dew point.

2. Clean the data by removing outliers and combining data sources.


Identify data spikes, missing data, or anomalous points to remove from
the data. Then aggregate different data sources together – in this case,
creating a single table including energy load, temperature, and dew
point.

3. Develop an accurate predictive model based on the aggregated


data using statistics, curve fitting tools, or machine learning. Energy
forecasting is a complex process with many variables, so you might
choose to use neural networks to build and train a predictive model.
Iterate through your training data set to try different approaches. When
the training is complete, you can try the model against new data to see
how well it performs.

4. Integrate the model into a load forecasting system in a production


environment.
Once you find a model that accurately forecasts the load, you can move
it into your production system, making the analytics available to
software programs or devices, including web apps, servers, or mobile
devices.

280
PREDICTIVE ANALYTICS

Developing Predictive Models is important task for that aggregated data


tells a complex story and to extract the insights it holds, you need an
accurate predictive model.

Predictive modeling uses mathematical and computational methods to


predict an event or outcome. These models forecast an outcome at some
future state or time based upon changes to the model inputs. Using an
iterative process, you develop the model using a training data set and then
test and validate it to determine its accuracy for making predictions. One
can also try out different machine learning approaches to find the most
effective model.

281
PREDICTIVE ANALYTICS

9.4 BENEFITS OF PREDICTIVE ANALYTICS

Predictive analytics makes looking into the future more accurate and
reliable than previous tools. As such it can help adopters find ways to save
and earn money. Retailers often use predictive models to forecast
inventory requirements, manage shipping schedules and configure store
layouts to maximize sales. Airlines frequently use predictive analytics to set
ticket prices reflecting past travel trends. Hotels, restaurants and other
hospitality industry players can use the technology to forecast the number
of guests on any given night in order to maximize occupancy and revenue.

By optimizing marketing campaigns with predictive analytics, organizations


can also generate new customer responses or purchases, as well as
promote cross-sell opportunities. Predictive models can help businesses
attract, retain and nurture their most valued customers.

Predictive analytics can also be used to detect and halt various types of
criminal behaviour before any serious damage is inflected. By using
predictive analytics to study user behaviours and actions, an organization
can detect activities that are out of the ordinary, ranging from credit card
fraud to corporate spying to cyberattacks.

9.5 PREDICTIVE ANALYTICS EXAMPLES

Organizations today use predictive analytics in a virtually endless number


of ways. The technology helps adopters in fields as diverse as finance,
healthcare, retailing, hospitality, pharmaceuticals, automotive, aerospace
and manufacturing etc.
Any industry can use predictive analytics to reduce risks, optimize
operations and increase revenue. Here are a few examples.

• Banking & Financial Services

The financial industry, with huge amounts of data and money at stake, has
long embraced predictive analytics to detect and reduce fraud, measure
credit risk, maximize cross-sell/up-sell opportunities and retain valuable
customers. Commonwealth Bank uses analytics to predict the likelihood of
fraud activity for any given transaction before it is authorized – within 40
milliseconds of the transaction initiation.

282
PREDICTIVE ANALYTICS

Also used in Developing credit risk models.

Financial institutions use machine learning techniques and quantitative


tools to predict credit risk.

Develop credit risk models. Forecast financial market trends. Predict the
impact of new policies, laws and regulations on businesses and markets.

• Retail
Since the now infamous study that showed men who buy diapers often buy
beer at the same time, retailers everywhere are using predictive analytics
for merchandise planning and price optimization, to analyze the
effectiveness of promotional events and to determine which offers are most
appropriate for consumers. Staples gained customer insight by analyzing
behaviour, providing a complete picture of their customers, and realizing a
137 percent ROI.

• Oil, Gas & Utilities


Whether it is predicting equipment failures and future resource needs,
mitigating safety and reliability risks, or improving overall performance, the
energy industry has embraced predictive analytics with vigor. Salt River
Project is the second-largest public power utility in the US and one of
Arizona's largest water suppliers. Analyses of machine sensor data predicts
when power-generating turbines need maintenance.

• Governments & the Public Sector


Governments have been key players in the advancement of computer
technologies. The US Census Bureau has been analyzing data to
understand population trends for decades. Governments now use
predictive analytics like many other industries – to improve service and
performance; detect and prevent fraud; and better understand consumer
behaviour. They also use predictive analytics to enhance cybersecurity.

• Health Insurance
In addition to detecting claims fraud, the health insurance industry is
taking steps to identify patients most at risk of chronic disease and find
what interventions are best. Express Scripts, a large pharmacy benefits
company, uses analytics to identify those not adhering to prescribed
treatments, resulting in a savings of $1,500 to $9,000 per patient.

283
PREDICTIVE ANALYTICS

• Manufacturing
For manufacturers it's very important to identify factors leading to reduced
quality and production failures, as well as to optimize parts, service
resources and distribution. Lenovo is just one manufacturer that has used
predictive analytics to better understand warranty claims – an initiative
that led to a 10 to 15 percent reduction in warranty costs.

• Automotive
Breaking new ground with autonomous vehicles companies developing
driver assistance technology and new autonomous vehicles use predictive
analytics to analyze sensor data from connected vehicles and to build
driver assistance algorithms.

Incorporate records of component sturdiness and failure into upcoming


vehicle manufacturing plans. Study driver behaviour to develop better
driver assistance technologies and, eventually, autonomous vehicles.

• Aerospace
Monitoring aircraft engine health to improve aircraft up-time and reduce
maintenance costs, an engine manufacturer created a real-time analytics
application to predict subsystem performance for oil,fuel, lift-off,
mechanical health, and controls.

Predict the impact of specific maintenance operations on aircraft reliability,


fuel use, availability and uptime.

• Energy Production – Forecasting electricity price and demand


Sophisticated forecasting apps use models that monitor plant availability,
historical trends, seasonality, and weather.

• Industrial Automation and Machinery – Predicting machine failures


A plastic and thin film producer saves 50,000 Euros monthly using a
health monitoring and predictive maintenance application that reduces
downtime and minimizes waste.

• Medical Devices – Using pattern-detection algorithms to spot asthma


and COPD
An asthma management device records and analyzes patients' breathing
sounds and provides instant feedback via a smartphone app to help
patients manage asthma and COPD.

284
PREDICTIVE ANALYTICS

• Manufacturing: Predict the location and rate of machine failures.


Optimize raw material deliveries based on projected future demands.

• Law enforcement: Use crime trend data to define neighbourhoods that


may need additional protection at certain times of the year.

9.6 PREDICTIVE ANALYTICS TOOLS

Predictive analytics tools give users deep, real-time insights into an almost
endless array of business activities. Tools can be used to predict various
types of behaviour and patterns, such as how to allocate resources at
particular times, when to replenish stock or the best moment to launch a
marketing campaign, basing predictions on an analysis of data collected
over a period of time.

Virtually all predictive analytics adopters use tools provided by one or more
external developers. Many such tools are tailored to meet the needs of
specific enterprises and departments.

Major predictive analytics software and service providers include:


• Acxiom
• IBM
• Information Builders
• Microsoft
• SAP
• SAS Institute
• Tableau Software
• Teradata
• TIBCO Software

285
PREDICTIVE ANALYTICS

• Predictive analytics models

Models are the foundation of predictive analytics — the templates that


allow users to turn past and current data into actionable insights, creating
positive long-term results. Some typical types of predictive models include:
• Customer Lifetime Value Model: Pinpoint customers who are most likely
to invest more in products and services.
• Customer Segmentation Model: Group customers based on similar
characteristics and purchasing behaviors
• Predictive Maintenance Model: Forecast the chances of essential
equipment breaking down.
• Quality Assurance Model: Spot and prevent defects to avoid
disappointments and extra costs when providing products or services to
customers.

9.7 PREDICTIVE MODELLING -WORKING AND


TECHNIQUES

Predictive models use known results to develop (or train) a model that can
be used to predict values for different or new data. Modeling provides
results in the form of predictions that represent a probability of the target
variable (for example, revenue) based on estimated significance from a set
of input variables.

This is different from descriptive models that help you understand what
happened, or diagnostic models that help you understand key relationships
and determine why something happened. Entire books are devoted to
analytical methods and techniques. Complete college curriculums delve
deeply into this subject. But for starters, here are a few basics.

286
PREDICTIVE ANALYTICS

There are two types of predictive models.

1. Classification models predict class membership. For instance, you try


to classify whether someone is likely to leave, whether he will respond
to a solicitation, whether he’s a good or bad credit risk, etc. Usually, the
model results are in the form of 0 or 1, with 1 being the event you are
targeting.

2. Regression models predict a number – for example, how much


revenue a customer will generate over the next year or the number of
months before a component will fail on a machine.

Three of the most widely used predictive modeling techniques are decision
trees, regression and neural networks.

I. Regression (linear and logistic) is one of the most popular method


in statistics. Regression analysis estimates relationships among
variables. Intended for continuous data that can be assumed to follow a
normal distribution, it finds key patterns in large data sets and is often
used to determine how much specific factors, such as the price,
influence the movement of an asset. With regression analysis, we want
to predict a number, called the response or Y variable. With linear
regression, one independent variable is used to explain and/or predict
the outcome of Y. Multiple regression uses two or more independent
variables to predict the outcome. With logistic regression, unknown
variables of a discrete variable are predicted based on known value of
other variables. The response variable is categorical, meaning it can
assume only a limited number of values. With binary logistic regression,
a response variable has only two values such as 0 or 1. In multiple
logistic regression, a response variable can have several levels, such as
low, medium and high, or 1, 2 and 3.

Regression techniques are often used in banking, investing and other


finance-oriented models. Regression helps users forecast asset values
and comprehend the relationships between variables, such as
commodities and stock prices.

287
PREDICTIVE ANALYTICS

II. Decision trees are classification models that partition data into
subsets based on categories of input variables. This helps you
understand someone's path of decisions. A decision tree looks like a tree
with each branch representing a choice between a number of
alternatives, and each leaf representing a classification or decision. This
model looks at the data and tries to find the one variable that splits the
data into logical groups that are the most different. Decision trees are
popular because they are easy to understand and interpret. They also
handle missing values well and are useful for preliminary variable
selection. So, if you have a lot of missing values or want a quick and
easily interpretable answer, you can start with a tree.

Decision trees, one of the most popular techniques, rely on a schematic,


tree-shaped diagram that's used to determine a course of action or to
show a statistical probability. The branching method can also show every
possible outcome of a particular decision and how one choice may lead
to the next.

III.Neural networks are sophisticated techniques capable of modeling


extremely complex relationships. They’re popular because they’re
powerful and flexible. The power comes in their ability to handle
nonlinear relationships in data, which is increasingly common as we
collect more data. They are often used to confirm findings from simple
techniques like regression and decision trees. Neural networks are
based on pattern recognition and some AI processes that graphically
“model” parameters. They work well when no mathematical formula is
known that relates inputs to outputs, prediction is more important than
explanation or there is a lot of training data. Artificial neural networks
were originally developed by researchers who were trying to mimic the
neurophysiology of the human brain.

Predictive analytics adopters have easy access to a wide range of


statistical, data-mining and machine-learning algorithms designed for use
in predictive analysis models. Algorithms are generally designed to solve a
specific business problem or series of problems, enhance an existing
algorithm or supply some type of unique capability.

Clustering algorithms, for example, are well suited for customer


segmentation, community detection and other social-related tasks. To
improve customer retention, or to develop a recommendation system,

288
PREDICTIVE ANALYTICS

classification algorithms are typically used. A regression algorithm is


typically selected to create a credit scoring system or to predict the
outcome of many time-driven events.

For Example: Predictive analytics in healthcare: Healthcare


organizations have become some of the most enthusiastic predictive
analytics adopters for a very simple reason: The technology is helping
them save money.

Healthcare organizations use predictive analytics in several different ways,


including intelligently allocating facility resources based on past trends,
optimizing staff schedules, identifying patients at risk for a costly near-
term readmission and adding intelligence to pharmaceutical and supply
acquisition and management.

A 2017 Society of Actuaries report on healthcare industry trends in


predictive analytics, discovered that over half of healthcare executives (57
percent) at organizations already using predictive analytics believe that the
technology will allow them to save 15 percent or more of their total budget
over the next five years. An additional 26 percent predicted savings of 25
percent or more. The study also revealed that most healthcare executives
(89 percent) belong to organizations that are either now using predictive
analytics or planning to do so within the next five years. An impressive 93
percent of healthcare executives stated that predictive analytics is
important to their business’ future.

While getting started in predictive analytics isn't exactly a snap, it's a task
that virtually any business can handle as long as one remains committed to
the approach and is willing to invest the time and funds necessary to get
the project moving. Beginning with a limited-scale pilot project in a critical
business area is an excellent way to cap start-up costs while minimizing
the time before financial rewards begin rolling in. Once a model is put into
action, it generally requires little upkeep as it continues to grind out
actionable insights for many years.

289
PREDICTIVE ANALYTICS

9.8 WHY IS PREDICTIVE ANALYTICS IMPORTANT?

Detecting Fraud

Optimising Marketing
Campaigns

Improving Operations

Reducing Risk

Organizations are turning to predictive analytics to help solve


difficult problems and uncover new opportunities. Common uses include:

• Detecting fraud. Combining multiple analytics methods can improve


pattern detection and prevent criminal behaviour. As cyber security
becomes a growing concern, high-performance behavioural analytics
examines all actions on a network in real time to spot abnormalities that
may indicate fraud, zero-day vulnerabilities and advanced persistent
threats.

• Optimizing marketing campaigns. Predictive analytics are used to


determine customer responses or purchases, as well as promote cross-
sell opportunities. Predictive models help businesses attract, retain and
grow their most profitable customers.

• Improving operations. Many companies use predictive models to


forecast inventory and manage resources. Airlines use predictive
analytics to set ticket prices. Hotels try to predict the number of guests
for any given night to maximize occupancy and increase revenue.
Predictive analytics enables organizations to function more efficiently.

290
PREDICTIVE ANALYTICS

• Reducing risk. Credit scores are used to assess a buyer’s likelihood of


default for purchases and are a well-known example of predictive
analytics. A credit score is a number generated by a predictive model
that incorporates all data relevant to a person’s creditworthiness. Other
risk-related uses include insurance claims and collections.

9.9 CASE STUDY:

This case study is an Interesting Predictive Analytic Examples with


MATLAB who applied innovative ways to apply predictive analytics
using to create new products and services, and to solve long-
standing problems in new ways.

These examples illustrate predictive analytics in action:

1. Baker Hughes Develops Predictive Maintenance Software for Gas


and Oil Extraction Equipment Using Data Analytics and Machine
Learning

Baker Hughes trucks are equipped with positive displacement pumps that
inject a mixture of water and sand deep into drilled wells. With pumps
accounting for about $100,000 of the $1.5 million total cost of the truck,
Baker Hughes needed to determine when a pump was about to fail. They
processed and analysed up to a terabyte of data collected at 50,000
samples per second from sensors installed on 10 trucks operating in the
field, and trained a neural network to use sensor data to predict pump
failures. The software reduced maintenance costs by 30–40%—or more
than $10 million.

2. Building IQ Develops Proactive Algorithms for HVAC Energy


Optimization in Large-Scale Buildings

Heating, ventilation, and air-conditioning (HVAC) systems in large-scale


commercial buildings are often inefficient because they do not take into
account changing weather patterns, variable energy costs, or the building’s
thermal properties.

Building IQ’s cloud-based software platform uses advanced algorithms to


continuously process gigabytes of information from power meters,
thermometers, and HVAC pressure sensors.

291
PREDICTIVE ANALYTICS

Machine learning is used to segment data and determine the relative


contributions of gas, electric, steam, and solar power to heating and
cooling processes.

Optimization is used to determine the best schedule for heating and cooling
each building throughout the day. The Building IQ platform reduces HVAC
energy consumption in large-scale commercial buildings by 10–25% during
normal operation.

3. Developing Detection Algorithms to Reduce False Alarms in


Intensive Care Units

False alarms from electrocardiographs and other patient monitoring devices


are a serious problem in intensive care units (ICUs). Noise from false
alarms disturbs patients’ sleep, and frequent false alarms desensitize
clinical staff to genuine warnings.

Competitors in the Physio Net /Computing in Cardiology Challenge were


tasked with developing algorithms that could distinguish between true and
false alarms in signals recorded by ICU monitoring devices.

Czech Academy of Sciences researchers won first place in the real-time


category of the challenge with MATLAB algorithms that can detect QRS
complexes, distinguish between normal and ventricular heartbeats, and
filter out false QRS complexes caused by cardiac pacemaker stimuli. The
algorithms produced a true positive rate (TPR) and true negative rate
(TNR) of 92% and 88%, respectively.

Predictive Analytics with MATLAB

To unlock the value of business and engineering data to make informed


decisions, teams developing predictive analytics applications increasingly
turn to MATLAB.

Using MATLAB tools and functions, one can perform predictive analytics
with engineering, scientific, and field data, as well as business and
transactional data. With MATLAB, you can deploy predictive applications to
large-scale production systems, and embedded systems.

292
PREDICTIVE ANALYTICS

Why MATLAB for Predictive Analytics is used?

1. MATLAB analytics work with both business and engineering data.


MATLAB has native support for sensor, image, video, telemetry, binary,
and other real-time formats. Explore this data using MATLAB Tall arrays
for Hadoop and Spark, and by connecting interfaces to ODBC/JDBC
databases.

2. MATLAB lets engineers do data science themselves.


Enable your domain experts to do data science, with powerful tools to
help them do machine learning, deep learning, statistics, optimization,
signal analysis, and image processing.

3. MATLAB analytics run in embedded systems.


Develop analytics to run on embedded platforms, by creating portable C
and C++ code from MATLAB code.

4. MATLAB analytics deploy to enterprise IT systems.


MATLAB integrates into enterprise systems, clusters, and clouds, with a
royalty-free deployable runtime.

Your Data + MATLAB = Success with Predictive Analytics

In this simplified view, engineering data arrives from sensors, instruments,


and connected systems out in the world. The data is collected and stored in
a file system either in-house or in the cloud.
This data is combined with data sourced from traditional business systems
such as cost data, sales results, customer complaints, and marketing
information.

After this, the analytics are developed by an engineer or domain expert


using MATLAB. Pre-processing is almost always required to deal with
missing data, outliers, or other unforeseen data quality issues. Following
that, analytics methods such as statistics and machine learning are used to
produce an “analytic”–a predictive model of your system.

To be useful, that predictive model is then deployed—either in a production


IT environment feeding a real-time transactional or IT system such as an
e-commerce site or to an embedded device—a sensor, a controller, or a
smart system in the real-world such as an autonomous vehicle.

293
PREDICTIVE ANALYTICS

9.10 SUMMARY

Predictive analytics is the use of data, statistical algorithms and machine


learning techniques to identify the likelihood of future outcomes based on
historical data. The goal is to go beyond knowing what has happened to
providing a best assessment of what will happen in the future.

Though predictive analytics has been around for decades, it's a technology
whose time has come. More and more organizations are turning to
predictive analytics to increase their bottom line and competitive
advantage. Growing volumes and types of data, and more interest in using
data to produce valuable insights.

Predictive analytics tools give users deep, real-time insights into an almost
endless array of business activities. Tools can be used to predict various
types of behaviour and patterns, such as how to allocate resources at
particular times, when to replenish stock or the best moment to launch a
marketing campaign, basing predictions on an analysis of data collected
over a period of time.

Virtually all predictive analytics adopters use tools provided by one or more
external developers. Many such tools are tailored to meet the needs of
specific enterprises and departments.

Any industry can use predictive analytics to reduce risks, optimize


operations and increase revenue.

The financial industry, with huge amounts of data and money at stake, has
long embraced predictive analytics to detect and reduce fraud, measure
credit risk, maximize cross-sell/up-sell opportunities and retain valuable
customers. Commonwealth Bank uses analytics to predict the likelihood of
fraud activity for any given transaction before it is authorized within 40
milliseconds of the transaction initiation.

In Oil, Gas & Utilities industry, whether it is predicting equipment failures


and future resource needs, mitigating safety and reliability risks, or
improving overall performance, the energy industry has embraced
predictive analytics with vigour.

294
PREDICTIVE ANALYTICS

Governments have been key players in the advancement of computer


technologies. They also use predictive analytics to enhance cyber security.

In addition to detecting claims fraud, the health insurance industry is


taking steps to identify patients most at risk of chronic disease and find
what interventions are best.

For manufacturers it's very important to identify factors leading to reduced


quality and production failures, as well as to optimize parts, service
resources and distribution.

Predictive models use known results to develop (or train) a model that can
be used to predict values for different or new data. Modeling provides
results in the form of predictions that represent a probability of the target
variable (for example, revenue) based on estimated significance from a set
of input variables.

This is different from descriptive models that help you understand what
happened, or diagnostic models that help you understand key relationships
and determine why something happened.

There are two types of predictive models.

Classification models predict class membership. For instance, you try to


classify whether someone is likely to leave, whether he will respond to a
solicitation, whether he’s a good or bad credit risk, etc. Usually, the model
results are in the form of 0 or 1, with 1 being the event you are targeting.

Regression models predict a number – for example, how much revenue a


customer will generate over the next year or the number of months before
a component will fail on a machine. Three of the most widely used
predictive modeling techniques are decision trees, regression and neural
networks.

295
PREDICTIVE ANALYTICS

9.11 SELF ASSESSMENT QUESTIONS


1. What is Predictive Analytics? Define.

2. Explain the predictive analytics workflow.

3. Write short note on: Predictive models

4. What are the benefits of Predictive analytics? Describe

5. Why is predictive analytics important? Explain.

9.12 MULTIPLE CHOICE QUESTIONS

1. Predictive analytics starts with a ………………to use data to reduce waste,


save time, or cut costs.
a. Business goal
b. MIS
c. Data mining
d. Business achievements

2. Predictive modeling uses ------------------------------methods to predict


an event or outcome.
a. mathematical
b. computational
c. mathematical and computational both
d. as per model compatibility

3. Clustering algorithms, are well suited for --------------------to improve


customer retention, or to develop a recommendation system.
a. Customer Identification
b. Community segmentation
c. Only for social-related tasks
d. Customer segmentation, community detection and other social-
related tasks.

296
PREDICTIVE ANALYTICS

4. Decision trees is one of the most popular techniques which rely on a


----------------------that's used to determine a course of action or to
show a statistical probability.
a. Tree-shaped diagram
b. Schematic and tree-shaped diagram
c. data on multiple Decisions required
d. multiple actionable tasks data

5. Regression models predicts –------------- as to how much revenue a


customer will generate over the next year or the number of months
before a component will fail on a machine.
a. Numbers
b. Events
c. Past records
d. Future learning

Answers: 1.(a), 2.(c), 3.(d) 4.(b), 5.(a)

297
PREDICTIVE ANALYTICS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture

298
PRESCRIPTIVE ANALYTICS

Chapter 10
Prescriptive Analytics
Objectives:

On completion of this chapter, you will understand about Prescriptive


Analytics ,it’s importance in data analytics considering the following
aspects:

Structure:

10.1 Introduction

10.2 Understanding and selecting the right Descriptive, Predictive, and

Prescriptive Analytics

10.3 Use of Prescriptive Analytics

10.4 Difference between Predictive and prescriptive analytics

10.5 Content and working of prescriptive analytics

10.6 Usage of prescriptive analytics -Examples

10.7 A Practical Introduction to Prescriptive Analytics (with Case Study in

R)

10.8 Summary

10.9 Self Assessment Questions

10.10 Multiple Choice Questions

299
PRESCRIPTIVE ANALYTICS

10.1 INTRODUCTION

Prescriptive Analytics is the area of data analytics that focuses on finding


the best course of action in a scenario given the available data. It’s related
to both descriptive analytics and predictive analytics but emphasizes
actionable insights instead of data monitoring.

Whereas descriptive analytics offers Business intelligence (BI) insights into


what has happened, and predictive analytics focuses on forecasting
possible outcomes, prescriptive analytics aims to find the best solution
given a variety of choices. Additionally, the field also empowers companies
to make decisions based on optimizing the result of future events or risks,
and provides a model to study them.

Prescriptive analytics gathers data from a variety of both descriptive and


predictive sources for its models and applies them to the process of
decision-making. This includes combining existing conditions and possible
decisions to determine how each would impact the future. Moreover, it can
measure the impact of a decision based on different possible future
scenarios.

The field borrows heavily from mathematics and computer science, using a
variety of statistical methods to create and re-create possible decision
patterns that could affect an organization in different ways.

Prescriptive analytics is the third and final phase of business analytics,


which also includes descriptive and predictive analytics.

Referred to as the "final frontier of analytic capabilities," prescriptive


analytics entails the application of mathematical and computational
sciences and suggests decision options to take advantage of the results of
descriptive and predictive analytics. The first stage of business analytics is
descriptive analytics, which still accounts for the majority of all business
analytics today. Descriptive analytics looks at past performance and
understands that performance by mining historical data to look for the
reasons behind past success or failure. Most management reporting – such
as sales, marketing, operations, and finance – uses this type of post-
mortem analysis.

Therefore , Prescriptive analytics is the final step of business analytics.

300
PRESCRIPTIVE ANALYTICS

10.2 UNDERSTANDING AND SELECTING THE RIGHT


DESCRIPTIVE, PREDICTIVE, AND PRESCRIPTIVE
ANALYTICS

With the flood of data available to businesses regarding their supply chain
these days, companies are turning to analytics solutions to extract
meaning from the huge volumes of data to help improve decision making

Companies that are attempting to optimize their efforts need capabilities to


analyze historical data, and forecast what might happen in the future. The
promise of doing it right and becoming a data-driven organization is great.
Huge ROIs can be enjoyed as evidenced by companies that have optimized
their supply chain, lowered operating costs, increased revenues, or
improved their customer service and product mix.

Looking at all the analytic options can be a daunting task. However, luckily
these analytic options can be categorized at a high level into three distinct
types. No one type of analytic is better than another, and in fact they co-
exist with, and complement, each other.

In order for a business to have a holistic view of the market and how a
company competes efficiently within that market requires a robust analytic
environment which includes:

• Descriptive Analytics, which use data aggregation and data mining to


provide insight into the past and answer: “What has happened?”
• Predictive Analytics, which use statistical models and forecasting
techniques to understand the future and answer: “What could happen?”
• Prescriptive Analytics, which use optimization and simulation algorithms
to advise on possible outcomes and answer: “What should we do?”

301
PRESCRIPTIVE ANALYTICS

1. Descriptive Analytics: Insight into the past

Descriptive analysis or statistics does exactly what the name implies they
“describe”, or summarize raw data and make it something that is
interpretable by humans. They are analytics that describe the past. The
past refers to any point of time that an event has occurred, whether it is
one minute ago, or one year ago. Descriptive analytics are useful because
they allow us to learn from past behaviours, and understand how they
might influence future outcomes.

The vast majority of the statistics we use fall into this category. (Think
basic arithmetic like sums, averages, percent changes.) Usually, the
underlying data is a count, or aggregate of a filtered column of data to
which basic math is applied.

For all practical purposes, there are an infinite number of these statistics.
Descriptive statistics are useful to show things like total stock in inventory,
average dollars spent per customer and year-over-year change in sales.

Common examples of descriptive analytics are reports that provide


historical insights regarding the company’s production, financials,
operations, sales, finance, inventory and customers.

Therefore, Use Descriptive Analytics when you need to understand at an


aggregate level what is going on in your company, and when you want to
summarize and describe different aspects of your business.

2. Predictive Analytics: Understanding the future

Predictive analytics has its roots in the ability to “predict” what might
happen. These analytics are about understanding the future. Predictive
analytics provides companies with actionable insights based on data.
Predictive analytics provides estimates about the likelihood of a future
outcome. It is important to remember that no statistical algorithm can
“predict” the future with 100% certainty. Companies use these statistics to
forecast what might happen in the future. This is because the foundation of
predictive analytics is based on probabilities.

302
PRESCRIPTIVE ANALYTICS

These statistics try to take the data that you have, and fill in the missing
data with best guesses. They combine historical data found in ERP, CRM,
HR and POS systems to identify patterns in the data and apply statistical
models and algorithms to capture relationships between various data sets.
Companies use predictive statistics and analytics anytime they want to look
into the future. Predictive analytics can be used throughout the
organization, from forecasting customer behaviour and purchasing patterns
to identifying trends in sales activities. They also help forecast demand
for inputs from the supply chain, operations and inventory.

One common application most people are familiar with is the use of
predictive analytics to produce a credit score. These scores are used by
financial services to determine the probability of customers making future
credit payments on time. Typical business uses include understanding how
sales might close at the end of the year, predicting what items customers
will purchase together, or forecasting inventory levels based upon a myriad
of variables.

Therefore, Use Predictive Analytics any time you need to know something
about the future, or fill in the information that you do not have.

3. Prescriptive Analytics: Advise on possible outcomes

The relatively new field of prescriptive analytics allows users to “prescribe”


a number of different possible actions and guide them towards a solution.
In a nutshell, these analytics are all about providing advice. Prescriptive
analytics attempts to quantify the effect of future decisions in order to
advise on possible outcomes before the decisions are actually made. At
their best, prescriptive analytics predicts not only what will happen, but
also why it will happen, providing recommendations regarding actions that
will take advantage of the predictions.

These analytics go beyond descriptive and predictive analytics by


recommending one or more possible courses of action. Essentially they
predict multiple futures and allow companies to assess a number of
possible outcomes based upon their actions. Prescriptive analytics use a
combination of techniques and tools such as business rules, algorithms,
machine learning and computational modelling procedures. These
techniques are applied against input from many different data sets

303
PRESCRIPTIVE ANALYTICS

including historical and transactional data, real-time data feeds, and big
data.

Prescriptive analytics are relatively complex to administer, and most


companies are not yet using them in their daily course of business. When
implemented correctly, they can have a large impact on how businesses
make decisions, and on the company’s bottom line. Larger companies are
successfully using prescriptive analytics to optimize production, scheduling
and inventory in the supply chain to make sure that are delivering the right
products at the right time and optimizing the customer experience.

Therefore, Use Prescriptive Analytics anytime you need to provide users


with advice on what action to take.

Prescriptive Analytics extends beyond predictive analytics by specifying


both the actions necessary to achieve predicted outcomes, and the
interrelated effects of each decision

Thus, Prescriptive analytics attempts to quantify the effect of future


decisions in order to advise on possible outcomes before the decisions are
actually made. ... Prescriptive analytics use a combination of techniques
and tools such as business rules, algorithms, machine learning and
computational modelling procedures.

304
PRESCRIPTIVE ANALYTICS

10.3 USE OF PRESCRIPTIVE ANALYTICS

Most modern BI tools have prescriptive analytics built in and provide users
with actionable results that empower them to make better decisions. One
of the more interesting applications of prescriptive analytics is in oil and
gas management, where prices fluctuate almost by second based on ever-
changing political, environmental, and demand conditions.

For manufacturers, the ability to model prices on a variety of factors allows


them to make better decisions about production, storage, and new
discoveries. Furthermore, the field is useful for managing equipment and
maintenance, as well as making better decisions regarding drilling and
exploration locations.

In healthcare business intelligence, prescriptive analytics is applied across


the industry, both in patient care and healthcare administration. For
practitioners and care providers, prescriptive analytics helps improve
clinical care and provide more satisfactory service to patients.

Insurers also use prescriptive analytics in their risk assessment models to


provide pricing and premium information for clients. For pharmaceutical
companies, prescriptive analytics helps identify the best testing and patient
cohorts for clinical trials. This reduces the costs of testing to eventually
help expedite drug development and possible approval.

305
PRESCRIPTIVE ANALYTICS

10.4 DIFFERENCE BETWEEN PREDICTIVE AND


PRESCRIPTIVE ANALYTICS

Big Data gets a lot of buzz in the business world. It's true that data
analytics can give you deep, useful insights into your business and its
customers, but only if you use those insights to their full potential.

There are three main components to business analytics: descriptive,


predictive and prescriptive.

Descriptive analytics — the "simplest class of analytics," is your raw data


in summarized form. It's your social engagement counts, sales numbers,
customer statistics and other metrics that show you what's happening in
your business in an easy-to-understand way.

Predictive and prescriptive analytics are the next steps that help you turn
descriptive metrics into insights and decisions. But you shouldn't rely on
just one or the other; when used in conjunction, both types of analytics
can help you create the strongest, most effective business strategy is
possible.

"Predictive by itself is not enough to keep up with the increasingly


competitive landscape," "Prescriptive analytics provide intelligent
recommendations for the optimal next steps for almost any application or
business process to drive desired outcomes or accelerate results."

"Predictive analytics forecasts what will happen in the future. Prescriptive


analytics can help companies alter the future, "They're both necessary to
improve decision-making and business outcomes."

306
PRESCRIPTIVE ANALYTICS

Analytics in action

Both types of analytics inform your business strategies based on collected


data. But the major difference between predictive and prescriptive is that
the former forecasts potential future outcomes, while the latter helps you
draw up specific recommendations.

"Prescriptive analytics builds on [predictive] by informing decision makers


about different decision choices with their anticipated impact on a specific
key performance indicators. Think of [traffic navigation app] Waze. Pick an
origin and a destination — a multitude of factors get mashed together, and
[it advises] you on different route choices, each with a predicted ETA. This
is everyday prescriptive analytics at work."

10.5 CONTENT AND WORKING OF PRESCRIPTIVE


ANALYTICS

Predictive analytics answers the question what is likely to happen. This is


when historical data is combined with rules, algorithms, and occasionally
external data to determine the probable future outcome of an event or the
likelihood of a situation occurring. The final phase is prescriptive analytics,
which goes beyond predicting future outcomes by also suggesting actions
to benefit from the predictions and showing the implications of each
decision option.

Prescriptive analytics not only anticipates what will happen and when it will
happen, but also why it will happen. Further, prescriptive analytics
suggests decision options on how to take advantage of a future opportunity
or mitigate a future risk and shows the implication of each decision option.
Prescriptive analytics can continually take in new data to re-predict and re-
prescribe, thus automatically improving prediction accuracy and prescribing
better decision options. Prescriptive analytics ingests hybrid data, a
combination of structured (numbers, categories) and unstructured data
(videos, images, sounds, texts), and business rules to predict what lies
ahead and to prescribe how to take advantage of this predicted future
without compromising other priorities.

307
PRESCRIPTIVE ANALYTICS

The scientific disciplines that comprise prescriptive analytics .

All three phases of analytics can be performed through professional


services or technology or a combination. In order to scale, prescriptive
analytics technologies need to be adaptive to take into account the growing
volume, velocity, and variety of data that most mission critical processes
and their environments may produce.

One criticism of prescriptive analytics is that its distinction from predictive


analytics is ill-defined and therefore ill-conceived.

308
PRESCRIPTIVE ANALYTICS

10.6 USAGE OF PRESCRIPTIVE ANALYTICS - EXAMPLES

Prescriptive analytics incorporates both structured and unstructured data,


and uses a combination of advanced analytic techniques and disciplines to
predict, prescribe, and adapt. While the term prescriptive analytics was
first coined by IBM and later trademarked by Ayata, the underlying
concepts have been around for hundreds of years. The technology behind
prescriptive analytics synergistically combines hybrid data, business rules
with mathematical models and computational models. The data inputs to
prescriptive analytics may come from multiple sources: internal, such as
inside a corporation; and external, also known as environmental data. The
data may be structured, which includes numbers and categories, as well
as unstructured data, such as texts, images, sounds, and videos.
Unstructured data differs from structured data in that its format varies
widely and cannot be stored in traditional relational databases without
significant effort at data transformation. More than 80% of the world's data
today is unstructured, according to IBM.

In addition to this variety of data types and growing data volume, incoming
data can also evolve with respect to velocity, that is, more data being
generated at a faster or a variable pace. Business rules define the business
process and include objectives constraints, preferences, policies, best
practices, and boundaries. Mathematical models and computational models
are techniques derived from mathematical sciences, computer science and
related disciplines such as applied statistics, machine learning, operations
research, natural language processing, computer vision, pattern
recognition, image processing, speech recognition, and signal processing.
The correct application of all these methods and the verification of their
results implies the need for resources on a massive scale including human,
computational and temporal for every Prescriptive Analytic project. In
order to spare the expense of dozens of people, high performance
machines and weeks of work one must consider the reduction of resources
and therefore a reduction in the accuracy or reliability of the outcome. The
preferable route is a reduction that produces a probabilistic result within
acceptable limits.

309
PRESCRIPTIVE ANALYTICS

A. Example-1

Application in Oil and Gas

Key Questions Prescriptive Analytics software answers for oil and


gas producers

Energy is the largest industry in the world ($6 trillion in size). The
processes and decisions related to oil and natural gas exploration,
development and production generate large amounts of data. Many types
of captured data are used to create models and images of the Earth’s
structure and layers 5,000 - 35,000 feet below the surface and to describe
activities around the wells themselves, such as depositional characteristics,
machinery performance, oil flow rates, reservoir temperatures and
pressures. Prescriptive analytics software can help with both locating and
producing hydrocarbons by taking in seismic data, well log data, production
data, and other related data sets to prescribe specific recipes for how and
where to drill, complete, and produce wells in order to optimize recovery,
minimize cost, and reduce environmental footprint.

310
PRESCRIPTIVE ANALYTICS

• Unconventional Resource Development

Examples of structured and unstructured data sets generated and by the


oil and gas companies and their ecosystem of service providers that can be
analysed together using Prescriptive Analytics software

With the value of the end product determined by global commodity


economics, the basis of competition for operators in upstream E&P is the
ability to effectively deploy capital to locate and extract resources more
efficiently, effectively, predictably, and safely than their peers. In
unconventional resource plays, operational efficiency and effectiveness is
diminished by reservoir inconsistencies, and decision-making impaired by
high degrees of uncertainty. These challenges manifest themselves in the
form of low recovery factors and wide performance variations.

Prescriptive Analytics software can accurately predict production and


prescribe optimal configurations of controllable drilling, completion, and
production variables by modeling numerous internal and external variables
simultaneously, regardless of source, structure, size, or format. Prescriptive
analytics software can also provide decision options and show the impact of
each decision option so the operations managers can proactively take
appropriate actions, on time, to guarantee future exploration and

311
PRESCRIPTIVE ANALYTICS

production performance, and maximize the economic value of assets at


every point over the course of their serviceable lifetimes.

• Oilfield Equipment Maintenance

In the realm of oilfield equipment maintenance, Prescriptive Analytics can


optimize configuration, anticipate and prevent unplanned downtime,
optimize field scheduling, and improve maintenance planning. According to
General Electric, there are more than 130,000 electric submersible pumps
(ESP's) installed globally, accounting for 60% of the world's oil production.
Prescriptive Analytics has been deployed to predict when and why an ESP
will fail, and recommend the necessary actions to prevent the failure.

In the area of Health, Safety, and Environment, prescriptive analytics can


predict and pre-empt incidents that can lead to reputational and financial
loss for oil and gas companies.

• Pricing

Pricing is another area of focus. Natural gas prices fluctuate dramatically


depending upon supply, demand, econometrics, geopolitics, and weather
conditions. Gas producers, pipeline transmission companies and utility
firms have a keen interest in more accurately predicting gas prices so that
they can lock in favourable terms while hedging downside risk. Prescriptive
analytics software can accurately predict prices by modeling internal and
external variables simultaneously and also provide decision options and
show the impact of each decision option.

312
PRESCRIPTIVE ANALYTICS

Example-2

Application in Health care:

Multiple factors are driving healthcare providers to dramatically improve


business processes and operations as the United States healthcare industry
embarks on the necessary migration from a largely fee-for service,
volume-based system to a fee-for-performance, value-based system.
Prescriptive analytics is playing a key role to help improve the performance
in a number of areas involving various stakeholders: payers, providers and
pharmaceutical companies.

Prescriptive analytics can help providers improve effectiveness of their


clinical care delivery to the population they manage and in the process
achieve better patient satisfaction and retention. Providers can do better
population health management by identifying appropriate intervention
models for risk stratified population combining data from the in-facility care
episodes and home based telehealth.

Prescriptive analytics can also benefit healthcare providers in their capacity


planning by using analytics to leverage operational and usage data
combined with data of external factors such as economic data, population
demographic trends and population health trends, to more accurately plan
for future capital investments such as new facilities and equipment
utilization as well as understand the trade-offs between adding additional
beds and expanding an existing facility versus building a new one.[20]

Prescriptive analytics can help pharmaceutical companies to expedite their


drug development by identifying patient cohorts that are most suitable for
the clinical trials worldwide - patients who are expected to be compliant
and will not drop out of the trial due to complications. Analytics can tell
companies how much time and money they can save if they choose one
patient cohort in a specific country vs. another.

In provider-payer negotiation, providers can improve their negotiation


position with health insurance by developing the robust understanding of
future service utilisation. By accurately predicting utilisation , providers can
also better allocate personnel.

313
PRESCRIPTIVE ANALYTICS

10.7 A PRACTICAL INTRODUCTION TO PRESCRIPTIVE


ANALYTICS (WITH CASE STUDY IN R)

“What are the different branches of analytics?” Most of us, when we’re
starting out on our analytics journey, are taught that there are two types –
descriptive analytics and predictive analytics. There’s actually a third
branch which is often overlooked – prescriptive analytics.

Prescriptive analytics is the most powerful branch among the three. Let us
understand with an example.

Recently, a deadly cyclone hit Odisha, India, but thankfully most people
had already been evacuated. The Odisha meteorological department had
already predicted the arrival of the monstrous cyclone and made the life-
saving decision to evacuate the potentially prone regions.

Contrast that with 1999, when more than 10,000 people died because of a
similar cyclone. They were caught unaware since there was no prediction
about the coming storm. So what changed?

The government of Odisha was a beneficiary of prescriptive analytics. They


were able to utilize the services of the meteorological department’s
accurate prediction of cyclones – their path, strength, and timing. They
used this to make decisions about when and what needs to be done to
prevent any loss of life.

314
PRESCRIPTIVE ANALYTICS

• Setting up our Problem Statement

The process starts with setting up of problem statement:

The senior management in a telecom provider organization is worried


about the rising customer attrition levels. Additionally, a recent
independent survey has suggested that the industry as a whole will face
increasing churn rates and decreasing ARPU (average revenue per unit).

The effort to retain customers so far has been very reactive. Only when the
customer calls to close their account is when we take action. The
management team is keen to take more proactive measures on this front.

Data scientists are tasked with analyzing their data, deriving insights,
predicting the potential behaviour of customers, and then recommending
steps to improve performance.

• Hypothesis Generation

Generating a hypothesis is the key to unlocking any data science or


analytics project. We should first list down what it is we are trying to
achieve through our approach and then proceed from there.

315
PRESCRIPTIVE ANALYTICS

Customer churn is being driven by the below factors (according the


independent industry survey):
• Cost and billing
• Network and service quality
• Data usage connectivity issues

If you have to test the same for telecom provider. Typically, you have to
encourage the company to come up with an exhaustive set of hypotheses
so as not to leave out any variables or major points.

• Laying Down our Model Building Approach

Now you have data set, the problem statement and the hypothesis to test,
it’s time to get what insights can be drawn.
The approach is to go through similar steps. Note that , you may be
removing variables with more than 30% missing value or you can take
your own call on this.

316
PRESCRIPTIVE ANALYTICS

Here’s the code to find the variables with more than 30% missing values:
As you can see in the above illustration, where all variables with more than
30% missing values are removed . Here’s the summary of our dataset:

• Data Visualization and Data Preparation — Descriptive Analytics

Let’s do a univariate, bivariate and multivariate analysis of various


independent variables along with the target variable. This should give you
an idea of the effects of churn. Let’s start by drawing up three plots.

First, we will analyze the mean minutes of usage, revenue range, mean
total monthly recurring charge and the mean number of dropped or
blocked calls against the target variable – churn:

Similarly, we shall analyze the mean number of dropped (failed) voice calls,
the total number of calls over the life of the customer, the range of the
number of outbound wireless to wireless voice calls and the mean number
of call waiting against the churn variable:

If you change things up a bit. You will. use the faceting functionality in the
awesome “ggplot2”package to plot the months of usage, credit class code,
call drops and the number of days of current equipment against the churn
variable:

You will then analyze the numeric variable separately to see if there are
any features that have high degrees of collinearity. This is because the
presence of collinear variables always reduces the model’s performance
since they introduce bias into the model.

You can then handle the collinearity problem. Now, there are many ways of
dealing with it, such as variable transformation and reduction using
principal component analysis (PCA). Now let us remove the highly
correlated variables:

317
PRESCRIPTIVE ANALYTICS

• Prediction of Customer Behaviour — Predictive Analytics

This is the part most of you may be familiar with – building models on
the training data. You can build a number of models so that you can
compare their performance across the spectrum.

It is generally a good practice to train multiple models starting from


simple linear models to complex non-parametric and non-linear ones.
The performance of models varies depending on how the dependent and
independent variables are related. If the relationship is linear, the simpler
models give good results (plus they’re easier to interpret).

Alternatively, if the relationship is non-linear, complex models generally


give better results. As the complexity of the model increases, the bias
introduced by the model reduces and the variance increases. The models
that can be build are:

• Simple Models like Logistic Regression & Discriminant Analysis with


different thresholds for classification
• Random Forest after balancing the dataset using Synthetic Minority
Oversampling Technique (SMOTE)
• The ensemble of five individual models and predicting the output by
averaging the individual output probabilities
• XGBoost algorithm

• Recommendations to improve performance  —  Prescriptive


Analytics

And now comes the part we’ve been waiting for – prescriptive analytics!
Let’s see what recommendations we can come up with to improve the
performance of our model.

If variables removed are more than 50% ,probability of changing the


decision of the customer for every 1 unit change in the respective
independent variable. This insight was generated from the logistic
regression model. That is essentially a relationship between the log of odds
of the dependent variable with the independent variables.

318
PRESCRIPTIVE ANALYTICS

So, if you calculate the exponential of coefficients of the dependent


variable, you will get the odds and from that, you can get the
probability (using formula Probability = Odds/(1+Odds)) of customer
behaviour changing for one unit change in the independent variable.

The below is summary statistics from the logistic model proves that:

• Variables impacting cost and billing are highly significant


• Adjmou has one of the top 5 odds ratios
• The mean total monthly recurring charge (totmrc_Mean), Revenue
(charge amount), Range (rev_Range), adjmou (Billing adjustments), etc.
are found to be highly significant. This suggests that cost and billing
impact customer behaviour
• Similarly, network and service quality variables like drop_bkl_Mean
(mean no. of dropped and blocked calls) is highly significant.
Datovr_Range (Range of revenue of data overage) is not found to be
significant but has an odds ratio of more than 1 indicating that 1 unit
change in its value has more than a 50% chance of changing the
customer behaviour from one level to other. Perhaps we need to pay
attention to it
• Additionally, the intercept is significant. This constitutes the effects of
levels of categorical variables that were removed by the model

319
PRESCRIPTIVE ANALYTICS

10.8 SUMMARY:

The Prescriptive analytics is the third and final phase of business analytics,
which also includes descriptive and predictive analytics.

Referred to as the "final frontier of analytic capabilities," prescriptive


analytics entails the application of mathematical and computational
sciences and suggests decision options to take advantage of the results of
descriptive and predictive analytics. The first stage of business analytics is
descriptive analytics, which still accounts for the majority of all business
analytics today. Descriptive analytics looks at past performance and
understands that performance by mining historical data to look for the
reasons behind past success or failure. Most management reporting – such
as sales, marketing, operations, and finance – uses this type of post-
mortem analysis.

The relatively new field of prescriptive analytics allows users to “prescribe”


a number of different possible actions and guide them towards a solution.
In a nutshell, these analytics are all about providing advice. Prescriptive
analytics attempts to quantify the effect of future decisions in order to
advise on possible outcomes before the decisions are actually made. At
their best, prescriptive analytics predicts not only what will happen, but
also why it will happen, providing recommendations regarding actions that
will take advantage of the predictions.

These analytics go beyond descriptive and predictive analytics by


recommending one or more possible courses of action. Essentially they
predict multiple futures and allow companies to assess a number of
possible outcomes based upon their actions. Prescriptive analytics use a
combination of techniques and tools such as business rules, algorithms,
machine learning and computational modelling procedures. These
techniques are applied against input from many different data sets
including historical and transactional data, real-time data feeds, and big
data.

Prescriptive analytics are relatively complex to administer, and most


companies are not yet using them in their daily course of business. When
implemented correctly, they can have a large impact on how businesses
make decisions, and on the company’s bottom line. Larger companies are
successfully using prescriptive analytics to optimize production, scheduling

320
PRESCRIPTIVE ANALYTICS

and inventory in the supply chain to make sure that are delivering the right
products at the right time and optimizing the customer experience.

Most modern BI tools have prescriptive analytics built in and provide users
with actionable results that empower them to make better decisions. One
of the more interesting applications of prescriptive analytics is in oil and
gas management, where prices fluctuate almost by second based on ever-
changing political, environmental, and demand conditions.

For manufacturers, the ability to model prices on a variety of factors allows


them to make better decisions about production, storage, and new
discoveries. Furthermore, the field is useful for managing equipment and
maintenance, as well as making better decisions regarding drilling and
exploration locations.

In healthcare business intelligence, prescriptive analytics is applied across


the industry, both in patient care and healthcare administration. For
practitioners and care providers, prescriptive analytics helps improve
clinical care and provide more satisfactory service to patients.

Insurers also use prescriptive analytics in their risk assessment models to


provide pricing and premium information for clients. For pharmaceutical
companies, prescriptive analytics helps identify the best testing and patient
cohorts for clinical trials. This reduces the costs of testing to eventually
help expedite drug development and possible approval.

Prescriptive analytics incorporates both structured and unstructured data,


and uses a combination of advanced analytic techniques and disciplines to
predict, prescribe, and adapt. While the term prescriptive analytics was
first coined by IBM and later trademarked by Ayata, the underlying
concepts have been around for hundreds of years. The technology behind
prescriptive analytics synergistically combines hybrid data, business rules
with mathematical models and computational models. The data inputs to
prescriptive analytics may come from multiple sources: internal, such as
inside a corporation; and external, also known as environmental data. The
data may be structured, which includes numbers and categories, as well
as unstructured data, such as texts, images, sounds, and videos.
Unstructured data differs from structured data in that its format varies
widely and cannot be stored in traditional relational databases without

321
PRESCRIPTIVE ANALYTICS

significant effort at data transformation. More than 80% of the world's data
today is unstructured, according to IBM.

What the self-driving car will deliver is a (fundamental) change in the car
driving experience. Likewise, the impact of prescriptive analytics, AI, and
ML in the workplace will change the work experience and redefine the jobs
and roles. In organizations and business, we will see the growing presence
of Augmented Decision Making through more informed, prescriptive
analytics that helps and guides decision-makers to examine and determine
the best course of action. Focusing on prescriptive analytics, AI, and ML on
use cases that add value to people’s capabilities and performance as well
as process value is essential to successful organizational adoption.

The technological revolution that is underway will fundamentally change


the way we live and work. AI and ML are already becoming more relevant
in our day-to-day lives (beyond the driving assistance in cars). Alexa,
Cortana, and Siri, as AI Assistants, are now commonly used and referred
to. One important aspect of the AI Assistant adoption has been the User
Experience (UX) developed to make them natural and normal to use.

10.9 SELF ASSESSMENT QUESTIONS:

1. Define Prescriptive Analytics.

2. In order for a business to have a holistic view compare the descriptive,


predictive and prescriptive analytics.

3. How prescriptive analytics is used ? Explain

4. Explain working of the Prescriptive Analytics.

5. Write short note on : Use / application of prescriptive Analytics in Oil


and Gas Sector.

322
PRESCRIPTIVE ANALYTICS

10.10 MULTIPLE CHOICE QUESTIONS:

1. What type Analytics is used when you need to understand at an


aggregate level, what is going on in your company, and when you want
to summarize and describe different aspects of your business?
a. Prescriptive Analytics
b. Descriptive Analytics
c. Predictive Analytics
d. Diagnostic Analytics

2. In the financial sector the type of analytics that can be used throughout
the organization, from forecasting customer behaviour and purchasing
patterns to identifying trends in sales activities is --------------which
also help forecast demand for inputs from the supply chain, operations
and inventory.
a. Prescriptive Analytics
b. Descriptive Analytics
c. Predictive Analytics
d. Diagnostic Analytics

3. Which Analytics, use optimization and simulation algorithms to advise


on possible outcomes and answer: “What should we do?”
a. Prescriptive Analytics
b. Descriptive Analytics
c. Predictive Analytics
d. Diagnostic Analytics

4. Prescriptive analytics attempts to quantify the effect of future decisions


in order to advise on possible outcomes---------------------------.
Prescriptive analytics use a combination of techniques and tools such as
business rules, algorithms, machine learning and computational
modelling procedures.
a. before the decisions are actually made
b. after the decisions are actually made
c. which may not come out unless modelling process is completed
d. at any time during the modelling process

323
PRESCRIPTIVE ANALYTICS

5. In the area of --------------------------- prescriptive analytics can predict


and pre-empt incidents that can lead to reputational and financial loss
for oil and gas companies.
a. Health
b. Environment
c. Safety
d. Health, Safety, and Environment

Answers: 1.(b), 2.(c), 3.(a), 4.(a), 5.(d)

324
PRESCRIPTIVE ANALYTICS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture

325
BUSINESS ANALYTICS PROCESS

Chapter 11
Business Analytics Process

Objectives:

On completion of this chapter, you will be able to understand business


analytical process by taking in to consideration following points:

Structure:

11.1 Introduction

11.2 Business Analytics Process

11.3 Relationship of BA Process and Organization Decision-Making Process

11.4 The 8-important steps of Business Analytics process

11.5 Summary

11.6 Self Assessment Questions

11.7 Multiple Choice Questions

326
BUSINESS ANALYTICS PROCESS

11.1 INTRODUCTION:

Business analytics begins with a data set (a simple collection of data or a


data file) or commonly with a database (a collection of data files that
contain information on people, locations, and so on). As databases grow,
they need to be stored somewhere. Technologies such as computer
clouds (hardware and software used for data remote storage, retrieval, and
computational functions) and data warehousing (a collection of databases
used for reporting and data analysis) store data. Database storage areas
have become so large that a new term was devised to describe them. Big
data describes the collection of data sets that are so large and complex
that software systems are hardly able to process them. Little data
describes the smaller data segments or files that help individual businesses
keep track of customers. As a means of sorting through data to find useful
information, the application of analytics has found new purpose.

Three terms in business literature are often related to one another:


analytics, business analytics, and business intelligence.

Analytics can be defined as a process that involves the use of statistical


techniques (measures of central tendency, graphs, and so on), information
system software (data mining, sorting routines), and operations research
methodologies (linear programming) to explore, visualize, discover and
communicate patterns or trends in data. Simply, analytics convert data into
useful information. Analytics is an older term commonly applied to all
disciplines, not just business. A typical example of the use of analytics is
the weather measurements collected and converted into statistics, which in
turn predict weather patterns.

There are many types of analytics, and there is a need to organize these
types to understand their uses. These types of analytics can be viewed
independently. For example, some firms may only use descriptive analytics
to provide information on decisions they face. Others may use a
combination of analytic types to glean insightful information needed to plan
and make decisions. This is summarised as under:

327
BUSINESS ANALYTICS PROCESS

Type of Analytics Definition


Descriptive The application of simple statistical techniques that
describes what is contained in a data set or database.
Example: An age bar chart is used to depict retail shoppers
for a department store that wants to target advertising to
customers by age.
Predictive An application of advanced statistical, information
software, or operations research methods to identify
predictive variables and build predictive models to identify
trends and relationships not readily observed in a
descriptive analysis. Example: Multiple regression is used
to show the relationship (or lack of relationship) between
age, weight, and exercise on diet food sales. Knowing that
relationships exist helps explain why one set of
independent variables influences dependent variables such
as business performance.
Prescriptive An application of decision science, management science,
and operations research methodologies (applied
mathematical techniques) to make best use of allocable
resources. Example: A department store has a limited
advertising budget to target customers. Linear
programming models can be used to optimally allocate the
budget to various advertising media.

The purposes and methodologies used for each of the three types of
analytics differ, as can be seen in Table below. It is these differences that
distinguish analytics from business analytics. Whereas analytics is focused
on generating insightful information from data sources, business analytics
goes the extra step to leverage analytics to create an improvement in
measurable business performance. Whereas the process of analytics can
involve any one of the three types of analytics, the major components of
business analytics include all three used in combination to generate new,
unique, and valuable information that can aid business organization
decision-making. In addition, the three types of analytics are applied
sequentially (descriptive, then predictive, then prescriptive).
Therefore, business analytics (BA) can be defined as a process beginning
with business-related data collection and consisting of sequential
application of descriptive, predictive, and prescriptive major analytic
components, the outcome of which supports and demonstrates business
decision-making and organizational performance.

328
BUSINESS ANALYTICS PROCESS

Type of Purpose Examples of


Analytics Methodologies
Descriptive To identify possible trends in Descriptive statistics,
large data sets or databases. The including measures of
purpose is to get a rough picture central tendency (mean,
of what generally the data looks median, mode),
like and what criteria might have measures of dispersion
potential for identifying trends or (standard deviation),
future business behavior. charts, graphs, sorting
methods, frequency
distributions, probability
distributions, and
sampling methods.
Predictive To build predictive models Statistical methods like
designed to identify and predict multiple regression and
future trends. ANOVA. Information
system methods like data
mining and sorting.
Operations research
methods like forecasting
models.
Prescriptive To allocate resources optimally to Operations research
take advantage of predicted methodologies like linear
trends or future opportunities. programming and
decision theory.

Business intelligence (BI) can be defined as a set of processes and


technologies that convert data into meaningful and useful information for
business purposes. While some believe that BI is a broad subject that
encompasses analytics, business analytics, and information systems, some
believe it is mainly focused on collecting, storing, and exploring large
database organizations for information useful to decision-making and
planning. One function that is generally accepted as a major component of
BI involves storing an organization’s data in computer cloud storage or in
data warehouses. Data warehousing is not an analytics or business
analytics function, although the data can be used for analysis. In
application, BI is focused on querying and reporting, but it can include
reported information from a BA analysis. BI seeks to answer questions
such as what is happening now and where, and also what business actions
are needed based on prior experience. BA, on the other hand, can answer

329
BUSINESS ANALYTICS PROCESS

questions like why something is happening, what new trends may exist,
what will happen next, and what is the best course for the future.

Thus, BA includes the same procedures as in plain analytics but has the
additional requirement that the outcome of the analytic analysis must
make a measurable impact on business performance. BA includes reporting
results like BI but seeks to explain why the results occur based on the
analysis rather than just reporting and storing the results, as is the case
with BI. In the below table Characteristics of analytics , business analytics
and business intelligence is summarised to know it better in comparison of
each other.

Characteristics Analytics Business Analytics Business


(BA) Intelligence (BI)
Business What is happening, What is happening What is happening
performance and what will be now, what will be now, and what have
planning role happening? happening, and we done in the past
what is the best to deal with it?
strategy to deal with
it?
Use of descriptive Yes Yes Yes
analytics as a major
component of
analysis
Use of predictive Yes Yes No (only historically)
analytics as a major
component of
analysis
Use of prescriptive Yes Yes No (only historically)
analytics as a major
component of
analysis
Use of all three in No Yes No
combination
Business focus Maybe Yes Yes

Focus of storing and No No Yes


maintaining data
Required focus of No Yes No
improving business
value and
performance

330
BUSINESS ANALYTICS PROCESS

11.2. BUSINESS ANALYTICS PROCESS

The complete business analytic process involves the three major


component steps applied sequentially to a source of data. The outcome of
the business analytic process must relate to business and seek to improve
business performance in some way.

331
BUSINESS ANALYTICS PROCESS

Business analytic process

The logic of the BA process as indicated in above Figure is initially based on


a question: What valuable or problem-solving information is locked up in
the sources of data that an organization has available? At each of the three
steps that make up the BA process, additional questions need to be
answered, as shown in above Figure. Answering all these questions
requires mining the information out of the data via the three steps of
analysis that comprise the BA process. The analogy of digging in a mine is
appropriate for the BA process because finding new, unique, and valuable
information that can lead to a successful strategy is just as good as finding
gold in a mine. Many firms routinely undertake BA to solve specific
problems, while other firms undertake BA to explore and discover new
knowledge to guide organizational planning and decision-making to
improve business performance.

The size of some data sources can be unmanageable, overly complex, and
generally confusing. Sorting out data and trying to make sense of its
informational value requires the application of descriptive analytics as a
first step in the BA process. One might begin simply by sorting the data
into groups using the four possible classifications presented in below Table.
Also, incorporating some of the data into spreadsheets like Excel and
preparing cross tabulations and contingency tables are means of restricting
the data into a more manageable data structure. Simple measures of
central tendency and dispersion might be computed to try to capture
possible opportunities for business improvement. Other descriptive analytic
summarization methods, including charting, plotting, and graphing, can
help decision makers visualize the data to better understand content
opportunities. The types of Data Measurement Classification Scales are as
under:

332
BUSINESS ANALYTICS PROCESS

Type of Data
Measurement Description
Scale
Data that is grouped by one or more characteristics. Categorical data
usually involves cardinal numbers counted or expressed as
percentages. Example 1: Product markets that can be characterized
Categorical
by categories of “high-end” products or “low-income” products,
Data based on dollar sales. It is common to use this term to apply to data
sets that contain items identified by categories as well as
observations summarized in cross-tabulations or contingency tables.
Data that is ranked or ordered to show relational preference.
Example 1: Football team rankings not based on points scored but
Ordinal Data on wins. Example 2: Ranking of business firms based on product
quality.
Data that is arranged along a scale where each value is equally
distant from others. It is ordinal data. Example 1: A temperature
gauge. Example 2: A survey instrument using a Likert scale (that is,
Interval Data 1, 2, 3, 4, 5, 6, 7), where 1 to 2 is perceived as equidistant to the
interval from 2 to 3, and so on. Note: In ordinal data, the ranking of
firms might vary greatly from first place to second, but in interval
data, they would have to be relationally proportional.
Data expressed as a ratio on a continuous scale. Example 1: The
Ratio Data ratio of firms with green manufacturing programs is twice that of
firms without such a program.

From Step 1 in the Descriptive Analytic analysis, some patterns or


variables of business behaviour should be identified representing targets of
business opportunities and possible future trend behaviour. Additional
effort (more mining) might be required, such as the generation of detailed
statistical reports narrowly focused on the data related to targets of
business opportunities to explain what is taking place in the data (what
happened in the past). This is like a statistical search for predictive
variables in data that may lead to patterns of behaviour a firm might take
advantage of if the patterns of behaviour occur in the future. For example,
a firm might find in its general sales information that during economic
downtimes, certain products are sold to customers of a particular income
level if certain advertising is undertaken. The sales, customers, and
advertising variables may be in the form of any of the measurable scales
for data, but they have to meet the three conditions of BA previously
mentioned: clear relevancy to business, an implementable resulting
insight, and performance and value measurement capabilities.

333
BUSINESS ANALYTICS PROCESS

To determine whether observed trends and behaviour found in the


relationships of the descriptive analysis of Step 1 actually exist or hold true
and can be used to forecast or predict the future, more advanced analysis
is undertaken in Step 2, Predictive Analytic analysis, of the BA process.
There are many methods that can be used in this step of the BA process. A
commonly used methodology is multiple regression. This methodology is
ideal for establishing whether a statistical relationship exists between the
predictive variables found in the descriptive analysis. The relationship
might be to show that a dependent variable is predictively associated with
business value or performance of some kind. For example, a firm might
want to determine which of several promotion efforts (independent
variables measured and represented in the model by dollars in TV ads,
radio ads, personal selling, and/or magazine ads) is most efficient in
generating customer sale dollars (the dependent variable and a measure of
business performance). Care would have to be taken to ensure the multiple
regression model was used in a valid and reliable way which is why other
statistical confirmatory analyses are used to support the model
development. Exploring a database using advanced statistical procedures
to verify and confirm the best predictive variables is an important part of
this step in the BA process. This answers the questions of what is currently
happening and why it happened between the variables in the model.

A single or multiple regression model can often forecast a trend line into
the future. When regression is not practical, other forecasting methods
(exponential smoothing, smoothing averages) can be applied as predictive
analytics to develop needed forecasts of business trends. The identification
of future trends is the main output of Step 2 and the predictive analytics
used to find them. This helps answer the question of what will happen.

If a firm knows where the future lies by forecasting trends as they would in
Step 2 of the BA process, it can then take advantage of any possible
opportunities predicted in that future state. In Step 3, Prescriptive
Analytics analysis, operations research methodologies can be used to
optimally allocate a firm’s limited resources to take best advantage of the
opportunities it found in the predicted future trends. Limits on human,
technology, and financial resources prevent any firm from going after all
opportunities they may have available at any one time. Using prescriptive
analytics allows the firm to allocate limited resources to optimally achieve
objectives as fully as possible. For example, linear programming (a
constrained optimization methodology) has been used to maximize the

334
BUSINESS ANALYTICS PROCESS

profit in the design of supply chains. This third step in the BA process
answers the question of how best to allocate and manage decision-making
in the future.

In summary, the three major components of descriptive, predictive, and


prescriptive analytics arranged as steps in the BA process can help a firm
find opportunities in data, predict trends that forecast future opportunities,
and aid in selecting a course of action that optimizes the firm’s allocation of
resources to maximize value and performance.

11.3. RELATIONSHIP OF BA PROCESS AND ORGANIZATION


DECISION-MAKING PROCESS

The BA process can solve problems and identify opportunities to improve


business performance. In the process, organizations may also determine
strategies to guide operations and help achieve competitive advantages.
Typically, solving problems and identifying strategic opportunities to follow
are organization decision-making tasks. The latter, identifying
opportunities, can be viewed as a problem of strategy choice requiring a
solution. It should come as no surprise that the BA process described in
closely parallels classic organization decision-making processes. As
depicted in below Figure, the business analytic process has an inherent
relationship to the steps in typical organization decision-making processes.

335
BUSINESS ANALYTICS PROCESS

Comparison of business analytics and organization decision-making


processes

The organization decision-making process (ODMP) presented in above


Figure is focused on decision making to solve problems but could also be
applied to finding opportunities in data and deciding what is the best
course of action to take advantage of them. The five-step ODMP begins
with the perception of disequilibrium, or the awareness that a problem
exists that needs a decision. Similarly, in the BA process, the first step is to
recognize that databases may contain information that could both solve
problems and find opportunities to improve business performance. Then in
Step 2 of the ODMP, an exploration of the problem to determine its size,
impact, and other factors is undertaken to diagnose what the problem is.
Likewise, the BA descriptive analytic analysis explores factors that might
prove useful in solving problems and offering opportunities. The ODMP
problem statement step is similarly structured to the BA predictive analysis
to find strategies, paths, or trends that clearly define a problem or
opportunity for an organization to solve problems. Finally, the ODMP’s last

336
BUSINESS ANALYTICS PROCESS

steps of strategy selection and implementation involve the same kinds of


tasks that the BA process requires in the final prescriptive step (make an
optimal selection of resource allocations that can be implemented for the
betterment of the organization).

The decision-making foundation that has served ODMP for many decades
parallels the BA process. The same logic serves both processes and
supports organization decision-making skills and capacities.

To summarise, in the above discussion you must have understood


important terminology and defined business analytics in terms of a unique
process useful in securing information on which decisions can be made and
business opportunities seized. Data classification measurement scales were
also briefly introduced to aid in understanding the types of measures that
can be employed in BA. The relationship of the BA process and the
organization decision-making process was explained in terms of how they
complement each other. This chapter ended with a brief overview of this
book’s organization and how it is structured to aid learning.

11.4 THE 8-IMPORTANT STEPS OF BUSINESS ANALYTICS


PROCESS

Following are the 8-step business analysis process steps that you can
apply whether you are in an agile environment or a traditional one,
whether you are purchasing off-the-shelf software or building custom code,
whether you are responsible for a multi-million dollar project or a one-
week project.

Depending on the size and complexity of your project, you can go through
these steps quickly or slowly, but to get to a successful outcome you must
go through them.

First, take a look at this process flow below which shows how the 8 steps
fit together and how you might iterate through them on a typical business
analyst project.

337
BUSINESS ANALYTICS PROCESS

Now let’s look at each of the 8 steps in more detail.

Step 1 – Get Oriented

Often as business analysts, we are expected to dive into a project and start
contributing as quickly as possible to make a positive impact. Sometimes
the project is already underway. Other times there are vague notions about
what the project is or why it exists. We face a lot of ambiguity as business
analysts and it’s our job to clarify the scope, requirements, and business
objectives as quickly as possible.

But that doesn’t mean that it makes sense to get ourselves knee-deep into
the detailed requirements right away. Doing so very likely means a quick
start in the wrong direction.

Taking some time, whether that’s a few hours, few days, or at the very
most a few weeks, to get oriented will ensure you are not only moving

338
BUSINESS ANALYTICS PROCESS

quickly but also able to be an effective and confident contributor on


the project.

Your key responsibilities in this step include:

• Clarifying your role as the business analyst so that you are sure to create
deliverables that meet stakeholder needs.
• Determining the primary stakeholders to engage in defining the project’s
business objectives and scope, as well as any subject matter experts, to
be consulted early in the project.
• Understanding the project history so that you don’t inadvertently repeat
work that’s already been done or rehash previously made decisions.
• Understanding the existing systems and business processes so you have
a reasonably clear picture of the current state that needs to change.

This is where you learn how to learn what you don’t know, so to speak.
This step gets you the information you need to be successful and effective
in the context of this particular project.

• Step 2 – Discover the Primary Business Objectives

It’s very common for business analysts and project managers to jump right
in to defining the scope of the project. However, this can lead to
unnecessary headaches. Uncovering and getting agreement on the
business needs early in a project and before scope is defined is the
quickest path forward to a successful project.

Your key responsibilities in this step include:


• Discovering expectations from your primary stakeholders – essentially
discovering the “why” behind the project.
• Reconciling conflicting expectations so that the business community
begins the project with a shared understanding of the business objectives
and are not unique to one person’s perspective.
• Ensuring the business objectives are clear and actionable to provide the
project team with momentum and context while defining scope and, later
on, the detailed requirements.

339
BUSINESS ANALYTICS PROCESS

Discovering the primary business objectives sets the stage for defining
scope, ensuring that you don’t end up with a solution that solves the wrong
problem or, even worse, with a solution that no one can even determine is
successful or not.

• Step 3 – Define Scope

A clear and complete statement of scope provides your project team the
go-forward concept to realize the business needs. Scope makes the
business needs tangible in such a way that multiple project team
participants can envision their contribution to the project and the
implementation.

Your key responsibilities in this step include:


• Defining a solution approach to determine the nature and extent of
technology and business process changes to be made as part of
implementing the solution to the primary business objectives.
• Drafting a scope statement and reviewing it with your key business and
technology stakeholders until they are prepared to sign-off or buy-in to
the document.
• Confirming the business case to ensure that it still makes sense for your
organization to invest in the project.

Scope is not an implementation plan, but it is a touchstone guiding all of


the subsequent steps of the business analysis process and tasks by other
project participants.

340
BUSINESS ANALYTICS PROCESS

• Step 4 – Formulate Your Business Analysis Plan

Your business analysis plan will bring clarity to the business analysis
process that will be used to successfully define the detailed requirements
for this project. Your business analysis plan is going to answer many
questions for you and your project team.

Your key responsibilities in this step include:


• Choosing the most appropriate types of business analysis deliverables,
given the project scope, project methodology, and other key aspects of
the project context.
• Defining the specific list of business analysis deliverables that will
completely cover the scope of the project and identifying the
stakeholders who will be part of the creation and validation of each
deliverable.
• Identifying the timelines for completing the business analysis
deliverables.

In the absence of defining a credible and realistic plan, a set of


expectations may be defined for you, and often those expectations are
unrealistic as they do not fully appreciate everything that goes into
defining detailed requirements.

• Step 5 – Define the Detailed Requirements

Detailed requirements provide your implementation team with the


information they need to implement the solution. They make scope
implementable. Without clear, concise, and actionable detailed
requirements, implementation teams often flounder and fail to
connect the dots in such a way that delivers on the original business case
for the project.

341
BUSINESS ANALYTICS PROCESS

Your key responsibilities in this step include:


• Eliciting the information necessary to understand what the business
community wants from a specific feature or process change.
• Analyzing the information you’ve discovered and using it to create a first
draft of one or more business analysis deliverables containing the
detailed requirements for the project.
• Reviewing and validating each deliverable with appropriate business and
technology stakeholders and asking questions to fill in any gaps.

Effective business analysts consciously sequence your deliverables to be as


effective as possible in driving the momentum of the project forward.
Paying attention to the project’s critical path, reducing ambiguity and
complexity, and generating quick wins are all factors to consider when
sequencing your deliverables.

• Step 6 – Support the Technical Implementation

On a typical project employing a business analyst, a significant part of the


solution involves a technical implementation team building, customizing,
and/or deploying software. During the technical implementation, there are
many worthwhile support tasks for you to engage in that will help drive the
success of the project and ensure the business objectives are met.

Your key responsibilities in this step include:


• Reviewing the solution design to ensure it fulfils all of the requirements
and looking for opportunities to meet additional business needs without
increasing the technical scope of the project.
• Updating and/or repackaging requirements documentation to make it
useful for the technology design and implementation process.
• Engaging with quality assurance professionals to ensure they understand
the business context for the technical requirements. This responsibility
may include reviewing test plans and/or test cases to ensure they
represent a clear understanding of the functional requirements.
• Making yourself available to answer questions and help resolve any
issues that surface during the technical design, technical implementation,
or testing phases of the project.

342
BUSINESS ANALYTICS PROCESS

• Managing requirements changes to ensure that everyone is working from


up-to-date documentation and that appropriate stakeholders are involved
in all decisions about change.
• When appropriate, leading user acceptance testing efforts completed by
the business community to ensure that the software implementation
meets the needs of business end users.

All of these efforts help the implementation team fulfil the intended
benefits of the project and ensure the investment made realizes a
positive return.

• Step 7 – Help the Business Implement the Solution

Your technology team can deliver a beautiful shiny new solution that
theoretically meets the business objectives, but if your business users
don’t use it as intended and go back to business-as-usual, your
project won’t have delivered on the original objectives. Business
analysts are increasingly getting involved in this final phase of the project
to support the business.

Your key responsibilities in this step may include:


• Analyzing and developing interim and future state business process
documentation that articulates exactly what changes need to be made to
the business process.
• Training end users to ensure they understand all process and procedural
changes or collaborating with training staff so they can create
appropriate training materials and deliver the training.
• Collaborating with business users to update other organizational assets
impacted by the business process and technology changes.

This step is all about ensuring all members of the business community are
prepared to embrace the changes that have been specified as part of the
project.

343
BUSINESS ANALYTICS PROCESS

• Step 8 – Assess Value Created by the Solution

A lot happens throughout the course of a project. Business outcomes are


discussed. Details are worked through. Problems, big and small, are
solved. Relationships are built. Change is managed. Technology is
implemented. Business users are trained to change the way they work.

In this flurry of activity and a focus on delivery, it’s easy to lose track of the
big picture. Why are we making all these changes and what value do they
deliver for the organization? And even more importantly, are we still on
track? Meaning, is the solution we’re delivering actually delivering the value
we originally anticipated?

Nothing creates more positive momentum within an organization


than a track record of successful projects. But if we don’t stop and
assess the value created by the solution, how do we know if we are
actually operating from a track record of success?

Your key responsibilities in this step may include:


• Evaluating the actual progress made against the business objectives for
the project to show the extent to which the original objectives have been
fulfilled.
• Communicating the results to the project sponsor, and if appropriate, to
the project team and all members of the organization.
• Suggesting follow-up projects and initiatives to fully realize the intended
business objectives of the project or to solve new problems that are
discovered while evaluating the impact of this project.

After completing this step, it’s likely you’ll uncover more opportunities to
improve the business which will lead you to additional projects. And so the
cycle begins again!

344
BUSINESS ANALYTICS PROCESS

11.5 SUMMARY

Real-time analysis is an emerging business tool that is changing the


traditional ways enterprises do business. More and more organisations are
today exploiting business analytics to enable proactive decision making; in
other words, they are switching from reacting to situations to anticipating
them.

One of the reasons for the flourishing of business analytics as a tool is that
it can be applied in any industry where data is captured and accessible.
This data can be used for a variety of reasons, ranging from improving
customer service as well improving the organisation’s capability to predict
fraud to offering valuable insights on online and digital information.

However business analytics is applied, the key outcome is the same: The
solving of business problems using the relevant data and turning it into
insights, providing the enterprise with the knowledge it needs to
proactively make decisions. In this way the enterprise will gain a
competitive advantage in the marketplace. Essentially, business analytics is
a 7-8 step process, outlined below.

• Defining the business needs


The first stage in the business analytics process involves understanding
what the business would like to improve on or the problem it wants solved.
Sometimes, the goal is broken down into smaller goals. Relevant data
needed to solve these business goals are decided upon by the business
stakeholders, business users with the domain knowledge and the business
analyst. At this stage, key questions such as, “what data is available”, “how
can we use it”, “do we have sufficient data” must be answered.

• Explore the data


This stage involves cleaning the data, making computations for missing
data, removing outliers, and transforming combinations of variables to
form new variables. Time series graphs are plotted as they are able to
indicate any patterns or outliers. The removal of outliers from the dataset
is a very important task as outliers often affect the accuracy of the model if
they are allowed to remain in the data set. As the saying goes: Garbage in,
garbage out (GIGO)!

345
BUSINESS ANALYTICS PROCESS

Once the data has been cleaned, the analyst will try to make better sense
of the data. The analyst will plot the data using scatter plots (to identify
possible correlation or non-linearity). He will visually check all possible
slices of data and summarise the data using appropriate visualisation and
descriptive statistics (such as mean, standard deviation, range, mode,
median) that will help provide a basic understanding of the data. At this
stage, the analyst is already looking for general patterns and actionable
insights that can be derived to achieve the business goal.

• Analyse the data


At this stage, using statistical analysis methods such as correlation analysis
and hypothesis testing, the analyst will find all factors that are related to
the target variable. The analyst will also perform simple regression analysis
to see whether simple predictions can be made. In addition, different
groups are compared using different assumptions and these are tested
using hypothesis testing. Often, it is at this stage that the data is cut,
sliced and diced and different comparisons are made while trying to derive
actionable insights from the data.

• Predict what is likely to happen


Business analytics is about being proactive in decision making. At this
stage, the analyst will model the data using predictive techniques that
include decision trees, neural networks and logistic regression. These
techniques will uncover insights and patterns that highlight relationships
and ‘hidden evidences’ of the most influential variables. The analyst will
then compare the predictive values with the actual values and compute the
predictive errors. Usually, several predictive models are ran and the best
performing model selected based on model accuracy and outcomes.

• Optimise (find the best solution)


At this stage the analyst will apply the predictive model coefficients and
outcomes to run ‘what-if’ scenarios, using targets set by managers to
determine the best solution, with the given constraints and limitations. The
analyst will select the optimal solution and model based on the lowest
error, management targets and his intuitive recognition of the model
coefficients that are most aligned to the organisation’s strategic goal.

346
BUSINESS ANALYTICS PROCESS

• Make a decision and measure the outcome


The analyst will then make decisions and take action based on the derived
insights from the model and the organisational goals. An appropriate
period of time after this action has been taken, the outcome of the action
is then measured.

• Update the system with the results of the decision


Finally the results of the decision and action and the new insights derived
from the model are recorded and updated into the database. Information
such as, ‘was the decision and action effective?’, ‘how did the treatment
group compare with the control group?’ and ‘what was the return on
investment?’ are uploaded into the database. The result is an evolving
database that is continuously updated as soon as new insights and
knowledge are derived.

11.6 SELF ASSESSMENT QUESTIONS

1. Describe the major types of Analytics involved in the process of


business analytics process.

2. “The outcome of the business analytic process must relate to business


and seek to improve business performance “ explain how it is achieved.

3. What are the important steps involved in business analytics process.


Explain in brief.

4. Do you agree that Relationship of BA Process and Organization Decision


Making Process are interdependent to each other? Explain.

5. What ids the responsibility of Business analyst in the process of


Discovering the Primary Business Objectives

347
BUSINESS ANALYTICS PROCESS

11.7 MULTIPLE CHOICE QUESTIONS

1. At what stage, key questions such as, “what data is available”, “how can
we use it”, “do we have sufficient data” must be answered?
a. Defining the business need
b. Exploring the data
c. Analysing the data
d. Optimise the data

2. What is actually not an implementation plan, but it is a touchstone


guiding all of the subsequent steps of the business analysis process and
tasks by other project participants.?
a. Explore the data
b. Define scope of the data
c. Analyse the data
d. Optimise the data

3. Effective business analysts consciously sequence your deliverables to be


as effective as possible in driving the momentum of the project forward.
Paying attention to the project’s ---------------------are all factors to
consider when sequencing your deliverables.
a. critical path
b. reducing ambiguity and complexity,
c. critical path, reducing ambiguity and complexity, and generating
quick wins
d. generating quick wins

4. What is your key responsibility to Help the Business while Implementing


the Solution?
a. Analyzing and developing interim and future state business process
documentation that articulates exactly what changes need to be
made to the business process.
b. Training end users to ensure they understand all process and
procedural changes or collaborating with training staff so they can
create appropriate training materials and deliver the training.
c. Collaborating with business users to update other organizational
assets impacted by the business process and technology changes.
d. All the above steps This steps ensuring all members of the business
community are prepared to embrace the changes that have been
specified as part of the project.

348
BUSINESS ANALYTICS PROCESS

5. On a typical project employing a business analyst, a significant part of


the solution involves……………..
a. technical implementation team building
b. customizing the data
c. identifying the software
d. a technical implementation team building, customizing, and/or
deploying software

Answers: 1.(a), 2.(b), 3.(c), 4.(d), 5.(d)

349
BUSINESS ANALYTICS PROCESS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture

350
BUSINESS ANALYTICS APPLICATIONS

Chapter 12
Business Analytics Applications

Objectives:

On completion of this chapter, you will be able to understand details about


various requirements and usages of applications in various enterprises
used it’s process , prerequisites and other aspects of development of
applicates for use in business analytics by taking in to consideration
following points:

Structure:

12.1 Introduction

12.2 Features of spreadsheet

12.3 Creating the data display

12.4 The importance of reporting all results

12.5 Application of Analytics

12.6 Analytical application development

12.7 Widely used applications in Analytics

12.8 Recommendation systems

12.9 Anatomy of Recommendation systems

12.10 Components of recommendation systems

12.11 Summary

12.12 Self Assessment Questions

12.13 Multiple Choice Questions

351
BUSINESS ANALYTICS APPLICATIONS

12.1 INTRODUCTION

Analysing the data is an important skill for any professional to possess. The
existence of data in its raw collected state has very little use without some
sort of processing.

Several methods of data analysis in Excel will be examined including the


sort functions and the Pivot table. The sort function is best used for
relatively small database, while the pivot table is helpful for analysing the
large data sets and quickly grouping items.

A spreadsheet is a large sheet having data and information arranged in


rows and columns in Excel. As you know, Excel is one of the most widely
used spreadsheet applications. It is a part of Microsoft Office suite.
Spreadsheet is quite useful in entering, editing, analysing and storing data.
Arithmetic operations with numerical data such as addition, subtraction,
multiplication and division can be done using Excel. You can sort numbers/
characters according to some given criteria (like ascending, descending
etc.)

12.2 FEATURES OF SPREADSHEETS

There are a number of features that are available in Excel to make your
task easier. Some of the main features are:

1. AutoSum - helps you to add the contents of a cluster of adjacent cells.

2. List AutoFill - automatically extends cell formatting when a new item is


added to the end of a list.

3. AutoFill - allows you to quickly fill cells with repetitive or sequential


data such as chronological dates or numbers, and repeated text. AutoFill
can also be used to copy functions. You can also alter text and numbers
with this feature.

4. AutoShapes toolbar will allow you to draw a number of geometrical


shapes, arrows, flowchart elements, stars and more. With these shapes
you can draw your own graphs.

5. Wizard - guides you to work effectively while you work by displaying


various helpful tips and techniques based on what you are doing.

352
BUSINESS ANALYTICS APPLICATIONS

6. Drag and Drop - it will help you to reposition the data and text by
simply dragging the data with the help of mouse.

7. Charts - it will help you in presenting a graphical representation of your


data in the form of Pie, Bar, Line charts and more.

8. PivotTable - it flips and sums data in seconds and allows you to


perform data analysis and generating reports like periodic financial
statements, statistical reports, etc. You can also analyse complex data
relationships graphically.

9. Shortcut Menus - the commands that are appropriate to the task that
you are doing will appear by clicking the right mouse button.

12.2.1 Hint for analysing the data

Before using the sort function or pivot tables, the data must be cleaned .
This means that the first step in data analysis is to go through the data
and ensure that the style of data entry is consistent within the columns. In
this case , for diagnosis it is important to make sure that only one word,
phrase or abbreviation is used to describe each diagnosis. If multiple words
are used to describe the same thing the analysis will be more difficult so it
is best to choose one term and use consistently. It may be necessary to
change the terminology used in the data set in order to be consistent
throughout , such a changes ought to be made at preliminary stage. If
there are multiple diagnoses for single subject, it is important to list
diagnoses separately. It may be necessary to create additional columns
labelled as “Diagnosis 2”, “Diagnosis 3” and so on listing one diagnosis in
each column. It is possible to create multiple pivot tables and manually
add result together.

12.2.2 Using the sort function in Excel

Using Excel 2016 for Windows, first select the data ( control -A -Select all).
On the top of the excel tool bar, choose the “data” Tab. Then click on the
sort function. In the window that pops up, click sort by “Diagnosis”. To start
again by any other column, click the button in the upper left corner of the
window that says “add level” then click on earlier column used for
diagnosis and then OK button.

353
BUSINESS ANALYTICS APPLICATIONS

Sorting is great tool to identify the trends and to analyse the small amount
of data. Once the data is sorted out by diagnosis and then on another
column used, simply count the number with each diagnosis and record the
breakdown numbers either manually or by using the excel “COUNTIFS”
formula.

The formula is excellent way to count specific data if it is too time


consuming to count manually. Simply alter the range and criteria in the
formula to examine the different subgroups.

12.2.3 Using Pivot tables in Excel

With a large data set , manually counting or using the formula to count
can be tedious and create opportunities for error. Pivot table will
automatically sort the data and list values , producing efficient and
accurate information . To create pivot table , select the data , click on
insert tab, then select “pivot table” ( for Macs, click on the “Data” tab ,
followed by “Pivot table”)

Use the column to be sorted and drag to the box with heading “Rows” and
excel provides automatic breakdown which can be used to calculate
percentage and to create graph. Alternatively, to sort first by named
column and then by diagnosis -Switch the order in “Rows “ box. Fields can
be added or removed as necessary. It may be helpful to practice dragging
different fields to different categories in order to develop an understanding
of how pivot table works.

12.3 CREATING THE DATA DISPLAY

Once the data is analysed it is often used to create a display so that others
an quickly and easily understands the result. One way to do this is to
create a chart using excel. First create another table to more easily show
the breakdown of the numbers with certain diagnosis.

Next click the Pivot Chart under the “ANAYZE” tab and select the option
“stacked Column” , this shows the desired chart and Axis titles , add labels
, format the colour scheme and hide the field setting

Once the chart is customised, it clearly displays important trends in in the


data.

354
BUSINESS ANALYTICS APPLICATIONS

12.4 THE IMPORTANCE OF REPORTING ALL RESULTS

When analysing the data, it is critical to report all results even if they
seem insignificant. It is also essential to not lump data analyses together
and make generalisation. For example researcher conducting the study on
effectiveness of visual aid to increase the knowledge of cataract
administers a 10 question survey to patient before and after showing them
the visual aid. Researcher finds that the visual aid increases the overall
number of questions answered correctly. This is a good start , but it is not
enough. It is critical that the researcher analyses the result of each
individual question. Just knowing that the intervention increases overall
knowledge provides little information about the strength and weakness of
intervention. Perhaps the intervention caused a significant increase in
number of people understanding what cataract id , but not the number of
people understanding proper post-operative procedures. This is important
to know because the intervention can then be modified to better convey
the necessary information.

12.5 APPLICATION OF ANALYTICS

How analytics is used in business from different perspective?

1. How can Analytics helps decision making in business support or


business enability in HR, Finance, IT, procurement, marketing etc.

2. What are the common areas of analytics deployment in different


industries like Banking, Insurance, Telecom, Healthcare etc

3. How Analytics provides competitive advantage in business function by


focusing on customer focusing functions like:
• Understanding customer or market segment
• Customising the products / services to customer and market segment
• Continuously listening to customer wants and needs

4. Learn how social media analytics And recommendation systems are


built.

355
BUSINESS ANALYTICS APPLICATIONS

12.5.1 Analytics in business support functions

1. Human capital Analytics

Every enterprise has strategic and secured information about their human
capital. The internal data source may range from employee profile,
compensation and benefits , employee performance , employee
productivity and so on stored in variety of technologies like ERP system ,
OLTP RDBMS, Hadoop ecosystem spread marts , data marts and data
warehouses. Some of the external data system may include compensation
benchmark , Employee sentiments , thought leadership contribution etc.
enterprises have started gaining benefits from the following areas by
deployment of analytics.
• Workforce planning analytics to acquire talents at the right time for right
positions. Human capital analytics will lead to identification of positions
that drive the business results and critical competencies needed for those
positions. Gujarat chemicals has developed the custom modelling tool
that predicts the future hiring needs for each business units and can
adjust its predictions based on industry trends ( acquisition).
• Workforce Talent Development analytics aligned to business goals
(Development).
• Workforce sentiment analytics for enhancing employee engagement
(ability to stimulate business impact of employee attrition)
(Engagement).
• Workforce utilisation analytics to ensure the optimised deployment of
right talent in right functions. This helps to connect employee
performance to business result (Optimisation). Retail companies are use
analytics to predict incoming call centres volume and release hourly
employees early if it I expected to drop.
• Workforce compensation Analytics helps to optimise benefits using big
data sources including performance and benchmarks (Pay).
• Compliance analytics helps to detect any anomalies relating to enterprise
compliance policies and initiate proactive corrective actions
(compliance).

356
BUSINESS ANALYTICS APPLICATIONS

2. IT Analytics

Enterprises use IT as business enabler. Investment in to a variety of IT


resources like data network, servers, data centre / cloud services ,
software licences, maintenance software , end user support and may
productivity tools. Due to multiple technology platforms, outsourcing
partners , complex demand of users and geographic spread of operations ,
enterprise IT operations are becoming complex.

Investing in right IT resources for business results is certainly a strategic


decision.

IT infrastructure analytics provide the insight in to availability and


performance of IT infrastructure. Service desks can prevent major outages
by using predictive analytics.
• Data network and storage utilisation analytics leads to optimisation of
servers and bandwidth.
• Security analytics provides vital clues about potential information
security threats and alert teams.
• Service quality analytics provides insights in to root cause of SLA
deviations and trigger process improvement initiatives.
• IT asset analytics supports optimal investment forecasts.
• IT policy compliance analytics can report policy enforcement deviations
and trigger corrective actions.

3. Sales and Marketing Analytics

Enterprises leverage IT for many marketing activities. In its most basic


form, marketing manager study reports relating to the customer segment ,
revenue share, revenue mix, marketing expenses trend, sales pipeline ,
marketing campaign performance and so on. Enterprises use BI solutions
to slice and dice customer data, understand buyer behaviour in various
market segments and generate alerts against present thresholds, termed
as descriptive analytics.

357
BUSINESS ANALYTICS APPLICATIONS

Managers are aware of what has happened and what is happening in the
business. Analytics enables enterprise to move further.

1. Exploratory analytics allow knowledge workers to find new business


opportunities by discovering hidden data patterns.

2. Mature organisations use predictive analytics to influence future


business.

3. Enterprises embark on prescriptive analytics to use the power


algorithms, machine learning and AI techniques to make routine
decisions almost instantaneous , thereby reducing the cycle times
dramatically.

These are common applications of analytics in sales and marketing


functions of an enterprise.

• Customer behaviour Analytics:


Customer behaviour data . which tells the decision makers what the
customer does and where he / she chooses to do it, sites in multiple
transaction system across the company.

• Customer attitude data:


This tells decision makers why a customer behaves in a certain manner or
how he / she feels about the product , comes from survey , social media,
call centre reports. Using analytics to mine new patterns, conduct proof of
concept analytics to find out areas of business impact , develop predictive
analytics solutions are some of the application areas.

• Customer segmentation:
Customer segmentation allows enterprise to define newer and sizable
group of the target prospects using analytics This enables enterprises to
customise the products and services to new segments and position them
for competitive advantage. It can be more strategic , such as behaviour
based profiling , predictive modelling or customer even based
segmentation.

• Modeling for pricing automation


Deep machine learning applications such as price elasticity modelling ,
channel affinity modelling and customer life event modelling.

358
BUSINESS ANALYTICS APPLICATIONS

• Recommendation system
Next best offer models can leverage the behaviour of the similar buyers to
next best product or service or proactively recommend the perfect
solution.

12.5.2 Analytics in Industries

Here we discuss how different industries apply analytics for business


benefits. One need to have an understanding of industry domain basics ,
trends, common current challenges in order to develop industry specific
analytics solutions.

1. Analytics in Telecom
People and devices generate data 24x7 globally in telecom industry.
Whether you are speaking to friend, browsing website , streaming a video,
playing games with friends or making in-app purchase , user activity
generates data about our needs , preferences, spending , complaints and
so on. Communication service providers (CSPs) traditionally have
leveraged the data they generate to make decisions in the areas of
improving financial performance, increasing operational efficiency or
managing the subscriber relationship. They have adopted advanced
reporting and BI tools to bring facts and trends to decision makers.
Strategic focus areas for deploying analytics in CSP business are:

• Customer retention / improving customer loyalty:


With stiff competition between the numerous players in the industry ,
customer retention is of top notch importance. Telcom is now much more
than making calls and analytical tools can help firms to identify the cross
selling opportunities and take crucial decisions to retain the customer.
Analytics can help in identifying the trends in customer behaviour to
predict the customer churn and appraise the decision maker to take
suitable action to prevent the same. When dealing with large customer
base , marketing across the board would be expensive and ineffective.
Hence analytics can help in better channelizing marketing efforts , such as
identify target group and / or region to launch pilot projects , so that firm
has better return on its marketing investment.

359
BUSINESS ANALYTICS APPLICATIONS

• Network Optimisation:
It is crucial for telecom operators to ensure that all its customers are able
to avail its products at all times. Also the firms need to be frugal when it
comes to allocating resources to network, because any unused capacity is
waste of resources . Analytics helps in better monitoring traffic and in
facilitating capacity planning decisions. Analytical tools leverages data
collected through day to day transactions and helps in both short term
optimisation decisions and long term strategic decision making.

• Predictive Analytics:
With the use of predictive analytics telecom operators can predict the
approximate success rate of new schemes based on the past preferences of
customers. This provides telecom operators with a great strategic
advantage. Predictive analytics helps in targeting the right customer at
right time based on their past behaviour and choices. It helps boosting
revenues by proper planning and reducing the operational cost in the long
term.

• Social analytics
The branding of telecom operators on social media plays a very crucial part
in customer gain and retention. Data generated through the social media
can be interpreted in to meaningful insights using social analytical tools.
The customer sentimental analytics, customer experience and positioning
of the company can be analysed to make the customer experience richer
and smoother. Also the data generated through such platform are much
diverse both geographically and demographically, and hence helps in
gaining a closer to reality customer information.

• Subscriber Acquisition:
CSPs study customer behaviour to identify the most suitable channels and
sales strategy for each product.

• Fraud detection analytics:


It helps to detect billing and device theft , clones SIMs and related things
as well as misuse of credentials.

• Churn Analytics:
Helps CSPs to not only model the loyalty program , but also predicts churn
and destination CSPs.

360
BUSINESS ANALYTICS APPLICATIONS

• Financial Analytics:

a. Infrastructure analytics:

CSP study CAPEX and optimise investment in infrastructure and save


money by considering utilisation option.

b. Product portfolio analytics :

This area provided information in to profitable products and services and


helps to exit from loss making products.

c. Channel Analytics:

Helps CSPs to optimise the commercial terms with partners to optimise


distributor margins.

d. Cost reduction:

This area focuses on reducing service management cost , operations cost ,


compliance risk related cost.

2. Analytics in Retail:

The retail industry is lucky vertical having greater access to data around
the consume, products they buy and use and different channels that sell
and services products. Data coupled with insights are at the heart of what
drives the retail business.

Technologies like POS, CRM, SCM, Big Data , mobility and social media
offers a means of understand shoppers via numerous digital touch points
ranging from. Their online purchases to their presence social network, to
their visits to brick and mortar stores as well as tweets , images and more.
Even today retailers are grappling with how to meaningfully leverage and
ultimately monetise the hidden insight around huge amount of structured
and unstructured data about the consumer. The value of analytics comes
from 3 sources;

361
BUSINESS ANALYTICS APPLICATIONS

a. Gaining insight to improve process and resources optimisation

b. Personalising and localising the offers

c. Creating community for branding and customer engagement.

I. Gaining insight to improve process and resources optimisations:

a. Supply chain analytics:


Every retailer needs to optimise the vendors of the products , its cost and
quality. They need to constantly track the performance of supply chain
and initiate proactive action for competitive advantage.

b. Pricing analytics:
It helps retailers to optimise the product pricing , special offers ,
merchandising , loyalty programs and campaigns that attract maximum
number of consumers both from physical stores and online store
perspective.

c. Buying experience analytics:


Retails can gain insight in to a path taken to purchase , complain
registered , help provided by store personnel, store layout / item search
time product details availability , pricing etc. and enhance the buying
experience and train the personnel for enhancing the consumer loyalty

II.Personalising and localising the offers

a. Inventory analytics:
Retailers aim to fulfil consumer demand by optimising stock and ability to
replenish when consumer demand increases due to seasonal effects or as
a result of powerful campaign. This area of analytics will alert store
managers about the potential need of stocking high moving items and
reduce slow moving items.

b. Consumer analytics:
Every region has people with different tests for goods and service levels.
The purpose of consumer analytics is to equip store managers with insight
to customise their product and services to the local consumer profile.

362
BUSINESS ANALYTICS APPLICATIONS

c. Campaign Analytics:
All retailers have digital marketing programs to entire consumers with
value offer. Retails invest in this area of analytics to design most effective
campaign that converts maximum number of consumers to buyers.

d. Fraud detection:
All retailers strives to eliminate fraud relating to payments , shipping and
change of price tag. Analytics can study transactions in real time to detect
fraud and alert store personnel or online commerce team.

III.Creating community for branding and customer engagement.

a. Web analytics:
Here the different perspectives of each consumers online behaviour such as
surfacing traffic , visitors and conversation trend , location of smart devices
, access to kiosks will be analysed to recommend the best sales approach
in response to each of the consumers real time actions.

b. Market basket analytics:


The promotion, price , offer and loyalty dimension of shopping behaviour
will be used to understand the sale pattern, customer preference and
buying pattern to create targeted and profitable product promotion,
customer offers and shelf arrangement.

c. Social media analytics


Listening and learning from the social community dimension of each
consumer online behaviour is the scope of this area of analytics. Here the
store taps in to consumer generated content with sentiment and
behavioural analysis to answer key merchandise , service and marketing
strategy questions.

d. Consumer behavioural analytics:


The focus area of consumer preference such as channels, categories ,
brands and product attributes , return and exchange pattern, usage level
of service program and participation in loyalty programs.

363
BUSINESS ANALYTICS APPLICATIONS

3. Analytics in Healthcare:

Healthcare is complex ecosystem of multiple industries interconnected to


achieve the health care goals. The entities include healthcare providers ,
physicians, insurance companies , pharmaceuticals companies,
laboratories, healthcare volunteers, regulatory bodies, retail medicine
distributors and so on cantered on patients. Imagine the complexity ,
variety, volume, velocity of data that gets generated in each of these
independent enterprises and multitude of interconnected heterogeneous IT
applications. Analytics is applicable to all these enterprises viz. insurance
companies, pharmaceutical manufacturers , hospitals etc. Here we discuss
how hospitals , that is healthcare providers leverage analytics for goals
like:

a. Hospital management Analytics:


It focuses on cost reduction , enhancing quality of care, improving patient
satisfaction , improving outcomes ( performance of diagnosis , testing and
treatment), providing secured access to patient data ( electronic health
record-EHR). Analytics , here supports fact based decisions in areas of
reduction of medical errors , manage diseases, understand physician
performance and retain patient.

b. Compliance analytics:
Provide healthcare compliance metrics to regulatory authorities and
benchmark against the world class hospitals using Baldridge criteria.
Improvement in wide spread use of digital data will support audits,
analytics and will improve hospital processes needed for regulatory
compliance.

c. Financial analytics:
This area of analytics leads to enhance ROI ( return on investment),
improved utilisation of hospital infrastructure and human resources,
optimise capital management , optimise supply chain and reduce frauds.

d. Predictive models:
Help healthcare professional go beyond traditional search and analysis of
unstructured data by applying predictive root cause analysis , natural
language and built in medical terminology support to identify trends and
patterns to achieve clinical and operational insight. Healthcare predictive
analytics can help healthcare organisations get to know their patients

364
BUSINESS ANALYTICS APPLICATIONS

better , so that they can understand better their individual patients need,
while delivering quality , cost effective life saving services.

e. Social analytics:
Helps hospitals listen to patients sentiments , requirement affordability and
insurance to model care and wellness programmes customising services by
localisation of needs.

f. Clinical Analytics:
A number of other critical clinical situations can be detected by analytics
applied to HER such as:
❖ Detecting port operative complications
❖ Predicting 30 days risk of re-admission
❖ Risk adjusting hospital mortality rates.

12.6 ANALYTICAL APPLICATION DEVELOPMENT

The design, development and deployment steps of an analytical


applications development are:

Step-1: Defining the problem:

What business questions you are trying to answer? Once you understand
this you need to think about what data is available to you to answer these
questions.

1. Is the data directly related to the question?

2. Is the data you need available within the enterprise or elsewhere?

3. What measure of accuracy and granularity are you going to use? Is that
level of summaries good enough for business users?

4. What criteria are you going to use to determine success or failure?


determine upfront, how you are going to measure the result?

365
BUSINESS ANALYTICS APPLICATIONS

Step-2: Setting up technical environment and processing the data


Collect the data and perform basic data quality check to ensure accuracy
and consistency. While this may end up taking the most time , it is critical
and erroneous data will create erroneous result. You may need to
transform data to make it conducive for analysis. Pick the analytics
approach and possible choice of algorithm and visualisation requirement.

Step-3: Run the initial analysis or Model.


Split the data set in to test data set and validation data set. Choose the
method by which you want to build the model and process the data. As you
become more familiar with predictive modelling and with your own data ,
you will find certain type of problem align with certain types of modeling
approaches or algorithm.

Step-4: Evaluate the Initial Result.


Are the results in line with what you were expecting to see? Are you able to
interpret the result? Do they answer the business questions you are trying
to answer? If the answer is yes, then move on to the next step and if the
answer is no , then consider the following:
❖ Try using different model/ algorithm
❖ Consider collecting more or different data
❖ Consider redefining or reframing the problem , changing the question
and the means to an answer as you better understand your data and
your environment

Step-5: Select the final model


Try a number different models and then when you are satisfied with the
results, choose the best one. Run the selected model or analysis and re-
examine the results.

Step-6: Test the final model


Test the final model by running it against the validation data set and assess
the result. Do not tweak or change the model in any way at this stage, as
it will invalidate any comparison to the initial results. If the results are
similar and you are satisfied with them you can move on to the final stage.
If you are not, then go back to the step-3 to reassess the model and the
data , make any necessary or desired changes and try running the model
again.

366
BUSINESS ANALYTICS APPLICATIONS

Step-7: Apply the model and validate by usage


There could be some data exceptions that the analysis model may not
handle. You may need to keep checking for impact of such data on results.
You may consider adding incremental features to the model which is
scalable and provides the consistent results for decision making.

12.7 WIDELY USED APPLICATIONS OF ANALYTICS

Most industries tend to look at applying the analytics in the areas


of :

1. Processing social media data for business benefits, Telecom CSPs stand
to understand the voice of the subscribers, HR will understand the
sentiments of the employees and partners , hospitals discover unmet
needs of patients , IT functions will understand the need of business
users challenges and service level expectations. Social media analytics ,
web analytics or digital s enterprises.

2. All products and service Enterprises strive to acquire new customers


without any exception. Each one likes to customise and personalise
offers so that prospects see value and make buying decisions. One of
the most common approaches enterprises take is the recommendation
system or engine to predict the most potential buyers. Recommendation
engine, the common analytics paradigm is used to recommend books,
gift items for various occasions , doctors in local areas , house hold
items for online purchase-the list is endless.

12.8 RECOMMENDATION SYSTEMS:

Have you ever wondered , what algorithm Google uses to maximise its
targets ads revenue? What about e-commerce websites which advocates
you through option such as “people who bought this also bought this “or
“how does Facebook automatically suggests to tag friends in the pictures?”

The answer is recommendation systems. With the growing amount of


information on WWW and with significant rise in the number of users , it
becomes increasingly important for companies to search, map and provide
them with the relevant chunk of information according to their preferences
and tests.

367
BUSINESS ANALYTICS APPLICATIONS

Companies now a days are building smart and intelligent recommendation


systems by studying the past behaviours of heir users . Hence providing
them recommendations and choice of their interest in terms of “relevant
job posting”, “Movies of the interest”, “suggested videos”, Facebook friend
that you may know” and the “people who bought this also bought this” etc.

Recommender systems are algorithms which aim to provide the most


relevant and accurate items to the user by filtering useful stuff from a huge
pool of information base. Recommendation systems discover the data
pattern in the data set by learning consumer choices and produces the
outcomes that correlates to their needs and interests

12.9 ANATOMY OF RECOMMENDATION SYSTEMS:

Recommendation systems are not totally new , they take results from
market basket analysis of business data in advance analytic system and
suggests the next best offer or next best action to specific customer. They
are also very popular for making suggestions or recommendations.

The online store visitor starts looking at a product or service. Amazon is


probably the most popular website that uses the recommendations engine
analytics. In the past all types of recommendations were quite difficult to
automate as the data storage , pre processing, model creation,
visualisation, integration were complex and generally needed multiple IT
systems working together.

Today , we can see recommendation systems being used in variety of


scenarios such as:
• Restaurant to dine in a new location you are visiting
• Music you may want to listen to next
• Newspaper or Magazine articles to read next
• Right doctor in your neighbourhood for treatment
• Best auto insurance policy to buy
• Next vacation travel spot
• Best online store for your grocery , and so on

368
BUSINESS ANALYTICS APPLICATIONS

Application of recommendation system has transcended customer shopping


experiences. Market research has shown that recommendation system
bring in anything between 10percent to 30percent of additional revenue of
company. Early adopters of recommendation engine technology such as
Amazon and Netflix have outperformed their competitors by leveraging the
unparalleled customer insight generated by their proprietary
recommendation system.

Both recommendations and pricing are classic topics for advanced analytics
modelling and both offer possibilities for real time scoring. As we built
more accurate models and train them with real life data, more accurate will
be the recommendations and the prices the company can offer. It will be
great advantage to retailers to change the price dynamically to acquire
more customers when an item is desired , there is a more of willingness to
pay a high price and when less desired the customer will pay less price.
This will play an important role in the decision making process. Discounts
are the way of helping the customers not only choose a particular supplier ,
but to help the customer move to desired state to purchase commitment .
At the same time discounts are expensive. They eat the profit that
company makes. In an ideal world , we would make discount decision
based on customer desire of closing the deal immediately.

Recommendation system predict customer needs based on previous


purchase history, online search, social media interactions, content, rating
users have provided for a product / service, reviews of products / services
and other personalised attributes captured. Such recommendation need to
tract each customer interaction such as login, price comparison, product
selection , actual purchase, comment or rating posted.

There are many commercial e-commerce platforms such as Baynote,


Omniture and Rich Relance that have multiple approaches to determine
the most appropriate product or service for recommendation. The common
algorithm or mechanism supported by these platforms include rule engine,
recommendation based on social media traffic such as Facebook, Tweeter
etc.

Recommendations based on reviews and rating awarded , internet search


term used, syndicated recommendation and collaborative
recommendations.

369
BUSINESS ANALYTICS APPLICATIONS

Features built in recommendation systems include item -to-item


association , many item -to-many item association , person to person
association, person -to- item association and user behavioural analytics.

12.10 COMPONENTS OF RECOMMENDATION SYSTEMS:

Recommended systems are based on several different techniques such as


collaborative filtering , content filtering, social filtering or various hybrid
approaches that use a combination of these. Though the design of these
system vary in their details , it is possible to abstract their characteristics
and behaviour to arrive at common structure:

1. Tracking user action and behaviour:


The users behaviour as they use the application is observed to now the
items they may be looking for, their specification , the preferences ,
experience and feedback of user , and so on. The techniques used to track
user behaviour include capturing and analysing click sequences, tracking
eye movement, tracking navigation sequences and measuring the time
spent in specific sections of the applications , items searched , attributes
specified in search, number of similar items typically viewed etc. They help
in identifying and/ or deducing user interests and preferences.

2. Capturing, cleansing, normalisation and storing data:


The users action result in generating the data that represents interest,
preferences, feedback on the information displayed etc. which helps to
build the model of the user and to compute similarities with other users.
This helps to profile the user to personalise the information presented to
the user and to improve the recommendation made. The collection of the
data may be based on observing the user explicit behaviour such as time
spent looking at certain items, navigation pattern, bookmarking etc. As
data is collected across users and over a period of time, it is essential to
eliminate any contradictions that may have crept in and keep it.

The data story especially , also, includes one or more indexes created
based on full text search of the items and their description and other
metadata. The indexes are usually used to improve the real time
performance of the recommendation system. Basic anamolies such as
undue dominance of certain parameters are compared by applying
techniques such as term frequency inverse document. The process of index
creation may also involve pruning of frequently appearing terms and

370
BUSINESS ANALYTICS APPLICATIONS

coalescing variants of terms through techniques with an equivalent term.


Such steps are often collectively employed.

The components employed in the implementation of this module may


includes components like, Lucene or solar. Very high volume and volatile
data together with stringent scalable requirements may call for use of a
distributed search technology such as Elastic Search to sustain
performance in the face of heavy user loads.

3. Prediction of relevant items and their ratings:


The current actions and the data input from the user are overlaid on the
information in the data store to generate the predictions. As mentioned, a
variety of approaches such as collaborative filtering , item -to-item
filtering, content filtering and their hybrids using algorithms ranging from
primitive Euclidean and the distance based similarity measures to
sophisticated ones based on advanced machine learning algorithms based
on statistical and machine learning techniques, and so on, depending on
the sophistication of the implementation. Comparison and similarity
analysis of user models is especially important in collaborative filtering
scenarios which are especially suited for content which is not amenable to
machine analysis such as images and videos.

Technologies used in implementation of these modules typically consists of


the big data tools like Hadoop, MapReduce and spark, leveraging the whole
array of NoSQL horizontally scalable big data stores such as HBase,
Cassandra, Neo4j. Machine learning technologies specially built for big data
like Mahout with several built in algorithms for predictions are gaining
popularity for generating real time results while serving millions of
simultaneous user requests.

371
BUSINESS ANALYTICS APPLICATIONS

4. Recommendations based on the predictions:


This module consists of logic to generate user friendly and valuable
recommendations based on the ranked or weighted predictions from the
previous step. The predictions are stored and combined with some
application context and user choices to generate a set of recommendations
catering to the users interest. For example, the recommendations may be
items that have a similarity index above a certain threshold determined by
the algorithm or specified by the user or the top “n” similar items.
Recommendations are often based on the user “neighbourhood”
considerations which confines the “distance” between users to a certain
computed or configured limit specified in terms of similarity parameters.
Considering the potentially very large number of combinations of diverse
factors on which recommendations can be based., this module will typically
allow extensive configuration at several levels – administrators , business
users and consumers- for filtering, formatting and presentation of results.

The above description outlines the essential components of basic


recommendations system. As mentioned earlier, real world implementation
may vary in their complexity and sophistication. Primarily, in addition to
the components described , these implementation may have components
for caching data such as user profiles and user models, item data ,
computed similarity metrics etc. for real time performance optimisation. In
addition the design may be further fine tuned for maximising online
performance through moving the data store maintenance offline or
asynchronous operation. A recent innovation in recommendation system is
to improve the quality of recommendation by factoring in the user location
and the other type of context information that mobile devices are capable
of delivering. In such cases, the recommendation system may include
additional components to cater to such scenarios.

372
BUSINESS ANALYTICS APPLICATIONS

12.11 SUMMARY

There are a number of features that are available in Excel to make your
task easier. A spreadsheet is a large sheet having data and information
arranged in rows and columns in Excel . As you know, Excel is one of the
most widely used spreadsheet applications. It is a part of Microsoft Office
suite. Spreadsheet is quite useful in entering, editing, analysing and
storing data. Arithmetic operations with numerical data such as addition,
subtraction, multiplication and division can be done using Excel. You can
sort numbers/ characters according to some given criteria (like ascending,
descending etc.)

Before using the sort function or pivot tables, the data must be cleaned .
This means that the first step in data analysis is to go through the data
and ensure that the style of data entry is consistent within the columns. In
this case , for diagnosis it is important to make sure that only one word,
phrase or abbreviation is used to describe each diagnosis. If multiple words
are used to describe the same thing the analysis will be more difficult so it
is best to choose one term and use consistently. It may be necessary to
change the terminology used in the data set in order to be consistent
throughout , such a changes ought to be made at preliminary stage.

Use the column to be sorted and drag to the box with heading “Rows” and
excel provides automatic breakdown which can be used to calculate
percentage and to create graph. Alternatively, to sort first by named
column and then by diagnosis -Switch the order in “Rows “ box. Fields can
be added or removed as necessary. It may be helpful to practice dragging
different fields to different categories in order to develop an understanding
of how pivot table works.

Once the data is analysed it is often used to create a display so that others
an quickly and easily understands the result. One way to do this is to
create a chart using excel. First create another table to more easily show
the breakdown of the numbers with certain diagnosis.

When analysing the data, it is critical to report all results even if they
seem insignificant. It is also essential to not lump data analyses together
and make generalisation.

373
BUSINESS ANALYTICS APPLICATIONS

Every enterprise has strategic and secured information about their human
capital. The internal data source may range from employee profile,
compensation and benefits, employee performance, employee productivity
and so on stored in variety of technologies like ERP system, OLTP RDBMS,
Hadoop ecosystem spread marts, data marts and data warehouses. Some
of the external data system may include compensation benchmark,
Employee sentiments, thought leadership contribution etc. enterprises
have started gaining benefits

Enterprises use IT as business enabler. Investment in to a variety of IT


resources like data network, servers, data centre / cloud services, software
licences, maintenance software, end user support and may productivity
tools. Due to multiple technology platforms, outsourcing partners, complex
demand of users and geographic spread of operations, enterprise IT
operations are becoming complex.

Enterprises leverage IT for many marketing activities. In its most basic


form, marketing manager study reports relating to the customer segment ,
revenue share, revenue mix , marketing expenses trend, sales pipeline,
marketing campaign performance and so on. Enterprises use BI solutions
to slice and dice customer data , understand buyer behaviour in various
market segments and generate alerts against present thresholds, termed
as descriptive analytics.

In respect of industries , one need to have an understanding of industry


domain basics , trends, common current challenges in order to develop
industry specific analytics solutions.

The design, development and deployment steps of an analytical


applications development are:

1. Defining the problem

2. Setting up technical environment and processing the data

3. Running the initial analysis / model

4. Evaluate the Initial Result.


5. Select the final model

6. Test the final model

374
BUSINESS ANALYTICS APPLICATIONS

7. Apply the model and validate by usage

Most industries tend to look at applying the analytics in the areas of :

• Processing social media data for business benefits, Telecom CSPs stand
to understand the voice of the subscribers, HR will understand the
sentiments of the employees and partners , hospitals discover unmet
needs of patients , IT functions will understand the need of business
users challenges and service level expectations. Social media analytics ,
web analytics or digital s enterprises.

• All products and service Enterprises strive to acquire new customers


without any exception. Each one likes to customise and personalise offers
so that prospects see value and make buying decisions. One of the most
common approaches enterprises take is the recommendation system or
engine to predict the most potential buyers. Recommendation engine ,
the common analytics paradigm is used to recommend books, gift items
for various occasions, doctors in local areas, house hold items for online
purchase-the list is endless.

Today, we can see recommendation systems being used in variety of


scenarios such as:

• Restaurant to dine in a new location you are visiting


• Music you may want to listen to next
• Newspaper or Magazine articles to read next
• Right doctor in your neighbourhood for treatment
• Best auto insurance policy to buy
• Next vacation travel spot
• Best online store for your grocery , and so on

Recommender systems are based on several different techniques such as


collaborative filtering, content filtering, social filtering or various hybrid
approaches that use a combination of these.

375
BUSINESS ANALYTICS APPLICATIONS

12.12 SELF ASSESSMENT QUESTIONS

1. What is spreadsheet? Describe the features of spreadsheet.

2. When Pivot table is used? Why? Explain.

3. Write short note on “How analytics is used in Industries”.

4. Explain the important steps in Analytical application development.

5. Explain the anatomy of recommended systems.

12.13 MULTIPLE CHOICE QUESTIONS

1. Before using the sort function or pivot tables, the data must be
-------------------
a. Cleaned
b. Sorted
c. Analysed
d. Arranged

2. In marketing analytics exploratory analytics allow knowledge workers to


find new business opportunities by discovering -------------------
a. Available data
b. hidden data pattern
c. Data of performance in past
d. Past trend analysis of given data

3. What for source of data is used in retail analytics?


a. Gaining insight to improve process and resources optimisation
b. Personalising and localising the offers
c. Creating community for branding and customer engagement.
d. Gaining insight for process improvement by personalising the offers
for community creating.

376
BUSINESS ANALYTICS APPLICATIONS

4. While developing the analytical application if the results are not in line
with what you were expecting then what you will do?
a. Try using different model/ algorithm
b. Consider collecting more or different data
c. Consider redefining or reframing the problem , changing the question
and the means to an answer as you better understand your data and
your environment
d. Use all the above 3 steps in sequential manner.

5. Recommendations based on predictions module consists of logic to


generate user friendly and valuable recommendations based on the
………………..
a. ranked predictions
b. weighted predictions only
c. either ranked or weighted predictions from the previous step.
d. Depending on all the previous steps used

Answers: 1.(a), 2.(b), 3.(d), 4.(d), 5.(c)

377
BUSINESS ANALYTICS APPLICATIONS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture

378
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

Chapter 13
Programming Languages & Softwares Used
in Data Analytics
Objectives:

This chapter consists of two parts, one is understanding the programming


languages used in data analytics and second one is most used software
used in data analytics. Therefore, on completion of this chapter, you will be
able to understand most used programming languages and software
developed using the programming languages and used by most of the
entrepreneurs in their business. Following points will be considered for our
understanding this chapter in better manner.

Structure:

13.1 Introduction

13.2 Programming Languages in data Analytics

13.3 Types of Programming Languages

13.4 Programming Languages for Data Science

13.5 Software’s used in data analytics

13.6 Other Recommended solutions

13.7 Maximising data analytics software

13.8 Summary

13.9 Self Assessment Questions

13.10 Multiple Choice Questions

379
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.1 INTRODUCTION

Programming forms the backbone of Software Development. Data Science


is an agglomeration of several fields including Computer Science. It
involves the use of scientific processes and methods to analyze and draw
conclusions from the data. Specific programming languages designed for
this role, carry out these methods. While most languages cater to the
development of software, programming for Data Science differs in the
sense that it helps the user to pre-process, analyze and generate
predictions from the data. These data-centric programming languages are
able to carry out algorithms suited for the specifics of Data Science.
Therefore, in order to become a proficient Data Scientist, you must master
one of the following data science programming languages. In today’s highly
competitive market, which is anticipated to intensify further, the data
science aspirants are left with no solution but to upskill and upgrade
themselves as per the industry demands. Prevailing situation odes the
mismatch between demand and supply ratio of data scientists and other
data professionals in the market, which makes up a great age to grab
better and progressive opportunities. The knowledge and application of
programming languages that better amplify the data science industry, are
must to have.

We will discuss few languages used in in data analytics in this chapter.

Similarly, data scientist have developed some software’s that are used in
the business analytics by the various entrepreneurs depending up on type
of their activities and need. Gone are the days when a Data Analyst knew
or worked on just one tool. Anyone who works with data these days is well
versed will multiple software tools. But are there any tools that are
essential for any data analyst? Of course there are! There are some tools
that a data analyst has to know to make work and life that much easier
and efficient. These software’s are also discussed in in brief in this chapter.

380
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.2 PROGRAMMING LANGUAGES IN DATA ANALYTICS

With 256 programming languages available today, choosing which


language to learn can be overwhelming and difficult. Some languages work
better for building games, while others work better for software
engineering, and others work better for data science.

13.3 TYPES OF PROGRAMMING LANGUAGES

A low-level programming language is the most understandable


language used by a computer to perform its operations. Examples of this
are assembly language and machine language. Assembly language is
used for direct hardware manipulation, to access specialized processor
instructions, or to address performance issues. A machine language
consists of binaries that can be directly read and executed by the
computer. Assembly languages require an assembler software to be
converted into machine code. Low-level languages are faster and more
memory efficient than high-level languages.

381
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

A high-level programming language has a strong abstraction from the


details of the computer, unlike low-level programming languages. This
enables the programmer to create code that is independent of the type of
computer. These languages are much closer to human language than a
low-level programming language and are also converted into machine
language behind the scenes by either the interpreter or compiler. These are
more familiar to most of us. Some examples include Python, Java, Ruby,
and many more. These languages are typically portable and the
programmer does not need to think as much about the procedure of the
program, keeping their focus on the problem at hand. Many programmers
today use high-level programming languages, including data scientists.

13.4 PROGRAMMING LANGUAGES FOR DATA SCIENCE

382
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.4.1 Python

In a recent worldwide survey, it was found that 83% of the almost 24,000
data professionals used Python. Data scientists and programmers like
Python because it is a general-purpose and dynamic programming
language. Python seems to be preferred for data science over R because it
ends up being faster than R with iterations less than 1000. It is also said to
be better than R for data manipulation. This language also contains good
packages for natural language processing and data learning and is
inherently object-oriented.

13.4.2 R

R is better for ad hoc analysis and exploring datasets than Python. It is an


open-source language and software for statistical computing and graphics.
This is not an easy language to learn, and most people find that Python is
easier to get the hang of. With loops that have more than 1000 iterations,
R actually beats Python using the lapply function. This may leave some
wondering if R is better for performing data science on big datasets,
however, R was built by statisticians and reflects this in its operations. Data
science applications feel more natural in Python.

13.4.3 Java

Java is yet another general-purpose, object-oriented language. This


language seems to be very versatile, being used in embedding electronics,
in web applications, and desktop applications. It may seem that a data
scientist would not need Java, however, frameworks such as Hadoop run on
the JVM. These frameworks constitute much of the big data stack.

383
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.4.4 Hadoop

Hadoop is a processing framework that manages data processing and


storage for big data applications running in clustered systems. This allows
storage for massive amounts of data and enables more processing power
with the ability to handle virtually limitless tasks at once. Additionally, Java
actually does have a number of libraries and tools for machine learning and
data science, it is easily scalable for larger applications, and it is fast.

13.4.5 SQL

SQL (Structured Query Language) is a domain-specific language used for


managing data in a relational database management system. SQL is
somewhat like Hadoop in that it manages data, however, the storage of the
data is much different and is explained very well in the above video. SQL
tables and SQL queries are critical for every data scientist to know and be
comfortable with. While SQL is not able to be exclusively used for data
science, it is imperative that a data scientist knows how to work with data
in database managing systems.

13.4.6 Julia

Julia is another high-level programming language and was designed for


high-performance numerical analysis and computational science. It has a
very wide range of uses such as web programming for both front and back-
end. Julia is able to be embedded in programs using its API,
supporting metaprogramming. This language is said the be faster for
Python because it was designed to quickly implement mathematical
concepts like linear algebra and deals with matrices better. Julia provides
the speedy development of Python or R while producing programs that run
as fast as C or Fortran programs would.

384
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.4.7 Scala

Scala is a general programming language that provides support for


functional programming, object-oriented programming, a strong static type
system, and concurrent and synchronized processing. Scala was designed
to address many issues that Java has. Once again, this language has many
different uses from web applications to machine learning, however, this
language only covers front end development. The language is known for
being scalable and good for handling big data as well, as the name itself is
an acronym of “scalable language”. Scala paired with Apache Spark allows
the ability to perform parallel processing on a large scale. Furthermore,
there are many popular and high-performance data science frameworks
written on top of Hadoop to be used in Scala or Java.

In conclusion, Python seems to be the most widely used programming


language for data scientists today. This language allows the integration of
SQL, TensorFlow, and many other useful functions and libraries for data
science and machine learning. With over 70,000 Python libraries, the
possibilities within this language seem endless. Python also allows a
programmer to create CSV output to easily read data in a spreadsheet. My
recommendation to newly aspiring data scientists is to first learn and
master Python and SQL data science implementations before looking at
other programming languages. It also is apparent that it is imperative that
a data scientist has some knowledge of Hadoop.

13.5 SOFTWARE’S USED IN DATA ANALYTICS:

The best data analytics software for 2020 is Sisense because of its simple
yet powerful functionalities that let you aggregate, visualize, and analyze
data quickly. Moreover, this platform has a scalable architecture that allows
it to handle a wide range of data volumes, making it great for small and
large businesses alike.

The digital age has made it easier for professionals to access data that
would allow you to optimize your business performance. However, to
leverage this information, you will need data analytics software that can
provide you with tools for data mining, organization, analysis, and

385
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

visualization. Moreover, it should be equipped with AI and advanced


algorithms to transform your raw data into valuable insights instantly. This
way, you can keep up with business trends, and even find ways to further
improve your overall operations.

However, there are plenty of factors involved in finding the right analytics
tool for a particular business. From checking its performance to figuring out
how well it plays with other systems, the research process can be
overwhelming. So, to help you, we have compiled the leading products on
the market and assessed their functionalities and usability. This way, it will
be easier for you to determine the best possible data analytics platform for
your operations.

Handling data from collection to visualization is a challenge in itself, even


more so as it the amount of data you process grows. Furthermore, a
scalable system is a requirement as data grows in volume each day. Hence,
you’d want to have an application equipped with the architecture and
technology that can support your data analytics processes.

In Kaggle’s survey on the State of Data Science and Machine Learning,


more than 16,000 data professionals from 171 countries and territories
revealed the challenges they encountered in the field. Respondents were
asked to choose all factors that apply to the difficulties they faced.

13.5.1 Challenges Encountered by Data Professionals


• Dirty Data-35.9%
• Lack of Data Science Talent-30.2%
• Lack of Management or Financial Support-27%
• Lack of Clear Questions to Answer-22.1%
• Data Inaccessibility-22%
• Data Science Results Not Used by Decision-Makers-17.7%
• Explaining Data Science to Others-16%
• Privacy Issues-14.4%
• Lack of Significant Domain Expert Input-14.2%

386
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

• Organization Cannot Afford a Data Science Team-13%

Results showed that the top challenge for data scientists is dirty
data (36%). Next comes the lack of data science talent (30%), company
politics (27%), not having clear questions (22%), inaccessible data (22%),
and results not used by decision-makers (18%).

There are also problems with the difficulty of explaining data to others with
16% and privacy issues with 14%. Meanwhile, 13% of data professionals
revealed their small organization couldn’t afford to have a data science
team.

These issues call for the importance of maximizing the capabilities of


technology doing so much more beyond recognizing your organization’s
bottlenecks. Today’s data analytics tools are easier to use and more
affordable for all sizes of companies. This is vital, especially when
considering the value it can bring to your organization.

For instance, content companies can use a data analytics tool to keep their
audiences by clicking and watching their content. Another example is for
gaming companies to get their hands on relevant data to keep players
active in the game by providing rewards.

Of course, it’s not new for companies trying to be data-driven in their


decision-making processes. According to a report by NVP, 85%
of companies are transitioning to become more data-driven. However, only
37% of them become successful.

387
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.5.2 Choosing the right data analytics software

To ensure effective implementation, one of the first things to prioritize is


choosing the right data analytics software. An excellent place to start is
getting to know the leading products in the niche by checking out our list
of the 20 best data analytics software. This way, you can further assess
exactly how you can leverage analytics to refine your workflows and boost
your products and services.

1. Sisense:

Sisense offers a robust data analytics system that brings analytics not just
to data scientists, but to all business users as well. It simplifies business
data analytics even to non-technical users through its set of tools and
features. Insights are extracted instantly by any user using self-service
analytics without hard coding and aggregating modeling. Some of its top
features that enable you to do so include its personalized dashboards,
interactive visualizations, and analytical capabilities.

Its dashboard is one of its top features that enable you to filter, explore,
and mine data in just a few clicks to get instant answers to your questions.
With its in-chip technology, data analytics can be performed faster with
richer insights. Furthermore, it provides you with advanced analytics
through an improved, advanced BI reporting and predictive analytics by
integrating R functions in your formulas.

It is best to test the features and functionalities of the tool first so that
you’ll know if it matches your requirements. To do so, you may sign up for
Sisense free demo.

388
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

Why choose Sisense?


• NLG technology. With NLG (Natural Language Detection) technology,
you can easily detect trends and patterns when you are interacting with
every single widget on the dashboard.
• Data Visualization. It has a rich widget library with a wide collection of
predesigned data visualization widgets. You also have the option to
submit your own open-source designs or receive recommendations on
how to best view your data.
• Anomaly detection. If there are any anomalies in your data, the
system uses machine learning to instantly detect and alert you with any
potential issues.

2. Looker
Looker is a data analytics platform that allows anyone to ask sophisticated
questions using familiar business terms. It delivers data directly to the
tools and applications used by your team, including custom ones.

In simple terms, the platform gathers and extracts data from various
sources and then loads it into an SQL database. From there, it undergoes
the platform’s agile modeling layer for custom business logic and, finally,
makes it available for all users through dashboards, shared insights, and
explorations.

As a browser-based solution, data is conveniently accessible in your


existing systems and easily shareable to everyone in your team. Exporting
can also be done both locally and directly to platforms such as Google
Drive and Dropbox. If you are interested to know this solution’s features
better, the Looker free demo can be a great way to get started.

Why choose Looker?


• Accessible data. The app doesn’t lock your data in your data analytics
tool. Instead, it can be accessed through systems such as Slack and
Salesforce.
• Data scheduling. Anyone in the team can schedule the delivery of data to
FTP, S3, chat, webhooks, and email, among others.
• Web integration. This provides you with a responsive mobile design, a
solution that’s embeddable with SSO, and a full RESTful API.

389
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

3. Zoho Analytics

Formerly Zoho Reports, business data analytics platform Zoho


Analytics hosts different components. These include KPI widgets, pivot
tables, and tabular view components. Thus, it can generate reports that
offer very useful insights. The system promotes collaborative review and
analysis, allowing users to work with colleagues on report development and
decision-making.

What’s good about the system is that businesses can embed just about any
report or dashboard in their blogs, websites, and apps. The system even
has state-of-the-art security practices that include connection encryption.
It can also be used by ISVs and developers the solution in the building and
integration of analytical and reporting functionalities into their systems.
Zoho Analytics offers a free trial and test drive its features at no cost.

Why choose Zoho Analytics?

a. Insightful reports. The solution makes use of a variety of


components, allowing it to generate insightful reports, which can be
used as a basis for data-driven decisions.

b. Highly-secure system. It makes use of only the finest security


measures, including encrypted connections.

c. Collaboration. Zoho Analytics promotes collaboration, allowing


colleagues to jointly develop reports and decisions.

4. Yellowfin

Yellowfin is an end-to-end business intelligence solution created to help


companies make better sense of their data. Equipped with comprehensive
analytics features, market-leading collaboration tools, and machine
learning capabilities, this is great for getting actionable insights from your
company’s performance. It allows you to get data-driven predictions that
can be used to make smarter business decisions. The software even has
multiple data visualization options so you can present your data, however
you prefer.

390
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

Accessible via desktop and mobile devices, Yellowfin also comes with a web
API that lets it integrate with a wide variety of business systems, add-ons,
and widgets. This means you can easily extend its functionalities depending
on the changing needs of your business. Alternatively, you can merge it
with your existing software solutions to streamline your workflow. The
vendor has an appealing free trial where you can tinker with the features
at no cost.

Why choose Yellowfin?


• Fully integrated BI platform. Yellowfin was created to eliminate having
to invest in different BI solutions. With this, you can address multiple
analytics problems and consolidate all of your data discovery, reporting,
and analytics in one platform.
• Data storytelling. Traditional graphs and charts can be boring and
difficult to understand. Using Yellowfin’s data storytelling capability, you
can create interactive presentations using different immersive analytics.
This way, you can simplify the data visualization process and let your
numbers do the talking.
• Customizable notifications. On top of presenting you with
comprehensive reports, Yellowfin also makes sure you are in-the-loop at
all times. Offering custom notifications, you will be alerted whenever
there are notable changes in your data flow.

5. Periscope Data

Periscope Data by Sisense combines business intelligence and analytics


into a single end-to-end platform to provide modern business with a
solution for collecting data, analyzing them and sharing the insights
garnered to the team and their customers.

It aggregates distinct data sources into one single source of truth and then
utilizes advanced analytics and BI reporting to make the most out of them.
The platform offers a wide variety of visualizations and charts to choose
from, and you can even create your very own if that’s the more beneficial
course of action to take for your organization.

Its predictive analysis feature, together with its Natural Language


Processing capability, which can streamline the analysis of unstructured
data, helps keep your business competitive and proactive. Working with

391
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

data is also made simpler and more efficient thanks to the integrated
Python, R and SQL environments.

Why choose Periscope Data?


• Data Engine. It comes with a data engine that lets you spend less time
managing several tech stacks so that query performance is at its fastest.
When paired with data ingestion at the most optimal speed, this virtually
guarantees that you’ll have ample time analyzing data regardless of its
complexity, size, and concurrency.
• Powerful SQL Editor. Periscope Data comes with advanced
functionalities, such as autocomplete, query revision history and snippets
mode. Moreover, it’s an environment that supports Python, R, and SQL.
This means that preparing datasets, performing analysis and creating
interactive visualizations are all easier to do in the language of your
choice.
• Insights sharing. Live dashboards can be seamlessly and quickly
shared with the team either using email, direct link or via Slack. This
keeps everyone on the same page. Dashboards can even be embedded
directly into your website or product, so end users can easily explore
data whenever convenient to them.

6. Domo

Domo is a data analytics solution that aims to provide a digitally


connected environment for your data, people, and systems. Your business’
data is put into work for all employees in your organization through real-
time data refresh and drag-and-drop data preparation capabilities.
Furthermore, partners outside of your organization also get to engage with
your data to increase productivity and the ability to act on them.

Using a holistic view in your system, you can make more informed actions
with the tool’s 7 platform components working together. You are notified
with predictive alerts to bring crucial matters and issues into your attention
with enough time before they make an impact on your organization.

392
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

Why choose Domo?

a. Connected data. Bring your data directly together with over 500 data
connectors from any third-party source such as on the cloud, on-
premise, and proprietary systems.

b. Instant data-driven chat. It has more than 300 interactive charts and
dashboards both for desktop and mobile use.

c. Native mobile apps. Manage responsibilities on your mobile devices


both for Android and iOS that’s intuitive, real-time, and designed for on-
the-go usage.

7. Qlik Sense

Qlik Sense is powered by the associative engine to deeply extract insights


commonly missed by other query-based data analytics tools. It does so by
indexing each possible relationship between data and combining them from
various data sources into a centralized view. The cloud-based data
analytics platform provides you with flexibility in providing the right
solution for various cases for analysts, teams, and global enterprises.

The absence of pre-aggregated data from standard query-based tools


paves the way for asking new questions and generating analytics even
without waiting for the help of experts in building new queries. Sharing of
insights is made with ease regardless of your organization’s size as the
system enables work collaboration in a secure, unified hub.

Why choose Qlik Sense?


• Smart visualization. Find insights visually by the fully interactive
interface of the platform that enables you to pan, zoom, and make
selections to explore and pinpoint your data effectively.
• Interactive analysis. Interactive selection and global search let you
explore data and ask any question with no limits to exploration. Each
click also updates analytics instantly to ensure the most up-to-date
version is available.
• Flexible for any device. Touch interaction and a responsive mobile
design ensure you can work on any device once you build the analytics
app once to enable exploration, collaboration, and creation of analytics.

393
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

8. GoodData

GoodData is an end-to-end, secure cloud data analytics system that


caters to your entire data pipeline—from the moment you take in data to
the time you deliver the insights you generated. It is not only available to
enterprises, but partnerships and software companies as well. The product
is mostly used in industries such as insurance, retail, financial services, and
ISV.

This smart business application integrates insights directly into your point
of work to expedite the decision-making process. Improvements are also
automated over time as it learns from user actions and is capable of
making data-driven predictions. On top of that, the tool ensures
enterprise-grade security in HIPAA, GDPR, SOC II, and ISO 27001, among
others.

Why choose GoodData?


• Industry-specific solutions. The tool has solutions specifically built to
cater to the needs of ISV, retail, financial services, and insurance
industries.
• Quick implementation. Deployment is done quickly so you can
immediately use the system within 8 to 10 weeks.
• Embedded analytics. Analytics is embedded in your application so you
can extend it for any use case such as machine learning, benchmarking,
basic reports, and advanced analytics.

9. Birst

Birst is a solution that utilizes data analytics in a network that connects


your insights for making smarter business decisions. These networked
analytics solutions combine the speed, agility, and usability of consumer-
grade desktop tools with the needs of IT specialists for data governance
and scalability. It has a multi-tenant cloud architecture that enables the
expansion of data analytics across departments, product lines, and regions.

Its specialty lies in its 2-tier approach for end-user data visualization,
querying, and production-oriented business intelligence. You can extract
data and maximize connectivity options in various databases and cloud or

394
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

on-premise applications. Developers can build their connections, or they


may use its numerous integrations with third-party systems.

Why choose Birst?


• User data tier. This lets you enable data as a service for data
governance by both centralized and decentralized teams. It is used for
aggregating and governing a complex mix of enterprise data with agility
and speed.
• Adaptive user experience. Users get a broad choice in how they
interact with data that adapts to modern work styles. Regardless of the
device used, you will see the same consistent interface with the tools
that you prefer.
• Multi-tenant cloud architecture. It connects everyone in a single,
networked view of data through the virtualization of the entire data
analytics ecosystem.

10. IBM Analytics

IBM Analytics is a data analytics tool specializing in evidence-based


insights to support crucial decision-making for your business. It simplifies
how you collect, organizes and analyze your data to allow an optimized
procurement, management, and scale. You even get freedom on how you
collect all types of data from various data sources.

The tool also lets you build a secure foundation for your analytics and
organize your data in a business-aligned, single source of truth.
Furthermore, it enables you to scale your insights by incorporating
evidence-based insights into your decisions that were previously
unobtainable. This can help you analyze data more smartly.

395
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

Why choose IBM Analytics?


• Machine learning. Expedite the deployment of data science projects by
maximizing intelligence into your enterprise application.
• Prescriptive analytics. Consider business constraints and the
optimization of business trade-offs to determine the best course of action
according to plan, design, schedule, and configuration.
• Predictive analytics. This brings together data mining, text analytics,
predictive modeling, ad-hoc statistical analytics, and advanced analytics
capabilities, among others, to spot patterns in data and anticipate what
is most likely to happen next.

11. IBM Cognos

IBM Cognos is the solution to consider when you’re looking to make


business decisions quickly with the use of smart self-service capabilities.
This tool provides IT with a solution to deploy in the cloud or on-premise
according to the architecture they require. Additionally, it caters to
business users who want to create and configure dashboards and reports
on their own.

One of its top features is its self-service functionality that enables users to
interact and access reports on mobile devices, both online and offline.
When it comes to analytics, the tool also offers a wide selection of analysis
methods ranging from trend analysis, analytical reporting, trend analysis,
and what-if analysis.

Why choose IBM Cognos?


• Smart self-service. As an integrated solution, you can efficiently deliver
mission-critical analytics and generate insights from data in an
impressive presentation and visualization.
• Robust automation. To increase productivity across teams,
ecosystems, and organizations, it uses smart technology to automate the
analytics process, offer recommendations, and predict user intent.
• Complete cloud-based experience. The user experience remains
consistent whether you are using the tool via desktop or mobile device as
it doesn’t require any desktop tool. This also eliminates the need for you
to transfer data as it lives in the cloud.

396
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

12. IBM Watson

IBM Watson is an analytics platform that streamlines leveraging


interactions, predicting disruptions, and accelerating research through the
use of artificial intelligence. This advanced data analysis and visualization
solution lives in the cloud and provides a reliable 2guide to users over the
discovery and analysis of their data.

Independently unravel patterns and meaning in your data through guided


data discovery and automated predictive analytics. Even without the help
of a professional data analyst, you can interact with data and gather
answers that you can understand using the tool’s cognitive capabilities like
natural language dialogue. This means any business user can immediately
determine a trend and visualize the data report in the dashboard for an
effective presentation.

Why choose IBM Watson?


• Smart data discovery. Using your own words, you can easily type a
question that will add or connect to data for you to gather
understandable insights on the go. Whether you’re on desktop or iPad,
you immediately get a roster of starting points.
• Analysis of trusted data. Since data analytics comes in many forms,
the tool helps you stay in synch when exploring, predicting, and
assembling data for a trusted insight.
• Simplified analysis. You can be prepared to act with confidence when
you identify patterns and factors that can potentially drive business
outcomes.

13. MATLAB

MATLAB is a data analytics platform commonly used by engineering, and


IT teams to support their big data analytics processes. It enables you to
access data from various sources and formats such as IoT devices, OPC
servers, File I/O, databases, data warehouses, and distributed file systems
(Hadoop) in a single, integrated environment.

Before the development of predictive models, the system empowers you to


pre-process and prepare your data by automating tasks ranging from
cleaning data, handling missing data, and removing noise from sensor

397
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

data. You can then directly forecast and predict outcomes by building
predictive models and prototypes. Furthermore, the system lets you
integrate the tool with production IT environments even without recoding
or building a custom infrastructure.

Why choose MATLAB?


• Machine learning. Offering a full set of statistics and ML functionalities,
it provides you with advanced methods like system identification, prebuilt
algorithms, financial modeling, and nonlinear optimization.
• Online deployment. The tool integrates with enterprise systems,
clouds, and clusters. Additionally, it can be targeted for real-time
embedded hardware.
• Physical-world data. It provides native support for binary, image,
sensor, telemetry, video, and other real-time formats.

14. Google Analytics

Google Analytics is one of the most popular and widely-used data


analytics tools to summarize data on high-level dashboards and function
with a variety of funnel visualization techniques. At its core, it is a web
analytics service that is used for tracking and reporting about website
traffic. The freemium product provides an analysis of poorly performing
pages using various data-driven techniques.

Furthermore, this tool provides you with data that you can transform into
actionable insights for businesses of all sizes to garner a stronger result
across their websites, applications, and offline channels. Specializing in one
of the most important aspects of data analysis, this tool is essential for
building a tight data analysis framework for your organization.

398
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

Why choose Google Analytics?


• Data collection and management. You get a single, comprehensive
view of your customer that’s easily customizable according to your
business needs. Sharing this across your organization is streamlined as
well.
• Data activation. Used in marketing, data can be activated to leverage
marketing campaigns and explore new content and channels.
• Data analysis. Reporting and analysis tools are available to help you
segment and filter data according to your needs to have a better
understanding of your customer’s lifecycle.

15. Apache Hadoop

If you are looking for an open-source platform, Apache Hadoop is a good


place to start for distributed storage and processing of large datasets. In
addition, it offers services for data access, governance, security, and
operations. It is a collection of utilities that facilitates a network of multiple
computers and data sets on computer clusters built from commodity
hardware to solve problems.

This solution is fundamentally resilient to support large computing clusters.


Failure of individual nodes in the cluster is rarely an issue, and if it does,
the system automatically re-replicates the data and redirects the remaining
ones in the cluster. It is a highly scalable platform that stores, handles, and
analyzes data at a petabyte scale.

Why choose Apache Hadoop?


• Low cost. Since it is an open-source platform, it runs on low-cost
commodity hardware, making it a more affordable solution compared to
proprietary software.
• Flexible platform. Data can be stored in any format, parsed, and
applied the schema to it when read. Since structured schemas are not
required before storing data, you may even store data in semi-structured
and unstructured formats.
• Data access and analysis. Data analysts have the option to choose
their preferred tools as they can interact with data in the platform

399
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

seamlessly using batch or interactive SQL or low-latency access with


NoSQL.

16. Apache Spark

Apache Spark is a developer-friendly big data analytics platform that


supports large SQL, stream processing, and batch processing. Like Apache
Hadoop, it is an open-source platform in data processing that supports a
unified analytics engine for machine learning and big data.

To maximize this solution, you can run it on Hadoop to create applications


that will leverage its power, derive deeper insights, and improve data
science workloads in a single and shared database.

Consistent levels of response and service are expected with its Hadoop
YARN-based architecture which makes the tool one of the data access
engines that work in YARN in HDP. This means the solution, along with
other applications, can share a common dataset and cluster with ease.

Why choose Apache Spark?


• Unified solution. It caters to creating and combining complex workflows
as it comes packaged with support for SQL queries, graph processing,
machine learning, and higher-level libraries.
• Data processing engine. Data analysts can execute streaming,
machine learning, and SQL workloads in development APIs needing fast
access to datasets.
• Easy-to-use APIs. The tool is easy to use as it offers a collection of
over 100 operators for data transformation and familiar data frame APIs
for semi-structured data manipulation.

400
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

17. SAP Business Intelligence Platform

SAP Business Intelligence Platform is a data analytics tool for


monitoring key metrics and gaining valuable insight into customer
behaviour while eliminating the guesswork in the process. At its core, it
serves as a BI solution for delivering actionable information at your reach.
It is available both in the cloud or on-premise to suit your requirements.

The platform offers various tools, including SAP Analytics Cloud and SAP
BusinessObjects BI Suite. They are used for solving specific business needs
and leveraging decision-making. By supporting the collecting IQ of your
business, this tool is reliable in providing a high standard for enterprise
data analytics and BI.

Why choose the SAP Business Intelligence Platform?


• Dashboards and apps. Dashboards are compelling and insightful to
drive an effective BI adoption to your organization.
• Data Visualization. To present the data in a way that’s understandable
for everyone involved, the tool helps business users in various skill levels
to easily understand and make use of data when making business
decisions.
• Self-service. By having access to data anytime and anywhere, decision-
making is made faster and more informed through real-time business
data.

18. Minitab

Minitab provides smart data analysis for businesses to improve quality


and drive efficiency in their performance. The flagship product, called
Minitab Statistical Software, is used by companies to graph and analyze
their data. Meanwhile, Minitab 18 is a solution that provides robust
statistical software for all business users to find meaningful solutions to
your tough business problems.

Through these tools, you don’t need to be a statistics expert to understand


and gain insights from your data since Minitab’s Assistance will guide you
along the way. Its other modules include Quality Trainer for learning how to
analyze data and companion for centralizing the tools you need in reporting

401
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

and analysis such as process mapping, brainstorming tools, and quality


function deployment.

Why choose Minitab?


• Cost-effective e-learning. To fully maximize the tool, you get e-
learning materials on how to analyze data and statistics online.
• Project Roadmaps. This feature streamlines the execution, replication,
and sharing of projects across your organization. Meanwhile, it also has
process mapping for constructing high-level and detailed flow charts.
• Assistant feature. This walks you through data analysis and helps you
interpret results with better accuracy. It can improve the quality of the
product and process as well as boost efficiency.

19. Strata

Stata is a data analytics and statistics software for obtaining, exploring,


and manipulating data. It enables you to visualize, model, and curate
results in reproducible reports. What’s more, the platform is developed by
researchers for researchers to effectively support the needs of their fellow
professional software developers. It is a complete and integrated software
package for all tools needed in data management, analysis, and graphics.

The tool is known for being fast, easy, and secure. It has an intuitive
command syntax and a point-and-click interface that streamlines how
analyses are reproduced and documented for review and publication.
Regardless of when they are written, version control ensures the analyses
scripts are accurate and up-to-date to show the same results.

Why choose Stata?


• Data management. You gain complete control over all types of data as
it works with bytes, double, long, float, integer, and string variables. It
can also combine and reshape datasets, monitor variables, and gather
statistics across replicates or groups.
• Statistical tools. It packs hundreds of statistical features ranging from
standard methods (basic tabulation and summaries, linear regression,
and choice modeling) to advanced techniques (multilevel models, survey
data, and structural equation modeling).

402
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

• Publication-quality graphics. Have the option to write scripts for


producing a wide variety of graphs in a reproducible manner or use point
and click to create your custom graph. This enables you to build a
publication-quality and distinctly-styled graph.

20. Visitor Analytics

Visitor Analytics is a simple-to-use analytics platform that allows users to


quickly see and retrieve insightful information about their visitor’s
experience. This user-friendly data analytics software displays
comprehensive metrics and charts for easy identification of website issues,
incoming generating factors, website visitor paths, and other information
that can prove valuable to improve your business.

With Visitor Analytics, you have access to detailed dashboards that have all
the key information you need at a glance. Basically, the software provides
you with all the key data you need on a silver platter. You can also view all
information on your mobile devices, which means you can whip out your
phone to check on traffic stats, page performance, conversion rates, and
other metrics.

Why choose Visitor Analytics?


• Easily comprehensible data. Visitor Analytics removes the guesswork
and complexities of analyzing complex data and statistics. The software
provides you with all the key data and metrics you need for
understanding the experience of your visitors.
• Insightful information for business planning. Use the data to your
advantage by improving the layout for optimal visitor paths and
conversions.
• A wealth of visitor experience data. Capture data on devices and
browsers used, geographical location, page performance, behaviour
history, referrals, and others.

403
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.6 OTHER RECOMMENDED SOLUTIONS

1. Actian. This platform features data management, analytics database


solution and data integration that let you process queries of various
data types, such as, SQL, embedded, relational, object and NoSQL.
Moreover, it integrates with a variety of apps and databases in both on-
premise and cloud deployment model and is ideal for large and medium
enterprises.

2. Analyse-it. An Excel-based data visualization and statistical analysis


tool, this software suits users who are familiar or comfortable with
spreadsheets. Also, non-technical users with limited statistics skills will
also find it intuitive, especially that file formats are saved as an Excel
workbook, making collaboration simple.

3. Pyramid Analytics. A user-friendly high-end analytics platform, this


comes with a self-service BI portal. Furthermore, it is also compatible
with any browser or device and features tools to generate various data
modeling that helps you understand complex business issues.

4. Arcadia Data. A BI platform with real-time capabilities that can scale to


a variety of data modeling demands, Arcadia Data is an ideal tool to
help you understand complex statistics fast. Additionally, the system
has a strong orientation towards data security and aims to set the
standard on preventing data leaks.

5. SigmaPlot. A data analysis tool to help you create graphs fast, even for
non-technical users. Besides, this software can also integrate with Excel
for data organization and PowerPoint for presenting outputs.

6. dxFeed Bookmap. A revolutionary data visualization solution, dxFeed


Bookmap provides users with a crystal clear view of the market to equip
them with essential data to make smarter business decisions. In
addition, the platform is primarily designed for traders who are looking
to track the dynamics of Depth of Market and explore the evolution of
the order book.

7. AtomLynx Insights Engine. This business intelligence solution and


data analytics platform gathers all important metrics and information
from disparate sources and places them on a single platform for fast

404
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

and efficient data analysis. Moreover, it leverages AI to provide you with


the relevant metrics you need.

8. Displayr. An intuitive BI system for beginners yet powerful for pros,


Displayr features tools for visualization, exploration, modeling, and
reporting. Furthermore, it utilizes drag-and-drop variable manipulation,
full R Language support, and statistical testing.

9. Statistix. Featuring a menu-driven interface and easy navigation for


non-statisticians, Statistix features powerful data manipulation
capabilities. The vendor also offers free technical support.

13.7 MAXIMISING DATA ANALYTICS SOFTWARE

• The process of analyzing big data to improve your business operations


doesn’t stop at purchasing data analytics software. If you want to
achieve your goals, you should also put some elbow grease into learning
the ins and outs of your new data analytics system. Moreover, it is
pertinent that you have a clear-cut idea of how to use it to your
advantage. To help you, we have listed some ways on how to maximize
this type of technology.
• Define what you want to achieve using data analytics
• Even before purchasing your data analytics program, try to list down
what you plan to use it for. That is to say, you need to understand what
aspects of your operations will benefit from using data, specify the type
of insights you would want the platform to provide, and determine what
you want to measure. By doing so, you will be able to specify your KPIs
and other metrics as well as establish a good framework as to how you
can use your new investment.
• Determine where to source your data
• Most analytics platforms have the capability to source data from multiple
internal and external systems for a more convenient data collection and
synchronization process. However, this doesn’t mean you should simply
integrate your platform with all your existing business systems and call it
a day. Be selective about where you source your data, choosing only
those that will help you move your data analysis efforts forward. This

405
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

way, you not only generate more targeted reports but also prevent your
database from being cluttered with information that you won’t even use.
• Regularly Assess Your Data Models
• You wouldn’t use outdated information to perform analysis, so why
should you utilize outdated data models? To prevent old data models
from having a negative effect on your data analytics efforts, you need to
make it a point to assess these models now and then. Check if you are
ignoring certain data sources or if you have overlooked how certain fields
could affect your model. Perhaps, certain data sources are containing
poor naming standards that are affecting your data analytics model. By
taking this extra step, you can ensure that you are generating accurate
reports that can drive your business forward.
• By taking advantage of these tips as you implement your data analytics
software, you are only a few steps away from reaping all the benefits
that this technology has to offer. Hopefully, our list of 20 best data
analytics tools was able to guide you in finding the right platform for your
operations.
• To sum it up, we highly recommend choosing Sisense. This is because it
offers a code-free self-service analytics system that is great for both
tech-averse and tech-savvy users. Furthermore, it offers highly
customizable dashboards, allowing it to easily adapt to your business’
data analytics and visualization needs.

406
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.8 SUMMARY

Data Science is a dynamic field with ever growing technologies and tools.
Since Data Science is a vast field, you must select a specific problem to
tackle. For this, you should select the programming language best suited
for it. The programming languages mentioned above, focus on several key
areas of Data Science and one must always be willing to experiment with
new languages based on the requirements.

There are 2 types of programming languages, a low-level programming


language is the most understandable language used by a computer to
perform its operations. Assembly language is used for direct hardware
manipulation, to access specialized processor instructions, or to address
performance issues. A machine language consists of binaries that can be
directly read and executed by the computer. Assembly languages require
an assembler software to be converted into machine code. Low-level
languages are faster and more memory efficient than high-level languages.

Another is high-level programming language has a strong abstraction from


the details of the computer, unlike low-level programming languages. This
enables the programmer to create code that is independent of the type of
computer. These languages are much closer to human language than a
low-level programming language and are also converted into machine
language behind the scenes by either the interpreter or compiler.

The Python, R, JAWA, Hadoop etc are some of the most used languages
along with other languages in in the development of software. Each has
specific characters and accordingly they are used by data scientists.

The digital age has made it easier for professionals to access data that
would allow you to optimize your business performance. However, to
leverage this information, you will need data analytics software that can
provide you with tools for data mining, organization, analysis, and
visualization. Moreover, it should be equipped with AI and advanced
algorithms to transform your raw data into valuable insights instantly. This
way, you can keep up with business trends, and even find ways to further
improve your overall operations.

407
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

However, there are plenty of factors involved in finding the right analytics
tool for a particular business. From checking its performance to figuring out
how well it plays with other systems, the research process can be
overwhelming. So, to help you, we have compiled the leading products on
the market and assessed their functionalities and usability. This way, it will
be easier for you to determine the best possible data analytics platform for
your operations.

The best data analytics software for 2020 is Sisense because of its simple
yet powerful functionalities that let you aggregate, visualize, and analyze
data quickly. Moreover, this platform has a scalable architecture that allows
it to handle a wide range of data volumes, making it great for small and
large businesses alike.

There are various challenges Encountered by Data Professionals


where they come across with following challenges with percentage
challenges as under:
• Dirty Data-35.9%
• Lack of Data Science Talent-30.2%
• Lack of Management or Financial Support-27%
• Lack of Clear Questions to Answer-22.1%
• Data Inaccessibility-22%
• Data Science Results Not Used by Decision-Makers-17.7%
• Explaining Data Science to Others-16%
• Privacy Issues-14.4%
• Lack of Significant Domain Expert Input-14.2%
• Organization Cannot Afford a Data Science Team-13%

The above information shows that the top challenge for data scientists is
dirty data (36%). Next comes the lack of data science talent (30%),
company politics (27%), not having clear questions (22%), inaccessible
data (22%), and results not used by decision-makers (18%).

408
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

These issues call for the importance of maximizing the capabilities of


technology doing so much more beyond recognizing your organization’s
bottlenecks. Today’s data analytics tools are easier to use and more
affordable for all sizes of companies. This is vital, especially when
considering the value it can bring to your organization.

To ensure effective implementation, one of the first things to prioritize is


choosing the right data analytics software. An excellent place to start is
getting to know the leading products in the niche by checking out our list
of the 20 best data analytics software in this chapter. This way, you can
further assess exactly how you can leverage analytics to refine your
workflows and boost your products and services.

By taking advantage of some of the tips provided at the end of the chapter
as you implement your data analytics software, you may be few steps
away from reaping all the benefits that this technology has to offer. The list
of 20 best data analytics tools was able to guide you in finding the right
platform for your operations.

To sum it up, we highly recommend choosing Sisense. This is because it


offers a code-free self-service analytics system that is great for both tech-
averse and tech-savvy users. Furthermore, it offers highly customizable
dashboards, allowing it to easily adapt to your business’ data analytics and
visualization needs.

13.9 SELF ASSESSMENT QUESTIONS

1. What are the programme languages that are used in data analytics?
Describe.

2. Name any 3 program languages , which according to you are best to use
un data analytics and explain in short.

3. What are the characteristic features of Software’s used in data


analytics? Explain in short.

4. Write short notes on : Sisense software.

5. What process steps you are required take in to consideration while


Maximising scope of data analytics software?

409
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

13.10 MULTIPLE CHOICE QUESTIONS

1. A ---------------------- is the most understandable language used by a


computer to perform its operations.
a. low-level programming language
b. High level programming language
c. Assembly language
d. Machine language

2. Which of the languages are much closer to human language and are
also converted into machine language behind the scenes by either the
interpreter or compiler?
a. low-level programming language
b. High level programming language
c. Assembly language
d. Program language

3. What programming language is mostly used by data scientists?


a. R
b. SQL
c. JAWA
d. Python

4. Sisense offers a robust data analytics system that brings analytics not
just to data scientists, but to all business users as well why?
a. NLG technology, Data Visualization and Anomaly detection
b. Accessible data, data scheduling and web integration
c. Insightful report, Highly-secure system and Collaboration
d. Data story telling Highly-secure system and Collaboration

5. Which data analytics platform commonly used by engineering, and IT


teams to support their big data analytics processes?
a. ZOHO
b. MATLAB
c. Apache Hadoop
d. IBM Analytics

Answers: 1.(a), 2.(b), 3.(d), 4. (a), 5.(b)

410
PROGRAMMING LANGUAGES & SOFTWARES USED IN DATA ANALYTICS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture

411
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

Chapter 14
Business Analytics and Digital
Transformation
Objectives:

On completion of this chapter, you will come to know how existing —


business processes, culture, and customer experiences meet changing
business and market requirements. This re-imagining of business in the
digital age is digital transformation. We will consider following point for our
discussion to understand the subject better considering the current
scenarios.

Structure:

14.1 Introduction

14.2 Definition

14.3 Difference between digitization, digitalization, and digital

transformation

14.4 What’s possible with digital transformation

14.5 Why Are Businesses Going Through Digital Transformations?

14.6 Why do businesses need to transform in the digital era?

14.7 Examples of Digital Transformation

14.8 How to Digitally Transform Your Business

14.9 Summary

14.10 Self Assessment Questions

14.11 Multiple Choice Questions

412
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

14.1 INTRODUCTION

The utilization of data analytics remains a common aspect of digital


transformation. An increasing number of enterprises view data as a
commodity, which explains how this became a major area of technology
investment across several industries. Organizations are willing to invest
vast resources into gathering, creating, and analyzing data. It should also
be noted that there are different types of analytics for enterprises. Some of
the most common types of analytics include:

• Descriptive analytics: This includes the elementary reporting and


business intelligence conducted by most enterprises.

• Prescriptive analytics: This involves technology that provides suggestions


for human action.

• Predictive analytics: This includes using data insights to anticipate human


action in the future. The predictions are commonly from a
recommendation engine

The collection and organization of data analytics is a common aspect that is


often tied to other digital transformation components, such cloud
computing and the Internet of Things (IoT). Data analytics is often the
driving force for those organizations embarking on adigital transformation.
Numerous departments within an enterprise could have a technological
need for data analytics, such as sales departments that have customer
relationship management (CRM) software. One of the sectors that have
embraced data analytics is the financial services industry, which uses the
technology to help with the detection of fraud.

Across the board, enterprises in all industries —and of different sizes —can
greatly benefit from data analytics. Data analytics can also aid enterprise
automation processes for numerous applications, such as providing insight
about when a machine or a system will fail. Overall, enterprises that
embrace data analytics will see improved productivity, which will enhance
important business decisions.

Although there are many benefits associated with harnessing data


analytics, there are also some risks to keep in mind. On the surface, data
might seem useful, but data by itself really does not provide immediate

413
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

assistance. In order for that data to be used correctly, it has to be


organized by employees. In some enterprises, that additional data is only
beneficial if it is acquired, organized, and shared in real-time.

14.2 DEFINITION:

Digital transformation is the process of using digital technologies to create


new — or modify existing — business processes, culture, and customer
experiences to meet changing business and market requirements. This
reimagining of business in the digital age is digital transformation.

It transcends traditional roles like sales, marketing, and customer service.


Instead, digital transformation begins and ends with how you think about,
and engage with, customers. As you move from paper to spreadsheets to
smart applications for managing your business, you have a chance to
reimagine how to do business — how to engage customers — with digital
technology on your side.

For small businesses just getting started, there’s no need to set up your
business processes and transform them later. You can future-proof your
organization from the word go. Building a 21st-century business on stickies
and handwritten ledgers just isn’t sustainable. Thinking, planning, and
building digitally sets you up to be agile, flexible, and ready to grow.

As they embark on digital transformation, many companies are taking a


step back to ask whether they are really doing the right things. Before
looking at the how’s and what’s of transforming your business, you first
need to answer a fundamental question: How did you get from paper and
pencil record-keeping to world-changing businesses built on the backs of
digital technologies?

414
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

14.3 DIFFERENCE BETWEEN DIGITIZATION,


DIGITALIZATION, AND DIGITAL TRANSFORMATION:

• Digitization is the move from analog to digital.

Not so long ago, businesses kept records on paper. Whether handwritten in


ledgers or typed into documents, business data was analog. If you wanted
to gather or share information, you dealt with physical documents —
papers and binders, xeroxes, and faxes.

Then computers went mainstream, and most businesses started converting


all of those ink-on-paper records to digital computer files. This is called
digitization: the process of converting information from analog to digital.

Finding and sharing information became much easier once it had been
digitized, but the ways in which businesses used their new digital records
largely mimicked the old analog methods. Computer operating systems
were even designed around icons of file folders to feel familiar and less
intimidating to new users. Digital data was exponentially more efficient for
businesses than analog had been, but business systems and processes
were still largely designed around analog-era ideas about how to find,
share, and use information.

• Digitalization is using digital data to simplify how you work.

The process of using digitized information to make established ways of


working simpler and more efficient is called digitalization. Note the word
established in that definition: Digitalization isn’t about changing how you
do business, or creating new types of businesses. It’s about keeping on
keeping on, but faster and better now that your data is instantly accessible
and not trapped in a file cabinet somewhere in a dusty archive.

Think of customer service, whether in retail, field ops, or a call centre.


Digitalization changed service forever by making customer records easily
and quickly retrievable via computer. The basic methodology of customer
service didn’t change, but the process of fielding an inquiry, looking up the
relevant data, and offering a resolution became much more efficient when
searching paper ledgers was replaced by entering a few keystrokes on a
computer screen or mobile device.

415
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

As digital technology evolved, people started generating ideas for using


business technology in new ways, and not just to do the old things faster.
This is when the idea of digital transformation began to take shape. With
new technologies, new things — and new ways of doing them — were
suddenly possible.

• Digital transformation adds value to every customer interaction.

Digital transformation is changing the way business gets done and, in some
cases, creating entirely new classes of businesses. With digital
transformation, companies are taking a step back and revisiting everything
they do, from internal systems to customer interactions both online and in
person. They’re asking big questions like “Can we change our processes in
a way that will enable better decision-making, game-changing efficiencies,
or a better customer experience with more personalization?”

Now we’re firmly entrenched in the digital age, and businesses of all sorts
are creating clever, effective, and disruptive ways of leveraging technology.
Netflix is a great example. It started out as a mail order service and
disrupted the brick-and-mortar video rental business. Then digital
innovations made wide-scale streaming video possible. Today, Netflix takes
on traditional broadcast and cable television networks and production
studios all at once by offering a growing library of on-demand content at
ultracompetitive prices.

Digitization gave Netflix the ability not only to stream video content directly
to customers, but also to gain unprecedented insight into viewing habits
and preferences. It uses that data to inform everything from the design of
its user experience to the development of first-run shows and movies at in-
house studios. That’s digital transformation in action: taking advantage of
available technologies to inform how a business runs.

416
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

14.4 WHAT’S POSSIBLE WITH DIGITAL TRANSFORMATION

A key element of digital transformation is understanding the potential of


your technology. Again, that doesn’t mean asking “How much faster can
we do things the same way?” It means asking “What is our technology
really capable of, and how can we adapt our business and processes to
make the most of our technology investments?”

Before Netflix, people chose movies to rent by going to stores and combing
through shelves of tapes and discs in search of something that looked
good. Now, libraries of digital content are served up on personal devices,
complete with recommendations and reviews based on user preferences.

Streaming subscription-based content directly to people’s TVs, computers,


and mobile devices was an obvious disruption to the brick-and-mortar
video rental business. Embracing streaming also led to Netflix looking at
what else it could do with the available technology. That led to innovations
like a content recommendation system driven by artificial intelligence. Talk
about making the most out of your IT department!

• Adapt your business to leverage digital transformation.

Similarly, digital transformations have reshaped how companies approach


customer service. The old model was to wait for customers to come find
you, whether in person or by calling an 800 number. But the rise of social
media has changed service much like it’s changed advertising, marketing,
and even sales and customer service. Progressive companies embrace
social media as a chance to extend their service offerings by meeting
customers on their platforms of choice. Making call centres and in-store
service desks run more efficiently with digital technology is of course great.
But real transformation comes when you look at all available technologies
and consider how adapting your business to them can give customers a
better experience. Social media wasn’t invented to take the place of call
centres, but it’s become an additional channel (and opportunity) to offer
better customer service. Adapting your service offerings to embrace social
media is another good example of a digital transformation.

417
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

As mentioned earlier, digital transformation encourages businesses to


reconsider everything, including traditional ideas of teams and
departments. That doesn’t necessarily mean tapping your service reps to
run marketing campaigns, but it can mean knocking down walls between
departments. Your social media presence can encompass service and
marketing, tied together by a digital platform that captures customer
information, creates personalized journeys, and routes customer queries to
your service agents.

14.5 WHY ARE BUSINESSES GOING THROUGH DIGITAL


TRANSFORMATIONS?

As digital technology advances and plays an ever-bigger part in our daily


lives, businesses have to keep up with the times. From a broad
perspective, it’s simple: Keep up or fall behind. Understanding what digital
transformation means to your business requires a bit more exploration,
however.

• Drivers in digitalization and digital transformation

The root of any change in business starts with customers. It has to:
Customer happiness is how you win in business.

Modern customer expectations are being driven by largely digital


technology and digital innovations. The always-connected customer is
always seeing new possibilities. When they see new things elsewhere, they
want them from you, too. And if you can’t offer them, they’ll find someone
else who can. The digitally connected world makes it easier than ever for
customers to comparison shop and move from one brand to another, often
with minimal effort required. Gain insights from consumers and business
buyers on the intersection of experience, technology, and trust.

• Digital innovation shapes business across all industries.

Digital transformation impacts every industry. Whether your business


generates revenue through client services, digital media, or physical goods,
technological innovations can transform your means of production,
distribution, and customer service.

418
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

Depending on your business, your customer could be a consumer or a


business-to-business (B2B) client. Let’s extend our perspective to also
include your employees. As we’ll talk about in a moment, employee
expectations are being driven by their own consumer experiences,
particularly when it comes to digital innovation in the workplace.

• Customers expect digital technology and innovation.

Today’s customers are connected and empowered by the digital era.


They’re connected 24/7, and increasingly want and expect that
same around-the-clock access to the companies they do business with. The
key drivers behind this change in consumer behaviour? Mobile devices and
social media.

Over half of customers surveyed for Salesforce’s report “State of the


Connected Customer” (first edition) said that technology has significantly
changed their expectations of how companies should interact with them.
More specifically, 73% of customers prefer to do business with brands that
personalize their shopping experience, according to the Harvard Business
Review.

Salesforce’s research also reports that 57% of consumers said it’s


absolutely critical or very important for companies they purchase from to
be innovative. Otherwise, they might just look for new companies to buy
from: 70% of respondents said new technologies have made it easier for
them to take their business elsewhere.

• Employee empowerment drives digital solutions.

The Apple iPhone is often mentioned as a key driver in the adoption of


consumer technology in the workplace. The iPhone wasn’t originally
marketed to businesses, but it quickly became popular, to the point that
corporate IT departments had to accommodate employees wanting to use
iPhones in lieu of other devices. Once a few big employers opened their
doors, acceptance of iPhones in the enterprise spread quickly.

419
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

The iPhone disrupted the status quo for technology adoption in the
workplace. Instead of IT leaders telling employees which approved devices
to use, enough workers asked for iPhones that IT departments eventually
acquiesced. This trend continues today, with more “consumer-grade”
technologies making their way into the workplace. Maybe even more
noteworthy is the flip side of the trend: Enterprise software has started
taking design and functionality cues from the consumer world. Long live
ease of use!

• Digital-first employees are connected employees.

Millennials — more than any other subset of the workforce — are


proponents of the digital-first mentality. Having come of age on PCs,
consumer electronics, and phone apps, millennials expect to enjoy the
same powerful, easy-to-use digital tools in the workplace as they do in the
rest of their lives.

Digital transformations apply this digital-first state of mind to empower all


your employees. In the same way that consumers look for businesses
ready and willing to connect with them 24/7 via social media and other
digital channels, today’s employees thrive in environments that make it
easy to collaborate, access information, and work anytime and from
anywhere. Digitalization is a powerful ally of the empowered employee.

For small businesses, the upside to building a digital business can be


game-changing. Not only is digitalization key to meeting customer
expectations and empowering employees, but it can also help small
businesses do more with less. The efficiencies afforded by going digital —
having one comprehensive database shared across your entire business,
leveraging customer data to create personalized messaging and service
strategies, enabling employee connectivity from mobile devices, for
example — can free small teams up to spend more time winning and
keeping new customers.

Bonus: When you build digitally from the beginning, it’s much easier to
scale systems as your business grows.

420
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

• Digital innovations are transforming industries.

Employees aren’t the only ones benefiting from easy-to-use, always-on


access to information in the workplace. Machines themselves are getting
smarter, too. Artificial intelligence (AI), the Internet of Things (IoT), cloud
analytics, and sensors of all sizes and capabilities are transforming
manufacturing, production, research — virtually all facets of business
across all industries

The examples are never ending. Digital innovations like AI and the IoT are
driving all manner of advancements in the production of everything from
consumer goods to cars and trucks. Optimized manufacturing processes
adapt to changing consumer demand. Cloud-based software affords real-
time visibility into supply chain logistics. Customer experience mapping
powered by machine learning surfaces key insights to help product
planners, marketers, and budget makers alike do their jobs better.
Together, these and many more innovations like them are changing the
way we do business, from every conceivable angle.

14.6 WHY DO BUSINESSES NEED TO TRANSFORM IN THE


DIGITAL ERA?

Digital transformation is business transformation. It’s a transformation


that’s being driven by the basic desire to make work better for everyone,
from employees to customers. The drivers we just walked though are some
of the biggest reasons behind the massive changes rippling through the
business world right now. Add to that the need every business has to
compete for — and win — customers. If your competitors are leveraging
digital transformation to streamline production, expand distribution, build a
better workplace for employees, and improve the overall customer
experience, you’d better up your game, too.

But how are these changes taking shape? What does digital transformation
look like in practice, across different parts of an organization? Let’s take a
look at some examples.

421
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

14.7 EXAMPLES OF DIGITAL TRANSFORMATION

What does digital transformation look like in practice, and how has it
already changed the way we do business? Let’s take a look at examples of
digital innovations in marketing, sales, and service that build closer
customer relationships and empower employees across all industries.

A. Examples of digital transformation in marketing.

At a high level, the goal of digital transformation in marketing is to find


more customers while spending less money. More specifically, awesome
digital marketing generates more quality leads and helps you get closer to
all of your customers, whether they’re new to your brand or long-time
loyalists.

The shift from analog to digital marketing materials helps these efforts in
two key ways. First, digital materials are generally cheaper to produce and
distribute than analog media. Email, in particular, is far less expensive than
print-and-mail campaigns. Second, digital marketing opens the door
to marketing automation, analytics tracking, and dialogue with customers
in ways that analog never could.

Instead of planning a one-size-fits-all trip down the funnel, marketers can


build 1-to-1 journeys that observe customer behaviours and shape the
experience along the way to best suit each individual buyer. And instead of
going on instinct and gut feelings alone, marketers now have data-driven
insights at hand to help craft those journeys.

B. Digital transformation helps marketers connect with individual


customers.

In “Welcome to Marketing in the Age of the Customer,” we take a close look


at the most popular digital tools and how marketers can leverage them
across the entire customer lifecycle. The entire post is well worth a read,
and serves as a great primer on how technologies — ranging from cloud
computing to artificial intelligence — can help you get closer to customers.

Let’s look at some examples from that article that detail how digitally
transforming your messaging strategy can increase customer engagement
and reduce your costs.

422
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

Traditional Marketing Digital Marketing Transformational


Channel Channel Impact
Print materials Digital materials Reduce cost of print and
distribution; ability to
score/grade prospects
based on digital
interactions
Print mail campaigns Email campaigns Reduce cost of print and
postage; greater scale
and personalization
Print/billboard advertising Social media advertising Personalized targeting;
lookalike audience
targeting
Brick-and-mortar Website/ecommerce site Eliminate rent/utilities;
storefront accessibility and scale;
opportunity to nurture
prospects at scale
Loyalty Club Card Mobile App Reduce signup friction;
reduce cost of printing
cards; ability to
personalize promotions
and trigger offers in real-
time; opportunity to push
offers and messaging out
to customers.

C. Examples of digital transformation in sales.

There’s a good reason that the traditional roles of marketing and sales are
being redefined in the digital age. It’s all about the data.

The ability to collect large amounts of precise data on consumer behaviour


lets marketing and sales teams, in particular, approach their work in ways
never before possible. Looking at consumers as individuals, and studying
their behaviour from the first touchpoint all the way through the buying
journey, brings to light the natural bond between marketing and sales.
Nurture that bond, and magic happens when these historically separate
groups work together.

423
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

• Data makes every sales rep productive.

Salespeople particularly benefit from access to more and better data. When
marketing and sales teams share information across a CRM, and individual
sales reps enter sales activity and keep their pipelines up to date on the
platform, information flows freely throughout an entire organization.

From there, two big things happen. First, more eyes on the same
information means more opportunities to share intelligence across your
entire business. Maybe someone from marketing ops sees a sales rep’s
note about a prospect in the CRM, and shares marketing campaign
activities related to the prospect that helps move the deal along.

Second, as information flows and gathers within your company, you set
yourself up to leverage cutting-edge digital innovations like artificial
intelligence.

• Digital transformation creates AI-driven sales techniques.

Artificial intelligence systems can be incredibly helpful in their ability to


comb through vast amounts of data in search of useful patterns and other
insights. As AI services evolve, they’re studying sales and marketing data
not only from the end-consumer standpoint, but also to determine the
effectiveness of sales techniques and strategies themselves. In addition to
surfacing insights around, say, which demographics are more likely to buy
at what times of the year, AI can shed light on which sales strategies have
proven most effective over time, or what promotions and product bundles
bucked long-term trends to move the revenue needle.

With more and more datasets available from external sources, AI systems
can mine marketplace information as well as your own sales history. From
there, the systems look for correlations, patterns, and even anomalies to
give your teams a competitive edge when going after accounts. Combining
AI-driven insights with the tribal knowledge of your teams is perhaps the
ultimate realization of digital transformation for sales.

424
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

• Social selling strategies are a key component of digital


transformations.

Social media is everywhere, mashing up news, entertainment, and brand


interactions alongside interpersonal connections. PricewaterhouseCoopers
recently found that 78% of consumers were in some way influenced by
social media during their buying process. And nearly half of consumers said
their buying behavior was directly affected by reviews and comments they
came across on social.

Consumer participation in social media has changed the buying process, so


any successful digital transformation needs to incorporate a social selling
strategy. This uniquely digital medium is full of opportunity for the savvy
salesperson to connect and build relationships with prospects and long-
time customers. As the Digital Marketing Institute aptly said, “Successful
social sellers can be regarded as thought leaders, or even trusted
consultants, by prospective customers as they provide value through
industry insights, sharing expertise and offering solutions to common
consumer questions through creating or sharing insightful content.”

Customer service, and our ideas around where service begins and ends,
are being upended by the digital era as much as any other part of
business. Maybe more so.

The “on-demand economy” has quickly grown from a few upstart apps that
hire errand runners and hail cars for busy urbanites to a global movement
to, as Forbes put it, “Uberize the entire economy.” A combination of
smartphone ubiquity, electronic payment systems, and apps designed to
match demand (consumers) to supply (gig workers) in real time has
created a world in which nearly anything you might want is just a swipe
and tap away, around the clock.

Talk about digital transformation! With everything from pizza delivery to


child care now available at their fingertips, customers are expecting more
and more companies and industries to embrace digital as their primary
means of doing business. For service departments, that means greater
expectations for 24/7 problem-solving on the customer’s channel of choice.
But it also means greater opportunities to delight buyers and win more
business.

425
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

• Social media is the new customer service desk.

Listening and responding to customers across all social media channels


sounds pretty daunting if you’re just getting started with the Twitter and
LinkedIn apps on your own phone. But a host of tools designed for social
service makes it easy to highlight customer needs, integrate social
channels into your service workflows, and start measuring brand sentiment
and activity across social media.

Meeting your customers where they already are is a big part of winning
business in our digital world. Approaching social service with a digital
transformation mindset can really spell the difference between struggling
to keep up with customer needs and turning service calls into opportunities
to grow your brand.

Collaboration across the different parts of your business is key. The


Salesforce “State of the Connected Customer” report made that clear: 84%
of high-performing marketing leaders say that service collaborates with
marketing to manage and respond to social inquiries and issues, while just
37% of underperformers say the same. When information is freed from
silos, teams collaborate more, and businesses perform better.

• Self-service is a service agent’s best friend.

Remember the days when everything from canned goods to kitchen


appliances came with a toll-free customer service number, and that 800
number was your only avenue for everything from product questions to
warranty claims? Call centers aren’t quite a thing of the past, but the
digital age brings so much more flexibility when it comes to finding the
right medium for serving customers in different ways.

The self-service portal is a great example. These user-facing tools offer


features like password reset, self-logging of incidents, service requests,
and knowledge base searches. They can also include more interactive
services like collaborative spaces, chat services, and embedded social
media feeds that are relevant to service issues.

User-friendly design, including search fields that offer suggestions, and


user profiles that leverage customers’ purchase and service histories, can
go a long way toward personalizing self-service for your customers. A good

426
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

self-service portal can reduce the demands on your service agents.


And customers like self-service: 59% of consumers and 71% of business
buyers say self-service availability impacts their loyalty, according to our
research.

• AI plays a key role in the digital transformation of service.

Bringing artificial intelligence into your service organization is a prime


example of the power of digital transformation. AI-powered chatbots that
answer simple customer inquiries serve as a welcoming presence on your
website, reducing the time customers have to wait to reach an agent.

Deploying chatbots to handle level one inquiries also frees up service


personnel to spend time on more sensitive cases. AI-powered bots can
serve as the entry point into intelligent case routing systems. When a
customer’s query is too complex for the chatbot to handle, natural
language processing helps map the question to the best available expert to
resolve the situation.

14.7 EXAMPLES OF DIGITAL TRANSFORMATION

1. Examples of digital transformation across industries.

We’ve talked a lot in this chapter about specific examples of digital


transformation in marketing, sales, and service. All digital transformations
start with the move from analog to digital — that is, taking information off
of paper and putting it into the digital realm. From there, these basic ideas
apply to all businesses and industries:

• Meet customers in the digital channels they already frequent


• Leverage data to better understand your customers and the marketplace
as a whole
• Free your data and share intelligence across your entire business
• Encourage once-separate groups like marketing, sales, and service to
collaborate

Digital transformation is helping many industries. Let’s look at how these


ideas are being applied in a few specific ones.

427
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

2. Examples of digital transformation in banking.

Banking has been radically transformed by digital technologies in ways that


have greatly benefited many consumers. Not so long ago, the majority of
transactions were handled in person by bank tellers. Automated teller
machines (ATMs) came along and streamlined the basic transaction
process, extending business hours and reducing wait times and
dependencies on human employees for cash withdrawals and other popular
transactions. Over time, ATM technology has evolved to accommodate cash
and check deposits, more secure transactions, and support for multiple
accounts, including credit cards and mortgages.

More recently, PCs and mobile devices have given way to online and mobile
banking, and cashless payment systems. Consumers now conduct more
and more bank business via the web, including paying bills and sending
funds directly to friends and family. Mobile banking apps let users take
snapshots of paper checks to make remote deposits, and a new wave of
payment systems, including PayPal and Apple Pay, let consumers pay for
everyday purchases with accounts linked directly to their phones, no cash
or plastic card required.

3. Examples of digital transformation in retail.

Retail has also been radically transformed in the digital era. Digital
transformation has both impacted the in-store retail experience and
ushered in the age of ecommerce.

Digital technologies have improved the retail experience for consumers and
proprietors alike, enabling everything from loyalty cards and e-coupons to
automated inventory and retail analytics systems. Shoppers who used to
clip coupons from newspapers and magazines now just show their phones
at checkout to access in-store discounts and deals. When they do this,
their purchases are tallied by digital systems that track consumer
behaviour trends, tie into inventory and purchasing systems, and trigger
individualized customer journey events like email and SMS messaging.
Additional personalization of the in-store experience can be enabled by
digital beacons that link to mobile apps to sense when particular shoppers
enter the store. From there, anything from a phone alert to a personal
concierge can be deployed to enhance the retail experience.

428
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

Retailers are now even experimenting with subscription-style sales using


Internet of Things technology. Amazon, for example, has Dash Buttons:
IoT-enabled devices with buttons that trigger automated reordering of an
item. Branded Dash Buttons are available for a growing number of
household goods and other items regularly in need of replenishment. Just
click the button when you’re running low and a refill — billed to your
Amazon Prime account, naturally — will be dispatched right away, just like
that.

4. Examples of digital transformation in insurance.

The impact of digital transformation in the insurance industry is similar to


our other examples in that consumer expectations are driving change.
Web- and app-based self-service portals make it easy for consumers to
comparison shop, enrol in coverage, use multiple agents and carriers for
different types of insurance (home, car, life, and so on), and file claims. In
fact, much of this is now possible without the need to actually speak to an
agent, which saves time for consumers and money for the insurance
companies.

What’s notable about digital transformation in insurance is the role the


Internet of Things is playing in revamping the industry. Inexpensive, IoT-
enabled sensors are giving insurers access to a wealth of data that’s
informing industry forecasting and claim reviews alike. Take auto insurance
as an example: In-vehicle sensors monitor actual driving habits, rewarding
consumers who routinely drive safely under the speed limit or log fewer-
than-average miles. Sensors connected to phones could also be used to
deter texting while driving by disabling a driver’s messaging apps while
their car is in motion. Connecting vehicles to wearable devices with blood
alcohol measurement capabilities could help prevent drunk driving by
temporarily disabling the engine, cutting risk for insurance carriers while
also making roads safer for everyone.

429
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

14.8 HOW TO DIGITALLY TRANSFORM YOUR BUSINESS

A digital transformation is a complete business transformation. It’s crucial


to keep this in mind if you’re seriously considering transforming your
business. It’s not just about updating IT systems and apps. It’s a cultural
shift, and a reimagining of all of your company’s processes and ways of
doing things.

As discussed, small businesses — even those just getting off the ground —
can leverage a digital transformation mindset to build digital first into their
company culture. What better way to imagine how digital innovation can
benefit customers than by being a digital native yourself in all aspects of
growing and running a business?

Before we get into how to build a framework for your digital


transformation, let’s first go through some of the signs that your business
is, in fact, in need of transforming.

• Signs that a business needs a digital transformation.

Signs that your business is in need of a digital transformation can appear


across different parts of your organization. They may not scream “It’s time
to go digital!” or “Why aren’t you on Instagram?” Instead, they could
manifest as a diverse set of business problems.

If one or more of the items on our checklist rings true, it might be time to
think seriously about developing a digital transformation strategy.
• You’re not getting the referrals that you used to get. More and more
referrals are now shared online, via social media, apps, email, and
messaging. If your business doesn’t have a strong, easy-to-share digital
presence, you could be missing out on referrals.
• Repeat business isn’t repeating like it used to. Customers not coming
back to do business with you again isn’t necessarily a sign that your
products and services aren’t measuring up. Losing repeat business could
be due to competitors’ promotions, lack of follow-up communication on
your part, or any number of other reasons. A digital transformation of
your messaging strategy could shed light on why your repeats have been
dwindling.

430
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

• Tried-and-true promotions are no longer generating leads. Why aren’t


your killer promotions effective anymore? Are you measuring their
impact? It’s hard to pinpoint the impact of print campaigns, and even last
year’s best digital strategies may no longer be effective. If your
promotions aren’t bringing in leads, it may be time for a new, bottom-up
approach to marketing.
• Cross-departmental complaints are mounting about a lack of
collaboration and information sharing, teams operating in silos, and so
on. The idea that sales and marketing just don’t get along has gone the
way of the dinosaurs. Collaboration is the operative word in today’s
progressive business cultures, and getting your data out of silos and in
front of whoever needs it is key. At the core of every digital foundation is
a plan to make business data accessible and useful across departments.
• Your technology systems feel old — employees are asking for features
they’re used to from consumer apps. Spreadsheets are great, but you
shouldn’t be using them for everything. Modern business apps that serve
specific needs, integrate with one another for data sharing, and offer
user-friendly experiences across desktop and mobile are where it’s at. If
your current technology doesn’t offer employees most, if not all, of the
above, maybe it’s time to look at a technology platform that can.

Digging past the surface to understand the root causes of these problems
often leads to the realization that you don’t have the proper visibility into
business data necessary to make good decisions. Many SMBs are built on a
patchwork of applications that don’t talk to each other. Fixing your
technology infrastructure to facilitate sharing and analyzing data across
your business is a key step toward better, more informed decision-making.

• A digital transformation strategy is a business transformation


strategy.

Remember that just as digital transformations are about business first, and
digital second, problems with your business data may be signals to look
more closely at how your company is doing business generally. Laurie
McCabe, Co-Founder and Partner at SMB Group, said it well: “In fact, it's
usually situations like these that make you realize you don't have great
visibility into your own business data or, even worse, have lost touch with
what your customers want and need.”

431
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

If you’re seeing red flags and realizing that your business data isn’t
centralized, accessible, and working for you, what’s next? It’s time to craft
a digital transformation strategy.

• How small business leaders can think about a digital


transformation strategy.

Start with an internal assessment to identify gaps, problems, and areas


where you may experience difficulties. What’s your biggest problem?
What’s the key to your survival? For very small and very new businesses,
the answers may be short and sweet: We need customers and sales. We
need a few key processes and systems we can run with. It’s important to
involve everyone at your company. All will be part of your digital
transformation over time, and you may have more stakeholders than you
think.

Even if your company is small and new, and the path to digital
transformation seems clear now, remember that you’re building for the
future. And future you will be bigger. Whether that means more
employees, more revenue, or both, your business will grow. Flexibility and
the ability to stay nimble as your business evolves should be built right into
your digital transformation strategy. Connecting with a Salesforce
MVP online or in person can be a great — and free — resource as you start
thinking about your small business digital transformation strategy.

• Consider outside help in mapping a digital transformation


strategy.

Working with consultants, partners, and tech vendors can be great for
SMBs because they have the depth of experience and knowledge to help
you figure out the best paths to success. Experienced partners have likely
helped other companies in similar situations, and so can help you find the
most direct paths to meaningful transformation..

Many small business leaders hear the word “consultant” and instinctively
flinch while reaching a hand to guard their wallets. Don’t assume that
getting help is always too expensive — that’s simply not true. Many large
companies offer free advice or trainings for SMBs. Beyond free offerings,
there are all sorts of ways to get advice without spending a lot.

432
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

• You don’t have to create your digital transformation roadmap


alone.

Remember that the point of hiring or partnering with an external group to


craft your digital transformation strategy is to draw upon their expertise.
They bring something to the table that you don’t have — experience and
industry expertise across many different clients — and can provide value
and best practices. Your short-term investment in their time is designed to
help your business reap bigger benefits over the long haul.

Tapping the right partner to consult on your transformation strategy lets


you come up with a better plan than you could on your own, while also
letting you stay focused on your core business. It will also help you avoid
some of the rookie mistakes that inevitably happen when you go it alone.

• Collaborate on technology decisions and investments when


leading a digital transformation in your organization.

If you’re leading a digital transformation in your organization, keep this


rule of thumb in mind as you consider decisions and investments: Be
collaborative. If you have 10 employees, all 10 will be affected by the
change, so you need to get them on board.

Don’t make decisions in a vacuum. The changes brought by digital


transformation will impact everyone’s daily workflow, and are meant to
empower employees. Get everyone involved early and solicit ideas. Not
only will you get better buy-in, you’ll get a better outcome, too.

• Avoid common mistakes in your digital transformation


framework.

Technology integration is key. It’s perhaps the number one area SMBs
should be investing in.

One of the biggest, easiest-to-make mistakes that businesses make is


investing in a bunch of different technologies that don’t integrate.
Unfortunately, it’s hard to unwind the resultant snarl of information when
your platforms and apps don’t work together.

433
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

SMBs need to stay focused on getting the capabilities they need now in a
way that will scale as their businesses grow. Today’s business ecosystems
and platforms make it easy for vendors and developers to build apps
tailored to helping SMBs grow. Adopting a scalable platform will help
ensure that the processes and information in your company can flow as
easily as possible. That’s the foundation upon which everything else can be
built.

• Build bridges to connect your data, employees, and customers.

You don’t need to scrap everything and start over when beginning a digital
transformation, even if you’re transitioning from a snarl of apps that don’t
talk to each other. In fact, the most effective solution is to bridge data
silos, and pull all information into a central space — rather than completely
starting over.

The second part of the process is to unify your data, with the aim of
creating a single, unified view of the customer. Once you’ve built bridges
between fragmented information, you’ll be able to surface useful insights
into customer behaviour and maximize the potential of new technologies
like AI. Looking at your business anew with the benefit of new insights and
tools is what digital transformations are all about.

434
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

14.9 SUMMARY

The collection and organization of data analytics is a common aspect that is


often tied to other digital transformation components, such cloud
computing and the Internet of Things (IoT). Data analytics is often the
driving force for those organizations embarking on a digital transformation.

Information is a critical enterprise asset but it’s still in the early adoption
phase. As businesses focus on digital transformation, this makes data and
analytics strategic priorities.

And they are the key accelerants of an organization’s digitization and


transformation efforts. Yet less than 50% of documented corporate
strategies mention data and analytics as fundamental components for
delivering enterprise value.

Businesses do realize that. But they are struggling to make the cultural
shift or commit to the necessary information management and advanced
analytics skills and technology investments.

Finding and sharing information became much easier once it had been
digitized, but the ways in which businesses used their new digital records
largely mimicked the old analog methods. Computer operating systems
were even designed around icons of file folders to feel familiar and less
intimidating to new users. Digital data was exponentially more efficient for
businesses than analog had been, but business systems and processes
were still largely designed around analog-era ideas about how to find,
share, and use information.

As digital technology evolved, people started generating ideas for using


business technology in new ways, and not just to do the old things faster.
This is when the idea of digital transformation began to take shape. With
new technologies, new things — and new ways of doing them — were
suddenly possible.

A key element of digital transformation is understanding the potential of


your technology. Again, that doesn’t mean asking “How much faster can
we do things the same way?” It means asking “What is our technology
really capable of, and how can we adapt our business and processes to
make the most of our technology investments?”

435
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

Digital transformation is business transformation. It’s a transformation


that’s being driven by the basic desire to make work better for everyone,
from employees to customers. The drivers we just walked though are some
of the biggest reasons behind the massive changes rippling through the
business world right now. Add to that the need every business has to
compete for — and win — customers. If your competitors are leveraging
digital transformation to streamline production, expand distribution, build a
better workplace for employees, and improve the overall customer
experience, you’d better up your game, too.

All digital transformations start with the move from analog to digital — that
is, taking information off of paper and putting it into the digital realm.
From there, these basic ideas apply to all businesses and industries:

• Meet customers in the digital channels they already frequent


• Leverage data to better understand your customers and the marketplace
as a whole
• Free your data and share intelligence across your entire business
• Encourage once-separate groups like marketing, sales, and service to
collaborate

Thus, digital transformation is a complete business transformation. It’s


crucial to keep this in mind if you’re seriously considering transforming
your business. It’s not just about updating IT systems and apps. It’s a
cultural shift, and a reimagining of all of your company’s processes and
ways of doing things.

436
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

14.10 SELF ASSESSMENT QUESTIONS

1. How business analytics is important in digital transformation? Explain

2. What is the Difference between digitization, digitalization, and digital


transformation?

3. Why Are Businesses Going Through Digital Transformations?

4. Explain with an example of digital transformation in retail.

5. List out the steps involved in Digitally Transforming the Business.

14.11 MULTIPLE CHOICE QUESTIONS

1. The ---------------------------is a common aspect that is often tied to


other digital transformation components, such cloud computing and the
Internet of Things.
a. collection of data
b. organization of data analytics
c. impact of data collected while implementing
d. collection and organization of data analytics

2. What is digital transformation?


a. It is the process of using digital technologies to create new business
process
b. It is the process of modify existing business processes,
c. It is a process of using digital technologies to create new process or
modify existing business processes, culture, and customer
experiences to meet changing business and market requirements
d. It is process of changing the culture, and customer experiences only
to meet changing business requirements

3. The process of using digitized information to make established ways of


working simpler and more efficient is called as -------------------
a. Digitalization
b. Digitization
c. Digital workflow
d. Optimising the digitalisation

437
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

4. What’s possible with digital transformation?


a. understanding the potential of your technology
b. How much faster can we do things the same way in business
c. What is technology in company really capable of, and how one can
adapt business and processes to make the most of our technology
investments
d. How much business can invest in to technology

5. A digital transformation is ---------------------a complete business


transformation. It’s crucial to keep this in mind if you’re seriously
considering transforming your business. It’s not just about updating IT
systems and apps. It’s a cultural shift, and a reimagining of all of your
company’s processes and ways of doing things.
a. updating IT systems and apps
b. a complete business transformation
c. only a cultural shift
d. only to re-imagining of all of processes

Answers: 1.(d), 2.(c), 3.(a), 4.(c), 5.(b)

438
BUSINESS ANALYTICS AND DIGITAL TRANSFORMATION

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture

439
CASE – STUDIES IN BUSINESS ANALYTICS

Chapter 15
Case – Studies in Business Analytics

Objectives:
On completion of this chapter, you will come to know the big success
stories of the companies who used big data analytics to deliver the
extraordinary results. These success stories are considered as case studies
to better understand the subject matter clearly. The case studies are as
under:

Structure:

15.1 Introduction

15.2 Case study-1: Walmart : How Big Data is used to drive supermarket

Performance

15.3 Case study-2: US Olympic women’s cycling team : How big data

analytics is used to optimise Athlete’s performance

15.4 Case Study-3: US immigration and Customs: How Big Data is used to

keep passengers safe and prevent Terrorism

15.5 Case study-4: Zynga : Big data in Gaming Industry

15.6 Case study-5: Uber: How big data is at the centre of Uber’s

Transportation Business

15.7 Case study-6: Amazon: How predictive analytics are used to get a

360-degree view of consumers.

15.8 Summary

15.9 Self Assessment Questions

15.10 Multiple Choice Questions

440
CASE – STUDIES IN BUSINESS ANALYTICS

15.1 INTRODUCTION:

In this chapter we are attempting to showcase the current state of the art
in big data and provide an overview of some companies and organisations
across different industries are using the big data to deliver the values in
diverse areas. The case studies covered areas including how retails use big
data to predict the trend and consumer behaviour , how government are
using the big data to foil terrorist plot, even how tiny family butcher or zoo
use the big data to improve the performance as well as use of big data in
cities , telecoms, sports, gambling, fashion , manufacturing , motor racing ,
videogaming etc.

Out of the above named activities we have selected few success stories for
your understanding better the bigdata and business analytics subject.

15.2 CASE STUDY-1: WALMART: HOW BIG DATA IS USED


TO DRIVE SUPERMARKET PERFORMANCE

Walmart are the largest retailers in the world and the world’s largest
company by revenue with over 2 million employees and 20,000 stores in
28 countries. With operations on this scale it is no surprise that they have
long seen the values in data analytics. In 2004, when Hurricane Sandy hit
the US , they found unexpected insights could come to light when data was
studied as whole , rather that isolated individual sets. Attempting to
forecast demand for emergency supplies in the face of approaching
Hurricane Sandy, CIO Linda Dillman turned up some surprising statistics.
As well as flashlights and emergency equipment, expected bad weather
had led to an upsurge in scale of strawberry pop Tarts in several other
locations. Extra supplies of these were dispatched to store in Hurricane
Frances’s path in 2012 and sold extremely well.

Walmart have grown their Big Data and analytics department considerably
since then, continuously staying on cutting edge. In 2015, the company
announced they were in the process of creating the worlds largest private
data cloud, to enable the processing of 2.5 petabytes of information every
hour.

441
CASE – STUDIES IN BUSINESS ANALYTICS

What problem is big data helping to solve?


Supermarkets sell millions of products to millions of people every day. It’s
fiercely competitive industry which a large proportion of people living in the
developed world count on provide them with day-to-day essentials.
Supermarkets compete not just on price but also on customer service and
vitally convenience. Having the right products in the right place at right
time, so the right people can buy them, presents huge logistic problems.
Products have to be efficiently priced to the cent, to stay competitive. And
if customers find they cant get everything they need under one roof , they
will look elsewhere for somewhere to shop that is better fit for their busy
schedule.

How is the Big Data used in Practice?


In 2011, with growing awareness of how data could be used to understand
their customers need and provide them with products they wanted to buy,
Walmart established @WalmartLabs and their first Big data team to
research and deploy new data-led initiatives across the business.

The culmination of this strategy was referred to as the Data Café-A state of
art analytics hub at their Bentonville , Arkansas headquarters. At the Café ,
the analytics team can monitor 200 streams of internal and external data
in real time, including 40 -petabyte database of all the sale transactions in
the previous week.

Timely analysis of real-time data is seen as key business driving


performance-as Walmart Senior Statistical Analyst Naveen Peddamail tell
me “If you can get the insight until you have analysed your sale for a week
or a month , then you have lost the sales within that time.” “Our goal is
always to get information to our business partners as fast as we can, so
they can take action and cut down the turnaround time. It is proactive and
reactive analytics”

Teams from any apart of business are invited to visit the Café with their
data problems and work with the analysts to devise a solution. There is
also a system which monitors performance indicators across the company
and triggers the automated alerts when they hit a certain level-inviting the
teams responsible for them to talk to the data team about possible
solutions.

442
CASE – STUDIES IN BUSINESS ANALYTICS

Peddamail gives an example of grocery team struggling to understand why


sale of a particular produce were unexpectedly declining. Once their data
was in hands of Café Analysts , it was established very quickly that the
decline was directly attributable to the pricing error. The error was
immediately rectified and sales recovered with days.

Sales across different stores in different geographical areas can also be


monitored in real time. One Halloween, Peddamail recalls , sales figures of
Novelty Cookies were being monitored , when analysts saw that there were
several locations where they were not selling at all. This enabled them to
trigger an alert to the merchandising teams responsible for those stores ,
who quickly realised that the product had not even been put on shelves.
Not exactly a complex algorithm , but it would not have been possible
without real time analytics.

Another initiative is Walmart’s social genome project which monitors public


social media conversation and attempt to predict what people will buy
based on their conversations. They also will have Shopycat service , which
predicts how peoples shopping habits are influenced by their friends (using
social media data again ) and have developed their own search engine,
named Polaris, to allow them to analyse search terms entered by
customers on their website.

What were the results?


Walmart tell me that the data Café system has led to reduction in the time
it takes from the problem being spotted in the numbers to a solution being
proposed from an average of two or three weeks down to around 20
minutes.

What data was used?


The data Café uses a constantly refreshed data base consisting of 200
billion rows of transactional data- and that only represents the most recent
few weeks of business !

On top of that it pulls in data from 200 other sources, indicating


meteorological data , economic data, telecom data, social media data, gas
prices and data base of events taking place in the vicinity of Walmart
stores.

443
CASE – STUDIES IN BUSINESS ANALYTICS

What are the technical details?


Walmart’s real time transactional database consists of 40 petabytes of
data. Huge though this volume of transactional data is, it only includes
from the most recent week’s data , as this is where the value , as far as
real time analysis goes, is to be found. Data from across the chain’s
stores, online divisions and corporate units are stored centrally on Hadoop
( distributed data storage and data management system).

CTO Jeremy King has described a approach as “data democracy” as the


aim is to make available to anyone in the business who can make use of it.
At some point after the adoption of distributed Framework in 2011,
analysts became concerned that the volume was growing at the rate that
could hamper their ability to analyse it. As a result a policy of “intelligently
managing” data collection was adopted which involved setting up several
systems designed to refine and categorise the data before it was stored.
Other technologies in use include spark and Cassandra and languages
including R and SAS are used to develop analytical applications.

Any Challenges that had to be overcome?


With the analytics operations as ambitious as the one planned by Walmart,
the rapid expansion required a large intake of new staff, and finding the
right people with the right skill proved difficult. This problem is far from
restricted to Walmart: a recent survey by researchers Gartner found that
more than half of business feel their ability to carry out big data analytics
is hampered by difficulty in hiring the appropriate talent.

One of the approaches Walmart took to solving this was to turn to


crowdsourcing data science competition website Kaggle.

Kaggle set users of the website a challenge involving predicting how


promotional and seasonal events such as stock clearance sale and holiday
would influence sale of number of different products. Those who came up
with models that most closely matched the real life data gathered by
Walmart were invited to apply for positions on the data science team. In
fact , ne of those who found himself working for Walmart after taking part
in competition was Naveen Peddamail, whose thoughts have been included
in this case study.

444
CASE – STUDIES IN BUSINESS ANALYTICS

Once a new Analyst starts at Walmart, they are put through their Analytics
rotation program. This sees them moved through each different team with
responsibility to Analytical work., to allow them to gain a broad overview of
how analytics is used across business.

Walmart’s senior recruiter for its information system operations, Mandar


Thakur told that “The Kaggle competition created an buzz about Walmart
and our analytics organisation. People always knew that Walmart generates
and has a lot of data , but the best part was that this let people see how
we are using it strategically.

What are the key learning points and Takeaways?


Supermarkets are big, fast, constantly changing business that are complex
organism consisting of many individual subsystems, This makes them an
ideal business in which to apply Big Data analytics.

Success in business is driven by competition. Walmart have always taken


a lead in data driven initiatives , such as loyalty and reward programs and
wholeheartedly committing themselves to the latest advances in real time ,
responsive analytics they have shown they plan to remain competitive.

Bricks “n” mortar retail may be seen as “low tech “-almost stone age , in
fact- compared to their flashy , online rivals but Walmart have shown that
cutting age Big Data ias just as relevant to them as it is to Amazon or
Alibaba. Despite the seemingly more convenient options on offer , it
appears that customers , whether through habit or preference , are still
willing to get in their cars and travel to shop to buy things in person. This
means there is a still huge market out there for the taking and business
that makes the best use of analytics in order to drive efficiency and
improve their customer’s experience are set to prosper.

445
CASE – STUDIES IN BUSINESS ANALYTICS

15.3 CASE STUDY-2: US OLYMPIC WOMEN’S CYCLING


TEAM : HOW BIG DATA ANALYTICS IS USED TO OPTIMISE
ATHLETE’S PERFORMANCE

Background:
We are aware that at various points sports and data analytics are becoming
fast friends. This is story of how US women’s cycling team went from
underdogs to silver medallists at the 2012 London Olympic - thanks to
power of data analytics.

The team were struggling when they turned to their friends, family and
community for help. A diverse group of volunteers were formed, made up
individuals in the sports and digital health communities, led by sky
Christopherson. Christopherson was an Olympic cyclist and the world
record holder for 200 meter velodrome sprint in the 35+ category. He had
achieved this using a training regime he designed himself , based on data
analytics and originally inspired by the work of Cardiologist Dr Eric Topol.

What problem is big data helping to solve?


Christopherson formed his OAthelet project after becoming disillusioned
with doping in the sports. This was in the wake of Lance Armstrong drug
scandal , dubbed “the greatest fraud in American sports”. The idea behind
OAthelet was to help athletes optimise their performance and the health
sustainable way , without the use of performance -enhancing drugs. As a
result the philosophy “data not drug” was born.

How is big data used in practice?


Working with the women’s cycling team, Christopherson put together a set
of sophisticated data-capture and monitoring techniques to record every
aspect affecting the athlete’s performance, including diet, sleep pattern,
environment and training intensity. These were monitored to spot patterns
related to the athlete’s performance, so that change could be made to their
training programs.

446
CASE – STUDIES IN BUSINESS ANALYTICS

What were results?


As Christopherson says, by measuring the various aspects (such as sleep
and diet) and understanding how they related , you can create
“breakthrough in performance”.

In this case the depth of analytics meant that Christopherson was able to
drill right down to what he call “Individual optimal zones”. With this
information , tailored programs could be tweaked for each athlete to get
best out of every team member. For example-one insight which came up
was that the cyclist Jenny Reed performed much better in the training if
she had slept at a lower temperature the night before. So she was
provided with water cooled mattress to keep her body at an exact
temperature throughout the night. “This had the effect of giving her better
deep sleep, which is when the body releases human growth hormone and
testosterone naturally”, says Christopherson . In the case of Sarah
Hammer , the data revealed a vitamin D deficiency , so they made changes
to her diet and daily routine ( including getting more sun shine). This
resulted in a measurable difference in her performance.

There is another benefit: helping athletes avoid injury. In Christopherson’s


opinion the leading temptation for athletes to use the performance
enhancing drugs that have blighted cycling is the need to train hard while
avoiding the gangers of injury and illness. Big data enables high
performance sports team to quantify the many factors that influence he
performance , such as training loads, recovery and how the human body
regenerates. This means team can finally measure all these elements and
establish early warning signals that, for example, stop them from pushing
athletes in to over training , which often results in to injury and illness.
According to Christopherson , the key is finding balance during training:
“It’s manipulating the training based on the data you have recorded so that
you are never pushing in to that danger zone, but also never backing off
and under utilising your talents. It’s very fine line and that’s what the big
data is enabling us to finally do.” When used accurately and efficiently , it is
thought big data could vastly extend the careers of professional athletes
and sports men and women well beyond the typical retirement age of 30,
with the right balance of diet and exercise and avoiding injury through over
exertion.

447
CASE – STUDIES IN BUSINESS ANALYTICS

Christopherson’s system has not been put through rigorous scientific


testing but it’s worked well in terms of his personal success and the US
Women’s cycling team- as demonstrated by that incredible silver medal
win.

What data was used?


Christopherson worked with internal and external and structured and un
structured data, for example, tome series data -like measurement of
physical parameters of blood sugar, skin parameters and pulse – was
captured using sensors attached to the body. These also captured noise
and sunlight exposure data. Environmental-such as temperature, time of
day and weather – was also considered, using the publicly available
information. Video analysis was also carried out, and athlete’s sleeping
patterns were measured using direct EEG.

What are the technical details?


To implement the program, Christopherson partnered with San Francisco
based data analytics and visualisation specialist Datameer. The data was
stored in cloud in Hadoop environment, with Datameer analysing the data.
Datameer’s infographic widgets visualised the results.

Any challenges that had to be overcome?


The challenge with data exploration is that it often lacs specific hypothesis.
But as Olympic athletes, the team were able to draw up on their
experience and constant self- experimentation to guide the data
exploration. This experience combined with Datameer’s spreadsheet
approach, helped the team cope with vast amount of data involved.
Datameer’s spreadsheet approach easily integrated the different types ,
sizes, sources of data, making it much easier to extract insights.

What are the key learning points and takeaways?


This case study highlights the importance of finding partner that
understands the unique challenges related to that field. In this case ,
Datameer’s CEO, Stephen Groschupf, was former competitive swimmer at
national level in Germany. With his background and prior knowledge ,
Groschupf immediately saw the potential of the project. Christopherson
was delighted with their input : “they came back with some really exciting
results- some connections that we had not seen before, How diet , training
and environment all influences each other. Everything is interconnected
and you can really see that in the data”

448
CASE – STUDIES IN BUSINESS ANALYTICS

It also highlights the importance of supporting patterns in data. so, it’s not
just about the amount of data you collect or how you analyse it, it’s about
looking for patterns across different datasets and combining that
knowledge to improve performance- this applies to sports team and
business alike.

15.4 CASE STUDY-3: US IMMIGRATION AND CUSTOMS :


HOW BIG DATA IS USED TO KEEP PASSENGERS SAFE AND
PREVENT TERRORISM

Background:
People move back and forward across US borders at a rate of nearly 100
million crossing a year. The department of homeland security (DHS) have
unenviable task of screening each one of those crossing to make sure that
they are not being made with ill intention, and pose no threat to national
security.

Federal agencies have spent many millions of dollars since 11 September


2001 in the hope that they can prevent terrorists entering the country and
carrying out further attacks on domestic soil. While formerly airport
security measures focussed on detecting the transportation of dangerous
items such as drugs or bombs, the emphasis has shifted towards
identifying the bad people.

Working together with researchers at university of Arizona , the DHS have


developed a system which they call the Automated Virtual Agent for truth
assessment in real time- AVATAR.

What problem is big data helping to solve?


Since 9/11, the US has increasingly aware that among the millions of
people crossing its borders every year are some who are travelling with the
intention of doing harm. Security has been massively stepped up at
airports and other points of entry, and generally this relies on one-to-one
screening carried out by human agents, face-to-face with travellers.

This of course leaves the system open to human fallibility. Immigration


and Customs service officers are highly trained to notice inconsistencies
ant tell -tale signs that a person may not be being honest about their
reason for entering the country , and what they intend to do when they get

449
CASE – STUDIES IN BUSINESS ANALYTICS

there. However , of course, as with anything involving humans , mistakes


will happen.

Research has shown that there is no full proof way for human to tell
whether another human is lying simply by speaking to and watching them,
despite what many believes about “give away signs”. Compounding this
problem , humans inevitably get tired , bored and distracted , meaning
their level vigilance can drop.

Of course, none of this is problem to a computer. It will examine the final


traveller of the day with the same vigilance and alertness as it did when it
started work in the morning.

How is the big data used in practice?

The AVATAR system uses sensors that scans the persons face and body
language , picking up the slightest variations of movement or cues which
could suggest something suspicious is going on. In addition, a
computerised agent with a virtual human face and voice asks several
questions in spoken English. The subject of inspection answers , and their
response is monitored , to detect fluctuations in tone of voice as well as
content of what exactly was said.

This data is then compared against the ever growing and constantly
updating big database collected by AVATAR, and matched against
suspicious profiles which experience has shown can indicate that someone
has something to hide or is not being honest about their intentions in
travelling.

Should it match a suspicious profile, the subject is highlighted for further


inspection , this time carried out by human agent.

The data is fed back to the human agent via tablets and smartphones,
which gives them probabilistic assessment of whether a particular subject
is likely to be honest – each aspect of their profile is coded red, amber or
green – depending on how likely AATAR believes it is that they are being
truthful. If too many reds or ambers flash up that subject will be
investigated in more depth.

450
CASE – STUDIES IN BUSINESS ANALYTICS

As well as on the US-Mexico border, the AVATAR system has been trialled
on European borders , including at Bucharest’s main airport.

What were the results?


Field tests of AVATAR system were carried out by National Centre for
Border Security and Immigration in Nogales, Arizona which concluded that
the machine was capable of carrying out the task it had been designed for.
As a result , it has been cleared by security to be put in to operation in
several jurisdiction in the US and Europe.

What data was used?


The AVATAR system relies on three sensors built in to the cabinet to make
probabilistic judgements about whether a person is telling the truth. The
first is an infrared camera that records the data on eye movements and
pupil dilation at 250 frames per second. A video camera monitors body
language for suspicious twitches or habitual movements of the body that fit
profiles which people tend to adopt when they are hiding something .
Lastly, a microphone records voice data to listen the subtle changes in the
pitch and tone of voice.

What are the technical details?


The system combines audio and video data capture devices with a
database of human cues that can give insight in to whether the interviewee
is acting in a suspicious manner. It is kiosk based system with everything
needed for operation included in one unit, making it simple to set up and
move to different points of immigration around the world, where fast data
networks may not be available. The infrared eye movement sensor collects
images at 250 frames per second to catch tiny movements that would
never be visible to another human.

Any challenges that had to be overcome?


Working out whether human beings are lying is notoriously difficult.
Despite having existed since the early 20th century in some form, no lie
detector ( polygraph) has ever been shown to be 100 percent reliable and
courts have never accepted that they are accurate enough for their findings
too. Be presented as evidence in the US or Europe.

451
CASE – STUDIES IN BUSINESS ANALYTICS

AVATAR aims to overcome this through a process similar to the predictive


modelling techniques used in many big data projects. As it examines more
people, it learns more about the facial , vocal and contextual indicators
likely to be present when a person is being dishonest.. while the traditional
lie detector relies on human interpreters to match these signals to what
they feel , based on their experience and the limited amount of reference
data they have access to. By interviewing the millions of people every year,
AVATAR should build up a far more reliable set of reference data which it
can use to flag up suspicious travellers.

What are the key learning points and takeaways?


Levels of migration both in and out of US are constantly increasing , and
system like AVATAR can relieve the manpower burden required to carryout
essential safety screening on travellers.

Machines have the capability to detect whether humans are lying or acting
deceptively , far more accurately than people themselves can, if they are
given the right data and algorithms.

Humans respect authority – lab tests on the AVATAR system found that
interviewee were more likely to answer truthfully when AVATAR was given
a serious, authoritative tone and face than when programmed to speak and
appear friendly and informal.

It will always be possible for human to cheat the system by effectively


adopting deceptive strategies . However, the chances of being able to do so
successfully will reduce the technology like AVATAR becomes more efficient
and widely deployed.

452
CASE – STUDIES IN BUSINESS ANALYTICS

15.5 CASE STUDY-4: ZYNGA : BIG DATA IN GAMING


INDUSTRY

Background:
Big data is big in gaming industry. Take Zynga, the company behind
Farmville, words with friends and Zynga Poker. Zynga position themselves
as makers of social games. , which are generally played on social media
platform and take advantage of connectivity with other users that those
platform offer. Their games are also built to take advantage of big data
those platforms enable them to collect. At their company’s peak, as many
as 2 million players were playing their games at any point during the day
and every second their servers processed 650 hands of Zynga Poker.

What problem is big data helping to solve?


Zynga have leveraged data to provide gamers (or bored office workers)
with novel , compulsive distraction. And of course , to make money.

How is the big data used in practice?


Zynga’s games and hundreds of others that work on the same principle-for
example the hugely popular Candy crush saga -use a business model which
has become known as “Freemium”. Players do not have to handover cash
upfront to play them, although they often charge small amounts (micro
transactions) for enhancements that will give them an advantage over
other players, or make the game more fun. For example, in Farmville ,
which simulates running a farm you can buy extra Livestock for your virtual
agriculture enterprise. Arrangements are also in place with range of
partners ranging from credit card companies to an demand movie
services., allowing players to earn credit to spend in game by taking up
their offers.

This ties in to Zynga’s second revenue stream: advertising. While playing


you will periodically see adverts just like while watching TV or reading a
magazine. Here, the data that they pull from Facebook is used to offer
marketers a precise demographic target for their segmented online
campaigns.

Big Data also plays a part in designing the games. Zynga’s smartest big
data insight was to realise the importance of giving their users what they
wanted, and to this end, they monitored and recorded how their games
were being played, using the data gained to tweak gameplay according to

453
CASE – STUDIES IN BUSINESS ANALYTICS

what was working well. For example, animals, which played mostly a
background role in early version, were made a more prominent part of
later games when the data revealed how popular were they with gamers.
In short, Zynga use data to understand what gamers like and don’t like
about their games.

Game developers are more aware than ever of huge amount of data that
can be gained , when every joysticks twitch can be analysed to provide
feedback on how gamers play games and what they enjoy. Once a game
has been released, this feedback can be analysed to find out if, for
example, players are getting frustrated at a certain point, and a live update
can be deployed to make it slightly easier. The idea is to provide the player
with a challenge that remains entertaining without becoming annoying.
Their ultimate aim is always to get players gaming for as long as possible –
either to feel like they are getting value for money if it was game they paid
for or so that they can be served plenty of adds if it is free game.

Zynga make their data available to all employees , so they can see what
has proved popular in games. So even Farmville product manager can see
the Poker data and see how many people have done a particular game
action, for example. This transparency helps foster a data driven culture
and entertain data experimentation across the company. Indeed, Yuko
Yamazaki , Head of analytics at Zynga , tells that the company are
currently running over 1000 experiments on live products at the time of
writing, continually testing features and personalising game behaviours for
their players. Zynga’s analytics team also do “data hackathons” , using
their data and use cases and they host many analytics and data meet -ups
on site. All this this helps encourage innovation and strengthen the data
driven culture.

Elsewhere in the gaming industry, it has even been suggested that the
Microsoft’s $ 2.5 billion acquisitions of Minecraft last year was because of
the games integrated data mining capabilities, which Microsoft could use in
other products. Minecraft, the extremely popular world building game, is
based around a huge database containing thousands of individual items
and objects that make up each world. By playing the game, the player is
essentially manipulating that data t5o create their desired outcome in the
game. Minecraft, in Microsoft’s opinion, provides an ideal introduction for
children to principles of structuring and manipulating digital data to build
model that relates in some way to the real world.

454
CASE – STUDIES IN BUSINESS ANALYTICS

What were the results?


Zynga measures success on two factors: internal adoption of systems and
external player retention. Looking at the internal metric first, Zynga have
2000 employees, all of whom have access to the company’s data -
visualisation tool. At least 1000 employees are using the tool on daily
basis, demonstrating that the company have a really strong culture of data
based decision making. Externally, user numbers are around 20-25 million
active daily users, which long way from their peak of 72 million active daily
users in 2012. A number of factors are at play in this decline, including the
end of Zynga’s special relationship with Facebook in 2012, and their
historical focus on browser based games ( as opposed to mobile based
games). But in 2014, Zynga acquired mobile specialists Natural Motions,
perhaps signalling a change of focus for the future.

“compared to web gaming” Yamazaki explains “mobile gaming has its own
challenges , such as anonymous play activities, more genres of game and
more concentrated session activities” particularly in mobile games, session
length can be more important than the number of users, and longer
session mean greater opportunities for Zynga. This is because in mobile
sessions , players are usually paying attention the whole time during their
sessions ( whereas in browser based session , they may just have the page
open on inactive tab) . So though the number of daily active users is
down , a stronger focus on mobile games will provide Zynga with the
potential for greater reach and higher revenue.

What data was used?


Zynga capture structured data on everything that happens in their games-
almost every single play is tracked, amounting to around 30-50 billion rows
of data a day.

What are the technical details?


At the time of writing , Zynga are in the process of replacing their MemSQL
database technology with MySQL SSD, running on Amazon web services.
Their vertica data warehouse is the words largest to run on Amazon.

In terms of future developments, the company are exploring real time


analytics and cloud analytics . Zynga have also started investing more in
machine learning and in addition to the technology mentioned above, they
now have Hadoop / MapReduce environment for advanced machine

455
CASE – STUDIES IN BUSINESS ANALYTICS

learning capabilities, focussed on predictions look alike social graph and


clustering analytics.

Any challenges that had to be overcome?


Zynga’s marketing and sometime intrusive presence on our social media
screens has certainly come in for criticism , and it’s fair to say the
company’s fortunes have declined in recent years- partly because of the
ending of their close relationship with the Facebook and partly because, in
the tech world, there always something new , shiny and often also free
popping up to draw user elsewhere. The challenge for Zynga is more where
to go from here, although rising mobile user numbers and new game
launches offer some bright rays of hopes.

What are the key learning points and takeaways?


Zynga serve as a good example of business built of innovative use of data
from the ground up , and heralded the arrival of big data as a force for
change in the gaming industry. Their culture of data based decision making
is admirable- something a lot of companies can learn from – and will
hopefully stand then in good stead for the challenges ahead. As Yamazaki
says “ social gaming continues to evolve from the way players play games
to what features are available on devices… Zynga has killer infrastructure
and insane data collection, holding records on billions of install since it’s
company launch. Big data has always been Zynga’s secrete sauce to launch
it ahead of the competition , and will be key to Zynga’s continued
leadership in the space”.

456
CASE – STUDIES IN BUSINESS ANALYTICS

15.6 CASE STUDY-5: UBER: HOW BIG DATA IS AT THE


CENTRE OF UBER’S TRANSPORTATION BUSINESS

Background
Uber is smartphone app-based taxi booking service which connects users
who need to get somewhere with drivers willing to give them a ride. The
service has been hugely popular . since being launched to serve San
Francisco in 2009, the service has been expanded to many major cities on
every continent except for Antarctica and the company are now valued at $
41 billion. The business are routed firmly in Big data , and leveraging this
data in more effective way than traditional taxi firms has played a huge
part in their success.

What problem is Big data helping to solve?


Uber’s entire business model is based on very big data principle of
crowdsourcing : anyone with car who is willing to help someone get to
where they want to go can offer to help get them there. This gives greater
choice for those who lives in areas where there is a little public transport,
and helps to cut the number pf cars on busy street by pooling journey.

How is the Big Data used in practice?


Uber store and monitor data on every journey their users take and use it to
determine demand, allocate resources and set fares. The company also
carry out in depth analysis of public transport networks in the cities they
serve, so that they can focus coverage in poorly served areas and provide
links to buses and trains.

Uber hold a huge data base of drivers in all cities they cover, so when
passenger asks for a ride , they can instantly match you with the most
suitable drivers. The company have developed algorithm to monitor traffic
conditions and journey time in the real time , meaning price can be
adjusted as a demand for rides changes, and traffic conditions means
journeys are likely to take longer. This encourages more drivers to get
behind the wheels when they are needed- and stay at home when demand
is low. The company have applied for patent for this method of big data
informed pricing which they call “surge pricing”. This is an implementation
of “dynamic pricing” similar to that used by hotel chains and airlines to
adjust price to meet demand although rather than simply increasing prices
at weekends or during public holidays it uses predictive modelling to
estimate demand in real time.

457
CASE – STUDIES IN BUSINESS ANALYTICS

Data also drives the company’s Uber pool service, which allows user to find
others near to them who according to Uber’s data , often makes similar
journeys at similar time so that they can share the ride. According to
Uber’s blog, introducing this service became a no brainer when their data
told them the “vast majority of have look -a -like -a trip that starts near ,
ends near and is happening around the same time as another trip”. Other
initiatives either trailed or due to launch in future includes Uber Chopper,
offering Helicopter rides to wealthy , Uber fresh grocery deliveries and
Uber rush , a package courier service.

Uber rely on detailed rating system – users can rate drivers , and vice
versa to build up trust and allow both parties to make informed decisions
about who they want to share a car with. Drivers in particular have to be
very conscious of keeping their standard high, as falling below a certain
threshold could result in their not being offered any more work. They have
another matric to worry about , too: their “acceptance rate”. This is the
number of jobs they accepts verses those they decline. Drivers apparently
have been told that they should aim to keep this above 80%, in order to
provide a consistently available service to passengers.

What were the results?


Data is at the very heart of everything Uber do, meaning this case is less
about short term results and more about long term development of data
driven business model. But it’s fair to say that without their clever use of
data the company would not have grown in to the phenomenon they are.

There is bigger picture benefit to all this data that goes way beyond
changing the way we book taxies or get ourselves to the office each day.
Uber CEO Travis Kalanick has claimed that the service will also cut the
number of private , owner operated automobiles on the roads of the
world’s most congested cities. For instance he hopes Uber Pool alone could
help cut traffic on the streets of London by a third. Service like Uber could
revolutionise the way we travel around our crowded cities. There are
certainly environmental as well as economic reasons why this could be
good thing.

458
CASE – STUDIES IN BUSINESS ANALYTICS

What data was used?


The company use a mixture of internal and external data. For example,
Uber calculate fares automatically using GPS, traffic data and the
company’s own algorithms, which make adjustments based on the time the
journey is likely to take. The company also analyse external data such as
public transport route to plan services.

What are the technical details?


It has prooven tricky to get any great details on Uber’s big data
infrastructure, but it appears all their data is collected in to Hadoop data
lake and they use Apache Spark and Hadoop to process the data.

Any challenges that had to be overcome?


The company’s algorithm based approach to surge pricing has occasionally
caused problems at busy times- a Forbes article noted how one five mile
journey on new year’s eve 2014 that would normally cost an average of
less than $ 20 ended up costing $ 122. This was because of the number of
divers on the road and the extra time taken to complete the journey.
Plenty of people would argue that’s simple economics: it’s normal to pay
more for the product or service in times of peak demand . but it hasn’t
stopped the company coming under fire for their pricing policy.

There have been other controversies-most notably regular taxi drivers


claiming it is destroying their livelihood, and concerns over lack of
regulation of the companies drivers. Ubers response to protests by taxi
drivers has been to attempt to co-opt them , by adding new category of
their fleet. Their Uber taxi service means you can be picked up by a
licensed taxi driver in registered private hire vehicle.

It is fair to say there are still some legal hurdles to overcome: the service
is currently banned in handful of jurisdictions., including Brussels and
parts of India , and is receiving intense scrutiny in many other parts of the
world. There have been several court cases in the us regarding the
companies compliance with regulatory procedures-some of which have
been dismissed and some are still ongoing. But given their popularity ,
there is a huge financial incentive for the company to press ahead with
plans to transform private travel.

459
CASE – STUDIES IN BUSINESS ANALYTICS

What are the key learning points and takeaways?


Uber demonstrate how even your business model can be based on Big Data
– with understanding results. And Uber are not alone in this realisation.
They have competitors offering similar services on smaller scale such as ,
Lyft, Sidecar and Haxi. Providing the regulation issues can be overcome ,
competition among these up stars is likely to be very fierce. The most
successful company is likely to be the one that manages the best use of
data available to them to improve the service they provide to customers.

15.7 CASE STUDY-6: AMAZON: HOW PREDICTIVE


ANALYTICS ARE USED TO GET A 360-DEGREE VIEW OF
CONSUMERS.

Background:
Amazon long ago outgrew their original business model of an on line
bookshop. They are now one of the world’s largest retailers of physical
goods , virtual goods such as e-books and streaming video and more
recently Web services.

Much of this has been built on top of their pioneering use of


“recommendation engine” technology – systems designed to predict what
we want, when we want it and of course offer us chance to give them
money for it.

With this ethos in mind, Amazon have also moved in to being the producer
of goods and services, rather than just retailer. As well as commissioning
the films and TV shows, they build and market electronics , including
tablets , TV boxes and streaming hardware.

Even more recently , they have moved to take on food supermarkets head-
on by offering fresh produce and far quicker delivery through their Amazon
Now services.

What problem is the big data helping to solve?


Information overload is a very real problem , and retailers have more to
loose from it than most of us. Online retailing relies on making as large
number of products or services available as possible to increase the
probability of making sales . Companies like Amazon and Walmart have
thrived by adopting an “everything under one roof” supermarket model.

460
CASE – STUDIES IN BUSINESS ANALYTICS

The problem here is that a customer can often feel overwhelmed when
presented with huge range of possible options. Psychologically, worries
about suffering from “buyers remorse”-wasting money by making ill-
informed purchasing decisions- can lead to our putting off spending money
until we are certain we have done sufficient research. The confusing
amount of options may even cause us to change or minds entirely about a
fact we need a $ 2000 ultra HD Television set and decide to go on vacation
instead.

It is the same problem that often plagues many projects involving large
amounts of information. Customers can become data rich with a great
many options but insight poor -with little idea about what would be the
best purchasing decision to meet their need and desire.

How big is the data used in practice?


Essentially, Amazon have used big data gathered from customers while
they brows the site to build and fine tune their recommendation engine.

Amazon probably did not invent the recommendation engine but they
introduced it to widespread public use. The theory is that the more they
know about you , the more likely they are to be able to predict what you
want to buy. Once they done that , they can streamline the process of
persuading you to buy it by cutting out the need for you to search through
their catalogue.

Amazon’s recommendation engine is based on collaborative filtering. This


means that it decide what it thinks you want by working out who you are,
then offering you items that people with similar profile to you have
purchased.

Unlike with content based filtering -as seen, for example , in Netflix’s
recommendation engine – this means the system does not actually have to
know anything about the unstructured data within the product it sells. All it
needs is Metadata : the name of the product , how much it cost, who else
has bought it and similar information.

Amazon gather the data on every one of their over a quarter of billion
customers while they use their services. As well as what you buy, they
monitor what you look at , your shipping address to determine
demographic data ( they can take a good stab at guessing your income

461
CASE – STUDIES IN BUSINESS ANALYTICS

level by knowing what neighbourhood you lives in) and whether you leave
the customer reviews and feedback.

They also look at the time of the day you are browsing , to determine your
habitual behaviour and match your data with others who follow similar
patterns.

If you use their steaming services , such as Amazons prime streaming


video or e-book rental , they can also tell how much of your tome you
devote to watching the movies or reading the books.

All of this data is used to build up a “360-degree view” of you as an


individual customer. Based on this , Amazon can find other people who
they think fit in to the same precisely fit in to the same precisely refined
consumer niche ( employed males between 18-45 , living in a rented house
with an income over $ 30,000 who enjoy foreign films , for example) and
make recommendations based on what they like.

In 2013, Amazon began selling this data advertisers, to allow them to


launch their own big data -driven marketing campaign. This put them in
competition with Google and Facebook , which also sell anonymized access
to user data to advertisers.

What were the results?


Amazon have grown to become largest on line retailer in US based on their
customer focussed approach to recommendation technology. Last year they
took in nearly $ 90 billion from worldwide sales.

Revenue for their cloud based web services business such as Amazon Wb
Services have grown by 81 percentage in the last year to $ 1.8 billion.

In addition, Amazons approach to big data -driven shopping and customer


services has made them a globally recognised brand.

What data was used?


Amazon collect data from users as they brows the site – monitoring
everything from the time they spend browsing each page to language used
in user reviews they leave. Additionally, they use external data sets such
as census information to establish our demographic details. If you use their
mobile apps on your GPS enabled smartphone or tablet , they can gather

462
CASE – STUDIES IN BUSINESS ANALYTICS

your location data and information about other apps and services you use
on your phone. Using Amazon’s streaming content services , such as
Amazon Prime and audible , provides them with more detailed information
on where, when and how you watch and listen to TV, film and audio.

What are the technical details?


Amazon’s core business is handled is handle in their central warehouse ,
which consists of Hewlett Packard servers running oracle on Linux to
handle their 187 million unique monthly website visitors, and over two
million third party amazon marketplace sellers.

Any challenge that had to be overcome?


In the early days , by far the biggest challenge for Amazon and all e-
retailers was getting the public to put their faith in taking part in online
commercial activity. These days , thanks to enhance security and
legislative pressure (and in spite of ever increasing incidences of data
theft), most of us are no more wary of giving our card details to an online
retailer than we are to a brick “n” mortar one. Amazon use Netscape
Secure Commerce server systems and SSL to store sensitive information in
an encrypted data base .

What are the key learning points and takeaways?


Diversity of consumer choice is a great thing , but too much choice and too
little guidance can confuse customers and put them off making purchasing
decisions.

Big Data recommendation engines simplify the task of predicting what a


customer wants , by profiling them and looking at the purchase history of
people who fit in to similar niches.

The more the business know about a customer , the better it can sell to
them. Developing a 360 degree view of each customer as an individual is a
foundation of big data -driven marketing and customer service.

Privacy and information security is an absolute priority. One large scale


data breach or theft can destroy consumer confidence in a business
overnight.

463
CASE – STUDIES IN BUSINESS ANALYTICS

15.8 SUMMARY

In the above 6 cases it has been tried to provide a comprehensive


overview of the current state of play in Big Data. However in such a fast
moving environment that is difficult. This field is developing so fast that it
is impossible to capture everything that is happening. However, giving 6
case study examples in this chapter it is tried to give good overview of
things happening currently. It hopefully shows that the big data is very real
and companies are using it everyday to improve what they do and how
they do it.

Next few years will see companies who ignore big data be overtaken by
those who don’t. Any organisation without a big data strategy and without
plans in place to start using Big Data to improve performance will be left
behind.

It is impossible to predict big data but one can see the term disappearing ,
because it will no longer be needed to emphasize a new phenomenon,
because over emphasizing the size of the data rather than the variety and
what we do with it.

Smart application of big data starts with your strategy in order to identify
the areas in which data can make the biggest difference to performance
and decision making. Only once one should clear about the strategic
questions the big data could help to answer but you start to collect and
analyse data to help to answer those questions and transform the
organisation. It is clearly mentioned in these 6 case studies showing how
these principles are applied well. However, in practice there are lot of
companies that get lost in the big data opportunities and end hoarding
data in mistaken believe it will, some day , become useful.

So it will be always better to start with right strategy and identify the big
challenges and areas in which the data will make the biggest difference.
Only then collect and analyse the data that will help you to meet those
challenges. Don’t fall in to the trap of collecting and analysing everything
you can.

In addition , there is huge innovations in related fields, such as the internet


of the things , machine learning, and artificial intelligence. These will affect

464
CASE – STUDIES IN BUSINESS ANALYTICS

the developments of big data and make phenomenon become even more
important.

If were to look in to a crystal ball , then you can see an increasing move to
real time analytics where large volume of data (structured and
unstructured ) are analysed in almost real time to inform decision making
and to feed machine -learning algorithms.

There is no doubt, that big data will give us many innovations and
improvements but it will also challenge us in the areas such as data
privacy and data protection. The ability to analyse everything in wrong
hands can cause unthinkable harm. It will be up to all to ensure the right
legal framework are in place to protect from the misuse of big data.

15.9 SELF ASSESSMENT QUESTIONS

1. How is the Big Data used in Practice by Walmart in the case study
given?

2. How big data analytics is used to optimise Athlete’s performance in case


of US Olympic women’s cycling team? Explain in brief.

3. Explain short technical details used in US immigration and customs in


the given case study?

4. What challenges that had to be overcome by Uber in the case study


explained?

5. In case of Amazon case study discussed in this chapter What was the
problem that big data helped to solve?

465
CASE – STUDIES IN BUSINESS ANALYTICS

15.10 MULTIPLE CHOICE QUESTIONS

1. In case of Walmart’s Shopycat service , which predicts how peoples


shopping habits are influenced by their friends and have developed their
own search engine, named Polaris , to allow them to ---------------------
entered by customers on their website
a. analyse search terms
b. consider the price
c. review
d. suggestions

2. in case of US women’s cycling team case study Working with the


women’s cycling team, analyst put together ---------------------to record
every aspect affecting the athlete’s performance, including diet , sleep
pattern, environment and training intensity.
a. sophisticated data-capture
b. monitoring techniques
c. a set of sophisticated data-capture and monitoring techniques
d. data on lifestyle

3. In terms of future developments , Zynga company are exploring real


time analytics and cloud analytics. Zynga have also started investing
more in --------------- and in addition to the technology mentioned
above , they now have Hadoop / MapReduce focussed on predictions
look alike social graph and clustering analytics.
a. Data capture tool
b. machine learning
c. data analysing
d. data spread sheet

4. In case study of Uber, It was proven tricky to get any great details on
Uber’s big data infrastructure, but it appears all their data is collected in
to Hadoop data lake and they use ------------to process the data.
a. Apache Spark
b. Hadoop
c. Meta data
d. Both Apache Spark and Hadoop

466
CASE – STUDIES IN BUSINESS ANALYTICS

5. Amazon’s core business is handled is handle in their central warehouse ,


which consists of --------------- to handle their 187 million unique
monthly website visitors, and over two million third party amazon
marketplace sellers.
a. Hewlett Packard servers running oracle on Linux
b. Microsoft server
c. Google server
d. Amazon’s own server

Answers: 1. (a), 2. (c), 3. (b),4. (d), 5. (a)

467
CASE – STUDIES IN BUSINESS ANALYTICS

REFERENCE MATERIAL
Click on the links below to view additional reference material for this
chapter

Summary

PPT

MCQ

Video Lecture - Part 1

Video Lecture - Part 2

468

You might also like