Download as pdf or txt
Download as pdf or txt
You are on page 1of 252

`

Business Analytics

SPCAM – MBA
SPEC Campus
Bakrol – Anand

Dr. Vishal Patidar


Director

1
`

2
`

3
`

Business Analytics

Module -I

Introduction:

The emergence of new technologies, applications, and social phenomena creates novel business
models. These new phenomena, which in turn, drive new ways of interacting and conducting
business. The growing emergence of ecommerce business and exceptional use of information
technologies in business generate curiosity. Social networks sites such as Facebook, LinkedIn and
Twitter combined with mobile devices, introduce such emerging technologies, which generate tools
for easy community building, collaboration, and knowledge creation, based on social networks. Such
emerging changes cause e-mail communication to be subsumed by social network communications,
as well as by text messages and tweets. The communities that are created can be based on professional
interest, business interest, and social factors.

Three terms in business literature are often related to one another: analytics, business analytics, and
business intelligence.

Analytics

Analytics can be defined as a process that involves the use of statistical techniques (measures of
central tendency, graphs, and so on), information system software (data mining, sorting routines),
and operations research methodologies (linear programming) to explore, visualize, discover and
communicate patterns or trends in data. Simply, analytics convert data into useful information.
Analytics is an older term commonly applied to all disciplines, not just business. A typical example
of the use of analytics is the weather measurements collected and converted into statistics, which in
turn predict weather patterns.

Business Analytics

Business analytics (BA) refers to the skills, technologies, practices for continuous iterative
exploration and investigation of past business performance to gain insight and drive business

4
`

planning. Business analytics focuses on developing new insights and understanding of business
performance based on data and statistical methods. In contrast, business intelligence traditionally
focuses on using a consistent set of metrics to both measure past performance and guide business
planning, which is also based on data and statistical methods.

Business analytics makes extensive use of statistical analysis, including explanatory and predictive
modelling, and fact-based management to drive decision making. It is therefore closely related to
management science.

Business Intelligence

Business intelligence (BI) can be defined as a set of processes and technologies that convert data into
meaningful and useful information for business purposes. While some believe that BI is a broad
subject that encompasses analytics, business analytics, and information systems (Bartlett, 2013, p.4),
others believe it is mainly focused on collecting, storing, and exploring large database organizations
for information useful to decision-making and planning (Negash, 2004). One function that is
generally accepted as a major component of BI involves storing an organization’s data in computer
cloud storage or in data warehouses. Data warehousing is not an analytics or business analytics
function, although the data can be used for analysis.
In application,
Business Intelligence is focused on querying and reporting, but it can include reported information
from a BA analysis. BI seeks to answer questions such as what is happening now and where, and also
what business actions are needed based on prior experience.
Business Analytics, on the other hand, can answer questions like why something is happening, what
new trends may exist, what will happen next, and what is the best course for the future.

Types of Analytics

There are many types of analytics, and there is a need to organize these types to understand their uses.
Broadly, we divide analytics into three categories
1. Descriptive analytics
2. Predictive analytics and
3. Prescriptive analytics

5
`

1. Descriptive analytics: gains insight from historical data with reporting, scorecards, clustering
etc. The application of simple statistical techniques that describes what is contained in a data set
or database. Example: An income bar chart is used to depict airlines companies that want to target
advertising to customers by Income. Or an age bar chart is used to depict retail shoppers for a
departmental store that wants to target advertising to customers by age.

2. Predictive analytics: employs predictive modelling using statistical and machine learning
techniques. An application of advanced statistical, information software, or operations research
methods to identify predictive variables and build predictive models to identify trends and
relationships not readily observed in a descriptive analysis. Example: Multiple regression is used
to show the relationship between age, income and profession on investment decisions.
3. Prescriptive analytics: recommends decisions using optimization, simulation, etc. An
application of decision science, management science, and operations research methodologies to
make best use of allocable resources. Example: A department store has a limited advertising
budget to target customers. Linear programming models can be used to optimally allocate the
budget to various advertising media.

Analytics Purposes and Tools

Types of Analytics Purpose Methodologies

Descriptive To identify possible trends in large data Descriptive statistics, including


sets or databases. The purpose is to get a measurement of central tendency
rough picture of what generally the data (mean, median, mode),
looks like and what criteria might have measurement of dispersion
potential for identifying trends or future (standard deviation) charts,
business behaviour. graphs, sorting methods,
frequency distributions, and
sampling methods

Predictive To build predictive models designed to Statistical methods like multiple


identify and predict future trends regression and ANOVA.
Information system methods like
data mining and sorting.

6
`

Operations research methods like


forecasting models.

Prescriptive To allocate resources optimally to take Operations research


advantage of predicted trends or future methodologies like linear
opportunities programming and decision
theory

Characteristics of Analytics, Business Analytics, and Business Intelligence


Characteristics Analytics Business Analytics Business Intelligence
(BA) (BI)
Business performance What is happening, What is happening now, What is happening
planning role and what will be what will be happening, now, and what have
happening? and what is the best we done in the past to
strategy to deal with it? deal with it?
Use of descriptive Yes Yes Yes
analytics as a major
component of analysis
Use of predictive Yes Yes No (only historically)
analytics as a major
components of analysis
Use of prescriptive Yes Yes No (Only historically)
analytics as a major
components of analysis
Use of all three in No Yes No
combination
Focus of storing and No No Yes
maintaining data
Required focus of No Yes No
improving business
value and performance

7
`

1.2 Business Analytics Process

The complete business analytic process involves the three major component steps applied
sequentially to a source of data. The outcome of the business analytic process must relate to business
and seek to improve business performance in some way.
The logic of the BA process is initially based on a question: What valuable or problem-solving
information is locked up in the sources of data that an organization has available?

Business Analytics Process

8
`

Business Intelligence

Definitions of Business Intelligence

Business Intelligence is the art of gaining a business advantage from data by answering fundamental
questions, such as how various customers rank, how business is doing now and if continued the
current path, what clinical trials should be continued and which should stop having money dumped
into!

Today with a strong Business Intelligence, companies can support decisions with more than just a
gut feeling. Creating a fact-based “decision” framework via a strong computer system provides
confidence in any decisions made.

Business intelligence is a relatively new term, coined in the early 1990's by Howard Dressner (Watson
& Wixom, 2007).

1. Business intelligence can be defined as "a broad collection of software platforms,


applications, and technologies that aim to help decision makers perform more effectively and
efficiently" (Arnott, Gibson, & Jagielska, 2004, p. 295)

2. “The processes, technologies and tools needed to turn data into information and information
into knowledge and knowledge into plans that drive profitable business action. BI
encompasses data warehousing, business analytics and knowledge management.” The Data
Warehouse Institute, Q4/2002

3. Business intelligence systems by definition are used to create knowledge to enable business
decision-making (Olszak & Ziemba, 2006)
4. Business Intelligence is defined as "knowledge gained about a business through the use of
various hardware/software technologies which enable organizations to turn data into
information”. Data Management Review
5. Business intelligence systems combine operational data with analytical tools to present
complex and competitive information to planners and decision makers, in order to improve
the timeliness and quality of the decision-making process (Negash, 2004).
6. Business Intelligence (BI) can be described as a value proposition that helps organizations in
their decision-making processes (Muntean, M., 2012).
7. Stackowiak et al. (2007) define Business intelligence as the process of taking large amounts
of data, analysing that data, and presenting a high-level set of reports that condense the
9
`

essence of that data into the basis of business actions, enabling management to make
fundamental daily business decisions
8. Zeng et al. (2006) define BI as “The process of collection, treatment and diffusion of
information that has an objective, the reduction of uncertainty in the making of all strategic
decisions.” Experts describe Business intelligence as a “business management term used to
describe applications and technologies which are used to gather, provide access to analyse
data and information about an enterprise, in order to help them make better informed business
decisions.”

What is Business Intelligence?

Business intelligence (BI) is a broad category of application programs and technologies for gathering,
storing, analysing, and providing access to data to help enterprise users make better business
decisions. BI applications support the activities of decision support, query and reporting, online
analytical processing (OLAP), statistical analysis, forecasting, and data mining. BI includes a set of
concepts and methods to improve business decision making by using fact-based support systems.

The concept of Business Intelligence (BI) is brought up by Gartner Group since 1996. It is defined
as the application of a set of methodologies and technologies, such as J2EE, DOTNET, Web Services,
XML, data warehouse, OLAP, Data Mining, representation technologies, etc, to improve enterprise
operation effectiveness, support management/decision to achieve competitive advantages. Business
Intelligence by today is never a new technology instead of an integrated solution for companies,
within which the business requirement is definitely the key factor that drives technology innovation.
How to identify and creatively address key business issues is therefore always the major challenge
of a BI application to achieve real business impact.

Business intelligence (BI) is about creating value for our organizations based on data or, more
precisely, facts. While it seems like another buzzword to describe what successful entrepreneurs have

10
`

been doing for years, if not centuries, that is, using business common sense? From a modern business-
value perspective, corporations use BI to enhance decision-making capabilities for managerial
processes (e.g., planning, budgeting, controlling, assessing, measuring, and monitoring) and to ensure
critical information is exploited in a timely manner and computer systems are the tools that help us
do that better, faster, and with more reliability.

History of Business Intelligence

Business intelligence is not just a modern idea. In his famous treatise The Art of War, Sun Tzu says,

“…what enables the wise commander to strike and conquer, and achieve things beyond the reach of
ordinary men, is foreknowledge. Now this foreknowledge cannot be elicited from spirits…” (Giles,
1994)

While Sun Tzu is not the father of business intelligence, his concept that foreknowledge breeds
success applies directly to BI.

Modern Business Intelligence uses computers to gain foreknowledge by processing and analysing
information in support of business decisions.

The term “business intelligence” has been around for decades, but it was first used as it is today by
Howard Dresner in 1988. Dresner defined business intelligence as the “concepts and methods to
improve business decision making by using fact-based support systems.” Today, business
intelligence is defined by Forrester as “a set of methodologies, processes, architectures, and
technologies that transform raw data into meaningful and useful information used to enable more
effective strategic, tactical, and operational insights and decision-making.”

Business intelligence allows managers to make informed and intelligent decisions regarding the
functioning of their organization. Informed decisions lead to better, more efficient processes in the
actual work environment, and help create a powerful competitive advantage.

In the first stages of business intelligence, IT teams ran reports and queries for the business side,
though today’s systems are focused more on enabling self-service intelligence for business users. As
with any technology, the offerings from vendors have evolved over time and continue to do so. As
core features like reporting and analytics are becoming commoditized, vendors are looking at other
features to differentiate themselves. Likewise, as the business environment changes, so do the
requirements organizations have for their business intelligence applications.

11
`

Business Intelligence Framework

COMPONENTS OF BI

12
`

There are four main components of Business Intelligence Systems

1. ETL tools,

2. Data warehouses,

3. OLAP techniques, and

4. Data-mining.

1. ETL Tools

ETL = Extract – Transform – Load

In computing, extract, transform and load (ETL) refers to a process in database usage and
especially in data warehousing that involves:
• Extracting data from outside sources
• Transforming it to fit operational needs (which can include quality levels)
• Loading it into the end target (database, more specifically, operational data store, data
mart or data warehouse). ETL is data integration software for building data warehouse.
Pull large volumes of data from different sources, in different formats, restructure
them and load into a warehouse
A variety of tools:
• major database vendors (IBM, Microsoft, Oracle)
• independent companies (Informatica – currently among market leaders)
• Open source (e.g. Clover ETL)

Extract
The first part of an ETL process involves extracting the data from the source systems. In
many cases this is the most challenging aspect of ETL, as extracting data correctly will set
the stage for how subsequent processes will go. ETL Architecture Pattern Most data
warehousing projects consolidate data from different source systems. Each separate system
may also use a different data organization/format. Common data source formats are relational
databases and flat files, but may include non-relational database structures such as
Information Management System (IMS) or other data structures such as Virtual Storage
13
`

Access Method (VSAM) or Indexed Sequential Access Method (ISAM), or even fetching
from outside sources such as through web spidering or screen-scraping. The streaming of the
extracted data source and load on-the-fly to the destination database is another way of
performing ETL when no intermediate data storage is required. In general, the goal of the
extraction phase is to convert the data into a single format which is appropriate for
transformation processing. An intrinsic part of the extraction involves the parsing of
extracted data, resulting in a check if the data meets an expected pattern or structure. If not,
the data may be rejected entirely or in part.

Transform
The transform stage applies to a series of rules or functions to the extracted data from the
source to derive the data for loading into the end target. Some data sources will require very
little or even no manipulation of data. In other cases, one or more of the following
transformation types may be required to meet the business and technical needs of the target
database:
• Selecting only certain columns to load (or selecting null columns not to load). For example, if the
source data has three columns (also called attributes), for example roll_no, age, and salary, then
the extraction may take only roll_no and salary. Similarly, the extraction mechanism may ignore
all those records where salary is not present (salary = null).
• Translating coded values (e.g., if the source system stores 1 for male and 2 for female, but the
warehouse stores M for male and F for female)
• Encoding free-form values (e.g., mapping "Male" to "1")
• Deriving a new calculated value (e.g., sale_amount = qty * unit_price)
• Sorting
• Joining data from multiple sources (e.g., lookup, merge) and deduplicating the data
• Aggregation (for example, rollup — summarizing multiple rows of data — total sales for each
store, and for each region, etc.)
• Generating surrogate-key values
• Transposing or pivoting (turning multiple columns into multiple rows or vice versa)
• Splitting a column into multiple columns (e.g., putting a comma-separated list specified as a string
in one column as individual values in different columns)
• Disaggregation of repeating columns into a separate detail table (e.g., moving a series of
addresses in one record into single addresses in a set of records in a linked address table)

14
`

• Lookup and validate the relevant data from tables or referential files for slowly changing
dimensions.
• Applying any form of simple or complex data validation. If validation fails, it may result in a full,
partial or no rejection of the data, and thus none, some or all the data is handed over to the next
step, depending on the rule design and exception handling. Many of the above transformations
may result in exceptions, for example, when a code translation parses an unknown code in the
extracted data.

Load
The load phase loads the data into the end target, usually the data warehouse (DW). Depending
on the requirements of the organization, this process varies widely. Some data warehouses may
overwrite existing information with cumulative information, frequently updating extract data is
done on daily, weekly or monthly basis. Other DW (or even other parts of the same DW) may
add new data in a historicized form, for example, hourly. To understand this, consider a DW that
is required to maintain sales records of the last year. Then, the DW will overwrite any data that
is older than a year with newer data. However, the entry of data for any one year window will be
made in a historicized manner. The timing and scope to replace or append are strategic design
choices dependent on the time available and the business needs. More complex systems can
maintain a history and audit trail of all changes to the data loaded in the DW.

As the load phase interacts with a database, the constraints defined in the database schema — as
well as in triggers activated upon data load — apply (for example, uniqueness, referential
integrity, mandatory fields), which also contribute to the overall data quality performance of the
ETL process.
• For example, a financial institution might have information on a customer in several
departments and each department might have that customer's information listed in a different
way. The membership department might list the customer by name, whereas the accounting
department might list the customer by number. ETL can bundle all this data and consolidate
it into a uniform presentation, such as for storing in a database or data warehouse.
• Another way that companies use ETL is to move information to another application
permanently. For instance, the new application might use another database vendor and most
likely a very different database schema. ETL can be used to transform the data into a format
suitable for the new application to use.

15
`

• An example of this would be an Expense and Cost Recovery System (ECRS) such as used by
accountancies, consultancies and lawyers. The data usually ends up in the time and billing
system, although some businesses may also utilize the raw data for employee productivity
reports to Human Resources (personnel dept.) or equipment usage reports to Facilities
Management.

2. Data warehouse

Data warehousing lets business leaders sift through subsets of data and examine interrelated
components that can help drive business. The multi-dimensional data warehouse is the core of
the business intelligence environment. Basically it is a large database containing all the data
needed for performance management. The modelling techniques used to build up this database
are crucial for the functioning of the BI solution.

Typical characteristics of the data warehouse are that:

• It contains time invariant data, in other words a report ran 2 months ago can be
reproduced today (if launched with the same parameters)
• It contains integrated data, where integrated means that the same business definitions
are use throughout the data warehouse
• It contains atomic data, and not aggregated data as often assumed, because users may
(and often do) need the lowest level of detail To get a good understanding of what a
multidimensional data warehouse is, it is important to understand the
multidimensional modelling techniques that assure the above characteristics.

A data warehouse is the main repository of the organization's historical data, its corporate
memory. For example, an organization would use the information that's stored in its data
warehouse to find out what day of the week they sold the most widgets in May 1992, or how
employee sick leave the week before the winter break differed between California and New
York from 2001-2005. In other words, the data warehouse contains the raw material for
management's decision support system.
The critical factor leading to the use of a data warehouse is that a data analyst can perform
complex queries and analysis (such as data mining) on the information without slowing down
the operational systems.

16
`

While operational systems are optimized for simplicity and speed of modification (online
transaction processing, or OLTP) through heavy use of database normalization and an entity-
relationship model, the data warehouse is optimized for reporting and analysis (on line
analytical processing, or OLAP). Frequently data in data warehouses is heavily renormalized,
summarized and/or stored in a dimension-based model but this is not always required to
achieve acceptable query response times.
More formally, Bill Inmon (one of the earliest and most influential practitioners) defined a
data warehouse as follows
• Subject-oriented, meaning that the data in the database is organized so that all the data
elements relating to the same real-world event or object are linked together;
• Time-variant, meaning that the changes to the data in the database are tracked and recorded
so that reports can be produced showing changes over time;
• Non-volatile, meaning that data in the database is never over-written or deleted, once
committed, the data is static, read-only, but retained for future reporting;
• Integrated, meaning that the database contains data from most or all of an organization's
operational applications, and that this data is made consistent.

3. OLAP (On-line analytical processing): It refers to the way in which business users can slice
and dice their way through data using sophisticated tools that allow for the navigation of
dimensions such as time or hierarchies. Online Analytical Processing or OLAP provides
multidimensional, summarized views of business data and is used for reporting, analysis,
modeling and planning for optimizing the business. OLAP techniques and tools can be used
to work with data warehouses or data marts designed for sophisticated enterprise intelligence
systems. These systems process queries required to discover trends and analyze critical
factors. Reporting software generates aggregated views of data to keep the management
informed about the state of their business. Other BI tools are used to store and analyze data,
such as data mining and data warehouses; decision support systems and forecasting;
document warehouses and document management; knowledge management; mapping,
information visualization and dash boarding; management information systems, geographic
information systems; Trend Analysis; Software as a Service (SaaS).

4. Data mining

17
`

Data mining is the automated process of discovering previously unknown useful patterns in
structured data. The data warehouse is therefore a perfect environment to conduct data mining
exercises on. To a certain extent online analytical processing, in which users slice and dice,
pivot, sort, filter data to see patterns is a form of human, visual data mining. However the
human eye can only see a limited amount of dimensions (mostly three) at the same time and
therefore cannot discover more complex relationships. Also discovering relationships
between different attributes of dimensions is a time consuming exercise. The field of
automated pattern detection has gained a lot of popularity in the last years. Successful
implementations of data mining applications are however clearly limited in functional scope
and data mining on large scope is not expected before 2010.

Business intelligence architecture

The architecture of a business intelligence system, depicted in Figure below.

Figure: typical business intelligence architecture

Need of the Business Intelligence

Business intelligence has come a long way ─ from assistance with report generation to self-service
platforms for discovery and analytical insight. As technological capabilities and business aptitude

18
`

with information continue to advance, the next generation of BI will be even more capable and
valuable to the enterprise.

With the amount of data stored by companies growing exponentially, it is no surprise that finding the
right data management solution continues to show up on the priority list of Chief Information
Officers (CIO). Data has to be secure and distributed efficiently for important up-to-date business
decisions.
Companies need to translate data into information to plan for future business strategies. For most
companies, valuable data is stored in massive spreadsheets or servers. Ideally, this data should
provide you with information on sales trends, consumer behaviour and resources allocation.
Company data can indicate the viability of your product and help in the planning of your future
growth. Hence data can help maximize revenues and reduce costs.
Business Intelligence (BI) is one of the fastest growing software sector and software vendors are
rapidly developing multiple BI tools to support the growing data analysis needs of organisations. In
order to be sustainable in a rapidly changing chaotic environment, organisations need to have access
to information about their operational performance. BI tools play a vital role in supporting the
decision makers at different organisational levels. As these tools are becoming critical in decision
making, it has become not only an information technology concern but also a management concern.

Business Intelligence enables organizations to make well informed business decisions and thus can
be the source of competitive advantages. This is especially true when firms are able to extrapolate
information from indicators in the external environment and make accurate forecasts about future
trends or economic conditions. Once business intelligence is gathered effectively and used
proactively then the firms can make decisions that benefit the firms.

The ultimate objective of business intelligence is to improve the timeliness and quality of
information. Timely and good quality information is like having a crystal ball that can give an
indication of what's the best course to take.

Business intelligence reveals:

• The position of the firm as in comparison to its competitors

• Changes in customer behaviour and spending patterns

• The capabilities of the firm

• Market conditions, future trends, demographic and economic information

• The social, regulatory, and political environment


19
`

Goals of Business Intelligence

Why do companies use business intelligence? The primary goal is stay ahead of the competition and
makes the right decision at the right time. Those decisions can be made around pretty much any
aspect of running a business, such as:

• Figuring out how to increase the effectiveness of marketing campaigns


• Deciding whether and when to enter new markets
• Improving products and services to better meet customers’ needs

One of the key aspects of business intelligence is that it’s designed to put information in the hands of
business users. Organizations are required to make decisions at an increasingly faster pace, so today’s
business intelligence tools help decision makers access the information they need without having to
first go through the IT department or specifically designated data scientists. Rather than request a
report and then wait for it to be created, the user can log into the business intelligence application and
view all the critical information presented in way that doesn’t take a specialist to understand.

Features of Business Intelligence

Business intelligence (BI) allows analysing data for patterns, insights – and competitive advantage.
At one time, dedicated BI business technology was mostly restricted to large companies who could
afford the expensive, complex software required.

Today, BI tools exist that are well within the reach of small and medium-sized businesses.

Different organizations will invest in a Business Intelligence (BI) solution for different reasons
depending on their specific circumstances and industry.

It’s like watching scary movies. Some of us put ourselves through the ordeal because we enjoy the
thrill. Others will just participate because it’s what everyone else is doing. And some go along looking
for an excuse to squeeze that special someone extra tight during those conveniently frequent moments
of terror.

Regardless of individual agenda, here are ten features you should insist upon in any BI solution, no
matter the circumstances of its application:
20
`

1. Ranking Reports

Ranking reports let you easily view the best- and worst-performing facets of your business, from
products to marketing campaigns to salespeople. You can view rankings across multiple dimensions
and specify various criteria to focus your results.

2. What-If Analysis

If you’re curious about how a future decision will affect your business, you can run a “what-if”
analysis using past data to predict the potential impacts. Tools for what-if analyses give you an
objective view of the risks and rewards involved in each potential decision, and allow you to plan
better for the future.

3. Executive Dashboards

Executive dashboards give your organization’s leaders a real-time overview of your business in the
form of graphs, charts, summaries and other information reports. They allow your company’s
executives to make smarter, faster and better decisions

4. Interactive Reports

Interactive reports allow users to condense the massive amounts of collected data into a wide variety
of possible views. Users can take advantages of features like statistical analysis and regression to
identify trends, anomalies and outliers in the data.

5. Geospatial Mapping

Applications using location intelligence can take your information and transform it into graphical and
cartographic representations, simplifying your geographical data. At a glance, judging which regions
are performing better than others — and which ones need particular attention — becomes much
easier.

6. Operational Reports

At the end of each day, your BI platform can provide your organization’s executives with a detailed
summary of the daily events, giving them the information they need to make critical decisions.

21
`

7. Pivot Tables

Pivot tables can automatically extract significant features from a large, messy set of data. They can
perform calculations such as sorting, counting or averaging the data stored in one table, and show the
summarized results in another table. Pivot tables are essential tools for analysing information and
uncovering hidden trends.

8. Ad-Hoc Reports

Instead of burdening your IT department with requests for detailed reports, ad-hoc reports are a BI
feature that let your non-technical end-users generate their own reports on the fly. Users can pick and
choose the elements that they wish to be included in the report, emphasizing only those aspects that
are relevant to their query.

9. User-Specific Security

If you need to restrict certain users’ access to particular data sets, your BI platform should allow you
to personalize your features and applications to individuals or groups of users. Some solutions
provide user-specific data sources, where a single application pulls from different sources of data
depending on who’s using the application.

10. Open Integration

Smart BI platforms will be able to access not only your organization’s own data, but information
from email, social media, websites and more. For example, instead of only providing your internal
sales data, your BI platform could accompany that information with reviews and comments about
your products.

With so many data formats and so many applications to pull from, it’s important that your BI
platform is able to integrate as many different types of data as possible under a single roof, seamlessly
combining disparate forms of information into an actionable report.

Use of Business Intelligence

BI analysis provides corporate decision makers with actionable information to pursue both
operational and strategic goals. As such, BI technologies are designed to handle and interpret big

22
`

data, whether structured or unstructured, in order to identify and develop new business
opportunities. Likewise, BI can go over and analyse financial records and operational statistics to
identify areas that need improvement. Moreover, BI has a most useful goal management function
that allows managers to program data according to goals set on a day-to-day basis. These goals
could be financial objectives, sales targets, marketing aims or productivity measures.
Additionally, BI can integrate some forms of advanced analytics (e.g. data mining, predictive
analytics) but mostly these are managed by separate teams of data scientists, predictive modelers
and statisticians. On the other hand, BI programs are handled by IT teams for typical data collection,
analysis and query as well as creation of reports. Lately, the availability of easy-to-use business
intelligence platforms allowed end users like corporate executives, business professionals and
employees to use BI programs directly for themselves.
The question “what is the use of BI” can best be answered by its many functionalities and
capabilities that include:

1. Analytics – the cornerstone of BI is the discovery, interpretation and communication of


relevant and useful patterns in data. Analytics may rely on descriptive, prescriptive and
predictive approaches as well as statistics, computer programming and other methods to
come up with meaningful data that can describe, predict, quantify and improve business
performance.
2. Reporting – one of the hallmarks of BI programs is the rich visualization of the resulting
insights and reports after data has undergone analysis. These reports can be charts, graphs,
presentations and info-graphics, all designed to provide the user essential information that
can be acted upon or decided on.
3. Complex event processing – this combines data from various sources to identify
significant events or things that happen across the organization. Whether they are threats
or opportunities, CEP gives businesses the chance to respond accordingly.
4. Data mining – BI supports data mining techniques in order to transform raw data into
useful information. Oftentimes, BI programs source data from data warehouses which
already store consolidated information. BI comes into focus on analysing the data.
5. Process mining – special algorithms are applied on event logs recorded by an
information system to identify trends and details in order to improve process efficiency.
6. Text mining – involves extracting pertinent patterns and deriving high quality
information from text sources which are then analysed by a BI program.

23
`

7. Benchmarking – BI applies this method to measure business performance through a


specific indicator (time, quality or cost) the result of which is a performance metrics that
is compared or benchmarked with the best practices of other companies or the level of
standards in a certain industry.
8. Business performance management – BPM activities especially in large businesses
and organizations involve large data that require collation, analysis and reporting, areas
that BI programs are built to handle.

Business Intelligence Tools

▪ Dashboards and business activity monitoring

▪ Dashboards: Shows key business performance indicators in a single integrated view

▪ Portals: Integrate data using web browser from multiple sources into a single webpage

▪ Data analysis and reporting tools

▪ Data-mining tools

▪ Data warehouses (DW)

▪ OLAP tools and data visualization

Key functionalities of BI tools

Categories Key functionalities of BI tools

Data ➢ Integrated of data from both in-house and external sources


consolidation ➢ Simplified extraction of data, transformation and loading through graphical
interfaces.

➢ Elimination of unwanted and unrelated data

Data quality ➢ Sanitise and prepare data to improve accuracy of decisions

Reporting ➢ User defined as well as standard reports can be generated to serve employees
at different levels.

➢ Personalized reports to cater to different individuals and functional units

24
`

Forecasting ➢ Support in creating forecasting and making comparison between historical data
and modelling and real-time data

Tracking of ➢ Monitor current progress with defined objectives through KPIs or expected
real-time data outcomes

➢ Prioritize scarce resources

Data ➢ Interactive reports with visualizations to understand relationships easily.


visualisation ➢ Scorecards to improve communication

Data analysis ➢ What-if analysis

➢ Sensitivity analysis

➢ Goal seeking analysis

➢ Market basket analysis

Mobility ➢ Portable applications can be installed on mobile devices such as mobile phones
and tablet computers to support executives and sales staff while travelling.

Rapid insight ➢ Drill down features allow users to dig deeper into data

➢ Through dashboards it is possible to identify and correct negative trends,


monitor the impact of newly made decisions, measure and improve overall
business performance

Report delivery ➢ Deliver reports to view in most commonly used office applications such as
and Microsoft Office (Word, Excel and so forth)
shareability ➢ Email reports in different formats

Ready-to-use ➢ Pre-built metadata with mappings defined considering performance and


applications security needs

➢ Pre-built reports and alerts to support management in real-time

Language ➢ Multiple language support


support

Applications in an Enterprise

25
`

Business intelligence can be applied to the following business purposes, in order to drive business
value.
1. Measurement – program that creates a hierarchy of performance metrics and benchmarking
that informs business leaders about progress towards business goals (business process
management).
2. Analytics – program that builds quantitative processes for a business to arrive at optimal
decisions and to perform business knowledge discovery. Frequently involves: data mining,
process mining, statistical analysis, predictive analytics, predictive modelling, business
process modelling, data lineage, complex event processing and prescriptive analytics.
3. Reporting/enterprise reporting – program that builds infrastructure for strategic reporting
to serve the strategic management of a business, not operational reporting. Frequently
involves data visualization, executive information system and OLAP.
4. Collaboration/collaboration platform – program that gets different areas (both inside and
outside the business) to work together through data sharing and electronic data interchange.
5. Knowledge management – program to make the company data-driven through strategies and
practices to identify, create, represent, distribute, and enable adoption of insights and
experiences that are true business knowledge. Knowledge management leads to learning
management and regulatory compliance.
Business Intelligence vs. Advanced Analytics

BUSINESS INTELLIGENCE ADVANCED ANALYTICS

Answer the ▪ What happened? ▪ Why did it happen?


questions: ▪ When? ▪ Will it happen again?

▪ Who? ▪ What will happen if we


change x?
▪ How many?
▪ What else does the data tell
us that we never thought to
ask?

Inclusions:
▪ Reporting (KPIs, metrics)

26
`

▪ Automated monitoring and alerting ▪ Statistical or quantitative


(thresholds) analysis

▪ Dashboards ▪ Data miming

▪ Scorecards ▪ Predictive modelling

▪ OLAP (cubes, slice and dice, ▪ Multivariate testing


drilling) ▪ Big data analytics
▪ Adhoc query ▪ Text analytics
▪ Operational and real-time BI

Key Success Factors for Business Intelligence Implementation

Following are the key factors for successful implementation of Business Intelligence

▪ Business-driven methodology and project management

▪ Clear vision and planning

▪ Committed management support & sponsorship

▪ Data management and quality

▪ Mapping solutions to user requirements

▪ Performance considerations of the Business Intelligence system

▪ Robust and expandable framework

Figure: The Business Intelligence Cycle [Source: Thomas Jr. (2001)]

27
`

Key Business Intelligence applications include:

a) Data Mining and Advanced Analysis


b) Visual and OLAP analysis
c) Enterprise Reporting
d) Dashboards and Scorecards
e) Mobile Apps and Alerts

Users of Business Intelligence

Following given is the four key players who are used Business Intelligence System:

1. The Professional Data Analyst:

The data analyst is a statistician who always needs to drill deep down into data. BI system helps them
to get fresh insights to develop unique business strategies.

2. The IT users:

The IT user also plays a dominant role in maintaining the BI infrastructure.

3. The head of the company:

CEO can increase the profit of their business by improving operational efficiency in their business.

4. The Business Users"

Business intelligence users can be found from across the organization. There are mainly two types of
business users

1. Casual business intelligence user


2. The power user.

The difference between both of them is that a power user has the capability of working with complex
data sets, while the casual user need will make him use dashboards to evaluate predefined sets of
data.
28
`

Business Intelligence for big data

BI platforms are increasingly being used as front-end interfaces for big data systems. Modern BI
software typically offers flexible back ends, enabling them to connect to a range of data sources. This,
along with simple user interfaces, makes the tools a good fit for big data architectures. Users can
connect to a range of data sources; including Hadoop systems, NoSQL databases, cloud platforms
and more conventional data warehouses, and can develop a unified view of their diverse data.

Because the tools are typically fairly simple, using BI as a big data front end enables a broad number
of potential users to get involved rather than the typical approach of highly specialized data architects
being the only ones with visibility into data.

There has been growing corporate interest in business intelligence (BI) as a path to reduced costs,
improved service quality, and better decision-making processes. However, while BI has existed for
years, it has difficulties reaching what specialists in the field consider its full potential.

Most of the work done throughout this period was focused on technologies, standards, processes and
tools to support the collection, storage rationalization and retrieval of data and the creation of reports.
Data warehouses, data marts, data dictionaries, and extract, transform, load (ETL) processes became
ubiquitous. Think about this stage as the beginnings of transforming data into information and the
use of information to help drive (primarily operational) decision making.

Advantages of Business Intelligence

With the help of business intelligence solutions, organizations can implement corrections and take
necessary measures to improve efficiency in various areas of their operations. An organization may
also identify new business opportunities and expand accordingly to accommodate its best interests.
Business intelligence software tools are highly dependent on rapidly evolving technologies like big
data, predictive analytics and data mining. Many technology consultants provide specialized business
intelligence tools, technology consulting, and implementation support with extensive industry
expertise to help organizations assess their business intelligence needs. Below are highlights key
advantages for using business intelligence tools:
29
`

▪ Faster Decision Making

Key executives are involved in making decisions that guide business direction and strategy. In the
absence of business intelligence solutions, this decision-making process often involves making a
considerable amount of presumptions. Without the availability of detailed reports and analysis,
executives may have to make decisions based on limited information like sales figures and market
demand. Business intelligence eliminates this guesswork and presents new information like real-time
production stats and customer feedback for various product lines that is backed by hard data. Some
predictive BI techniques also allow for "what if" analysis to see how a decision would affect the
company in the future. All this information provides key insights and a wider perspective, which
enables faster decision making at the right time.

▪ Real-time Performance Measurement

Business intelligence tools continuously monitor large amounts of data generated by an organization
and carefully analyse it for several performance metrics such as efficiency, sales figures and
marketing costs related to the business - in real time. This helps keep top management informed about
the status and performance of various critical components within the organization and the
collaboration between business units. It also enables business executives to detect market
opportunities and take advantage of them.

▪ Improved Reporting Speed

BI users can access large amounts of unprocessed data in the form of organized and readable reports
that present information in an interactive manner within a short amount of time. This eliminates the
need to sift through loads of data and printing a pile of various reports.

▪ Greater Insights into Customer Behaviour

BI can analyse sales figures and customer feedback to represent facts that tell a business a great deal
about their customer's preferences and needs. Using IT products as an example, logged customer
information can be sent back to a company's servers to be analysed to get an idea of how the customers
are responding to the design of a particular software product. Products such as Google Chrome,
Microsoft Windows and others are continuously monitored and updated to keep up with the demands
of customers. Analysing this information can also help a company detect what the customer is buying

30
`

and what his/her needs are enabling decisions that allow the company to retain or grow their customer
base.

▪ Identify New Business Opportunities

If a business has a numerous product, BI can help detect customer touch points where a customer
buys multiple products produced by the same company on an individual basis. Such touch points can
provide a company with new business opportunities to sell a group of products together as a single
integrated package to retain and grow a particular customer base. Thus, by using business
intelligence, opportunities, which were previously undetected, can be put used to maximize profits.

▪ Boost productivity

With a BI program, It is possible for businesses to create reports with a single click thus saves lots of
time and resources. It also allows employees to be more productive on their tasks.

▪ To improve visibility

BI also helps to improve the visibility of these processes and make it possible to identify any areas
which need attention.

▪ Fix Accountability

BI system assigns accountability in the organization as there must be someone who should own
accountability and ownership for the organization's performance against its set goals.

Disadvantages of Business Intelligence

▪ Costly

Business intelligence can prove costly for small as well as for medium-sized enterprises. The use of
such type of system may be expensive for routine business transactions.

▪ Complexity

31
`

Another drawback of Business Intelligence is its complexity in implementation of data warehouse. It


can be so complex that it can make business techniques rigid to deal with.

▪ Limited use
Like all improved technologies, Business Intelligence was first established keeping in consideration
the buying competence of rich firms. Therefore, BI system is yet not affordable for many small and
medium size companies.

▪ Time Consuming Implementation


It takes almost one and half year for data warehousing system to be completely implemented.
Therefore, it is a time-consuming process.

Trends in Business Intelligence

The following are some business intelligence and analytics trends that you should be aware of.
▪ Artificial Intelligence: Gartner' report indicates that Artificial Intelligence and machine
learning now take on complex tasks done by human intelligence. This capability is being
leveraged to come up with real-time data analysis and dashboard reporting.

▪ Collaborative Business Intelligence: Business Intelligence software combined with


collaboration tools, including social media, and other latest technologies enhance the working
and sharing by teams for collaborative decision making.

▪ Embedded Business Intelligence: Embedded Business Intelligence allows the integration of


Business Intelligence software or some of its features into another business application for
enhancing and extending it's reporting functionality.

▪ Cloud Analytics: Business Intelligence applications will be soon offered in the cloud, and
more businesses will be shifting to this technology. As per their predictions within a couple
of years, the spending on cloud-based analytics will grow 4.5 times faster.

32
`

Business Intelligence Applications in an enterprise

Business Intelligence can be applied to the following business purposes, in order to drive business
value.

1. Measurement – program that creates a hierarchy of Performance metrics (see also Metrics
Reference Model) and Benchmarking that informs business leaders about progress towards business
goals (AKA Business process management).

2. Analytics – program that builds quantitative processes for a business to arrive at optimal decisions
and to perform Business Knowledge Discovery. Frequently involves: data mining, statistical analysis,
Predictive analytics, Predictive modelling, Business process modelling
3. Reporting/Enterprise Reporting – program that builds infrastructure for Strategic Reporting to
serve the Strategic management of a business, NOT Operational Reporting. Frequently involves:
Data visualization, Executive information system, OLAP

4. Collaboration/Collaboration platform – program that gets different areas (both inside and
outside the business) to work together through Data sharing and Electronic Data Interchange.

5. Knowledge Management – program to make the company data driven through strategies and
practices to identify, create, represent, distribute, and enable adoption of insights and experiences that
are true business knowledge. Knowledge Management leads to Learning Management and
Regulatory compliance/Compliance

Business Analytics

Business analytics refers to the generation and use of knowledge and intelligence to apply data-based
decision making to support an organization’s strategic and tactical business objectives (Goes, 2014;
Stubbs, 2011).
Business analytics includes “decision management, content analytics, planning and forecasting,
discovery and exploration, business intelligence, predictive analytics, data and content management,
stream computing, data warehousing, information integration and governance” (IBM, 2013, p. 4).

33
`

Business analytics aims to generate knowledge, understanding and learning – collectively referred to
as ‘insight’ – to support evidence-based decision making and performance management.

Business analytics refers to the skills, technologies, applications and practices for continuous iterative
exploration and investigation of past business performance to provide actionable insights.

Business analytics focuses on developing new insights and understanding of business performance
based on data and statistical methods.

Gartner IT Glossary define business analytics as

“Business analytics is comprised of solutions used to build analysis models and simulations to create
scenarios, understand realities and predict future states.”

Decision making in businesses today is moving to the point where accepted practice is about first
understanding the numbers and what they are revealing, and then using this insight to drive intelligent
business decisions. This replaces the approach where people take the action that feels right and then
examine the numbers afterwards to see if it worked. Insight, therefore, should drive decision making.
But insight also has a broader role to play in the landscape of organisations

The types of questions that can be addressed by analytics initiatives

1. WHAT happened (descriptive)?

This question seeks information describing a situation, event or the status of an asset or product (such
as location or temperature) to set out what has happened. For a consultancy firm, for example, this
might involve reporting client revenue for the last quarter. ‘What’ questions are usually answered in
canned (or pre-defined) reports?

2. WHY did it happen (diagnostic)?

This aims to enable understanding of the reasons why an observed event actually took place. It might
necessitate undertaking some root-cause analysis or using data to test a hypothesis. For example, if a

34
`

consultancy firm is experiencing reduced billings with a particular client, it is about understanding
the reasons in order to work out how to reverse the decline (i.e., to make a decision).

3. WHEN might it happen (predictive)?

The task here is to understand how to predict when a future event is likely to happen. This will
generally require building a model. First, the component parts will need to be identified, before
determining from historical data how they all fit together. Historical data can then be used to see if
the model is a good predictor of outcomes that have already been observed. For example, Jaguar
Land Rover’s has collected petabytes (one thousand million bytes) of telemetry data on the
performance of its Ingenium engines. It can now examine this data to predict the likelihood of certain
components failing and schedule maintenance accordingly. On a smaller scale, a retailer might seek
to predict the additional sales generated by particular types of promotions.

4. HOW I can make it happen (prescriptive)?

The main challenge in predicting events is often in creating the mechanism through which people or
events might be influenced. This is usually achieved through experimentation. For example, online
retailers might do A/B testing (comparing the performance of two versions of a web page) to
determine which design is most likely to convert visits to real sales. A mobile phone operator might
wish to nudge customers towards using channels that cost less to service. Or a tax authority might
want to find out whether a particular form of words in a tax demand more effectively influences
taxpayers to pay their outstanding liabilities on time.

Analytic Purposes and Tools

Types of Analytics Purpose Examples of Methodologies


Descriptive To identify trends in large data Descriptive statistics, including
sets or databases. The purpose is measuring of central tendency
to get a rough picture of what (mean, median, mode), measures
generally the data looks like and of dispersion (standard deviation),
what criteria might have potential charts, graphs, sorting methods,
for identifying trends or future frequency distributions, probability
business behaviour. distribution, and sampling methods
35
`

Predictive To build predictive models Statistical statistics, including


designed to identify and multiple regression and ANOVA.
predict future trends. Information system methods like
data mining and shorting. Operations
research methods like forecasting
models.

Prescriptive To allocate resources optimally Operations research methodologies


to take advantage of predicted like linear programming and
trends or future opportunities decision theory.

36
`

Characteristics of Analytics, Business Analytics, and Business Intelligence


Characteristics Analytics Business Analytics Business Intelligence
(BA) (BI)

Business Performance What is happening What is happening What is happening


planning role and what will be now, what will be now, and what have we
happening? happening, and what done in the past to deal
is the best strategy to with it?
deal with it?
Use of descriptive
analytics as a major Yes Yes Yes
component of analysis

Use of predictive
analytics as a major Yes Yes No (only historically)
as a major component
of analysis

Use of prescriptive
analytics as a major Yes Yes No (only historically)
component of analysis

Use of all three in No Yes No


Combination

Business focus Maybe Yes Yes

Focus on storing and


maintaining data No No Yes

Required focus of
improving business No Yes No
value and performance

37
`

Business Analytics Process

Comparison of Business Analytics and Organization Decision-making Process

Business Intelligence vs. Business Analytics

BASIS FOR
Business Intelligence Business Analytics
COMPARISON

Analyses past and present to drive Analyses past data to drive current
Definition
current business needs business

To run current business To change business operations and


Usage
operations improve productivity

Ease of
For current business operations For future business operations
Operations

38
`

Improves current strategy with Improves strategy for moving


Strategy
knowledge of past results. forward with predictive analysis.

Identify current problems and Identify potential issues and


Focus
determine how to resolve them. determine how to avoid them.

SAP Business Objects, Word processing, Google docs, MS


Tools
QlikSense, TIBCO, PowerBI etc., Visio, MS Office Tools etc.,

Apply to all large-scale


Applies to companies where future
Applications companies to run current business
growth and productivity as its goal
operations

Contains Data warehouse,


Field Comes under Business Analytics
information management etc.,

Future of Business Analytics

Business analytics is one of the fastest growing areas in enterprise. As technology becomes more
interconnected, data must be able to move freely as it is sent and received throughout your
organization. This collection of data from inside and outside your company is commonly referred to
as big data. Big data provides a source for discovery and analysis.

Big data continues to impact the way data is consumed and reported. By thoroughly understanding
big data in your company, you can improve day-to-day operations. Plus, the insights gained from
analysing big data pave the way for more strategic planning in the future.

Online Transaction Processing (OLTP)

OLTP applications are characterized by many users creating, updating, or retrieving individual
records. Therefore, OLTP databases are optimized for transaction updating. The main emphasis for
OLTP systems is put on very fast query processing, maintaining data integrity in multi-access
environments and an effectiveness measured by number of transactions per second. In OLTP database
there is detailed and current data, and schema used to store transactional databases is the entity model.

39
`

Typically, OLTP systems are used for order entry, financial transactions, customer relationship
management (CRM) and retail sales
Examples of OLTP transactions include:

• Online banking

• Purchasing a book online

• Booking an airline ticket

• Sending a text message

• Order entry

• Telemarketers entering telephone survey results

• Call centre staff viewing and updating customers’ details

Characteristics of OLTP

OLTP transactions are usually very specific in the task that they perform, and they usually involve a
single record or a small selection of records.

For example, an online banking customer might send money from his account to his wife’s account.
In this case, the transaction only involves two accounts – his account and his wife’s. It does not
involve the other bank customers.

BENEFITS:
Online Transaction Processing has two key benefits: simplicity and efficiency. Reduced paper trails
and the faster, more accurate forecasts for revenues and expenses are both examples of how OLTP
makes things simpler for businesses.

What is OLAP?

OLAP (online analytical processing) is computer processing that enables a user to easily and
selectively extract and view data from different points of view. OLAP allows users to analyze
database information from multiple database systems at one time. OLAP data is stored in
multidimensional databases. It helps business users with multidimensional data from Data
40
`

Warehouse or data marts, without concerns regarding how or where the data are stored. So an OLAP
Server is a high capacity, multi user data manipulation engine specifically designed to support and
operate on multi-dimensional data structure. The key feature is "Multidimensional", the ability to
analyse metrics in different dimensions such as time, geography, gender, product, etc.

Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows
managers, and analysts to get an insight of the information through fast, consistent, and interactive
access to information.

Types of OLAP Servers

We have four types of OLAP Servers

1. ROLAP (Relational OLAP)


2. MOLAP (Multidimensional OLAP)
3. Hybrid OLAP (HOLAP)
4. Specialized SQL Servers

1. ROLAP (Relational OLAP)

ROLAP servers are placed between relational back-end server and client front-end tools. To store
and manage warehouse data, ROLAP uses relational or extended-relational DBMS. The source data
are entered into a relational database, generally in a star or snowflake schema, which aids in fast
retrieval times. The server provides a multidimensional model of the data, via optimized SQL queries.
There are a number of reasons to choose a relational database for storage as opposed to a
multidimensional database. RDBs are a well-established technology that has had plenty of
opportunities for optimization. Real world use has led to a more robust product. Additionally, RDBs
support larger amounts of data than MDDBs do. They are designed for large amounts of data.
ROLAP includes the following:
• Implementation of aggregation navigation logic.
• Optimization for each DBMS back end.
• Additional tools and services.

41
`

Pros and Cons of Rational Database


Pros Cons
• Ideal for large amounts of data • SQL is not optimal for complex queries
• Proven, Optimized technology • Determining an optimal data- storage
scheme is more important and difficult.

2. Multidimensional OLAP (MOLAP)

MOLAP stands for Multidimensional On-Line Analytical Processing.. MOLAP uses array-based
multidimensional storage engines for multidimensional views of data. With multidimensional data
stores, the storage utilization may be low if the data set is sparse. Therefore, many MOLAP server
use two levels of data storage representation to handle dense and sparse data sets.
The purpose for using an MDDB is fairly straightforward. It can efficiently store data that are by
nature multidimensional, providing a means of fast querying of the database. Data are transferred
from a data source (as described above) into the multidimensional database, and then the database is
aggregated. This pre-calculation is what allows OLAP queries to be faster, since the calculation of
summary data is already done. The query time becomes a function solely of the time required to
access one piece of data, as opposed to the time to access many pieces of data and performing the
calculation. The approach also supports the philosophy of doing the work once, and using the results
over and over.

Pros and Cons of Multidimensional Databases (MDDBs)


Pros Cons
• Accurately models business data • Generally does not handle VLDB’s
gracefully
• Fast access times with no SQL • New technology that is not yet
optimized
• Pre- calculated summary data • Risk of database explosion

42
`

3. Hybrid OLAP (HOLAP)

Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP
and faster computation of MOLAP. HOLAP servers allow storing the large data volumes of detailed
information. The aggregations are stored separately in MOLAP store.

4. Specialized SQL Servers

Specialized SQL servers provide advanced query language and query processing support for SQL
queries over star and snowflake schemas in a ready-only environment.

ROLAP v/s MOLAP

Characteristics ROLAP MOLAP


SCHEMA Use Star Schema Use Data Cubes
• Additional dimensions • Addition dimensions require
can be added dynamically. recreation of data cube.
Database Size Medium to large Small to Medium
Architecture Client/Server Client/Server
Access Support ad-hoc requests Limited to pre-defined dimensions
Resources High Very High
Flexibility High Low
Scalability High Low
Speed • Good with small data sets. • Faster for small to medium
• Average for medium to data sets.
large data set. • Average for large data sets.

OLTP vs. OLAP

We can divide IT systems into transactional (OLTP) and analytical (OLAP).


In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP
systems help to analyse it.

43
`

Difference between OLAP & OLTP

OLTP OLAP
Operational Database Data Warehouse
1. Involving day-to-day processing 1. Involving historical processing of information
2. Static data 2. Dynamic data
2. Short database transactions 2. Long database transactions
3. Normalization is promoted 3. Denormalization is promoted
4. High volume transactions 4. Low volume transactions
5. Transaction recovery is necessary 5. Transaction recovery is not necessary
6. OLTP systems are used by clerks, database 6. OLAP systems are used by knowledge
administrators or database professionals workers such as executives, managers and
analysts.
7. Useful in running the business 7. Useful in analysing the business
8. It focuses on Data in 8. It focuses on information out
9. Provide primitive and highly detailed data. 9. Provides summarized and consolidated data
10. Number of users is in thousands 10. Number of users is in hundreds

44
`

Major differences between OLTP and OLAP system design


OLTP System OLAP System
Online Transaction Processing Online Analytical Processing
(Operational System) (Data Warehouse)
Source of Data Operational data; OLTPs are the Consolidation data; OLAP data comes
original source of the data from the various OLTP Databases
Purpose of Data To control and run fundamental To help with planning, problem solving,
business tasks and decision support
What the Data Reveals a snapshot of ongoing Multi-dimensional views of various
business process kinds of business activities
Inserts and Short and fast inserts and updates Periodic long-running batch jobs refresh
Updates initiated by end users the data
Queries Relatively standardized and simple Often complex queries involving
queries aggregations
Processing Typically very fast Depends on the amount of data
Speed involved; batch data refreshes and
complex queries may take many hours;
query speed can be improved by creating
indexes
Space Can be relatively small if historical Larger due to the existence of
Requirements data is archived aggregated structures and history data;
requires more indexes then OLTP
Database Highly normalized with many tables Typically de-normalized with fewer
Design tables use star and / or snowflake
schemas
Backup and Backup religiously; operational data Instead of regular backups, some
Recovery is critical to run the business, data environments may consider simply
loss is likely to entail significant reloading the OLTP data as a recovery
monetary loss and legal liability method.

45
`

OLAP Operations:

OLAP provided a user-friendly environment for interactive data analysis. One of the most popular
front-end applications for OLAP is a PC spreadsheet program. Allows these perspectives and several
levels of detail to be materialized by exploiting dimensions and their hierarchies. It Provide an
interactive data analysis environment.
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP operations
in multidimensional data.
Here is the list of OLAP operations:
• Roll-up
• Drill-down
• Slice and dice
• Pivot rotate

• Roll-up

The roll-up operation performs aggregation on a data cube either by climbing up the hierarchy or by
dimension reduction.
Consider an example:
Location Medal
Delhi 3
New York 5
Mumbai 5
Washington D.C. 2

Delhi, New York, Mumbai and Washington D.C. win 3, 5, 5 and 2 medals respectively. So in this
example, we roll upon location from cities to countries.

Location Medal

India 8
America 7

Roll-up
46
`

More detailed data to less detailed data

• Drill-down

Drill-down is the reversal operation of roll-up. It is performance by either of the following ways:
• By steeping down a concept hierarchy for a dimension
• By introducing a new dimension

Consider an example.
Location Medal

India 8
America 7

Location Medal
Delhi 3
New York 5
Mumbai 5
Washington D.C. 2
Drill-down
Less detailed data to more detailed data

• Slice

The slice operation selects one particular dimension from a given cube and provides a new sub-cube.
Consider an example.
If we want to make a selection where Medal = 5
Location Medal

New-York 5
Mumbai 5

47
`

• Dice

Dice selects two or more dimensions from a given cube and provides a new sub-cube.
For example, if we want to make a select where Medal = 3 or Location = Washington D.C.
Location Medal

Delhi 3
Washington D.C. 5

• Pivot

Pivot is also known as rotate. It rotates the data axis to view the data from different perspectives.

M L
e o
d c
a Pivo a
Location l t Medal t
i
o
n

Pivot

What is Entity Relationship Model?

Entity-Relationship (E/R) Model is widely used conceptual level data model proposed by Peter P
Chen in 1970s.

Entity Relationship Modelling (ER Modelling) is a graphical approach to database design. It uses
Entity/Relationship to represent real world objects.

Data model to describe the database system at the requirements collection stage
48
`

• High level description.


• Easy to understand for the enterprise managers.
• Rigorous enough to be used for system building.

Concepts available in the model

• Entities and attributes of entities.


• Relationships between entities.
• Diagrammatic notation.

An Entity is a thing or object in real world that is distinguishable from surrounding environment. For
example each employee of an organization is a separate entity. Following are some of major
characteristics of entities

Let's consider an example.

An employee of an organization is an entity. If "Sam" is a programmer (an employee) at TCS, he can


have attributes (properties) like name, age, weight, height, etc. It is obvious that those do hold values
relevant to him.

Each attribute can have Values. In most cases single attribute have one value. But it is possible for
attributes have multiple values also. For example Sam’s age has a single value. But his "phone
numbers" property can have multiple values.

Entities can have relationships with each other. Let's consider a simplest example. Assume that each
TCS Programmer is given a Computer. It is clear that that Sam’s Computer is also an entity. Sam is
using that computer and the same computer is used by Sam. In other words there is a mutual
relationship among Sam and his computer.

In Entity Relationship Modelling, we model entities, their attributes and relationships among
entities.

49
`

Example -1

Example -2

50
`

Example -3

Example - 4

51
`

• Entity – rectangle
• Attribute – ellipses
• Relationship – diamond
• Link – line
• Primary key – underline
• Double Ellipses – multi-valued attributes

Star and Snowflake Schema

Multidimensional schema is especially designed to model data warehouse systems. The schemas are
designed to address the unique needs of very large databases designed for the analytical purpose
(OLAP).

52
`

Types of Data Warehouse Schema:

Following are 3 chief types of multidimensional schemas each having its unique advantages.

• Star Schema
• Snowflake Schema
• Galaxy Schema

In most database environments, users perform two basic types of tasks: modification (inserting,
updating, and deleting records) and retrieval (queries). Modifying records is generally known as
online transaction processing (OLTP). Data retrieval is referred to as online analytical processing
(OLAP) or decision support, because the information is often used to make business decisions. This
section describes these data models and their structural requirements.

When database records are modified, the most important requirements are update performance and
data integrity. These needs are addressed by the entity relation model of organizing data. Entity
relation schemas are highly normalized. This means that data redundancy is eliminated by separating
the data into multiple tables. The process of normalization results in a complex schema with many
tables and join paths.

When database records are retrieved, the most important requirements are query performance and
schema simplicity. These needs are best addressed by the dimensional model. Another name for the
dimensional model is the star schema.

Star Schema

The star schema is the simplest type of Data Warehouse schema. It is known as star schema as its
structure resembles a star. In the Star schema, the centre of the star can have one fact tables and
numbers of associated dimension tables. It is also known as Star Join Schema and is optimized for
querying large data sets. A diagram of a star schema resembles a star, with a fact table at the centre.
The following figure is a sample star schema.

53
`

Diagram: Star Schema

Star Schema Advantages

Star schemas are easy for end users and applications to understand and navigate. With a well-designed
schema, users can quickly analyze large, multidimensional data sets. The main advantages of star
schemas in a decision-support environment are:

• Query performance

Because a star schema database has a small number of tables and clear join paths, queries run
faster than they do against an OLTP system. Small single-table queries, usually of dimension
tables, are almost instantaneous. Large join queries that involve multiple tables take only
seconds or minutes to run.

In a star schema database design, the dimensions are linked only through the central fact table.
When two dimension tables are used in a query, only one join path, intersecting the fact table,
exists between those two tables. This design feature enforces accurate and consistent query
results.

• Load performance and administration

Structural simplicity also reduces the time required to load large batches of data into a star
schema database. By defining facts and dimensions and separating them into different tables,
54
`

the impact of a load operation is reduced. Dimension tables can be populated once and
occasionally refreshed. You can add new facts regularly and selectively by appending records
to a fact table.

• Built-in referential integrity

A star schema has referential integrity built in when data is loaded. Referential integrity is
enforced because each record in a dimension table has a unique primary key, and all keys in
the fact tables are legitimate foreign keys drawn from the dimension tables. A record in the
fact table that is not related correctly to a dimension cannot be given the correct key value to
be retrieved.

• Easily understood

A star schema is easy to understand and navigate, with dimensions joined only through the
fact table. These joins are more significant to the end user, because they represent the
fundamental relationship between parts of the underlying business. Users can also browse
dimension table attributes before constructing a query.

Snowflake Schema

The snowflake schema architecture is a more complex variation of the star schema used in a data
warehouse, because the tables which describe the dimensions are normalized.

Snowflake schema consists of a fact table surrounded by multiple dimension tables which can be
connected to other dimension tables via many-to-one relationship. The snowflake schema is a kind
of star schema however it is more complex than a star schema in term of the data model. This schema
resembles a snowflake, therefore, it is called snowflake schema.

A snowflake schema is designed from star schema by further normalizing dimension tables to
eliminate data redundancy. Therefore in the snowflake schema, instead of having big dimension
tables connected to a fact table, we have a group of multiple dimension tables. In the snowflake
schema, dimension tables are normally in the third normal form (3NF). The snowflake schema helps
save storage however it increases the number of dimension tables

55
`

The following figure is a sample Snowflake Schema

Comparison of Star Schema vs. Snowflake Schema


Star Schema Snowflake Schema
Easy of maintenance / Has redundant data and hence less No redundancy, so snowflake
change easy to maintain/change schemas are easier to maintain
and change
Easy to use Lower query complexity and easy More complex queries and hence
to understand less easy to understand
Query performance Less number of foreign keys and More foreign keys and hence
hence shorter query execution longer query execution time
time (faster) (slower)
Type of Data warehouse Good for data marts with simple Good to use for data warehouse
relationships (1:1 or 1:many) core to simplify complex
relationships (many: many)
Joins Fewer Joins Higher number of Joins

56
`

Dimension table A star schema contains only A snowflake schema may have
single dimension table for each more than one dimension table
dimension for each dimension
When to use When dimension table contains When dimension table is
less number of rows, we can relatively big in size,
choose star schema snowflaking is better as it
reduces space
Normalization / Both Dimension and Fact Tables Dimension Tables are in
De-normalization are in De-Normalized form Normalized form but Fact Table
is in De-Normalized form.
Data model Top down approach Bottom up approach

57
`

Module - II
Digital Data

Definitions of data

“Data: A representation of facts, concepts or instructions in a formalised manner suitable for


communication, interpretation, or processing by humans or by automatic means.” (Hicks [1993: 668]
quoted by Checkland and Holwell [1998])

Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are
accumulating vast and growing amounts of data in different formats and different databases. This
includes:
• Operational or transactional data such as, sales, cost, inventory, payroll, and accounting
• Non-operational data, such as industry sales, forecast data, and macro economic data
• Meta data - data about the data itself, such as logical database design or data dictionary definitions

One of the greatest benefits and drivers of the digital revolution is the ability to gather and analyze
data like never before. From business operations to health care to law enforcement, data is
transforming the way we work and live.

Types of Digital Data

58
`

Structured data:

Structured data includes mainly text, these data are easily processed. These data are easily entered,
stored and analysed. Structured data are stored in the form of rows and columns which is easily
managed with the language called “Structured Query Language” (SQL). Relational model is a data
model that supports structured data and manages it in the form of row and table and process the
content of the table easily. XML also Support structured data. Most of the content of the web pages
are in the XML forms. These contents are included in structured data, companies like Google uses
structured data to find on the web to understand the content of the page. This way most of the Google
search is done with the help of structured data. Since starting of the revolution of database network,
hierarchical, relational, object relational data model deals with structured data.

Characteristics of Structured Data

1. Structured data has various data type: date, name, number, characters, and address.
2. These data are arranged in a defined way
3. Structured data are handling through SQL
4. Structured data are dependent on schema; it is a schema based.
5. These data can easily interact with computer

59
`

Semi-Structured Data
Semi-structured data includes e-mails, XML and JSON. Semi structured data is not fit for relational
database where it is expressed with the help of edges, labels and tree structures. These are represented
with the help of trees and graphs and they have attributes, labels. These are schema-less data. Data
models which are graph based can store semi-structured data. MongoDB is a NOSQL model that
support JSON (semi-structured data).
Data consist of tags and which are self-describing are generally semi-structured data. They are
different from structured and unstructured data. Data Object Model, Objects Exchange Model, Data
Guide is famous data model that express semi structured data. Concepts for semi-structured data
model: document instance, document schema, elements attributes elements relationship sets.

60
`

Characteristics of Semi-structured Data

1. It is not based on Schema


2. It is represented through label and edges
3. It is generated from various web pages
4. It has multiple attributes

Unstructured Data

Unstructured data includes videos, images, and audios. Today, in our digital universe 90% of data
which is increasing is unstructured data. This data is not fit for relational database and in order to
make them store, scenario came up with NoSQL database. Today there are four family of NoSQL
database: key-value, column-oriented, graph-oriented, and document-oriented. Most of the famous
organization today (Amazon, LinkedIn, Facebook, Google, YouTube) is dealing with NoSQL data
and they are replaced their convention database to NoSQL database.

Characteristics of Unstructured Data

1. It is not based on Schema


2. It is not suitable for relational database
3. 90% of unstructured data is growing today
4. It includes digital media files, Word doc., pdf files,
5. It is stored in NoSQL database

61
`

Let’s compare structures and unstructured data


Structured Data Unstructured Data
Characteristics • Pre-defined data models • No pre-defined data model
• Usually, text only • May be text, images, sound, video
• Easy to search or other formats
• Difficult to search
Resides In • Relational databases • Applications
• Data warehouses • NoSQL databases
• Data warehouses
• Data lakes
Generated by • Humans or machines • Humans or machines
Typical • Airline reservation systems • Word processing
applications • Inventory control • Presentation software
• CRM systems • Email clients
• ERP • Tools for viewing or editing
media
Examples • Dates • Text files
• Phone numbers • Reports
• Social security numbers • Email messages
• Credit card numbers • Audio files
• Customers names • Video files
• Addresses • Images
• Product names and numbers • Surveillance imagery
• Transaction information

62
`

Data Warehouse

Introduction

Data that is capture and stored during the processing of transactions can be used to produce valuable
information to end-user, especially management. The management information can be used to plan,
monitor, and control business operations (Whitten et al, 1998:53). In order to exploit this data asset,
organizations began creating data warehouse to satisfy these demands for decision-making (Gates,
1999:250)

The data warehouse is the significant component of business intelligence. It is subject oriented,
integrated. The data warehouse supports the physical propagation of data by handling the numerous
enterprise records for integration, cleansing, aggregation and query tasks. It can also contain the
operational data which can be defined as an updateable set of integrated data used for enterprise wide
tactical decision-making of a particular subject area. It contains live data, not snapshots, and retains
minimal history. Data sources can be operational databases, historical data, external data for example,
from market research companies or from the Internet), or information from the already existing data
warehouse environment. The data sources can be relational databases or any other data structure that
supports the line of business applications. They also can reside on many different platforms and can
contain structured information, such as tables or spreadsheets, or unstructured information, such as
plaintext files or pictures and other multimedia information.

Definition of Data Warehouse

• “A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection


of data in support of management’s decision-making process.” — W. H. Inmon, the father
of the term data warehouse.
• According to Ralph Kimball, Data Warehouse is a transaction data specifically structured
for query and analysis.
• “A data warehouse is simply a single, complete, and consistent store of data obtained from a
variety of sources and made available to end users in a way they can understand and use it in
a business context.” -- Barry Devlin, IBM Consultant

63
`

What Is a Data Warehouse?

A process of transforming data into information and making it available to users in a timely enough
manner to make a difference. [Forrester Research, April 1996]

Data warehouse is a database of unique data structure that allows relativity quick and easy
performance of complex quires over large amount of data. A data warehouse is a relational or
multidimensional database that is designed for query and analysis. Data warehouses are not optimized
for transaction processing, which is the domain of OLTP systems. Data warehouses usually
consolidate historical and analytic data derived from multiple sources. Data warehouses separate
analysis workload from transaction workload and enable an organization to consolidate data from
several sources.

A data warehouse usually stores many months or years of data to support historical analysis. The data
in a data warehouse is typically loaded through an extraction, transformation, and loading (ETL)
process from one or more data sources such as OLTP applications, mainframe applications, or
external data providers.

Users of the data warehouse perform data analyses that are often time-related. Examples include
consolidation of last year's sales figures, inventory analysis, and profit by product and by customer.
More sophisticated analyses include trend analyses and data mining, which use existing data to
forecast trends or predict futures. The data warehouse typically provides the foundation for a business
intelligence environment.

The Key Characteristics of a Data Warehouse

The key characteristics of a data warehouse are as follows:

64
`

• Subject Oriented
• Integrated
• Non-Volatile
• Time Variant
• Subject Oriented

Data is categorized and stored by business subject rather than by application.

• Organized around major subjects, such as customer, product, sales.


• Focusing on the modelling and analysis of data for decision makers, not on daily operations
or transaction processing.
• Provide a simple and concise view around particular subject issues by excluding data that are
not useful in the decision support process.
• Integrated

Data on a given subject is defined and stored once

65
`

Constructed by integrating multiple, heterogeneous data sources

– relational databases, flat files, on-line transaction records

• One set of consistent, accurate, quality information

• Standardization

– Naming conventions
– Coding structures
– Data attributes
– Measures

Data cleaning and data integration techniques are applied.

– Ensure consistency in naming conventions, encoding structures, attribute measures,


etc. among different data sources
– When data is moved to the warehouse, it is converted

• Non-Volatile

Typically data in the data warehouse is not deleted

66
`

A physically separate store of data transformed from the operational environment

• Operational update of data does not occur in the data warehouse environment

– Does not require transaction processing, recovery, and concurrency control


mechanisms
– Requires only two operations in data accessing:

• Initial loading of data and access of data

• Time Variant

Data is stored as a series of snapshots, each representing a period of time

The time horizon for the data warehouse is significantly longer than that of operational systems
67
`

– Operational database: current value data


– Data warehouse data: provide information from a historical perspective (e.g., past 5-
10 years)

Every key structure in the data warehouse – Contains an element of time, explicitly or implicitly –
But the key of operational data may or may not contain “time element”

In general, fast query performance with high data throughput is the key to a successful data
warehouse.

Basic components of a data warehouse

A data warehouse stores data that is extracted from data stores and external sources. The data records
within the warehouse must contain details to make it searchable and useful to business users.

• Data sources from operational systems, such as Excel, ERP, CRM or financial applications;

• A data staging area where data is cleaned and ordered; and

• A presentation area where data is warehoused.

Data analysis tools, such as business intelligence software, access the data within the warehouse.
Data warehouses can also feed data marts, which are decentralized systems in which data from the
warehouse is organized and made available to specific business groups, such as sales or inventory
teams.

In addition, Hadoop has become an important extension of data warehouses for many enterprises
because the data processing platform can improve components of the data warehouse architecture --
from data ingestion to analytics processing to data archiving.

Data warehouse design methods

In addition to Inmon's top-down approach to data warehouses and Kimball's bottom-up method, some
organizations have also adopted hybrid options.

68
`

• Top-down method: Inmon's method calls for building the data warehouse first. Data is
extracted from operational and possibly third-party external systems and may be validated in
a staging area before being integrated into a normalized data model. Data marts are created
from the data stored in the data warehouse.

• Bottom-up method: Kimball's data warehousing architecture calls for dimensional data
marts to be created first. Data is extracted from operational systems, moved to a staging area
and modelled into a star schema design, with one or more fact tables connected to one or more
dimensional tables. The data is then processed and loaded into data marts, each of which
focuses on a specific business process. Data marts are integrated using data warehouse bus
architecture to form an enterprise data warehouse.

• Hybrid method: Hybrid approaches to data warehouse design include aspects from both the
top-down and bottom-up methods. Organizations often seek to combine the speed of the
bottom-up approach with the integration achieved in a top-down design.

Advantages of Data Warehouse

• Lowers cost of information access


• Improves customer responsiveness
• Identifies hidden business opportunities
• Strategic decision making

Data Marts:

In 1997, White (1997, 7) stated that there is “considerable debate and confusion in the industry about
the role and value of a data mart. Subsequent research over the years in this field and evident found
in literature defines data marts as
• “... the term data mart, which has been used variously to describe a small standalone data
warehouse or a distributed secondary level of storage and distribution in conjunction with a
data warehouse”. (Dodge and Gorman, 1998:11)
• “A data mart, often referred to as a subject-oriented data warehouse, represents a subset of
the data warehouse comprised of relevant data for a particular business function (e.g.
marketing, sales, finance etc.)” (Poe et al , 1998:18)

69
`

• “The term data is used to refer to a small-capacity data warehouse designed for use by a
business unit or department of a corporation” (Gray and Watson, 1998:103)
• “A scaled-down version of a data warehouse that is tailored to contain information likely to
be used only by the target group” (Gates, 1999:493).
• “A data mart is simply a smaller data warehouse. Usually the data in a mart us a subset of
data that is found in an enterprise-wide warehouse, as follows:
o A data warehouse is for data throughout the enterprise.
o A data mart is specific to a particular department (Microsoft Corporation, 1999:3).
• “A data mart contains a subset of corporate-wide data that is of value to a specific group of
users. The scope is confined to specific selected subjects” (Ham and Kamber, 2001:67)
• “A data mart is smaller, less expensive, and more focused than a large-scale data warehouse.
Data marts can be a substitution for a data warehouse, or they can be used in addition to it”
(Turban et al, 2001:759)
• Inmon, 1999, describe data mart as “a collection of subject areas organized for decision
support based on the needs of a given department. Finance has their data mart, marketing has
theirs, and sales have theirs and so on”

From these definitions in literature, the following can be concluded:

1. A data mart is smaller in scope and/or size than a data warehouse;


2. A data mart is appears to be less expensive to build than a data warehouse; and
3. There can be multiple data marts inside an enterprise. A data mart can support a particular
business function, business process or business unit.

Why do we need Data Mart?

• Data Mart helps to enhance user's response time due to reduction in volume of data
• It provides easy access to frequently requested data.
• Data mart are simpler to implement when compared to corporate Data warehouse. At the same
time, the cost of implementing Data Mart is certainly lower compared with implementing a
full data warehouse.
• Compared to Data Warehouse, a data mart is agile. In case of change in model, data mart can
be built quicker due to a smaller size.

70
`

• A Data mart is defined by a single Subject Matter Expert. On the contrary data warehouse is
defined by interdisciplinary SME from a variety of domains. Hence, Data mart is more open
to change compared to Data warehouse.
• Data is partitioned and allows very granular access control privileges.
• Data can be segmented and stored on different hardware/software platforms.

Type of Data Mart

There are three main types of data marts are:

1. Dependent: Dependent data marts are created by drawing data directly from operational,
external or both sources.
2. Independent: Independent data mart is created without the use of a central data warehouse.
3. Hybrid: This type of data marts can take data from data warehouses or operational systems.

Dependent Data Mart

A dependent data mart allows sourcing organization's data from a single Data Warehouse. It offers
the benefit of centralization. If you need to develop one or more physical data marts, then you need
to configure them as dependent data marts.

Dependent data marts can be built in two different ways. Either where a user can access both the data
mart and data warehouse, depending on need, or where access is limited only to the data mart. The
second approach is not optimal as it produces sometimes referred to as a data junkyard. In the data
junkyard, all data begins with a common source, but they are scrapped, and mostly junked.

71
`

Independent Data Mart

An independent data mart is created without the use of central Data warehouse. This kind of Data
Mart is an ideal option for smaller groups within an organization.

An independent data mart has neither a relationship with the enterprise data warehouse nor with any
other data mart. In Independent data mart, the data is input separately, and its analyses are also
performed autonomously.

Implementation of independent data marts is antithetical to the motivation for building a data
warehouse. First of all, you need a consistent, centralized store of enterprise data which can be
analyzed by multiple users with different interests who want widely varying information.

72
`

Hybrid data Mart:

A hybrid data mart combines input from sources apart from Data warehouse. This could be helpful
when you want ad-hoc integration, like after a new group or product is added to the organization.

It is best suited for multiple database environments and fast implementation turnaround for any
organization. It also requires least data cleansing effort. Hybrid Data mart also supports large storage
structures, and it is best suited for flexible for smaller data-centric applications.

73
`

Steps in Implementing a Datamart

Implementing a Data Mart is a rewarding but complex procedure. Here are the detailed steps to
implement a Data Mart:

Designing

Designing is the first phase of Data Mart implementation. It covers all the tasks between initiating
the request for a data mart to gathering information about the requirements. Finally, we create the
logical and physical design of the data mart.

74
`

The design step involves the following tasks:

• Gathering the business & technical requirements and Identifying data sources.
• Selecting the appropriate subset of data.
• Designing the logical and physical structure of the data mart.

Data could be partitioned based on following criteria:

• Date
• Business or Functional Unit
• Geography
• Any combination of above

Data could be partitioned at the application or DBMS level. Though it is recommended to partition
at the Application level as it allows different data models each year with the change in business
environment.

What Products and Technologies Do You Need?

A simple pen and paper would suffice. Though tools that help you create UML or ER diagrams would
also append meta data into your logical and physical designs.

Constructing

This is the second phase of implementation. It involves creating the physical database and the logical
structures.

This step involves the following tasks:

• Implementing the physical database designed in the earlier phase. For instance, database
schema objects like table, indexes, views, etc. are created.

What Products and Technologies Do You Need?

You need a relational database management system to construct a data mart. RDBMS have several
features that are required for the success of a Data Mart.

75
`

• Storage management: An RDBMS stores and manages the data to create, add, and delete
data.
• Fast data access: With a SQL query you can easily access data based on certain
conditions/filters.
• Data protection: The RDBMS system also offers a way to recover from system failures such
as power failures. It also allows restoring data from these backups incase of the disk fails.
• Multiuser support: The data management system offers concurrent access, the ability for
multiple users to access and modify data without interfering or overwriting changes made by
another user.
• Security: The RDMS system also provides a way to regulate access by users to objects and
certain types of operations.

Populating:

In the third phase, data in populated in the data mart.

The populating step involves the following tasks:

• Source data to target data Mapping


• Extraction of source data
• Cleaning and transformation operations on the data
• Loading data into the data mart
• Creating and storing metadata

What Products and Technologies Do You Need?

You accomplish these population tasks using an ETL (Extract Transform Load)Tool. This tool allows
you to look at the data sources, perform source-to-target mapping, extract the data, transform, cleanse
it, and load it back into the data mart.

In the process, the tool also creates some metadata relating to things like where the data came from,
how recent it is, what type of changes were made to the data, and what level of summarization was
done.

76
`

Accessing

Accessing is a fourth step which involves putting the data to use: querying the data, creating reports,
charts, and publishing them. End-user submit queries to the database and display the results of the
queries

The accessing step needs to perform the following tasks:

• Set up a meta layer that translates database structures and objects names into business terms.
This helps non-technical users to access the Data mart easily.
• Set up and maintain database structures.
• Set up API and interfaces if required

What Products and Technologies Do You Need?

You can access the data mart using the command line or GUI. GUI is preferred as it can easily
generate graphs and is user-friendly compared to the command line.

Managing

This is the last step of Data Mart Implementation process. This step covers management tasks such
as-

• Ongoing user access management.


• System optimizations and fine-tuning to achieve the enhanced performance.
• Adding and managing fresh data into the data mart.
• Planning recovery scenarios and ensure system availability in the case when the system fails.

Best practices for Implementing Data Marts

Following are the best practices that you need to follow while in the Data Mart Implementation
process:

• The source of a Data Mart should be departmentally structured


• The implementation cycle of a Data Mart should be measured in short periods of time, i.e., in
weeks instead of months or years.

77
`

• It is important to involve all stakeholders in planning and designing phase as the data mart
implementation could be complex.
• Data Mart Hardware/Software, Networking and Implementation costs should be accurately
budgeted in your plan
• Even though if the Data mart is created on the same hardware they may need some different
software to handle user queries. Additional processing power and disk storage requirements
should be evaluated for fast user response
• A data mart may be on a different location from the data warehouse. That's why it is important
to ensure that they have enough networking capacity to handle the Data volumes needed to
transfer data to the data mart.
• Implementation cost should budget the time taken for Data mart loading process. Load time
increases with increase in complexity of the transformations.

Advantages and Disadvantages of a Data Mart

Advantages

• Data marts contain a subset of organization-wide data. This Data is valuable to a specific
group of people in an organization.
• It is cost-effective alternatives to a data warehouse, which can take high costs to build.
• Data Mart allows faster access of Data.
• Data Mart is easy to use as it is specifically designed for the needs of its users. Thus a data
mart can accelerate business processes.
• Data Marts needs less implementation time compare to Data Warehouse systems. It is faster
to implement Data Mart as you only need to concentrate the only subset of the data.
• It contains historical data which enables the analyst to determine data trends.

Disadvantages

• Many a time’s enterprises create too many disparate and unrelated data marts without much
benefit. It can become a big hurdle to maintain.
• Data Mart cannot provide company-wide data analysis as their data set is limited.

78
`

Comparison of data warehouse and types of data marts

79
`

Data Lake

Companies have embraced the concept of the data lake or data hub to serve their data storage and
data-driven application needs. However, gaps remain in the maturity and capability of the Hadoop
stack, leaving organizations struggling with how to reap the benefits of these data lakes and how to
create analytic applications that deliver value to end users. For data lakes to succeed, organizations
need to learn and understand the differences between these big data scenarios:

I. Data discovery and exploratory analysis

II. Analytic applications and operationalization analytics across the enterprise

What is Data Lake?

The concept of a data lake is emerging as a popular way to organize and build the next generation of
systems to master new big data challenges.

Data Lake Concept:

A Data Lake is a large size storage repository that holds a large amount of raw data in its original
format until the time it is needed. Every data element in a Data lake is given a unique identifier and
tagged with a set of extended metadata tags. It offers wide varieties of analytic capabilities.

It is a storage repository that can store large amount of structured, semi-structured, and unstructured
data. It is a place to store every type of data in its native format with no fixed limits on account size
or file. It offers high data quantity to increase analytic performance and native integration.

Data Lake is like a large container which is very similar to real lake and rivers. Just like in a lake you
have multiple tributaries coming in, a data lake has structured data, unstructured data, machine to
machine, logs flowing through in real-time.

The data lake concept centres on landing all analyzable data sets of any kind in raw or only lightly
processed form into the easily expandable scale-out Hadoop infrastructure to ensure that the fidelity
of the data is preserved. Instead of forcing data into a static schema and running an ETL (Extract,

80
`

Transform, Load) process to ft it into a structured database; a Hadoop-first approach enhances agility
by storing data at its raw form. As a result, data is available at a more granular level without losing
its details, and schemas are created at a later point. This process is also referred to as ‘schema-on-
read.’

Key Data Lake Concepts

Following are Key Data Lake concepts that one needs to understand to completely understand the
Data Lake Architecture

• Data Ingestion

Data Ingestion allows connectors to get data from a different data sources and load into the Data
Lake.

Data Ingestion supports:

• All types of Structured, Semi-Structured, and Unstructured data.


• Multiple ingestions like Batch, Real-Time, One-time load.
• Many types of data sources like Databases, Webservers, Emails, IoT, and FTP.

• Data Storage

Data storage should be scalable, offers cost-effective storage and allow fast access to data exploration.
It should support various data formats.

• Data Governance

Data governance is a process of managing, availability, usability, security, and integrity of data used
in an organization.

• Security

Security needs to be implemented in every layer of the Data Lake. It starts with Storage, Unearthing,
and Consumption. The basic need is to stop access for unauthorized users. It should support different
tools to access data with easy to navigate GUI and Dashboards.

81
`

Authentication, Accounting and Data Protection are some important features of data lake security.

• Data Quality:

Data quality is an essential component of Data Lake architecture. Data is used to exact business value.
Extracting insights from poor quality data will lead to poor quality insights.

• Data Discovery

Data Discovery is another important stage before you can begin preparing data or analysis. In this
stage, tagging technique is used to express the data understanding, by organizing and interpreting the
data ingested in the Data Lake.

• Data Auditing

Two major Data auditing tasks are tracking changes to the key dataset.

1. Tracking changes to important dataset elements


2. Captures how/ when/ and who changes to these elements.

Data auditing helps to evaluate risk and compliance.

• Data Lineage

This component deals with data's origins. It mainly deals with where it movers over time and what
happens to it. It eases errors corrections in a data analytics process from origin to destination.

• Data Exploration

It is the beginning stage of data analysis. It helps to identify right dataset before starting Data
Exploration.

All given components need to work together to play an important part in Data Lake building easily
evolve and explore the environment.

82
`

Important tiers in Data Lake Architecture

1. Ingestion Tier: The tiers on the left side depict the data sources. The data could be loaded
into the data lake in batches or in real-time
2. Insights Tier: The tiers on the right represent the research side where insights from the
system are used. SQL, NoSQL queries, or even excel could be used for data analysis.
3. HDFS is a cost-effective solution for both structured and unstructured data. It is a landing
zone for all data that is at rest in the system.
4. Distillation tier takes data from the storage tire and converts it to structured data for easier
analysis.
5. Processing tier run analytical algorithms and user’s queries with varying real time,
interactive, batch to generate structured data for easier analysis.
6. Unified operations tier governs system management and monitoring. It includes auditing
and proficiency management, data management, workflow management.

Need of Data Lake

The data lake arose because new types of data needed to be captured and exploited by the enterprise.
As this data became increasingly available, early adopters discovered that they could extract insight
through new applications built to serve the business.

The data lake supports the following capabilities:

• To capture and store raw-data at scale for a low cost

• To store many types of data in the same repository

• To perform transformations on the data

• To define the structure of the data at the time it is used, referred to as schema on read.

• To perform new types of data processing

• To perform single subject analytics based on very specific use cases

The first examples of data lake implementations were created to handle web data at organizations
like Google, Yahoo, and other web-scale companies. Then many other sorts of big data followed suit:

• Clickstream data
83
`

• Server logs

• Social media

• Geolocation coordinates

• Machine and sensor data

For each of these data types, the data lake created a value chain through which new types of business
value emerged. Data lakes are created to store historical, micro-transactional event data, but most
enterprises must bring to bear the operational intelligence aspects of a data lake. While data discovery
tools give you a head start in identifying the gaps in your data or creating one-of analysis,
operationalizing big data requires further data management and analytic engines. To maximize the
value of data lakes, organizations must think ahead architecturally and balance experimentation and
the use of pure data discovery activities with creating enterprise applications that add context,
consumability, and availability of data to the entire enterprise.

• Using data lakes for web data increased the speed and quality of web search

• Using data lakes for Clickstream data supported more effective methods of web advertising

• Using data lakes for cross-channel analysis of customer interactions and behaviours provided
a more complete view of the customer

The data going into a lake might consist of machine-generated logs and sensor data (e.g., Internet of
Things or IoT), customer behaviour (e.g., web Clickstream), social media, documents (e.g., e-mails),
geo-location trails, images, video and audio, and structured enterprise data sets such as transactional
data from relational sources and systems such as ERP, CRM or SCM.

The first companies that created data lakes were web-scale companies focused on big data. The
challenge was to handle the scale of that data and to perform new types of transformations and
analytics on the data to support key applications such as indexing the Web or enabling ad targeting.
But as the wave of big data kept coming, companies that had invested years in creating enterprise
data warehouses began creating data lakes to complement their enterprise data warehouses. The data
lake and the enterprise data warehouse must both do what they do best and work together as
components of a logical data warehouse.

84
`

Data Swamps

• Raw data

• Can’t find or use data

• Can’t allow access without protecting sensitive data

85
`

Maturity stages of Data Lake

The Definition of Data Lake Maturity stages differ from textbook to other. Though the crux remains
the same. Following maturity, stage definition is from a layman point of view.

Stage 1: Handle and ingest data at scale

This first stage of Data Maturity Involves improving the ability to transform and analyze data. Here,
business owners need to find the tools according to their skill set for obtaining more data and build
analytical applications.

Stage 2: Building the analytical muscle

This is a second stage which involves improving the ability to transform and analyze data. In this
stage, companies use the tool which is most appropriate to their skill set. They start acquiring more
data and building applications. Here, capabilities of the enterprise data warehouse and data lake are
used together.
86
`

Stage 3: EDW (Enterprise data warehouse) and Data Lake work in unison

This step involves getting data and analytics into the hands of as many people as possible. In this
stage, the data lake and the enterprise data warehouse start to work in a union. Both playing their part
in analytics

Stage 4: Enterprise capability in the lake

In this maturity stage of the data lake, enterprise capabilities are added to the Data Lake. Adoption
of information governance, information lifecycle management capabilities, and Metadata
management. However, very few organizations can reach this level of maturity, but this tally will
increase in the future.

Best practices for Data Lake Implementation:

• Architectural components, their interaction and identified products should support native data
types
• Design of Data Lake should be driven by what is available instead of what is required. The
schema and data requirement is not defined until it is queried
• Design should be guided by disposable components integrated with service API.
• Data discovery, ingestion, storage, administration, quality, transformation, and visualization
should be managed independently.
• The Data Lake architecture should be tailored to a specific industry. It should ensure that
capabilities necessary for that domain are an inherent part of the design
• Faster on-boarding of newly discovered data sources is important
• Data Lake helps customized management to extract maximum value
• The Data Lake should support existing enterprise data management techniques and methods

Challenges of building a data lake:

• In Data Lake, Data volume is higher, so the process must be more reliant on programmatic
administration
• It is difficult to deal with sparse, incomplete, volatile data
• Wider scope of dataset and source needs larger data governance & support

87
`

Difference between Data Lakes and Data Warehouse

Parameters Data Lakes Data Warehouse

Data Data lakes store everything Data warehouse focuses only on


Business Processes

Processing Data are mainly unprocessed Highly processed data

Types of It can be unstructured, semi- It is mainly in tabular form and


data structured and structured structured

Task Share data stewardship Optimized for data retrieval

Agility Highly agile, configure and Compare to data lake it is less agile and
reconfigure as needed has fixed configuration.

Users Data lake is mostly used by data Business professionals widely use data
scientists warehouse

Storage Data lakes design for low-cost Expensive storage that give fast
storage response times are used

Security Offers lesser control Allows better control of the data

Schema Schema on reading (no predefined Schema on write (predefined schema)


schemas)

Data Helps for faster ingestion on new Time-consuming to introduce new


processing data content

Tools Can use open source/tools like Mostly commercial tools.


Hadoop /Map Reduction

88
`

Benefits and Risks of using Data Lake:

Here are some major benefits in using a Data Lake:

• Helps fully with product ionizing & advanced analytics


• Offers cost-effective scalability and flexibility
• Offers value from unlimited data types
• Reduces long-term cost of ownership
• Allows economic storage of files
• Quickly adaptable to changes
• The main advantage of data lake is the centralization of different content sources
• Users, from various departments, may be scattered around the globe can have flexible
access to the data

Risk of Using Data Lake:

• After some time, Data Lake may lose relevance and momentum
• There is larger amount risk involved while designing Data Lake
• Unstructured Data may lead to Ungoverned Chao, Unusable Data, Disparate & Complex
Tools, Enterprise-Wide Collaboration, Unified, Consistent, and Common
• It also increases storage & computes costs
• There is no way to get insights from others who have worked with the data because there is
no account of the lineage of findings by previous analysts
• The biggest risk of data lakes is security and access control. Sometimes data can be placed
into a lake without any oversight, as some of the data may have privacy and regulatory need

89
`

Business Reporting, Visual Analytics

Report

Any communication artefact prepared to convey specific information.

A report can fulfil many functions

• To ensure proper departmental functioning

• To provide information

• To provide the results of an analysis

• To persuade others to act

• To create an organizational memory…

What is a Business Report?

A written document that contains information regarding business matters.


• Purpose: to improve managerial decisions
• Source: data from inside and outside the organization (via the use of ETL)
• Format: text + tables + graphs/charts
• Distribution: in-print, email, portal/intranet

Data acquisition Information generation Decision making Process management

Visual Analytics

Visual analytics enables business users to interact with data and engage in analytical processes
through visual representations. It helps users become more productive with data by using software to
integrate data analysis capabilities with modern, graphical ways of expressing information.

Visual analytics, which includes visual data discovery, is part of a larger trend in Business
Intelligence toward greater self-service data access and analysis.

90
`

Cloud computing and software as a service (SaaS) for BI, analytics, and data warehousing are
maturing, giving organizations more options for servicing demand for BI and visual analytics. The
options also give business functions choices for addressing growing demand for data without adding
significantly to the IT function; this is particularly important for businesses that fear missing
opportunities if they wait for on-premises systems to be built to handle their BI and analytics needs.
With cloud options, companies can focus on the business reasons for analytics rather than on
configuring their data infrastructure.

A visual analytics tools supports

♦ Showing different views to data: from raw data to data abstractions

♦ Representations of large quantities of information in small space

♦ Finding pattern from data: similarities, anomalies, relationships and events

♦ Simulation, prediction, testing hypothesis

♦ Data retrieval, browsing and exploration

♦ Information extraction and distillation.

Data visualization

Data visualization is a general term that describes any effort to help people understand the
significance of data by placing it in a visual context. Patterns, trends and correlations that might go
undetected in text-based data can be exposed and recognized easier with data visualization software.

Today's data visualization tools go beyond the standard charts and graphs used in Microsoft
Excel spreadsheets, displaying data in more sophisticated ways such as infographics, dials and
gauges, geographic maps, sparklines, heat maps, and detailed bar, pie and fever charts. The images
may include interactive capabilities, enabling users to manipulate them or drill into the data for
querying and analysis. Indicators designed to alert users when data has been updated or predefined
conditions occur can also be included.

91
`

Importance of data visualization

Data visualization has become the de facto standard for modern business intelligence (BI). The
success of the two leading vendors in the BI space, Tableau and Qlik -- both of which heavily
emphasize visualization -- has moved other vendors toward a more visual approach in their software.
Virtually all BI software has strong data visualization functionality.

Data visualization tools have been important in democratizing data and analytics and making data-
driven insights available to workers throughout an organization. They are typically easier to operate
than traditional statistical analysis software or earlier versions of BI software. This has led to a rise
in lines of business implementing data visualization tools on their own, without support from IT.

Data visualization software also plays an important role in big data and advanced analytics projects.
As businesses accumulated massive troves of data during the early years of the big data trend, they
needed a way to quickly and easily get an overview of their data. Visualization tools were a natural
fit.

Visualization is central to advanced analytics for similar reasons. When a data scientist is writing
advanced predictive analytics or machine learning algorithms, it becomes important to visualize the
outputs to monitor results and ensure that models are performing as intended. This is because
visualizations of complex algorithms are generally easier to interpret than numerical outputs.

Examples of data visualization

Data visualization tools can be used in a variety of ways. The most common use today is as a BI
reporting tool. Users can set up visualization tools to generate automatic dashboards that track
company performance across key performance indicators and visually interpret the results.

Many business departments implement data visualization software to track their own initiatives. For
example, a marketing team might implement the software to monitor the performance of an email
campaign, tracking metrics like open rate, click-through rate and conversion rate.

As data visualization vendors extend the functionality of these tools, they are increasingly being used
as front ends for more sophisticated big data environments. In this setting, data visualization software

92
`

helps data engineers and scientists keep track of data sources and do basic exploratory analysis of
data sets prior to or after more detailed advanced analyses.

How data visualization works

Most of today's data visualization tools come with connectors to popular data sources, including the
most common relational databases, Hadoop and a variety of cloud storage platforms. The
visualization software pulls in data from these sources and applies a graphic type to the data.

Data visualization software allows the user to select the best way of presenting the data, but,
increasingly, software automates this step. Some tools automatically interpret the shape of the data
and detect correlations between certain variables and then place these discoveries into the chart type
that the software determines is optimal.

Typically, data visualization software has a dashboard component that allows users to pull multiple
visualizations of analyses into a single interface, generally a web portal.

Data visualization is the presentation of data in a pictorial or graphical format. It enables decision
makers to see analytics presented visually, so they can grasp difficult concepts or identify new
patterns. With interactive visualization, you can take the concept a step further by using technology
to drill down into charts and graphs for more detail, interactively changing what data you see and
how it’s processed.

History of Data Visualization

The concept of using pictures to understand data has been around for centuries, from maps and graphs
in the 17th century to the invention of the pie chart in the early 1800s. Several decades later, one of
the most cited examples of statistical graphics occurred when Charles Minard mapped Napoleon’s
invasion of Russia. The map depicted the size of the army as well as the path of Napoleon’s retreat
from Moscow – and tied that information to temperature and time scales for a more in-depth
understanding of the event.

It’s technology, however, that truly lit the fire under data visualization. Computers made it possible
to process large amounts of data at lightning-fast speeds. Today, data visualization has become a

93
`

rapidly evolving blend of science and art that is certain to change the corporate landscape over the
next few years.

Why is data visualization important?

Because of the way the human brain processes information, using charts or graphs to visualize large
amounts of complex data is easier than poring over spreadsheets or reports. Data visualization is a
quick, easy way to convey concepts in a universal manner – and you can experiment with different
scenarios by making slight adjustments.

Data visualization can also:

• Identify areas that need attention or improvement.

• Clarify which factors influence customer behaviour.

• Help you understand which products to place where.

• Predict sales volumes.

Data visualization is going to change the way our analysts work with data. They’re going to be
expected to respond to issues more rapidly. And they’ll need to be able to dig for more insights –
look at data differently, more imaginatively. Data visualization will promote that creative data
exploration.

Simon Samuel, Head of Customer Value Modelling for a large bank in the UK

How Is It Being Used?

Regardless of industry or size, all types of businesses are using data visualization to help make sense
of their data. Here’s how.

Comprehend information quickly

94
`

By using graphical representations of business information, businesses are able to see large amounts
of data in clear, cohesive ways – and draw conclusions from that information. And since it’s
significantly faster to analyze information in graphical format (as opposed to analyzing information
in spreadsheets), businesses can address problems or answer questions in a more timely manner.

Identify relationships and patterns

Even extensive amounts of complicated data start to make sense when presented graphically;
businesses can recognize parameters that are highly correlated. Some of the correlations will be
obvious, but others won’t. Identifying those relationships helps organizations focus on areas most
likely to influence their most important goals.

Pinpoint emerging trends

Using data visualization to discover trends – both in the business and in the market – can give
businesses an edge over the competition, and ultimately affect the bottom line. It’s easy to spot
outliers that affect product quality or customer churn, and address issues before they become bigger
problems.

Communicate the story to others

Once a business has uncovered new insights from visual analytics, the next step is to communicate
those insights to others. Using charts, graphs or other visually impactful representations of data is
important in this step because it’s engaging and gets the message across quickly.

Laying the groundwork for data visualization

Before implementing new technology, there are some steps you need to take. Not only do you need
to have a solid grasp on your data, you also need to understand your goals, needs and audience.
Preparing your organization for data visualization technology requires that you first:

• Understand the data you’re trying to visualize, including its size and cardinality (the uniqueness
of data values in a column).

95
`

• Determine what you’re trying to visualize and what kind of information you want to
communicate.

• Know your audience and understand how it processes visual information.

• Use a visual that conveys the information in the best and simplest form for your audience.

Once you've answered those initial questions about the type of data you have and the audience who'll
be consuming the information, you need to prepare for the amount of data you'll be working with.
Big data brings new challenges to visualization because large volumes, different varieties and varying
velocities must be taken into account. Plus, data is often generated faster that it can be managed and
analyzed.

There are factors you should consider, such as the cardinality of columns you’re trying to visualize.
High cardinality means there’s a large percentage of unique values (e.g., bank account numbers,
because each item should be unique). Low cardinality means a column of data contains a large
percentage of repeat values (as might be seen in a “gender” column).

Discover the insights hidden in your data, with rich, interactive visuals. Oracle Data Visualization is
easy to use, yet powerful enough to perform advanced calculations.

• Automatically visualize data as you drag and drop attributes, chart, and graphs
• Change layouts to present new insights
• Answer questions quickly with online search and guided navigation
• Empower everyone in your organization to uncover the value in the data

Craft Visual Data Stories

Visual data storytelling makes complex ideas engaging, meaningful, and easy to understand. With
just a click, new users can join in, add or remove content, and share new insights.

• Capture insights are visual stories


• Save story points (snapshots of the analytical moment-in-time)
• Add comments to highlight key discoveries
• Transform discussions with secure, dynamic collaboration

96
`

• Improve decision-making and promote faster action

Charts and graphs

Well-presented charts and graphs can greatly enhance the readability of a research report. Charts and
Graphs simplify data and display key findings in pictorial form. Charts and graphs are visual
representation of data that can present complex information quickly and clearly, and assist the reader
to see patterns and trends in data.

Generally speaking, a graph is an effective way of communicating data when:

• Precise numeric details are not required

• A trend or comparison can be demonstrated

• There are relationships between data values

1) Column Chart (Vertical Bar)

A column chart (column graph) is a chart with vertically-arranged columns - the height of which
represents the value. It is best for comparing means or percentages between 2 to 7 different groups.
As you can see from the example below, each column is separated by blank space. For this reason,
the x-axis should be based on a scale that has mutually exclusive categories. Categories that are based
on a continuous scale are better suited for a histogram.

2) Horizontal Bar Chart

A bar chart (bar graph) is a chart with horizontally-arranged bars (rectangular or cylinder) - the
lengths of which are proportional to the values that they represent. This kind of chart is used when
comparing the mean or percentages of 8 or more different groups. Similar to the column chart, the
horizontal bar chart should only be used when comparing categories that are mutually exclusive.

97
`

3) Pie Charts

A pie chart (circle graph) is a circular chart that is divided into slices to illustrate proportion. Pie
charts are perfect for illustrating a sample break down in a single dimension. In other words, it is best
to use pie charts when you want to show differences within different parts based on one variable. It
is important to remember that pie charts should only be used with a group of categories that are the
parts of a whole.

4) Line Charts

A line chart is made by a series of data points that are connected by a line. Line charts provide clear
demonstration of trends over time. This is done most often to measure the long term progression of
sales, or any other empirical statistic valued for businesses or organizations. It can also be used to
compare two different variables over a period of time.

5) Scatter Plot

A scatter plot, scatter plot, or scatter graph is a type of mathematical diagram using Cartesian
coordinates to display values for two variables for a set of data. Scatter plots are used to depict how
different objects settle around a mean based on 2 to 3 different dimensions. This allows for quick and
easy comparisons between competing variables. Through such visuals, the viewers can quickly see
the difference between two objects or its relation to the average

98
`

Types of data and choice of graph

There are many different types of graph, each suitable for different types of data. The type of graph
selected for use depends on the type of data being represented.

Categorical data are data which fall into one of two or more discrete categories, but with no intrinsic
ordering to the categories. For example, sex is a categorical variable with two categories (male,
female). Hair colour is another categorical variable, but with a number of categories (blonde, brown,
auburn, black etc). For purely categorical data, the variables do not have a clear order. Graphs suitable
for categorical data include bar graphs (both horizontal and vertical), clustered bar graphs and stacked
column charts.

Ordinal data are similar to categorical data, except there is a clear ordering to the variables. While
ordinal data has a definite ordering, the degree of difference between categories is not always
consistent or measurable. For example, highest level of education might be (very simplistically)
classified into primary school, secondary school, some tertiary study, completed tertiary study. There
is likely to be a bigger difference between respondents having educational experience at the primary
school level versus the secondary school level, than there is between those undertaking some tertiary
study, and those who have completed tertiary study. Bar graphs and histograms are suitable for
presenting ordinal data. Data on a numeric scale are usually thought of, and treated, as continuous
data. For example, numbers of people scoring a grade between 1 and 100 on an examination would
constitute continuous data. Other examples of continuous data include height, time, mass, distance
and dollar values. Graphs suitable for continuous data include line graphs and scatter plots.

Interval data are data that have both an order, and an equal spacing between categories. For example,
if a survey asks respondents to nominate how much money they have saved in the bank for an
emergency, and the available response categories are ‘$0-$4,999’, ‘$5,000 to $9,999’, ‘$10,000 to
$14,999’ and so on, then the data are interval data. Sometimes, continuous data are converted to
interval data for reporting purposes. Graphs suitable for interval data include bar graphs, histograms
and box-whisker plots

99
`

Module - III

Data Mining
Data mining has attracted a great deal of attention in the information industry and in society as a
whole in recent years. Wide availability of huge amounts of data and the imminent need for turning
such data into useful information and knowledge. Data mining can be viewed as a result of the natural
evolution of information technology. Since the 1960s, database and information technology has been
evolving systematically from primitive file processing systems to sophisticated and powerful
database systems

What Is Data Mining?

Data mining is the practice of automatically searching large stores of data to discover pattern and
trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to
segment the data and evaluate the probability of future events. Data Mining is also called Knowledge
Discovery in Data (KDD)

Data mining is extraction of useful patterns from data sources, e.g. databases, texts, web, image etc.
It is a process that help user in discovering patterns in huge data sets this process must be automatic
or semi-automatic. The pattern discovered must be meaningful and help in better decision making.
It aids management to discover hidden, valid, and potentially useful patterns in huge data sets. Data
Mining is all about discovering unsuspected/ previously unknown relationships amongst the data.
Data mining is an emerging multi-disciplinary filed such as statistics, machine learning, database
technology, information science, visualizations etc. Data mining can be used for marketing, fraud
detection, and scientific discovery, etc.

The key properties of data mining are:

• Automatic discovery of patterns


• Prediction of likely outcomes
• Creation of actionable information
• Focus on large data sets and databases

100
`

Data mining can answer questions that cannot be addressed through simple query and reporting
techniques.

The Evolution of Data Mining

Data mining is a natural development of the increased use of computerized databases to


store data and provide answers to business analysts.

Evolutionary Step Business Question Enabling Technology


"What was my total
Data Collection (1960s) revenue in the last five computers, tapes, disks
years?"
"What were unit sales
faster and cheaper computers with more storage,
Data Access (1980s) in New England last
relational databases
March?"
"What were unit sales faster and cheaper computers with more storage,
Data Warehousing and in New England last On-line analytical processing
Decision Support March? Drill down to (OLAP), multidimensional databases, data
Boston." warehouses
"What's likely to
happen to Boston unit faster and cheaper computers with more storage,
Data Mining
sales next month? advanced computer algorithms
Why?"

Data mining as a process of knowledge discovery

101
`

The Knowledge Discovery in Databases process comprises of a few steps leading from raw data
collections to some form of new knowledge. The iterative process consists of the following steps:

• Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data
are removed from the collection.

• Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a
common source.

• Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the
data collection.

• Data transformation: also known as data consolidation, it is a phase in which the selected data is
transformed into forms appropriate for the mining procedure.

• Data mining: it is the crucial step in which clever techniques are applied to extract patterns
potentially useful.

• Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified
based on given measures.

• Knowledge representation: is the final phase in which the discovered knowledge is visually
represented to the user. This essential step uses visualization techniques to help users understand and
interpret the data mining results.

102
`

Architecture of a data mining system

Architecture of a typical data mining system

Data mining tasks can be classified into two categories:

Descriptive and predictive:

Descriptive mining tasks characterize the general properties of the data in the database.

103
`

Predictive mining tasks perform inference on the current data in order to make predictions

Data mining can be performed on following types of data

• Relational databases
• Data warehouses
• Advanced DB and information repositories
• Object-oriented and object-relational databases
• Transactional and Spatial databases
• Heterogeneous and legacy databases
• Multimedia and streaming database
• Text databases
• Text mining and Web mining

104
`

Data Mining Techniques

1. Classification:

This analysis is used to retrieve important and relevant information about data, and metadata. This
data mining method helps to classify data in different classes.

2. Clustering:

Clustering analysis is a data mining technique to identify data that are like each other. This process
helps to understand the differences and similarities between the data.

3. Regression:

Regression analysis is the data mining method of identifying and analyzing the relationship between
variables. It is used to identify the likelihood of a specific variable, given the presence of other
variables.
105
`

4. Association Rules:

This data mining technique helps to find the association between two or more Items. It discovers a
hidden pattern in the data set.

5. Outer detection:

This type of data mining technique refers to observation of data items in the dataset which do not
match an expected pattern or expected behavior. This technique can be used in a variety of domains,
such as intrusion, detection, fraud or fault detection, etc. Outer detection is also called Outlier
Analysis or Outlier mining.

6. Sequential Patterns:

This data mining technique helps to discover or identify similar patterns or trends in transaction data
for certain period.

7. Prediction:

Prediction has used a combination of the other data mining techniques like trends, sequential patterns,
clustering, classification, etc. It analyzes past events or instances in a right sequence for predicting a
future event.

There are several major data mining techniques have been developing and using in data mining
projects recently including association, classification, clustering, prediction, sequential
patterns and decision tree. We will briefly examine those data mining techniques in the following
sections.

Association

Association is one of the best-known data mining technique. In association, a pattern is discovered
based on a relationship between items in the same transaction. That’s is the reason why association
technique is also known as relation technique. The association technique is used in market basket
analysis to identify a set of products that customers frequently purchase together.

Retailers are using association technique to research customer’s buying habits. Based on historical
sale data, retailers might find out that customers always buy crisps when they buy beers, and,
106
`

therefore, they can put beers and crisps next to each other to save time for the customer and increase
sales.

Classification

Classification is a classic data mining technique based on machine learning. Basically, classification
is used to classify each item in a set of data into one of a predefined set of classes or groups.
Classification method makes use of mathematical techniques such as decision trees, linear
programming, neural network, and statistics. In classification, we develop the software that can learn
how to classify the data items into groups. For example, we can apply classification in the application
that “given all records of employees who left the company, predict who will probably leave the
company in a future period.” In this case, we divide the records of employees into two groups that
named “leave” and “stay”. And then we can ask our data mining software to classify the employees
into separate groups.

Clustering

Clustering is a data mining technique that makes a meaningful or useful cluster of objects which have
similar characteristics using the automatic technique. The clustering technique defines the classes
and puts objects in each class, while in the classification techniques, objects are assigned into
predefined classes. To make the concept clearer, we can take book management in the library as an
example. In a library, there is a wide range of books on various topics available. The challenge is
how to keep those books in a way that readers can take several books on a particular topic without
hassle. By using the clustering technique, we can keep books that have some kinds of similarities in
one cluster or one shelf and label it with a meaningful name. If readers want to grab books in that
topic, they would only have to go to that shelf instead of looking for the entire library.

Prediction

The prediction, as its name implied, is one of a data mining techniques that discovers the relationship
between independent variables and relationship between dependent and independent variables. For
instance, the prediction analysis technique can be used in the sale to predict profit for the future if we
consider the sale is an independent variable, profit could be a dependent variable. Then based on the
historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction.

107
`

Sequential Patterns

Sequential patterns analysis is one of data mining technique that seeks to discover or identify similar
patterns, regular events or trends in transaction data over a business period.

In sales, with historical transaction data, businesses can identify a set of items that customers buy
together different times in a year. Then businesses can use this information to recommend customers
buy it with better deals based on their purchasing frequency in the past.

Decision trees

A decision tree is one of the most commonly used data mining techniques because its model is easy
to understand for users. In decision tree technique, the root of the decision tree is a simple question
or condition that has multiple answers. Each answer then leads to a set of questions or conditions that
help us determine the data so that we can make the final decision based on it. For example, We use
the following decision tree to determine whether or not to play tennis:

Starting at the root node, if the outlook is overcast then we should definitely play tennis. If it is rainy,
we should only play tennis if the wind is the week. And if it is sunny then we should play tennis in
case the humidity is normal.

We often combine two or more of those data mining techniques together to form an appropriate
process that meets the business needs.

108
`

Data mining Examples:

Example 1:

Consider a marketing head of telecom service provides who wants to increase revenues of long
distance services. For high ROI on his sales and marketing efforts customer profiling is important.
He has a vast data pool of customer information like age, gender, income, credit history, etc. But its
impossible to determine characteristics of people who prefer long distance calls with manual analysis.
Using data mining techniques, he may uncover patterns between high long distance call users and
their characteristics.

For example, he might learn that his best customers are married females between the age of 45 and
54 who make more than $80,000 per year. Marketing efforts can be targeted to such demographic.

Example 2:

A bank wants to search new ways to increase revenues from its credit card operations. They want to
check whether usage would double if fees were halved.

Bank has multiple years of record on average credit card balances, payment amounts, credit limit
usage, and other key parameters. They create a model to check the impact of the proposed new
business policy. The data results show that cutting fees in half for a targeted customer base could
increase revenues by $10 million.

Data Mining Tools

Following are 2 popular Data Mining Tools widely used in Industry

R-language:

R language is an open-source tool for statistical computing and graphics. R has a wide variety of
statistical, classical statistical tests, time-series analysis, classification and graphical techniques. It
offers effective data handling and storage facility.

109
`

Oracle Data Mining:

Oracle Data Mining popularly knowns as ODM is a module of the Oracle Advanced Analytics
Database. This Data mining tool allows data analysts to generate detailed insights and makes
predictions. It helps predict customer behavior, develops customer profiles, identifies cross-selling
opportunities.

Benefits of Data Mining:

• Data mining technique helps companies to get knowledge-based information.


• Data mining helps organizations to make the profitable adjustments in operation and
production.
• The data mining is a cost-effective and efficient solution compared to other statistical data
applications.
• Data mining helps with the decision-making process.
• Facilitates automated prediction of trends and behaviors as well as automated discovery of
hidden patterns.
• It can be implemented in new systems as well as existing platforms
• It is the speedy process which makes it easy for the users to analyze huge amount of data in
less time.

Disadvantages of Data Mining

• There are chances of companies may sell useful information of their customers to other
companies for money. For example, American Express has sold credit card purchases of their
customers to the other companies.
• Many data mining analytics software is difficult to operate and requires advance training to
work on.
• Different data mining tools work in different manners due to different algorithms employed
in their design. Therefore, the selection of correct data mining tool is a very difficult task.
• The data mining techniques are not accurate, and so it can cause serious consequences in
certain conditions.

110
`

Data Mining Applications

Applications Usage

Communications Data mining techniques are used in communication sector to predict customer
behavior to offer highly targetted and relevant campaigns.

Insurance Data mining helps insurance companies to price their products profitable and
promote new offers to their new or existing customers.

Education Data mining benefits educators to access student data, predict achievement levels
and find students or groups of students which need extra attention. For example,
students who are weak in maths subject.

Manufacturing With the help of Data Mining Manufacturers can predict wear and tear of
production assets. They can anticipate maintenance which helps them reduce
them to minimize downtime.

Banking Data mining helps finance sector to get a view of market risks and manage
regulatory compliance. It helps banks to identify probable defaulters to decide
whether to issue credit cards, loans, etc.

Retail Data Mining techniques help retail malls and grocery stores identify and arrange
most sellable items in the most attentive positions. It helps store owners to comes
up with the offer which encourages customers to increase their spending.

Service Providers Service providers like mobile phone and utility industries use Data Mining to
predict the reasons when a customer leaves their company. They analyze billing
details, customer service interactions, complaints made to the company to assign
each customer a probability score and offers incentives.

111
`

E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-sells through
their websites. One of the most famous names is Amazon, who use Data mining
techniques to get more customers into their eCommerce store.

Super Markets Data Mining allows supermarket's develope rules to predict if their shoppers were
likely to be expecting. By evaluating their buying pattern, they could find woman
customers who are most likely pregnant. They can start targeting products like
baby powder, baby shop, diapers and so on.

Crime Data Mining helps crime investigation agencies to deploy police workforce
Investigation (where is a crime most likely to happen and when?), who to search at a border
crossing etc.

Bioinformatics Data Mining helps to mine biological data from massive datasets gathered in
biology and medicine.

Summary:

• Data Mining is all about explaining the past and predicting the future for analysis.
• Data mining helps to extract information from huge sets of data. It is the procedure of mining
knowledge from data.
• Data mining process includes business understanding, Data Understanding, Data Preparation,
Modelling, Evolution, Deployment.
• Important Data mining techniques are Classification, clustering, Regression, Association
rules, Outer detection, Sequential Patterns, and prediction
• R-language and Oracle Data mining are prominent data mining tools.
• Data mining technique helps companies to get knowledge-based information.
• The main drawback of data mining is that many analytics software is difficult to operate and
requires advance training to work on.
• Data mining is used in diverse industries such as Communications, Insurance, Education,
Manufacturing, Banking, Retail, Service providers, eCommerce, Supermarkets
Bioinformatics.
112
`

Data Mining Applications in Sales/Marketing

Data mining enables businesses to understand the hidden patterns inside historical purchasing
transaction data, thus helping in planning and launching new marketing campaigns in a prompt and
cost-effective way. The following illustrates several data mining applications in sale and marketing.

• Data mining is used for market basket analysis to provide information on what product
combinations were purchased together when they were bought and in what sequence. This
information helps businesses promote their most profitable products and maximize the
profit. In addition, it encourages customers to purchase related products that they may have
been missed or overlooked.
• Retail companies use data mining to identify customer’s behavior buying patterns.

Data Mining Applications in Banking / Finance

• Several data mining techniques e.g., distributed data mining have been researched, modeled
and developed to help credit card fraud detection.
• Data mining is used to identify customers loyalty by analyzing the data of customer’s
purchasing activities such as the data of frequency of purchase in a period of time, a total
monetary value of all purchases and when was the last purchase. After analyzing
those dimensions, the relative measure is generated for each customer. The higher of the
score, the more relative loyal the customer is.
• To help the bank to retain credit card customers, data mining is applied. By analyzing the
past data, data mining can help banks predict customers that likely to change their credit card
affiliation so they can plan and launch different special offers to retain those customers.
• Credit card spending by customer groups can be identified by using data mining.
• The hidden correlations between different financial indicators can be discovered by using data
mining.
• From historical market data, data mining enables to identify stock trading rules.

Data Mining Applications in Health Care and Insurance

The growth of the insurance industry entirely depends on the ability to convert data into the
knowledge, information or intelligence about customers, competitors, and its markets. Data mining
113
`

is applied in insurance industry lately but brought tremendous competitive advantages to the
companies who have implemented it successfully. The data mining applications in the insurance
industry are listed below:

• Data mining is applied in claims analysis such as identifying which medical procedures are
claimed together.
• Data mining enables to forecasts which customers will potentially purchase new policies.
• Data mining allows insurance companies to detect risky customers’ behavior patterns.
• Data mining helps detect fraudulent behaviour.

Data Mining Applications in Transportation

• Data mining helps determine the distribution schedules among warehouses and outlets and
analyze loading patterns.

Data Mining Applications in Medicine

• Data mining enables to characterize patient activities to see incoming office visits.
• Data mining helps identify the patterns of successful medical therapies for different illnesses.

Data mining applications are continuously developing in various industries to provide more hidden
knowledge that increases business efficiency and grows businesses.

114
`

Data Mining Vs Data Warehouse: Key Differences

Data Mining Data Warehouse

Data mining is the process of analysing unknown A data warehouse is database system which is
patterns of data. designed for analytical instead of transactional
work.

Data mining is a method of comparing large Data warehousing is a method of centralizing


amounts of data to finding right patterns. data from different sources into one common
repository.

Data mining is usually done by business users Data warehousing is a process which needs to
with the assistance of engineers. occur before any data mining can take place.

115
`

Data mining is the considered as a process of On the other hand, Data warehousing is the
extracting data from large data sets. process of pooling all relevant data together.

One of the most important benefits of data One of the pros of Data Warehouse is its ability
mining techniques is the detection and to update consistently. That's why it is ideal for
identification of errors in the system. the business owner who wants the best and latest
features.

Data mining helps to create suggestive patterns Data Warehouse adds an extra value to
of important factors. Like the buying habits of operational business systems like CRM systems
customers, products, sales. So that, companies when the warehouse is integrated.
can make the necessary adjustments in operation
and production.

The Data mining techniques are never 100% In the data warehouse, there is great chance that
accurate and may cause serious consequences in the data which was required for analysis by the
certain conditions. organization may not be integrated into the
warehouse. It can easily lead to loss of
information.

The information gathered based on Data Mining Data warehouses are created for a huge IT
by organizations can be misused against a group project. Therefore, it involves high maintenance
of people. system which can impact the revenue of medium
to small-scale organizations.

After successful initial queries, users may ask Data Warehouse is complicated to implement
more complicated queries which would increase and maintain.
the workload.

116
`

Organisations can benefit from this analytical Data warehouse stores a large amount of
tool by equipping pertinent and usable historical data which helps users to analyze
knowledge-based information. different time periods and trends for making
future predictions.

Organisations need to spend lots of their In Data warehouse, data is pooled from multiple
resources for training and Implementation sources. The data needs to be cleaned and
purpose. Moreover, data mining tools work in transformed. This could be a challenge.
different manners due to different algorithms
employed in their design.

The data mining methods are cost-effective and Data warehouse's responsibility is to simplify
efficient compares to other statistical data every type of business data. Most of the work that
applications. will be done on user's part is inputting the raw
data.

Another critical benefit of data mining Data warehouse allows users to access critical
techniques is the identification of errors which data from the number of sources in a single place.
can lead to losses. Generated data could be used Therefore, it saves user's time of retrieving data
to detect a drop-in sale. from multiple sources.

Data mining helps to generate actionable Once you input any information into Data
strategies built on data insights. warehouse system, you will unlikely to lose track
of this data again. You need to conduct a quick
search, helps you to find the right statistic
information.

117
`

Why use Data mining?

Some most important reasons for using Data mining are:

• Establish relevance and relationships amongst data. Use this information to generate
profitable insights
• Business can make informed decisions quickly
• Helps to find out unusual shopping patterns in grocery stores.
• Optimize website business by providing customize offers to each visitor.
• Helps to measure customer's response rates in business marketing.
• Creating and maintaining new customer groups for marketing purposes.
• Predict customer defections, like which customers are more likely to switch to another
supplier in the nearest future.
• Differentiate between profitable and unprofitable customers.
• Identify all kind of suspicious behaviour, as part of a fraud detection process.

118
`

119
`

What is ETL?

ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool extracts the data
from different RDBMS source systems then transforms the data like applying calculations,
concatenations, etc. and then load the data into the Data Warehouse system.

In ETL data is flows from the source to the target. In ETL process transformation engine takes care
of any data changes.

What is ELT?

ELT is a different method of looking at the tool approach to data movement. Instead of transforming
the data before it's written, ELT lets the target system to do the transformation. The data first copied
to the target and then transformed in place.

ELT usually used with no-Sql databases like Hadoop cluster, data appliance or cloud installation.

120
`

Difference between ETL vs. ELT

ETL and ELT process are different in following parameters:

Parameters ETL ELT

Process Data is transformed at staging server and then Data remains in the DB of the Data
transferred to Data warehouse DB. warehouse.

Code Usage Used for Used for High amounts of data

• Compute-intensive Transformations
• Small amount of data

Transformation Transformations are done in ETL server/staging Transformations are performed in the
area. target system

Time-Load Data first loaded into staging and later loaded into Data loaded into target system only
target system. Time intensive. once. Faster.

Time- ETL process needs to wait for transformation to In ELT process, speed is never
Transformation complete. As data size grows, transformation time dependant on the size of the data.
increases.

121
`

Parameters ETL ELT

Time- Maintenance It needs highs maintenance as you need to select Low maintenance as data is always
data to load and transform. available.

Implementation At an early stage, easier to implement. To implement ELT process


Complexity organization should have deep
knowledge of tools and expert skills.

Support for Data ETL model used for on-premises, relational and Used in scalable cloud infrastructure
warehouse structured data. which supports structured,
unstructured data sources.

Data Lake Support Does not support. Allows use of Data lake with
unstructured data.

Complexity The ETL process loads only the important data, as This process involves development
identified at design time. from the output-backward and
loading only relevant data.

Cost High costs for small and medium businesses. Low entry costs using online
Software as a Service Platforms.

Lookups In the ETL process, both facts and dimensions All data will be available because
need to be available in staging area. Extract and load occur in one single
action.

Aggregations Complexity increase with the additional amount of Power of the target platform can
data in the dataset. process significant amount of data
quickly.

Calculations Overwrites existing column or Need to append the Easily add the calculated column to
dataset and push to the target platform. the existing table.

Maturity The process is used for over two decades. It is well Relatively new concept and complex
documented and best practices easily available. to implement.

122
`

Parameters ETL ELT

Hardware Most tools have unique hardware requirements Being Saas hardware cost is not an
that are expensive. issue.

Support for Mostly supports relational data Support for unstructured data readily
Unstructured Data available.

Summary:

• ETL stands for Extract, Transform and Load while ELT stands for Extract, Load, Transform
• In ETL process data flows from the source to staging to the target.
• ELT lets the target system to do the transformation. No staging system involved.
• ELT address many a challenge of ELT but is expensive and requires niche skills to implement
and maintain.

123
`

Text & Web Mining

Web Mining:

Web mining is the process which includes various data mining techniques to extract knowledge from
web data categorized as web content, web structure and data usage. It includes a process of
discovering the useful and unknown information from the web data.

Web mining can be classified based on the following categories:

1. Web Content
2. Web Structure
3. Web Usage

Web Content Mining:

Web content mining is defined as the process of converting raw data to useful information using the
content of web page of a specified web site.

The process starts with the extraction of structured data or information from web pages and then
identifying similar data with integration. Various types of web content include text, audio, video etc.
This process is called as text mining.

Text Mining uses Natural Language processing and retrieving information techniques for a specific
mining process.

Web Structure Mining:

Web graphs include a typical structure which consists of web pages such as nodes and hyperlinks
which will be treated as edges connected between web pages. It includes a process of discovering a
specified structure with information from the web.

124
`

This category of mining can be performed either at document level or hyperlink level. The research
activity which involves hyperlink level is called hyperlink analysis.

Terminologies associated with Web structure:

1. Web graph: It is a directed graph which represents the web.


2. Node: Each web page includes a node of the web graph.
3. Link: Hyperlink is a type of directed edge of the web graph.
4. In-degree: In-degree specifies the number of distinct links that point to a specified node.
5. Out-degree: Out-degree specifies the number of distinct lakes originating at a node that points to
other nodes.
6. Directed path: Directed path includes a sequence of links starting from a specified node that can
be followed to reach another node.
7. Shortest Path: The shortest path will be the shortest length out of all the paths between p and q.
8. Diameter: The maximum of the shortest path between a pair of nodes p and q for all pairs of
nodes p and q in the web graph.

Web Usage Mining:

Web includes a collection of interrelated files with one or more web servers. It includes a pattern of
discovery of meaningful patterns of data generated by the client-server transaction.

The typical sources of data are mentioned below:


1. Data which is generated automatically is stored in server access logs, referrer logs, agent logs and
client-side cookies.
2. Information of user profiles.
3. Metadata which includes page attributes and content attributes.

Web server log:

Server logs created by the server record all activities. The page forwarded to the web server
includes every single piece of basic information about URL.

Text Mining:

125
`

The objective of text mining is to exploit information which is included in textual documents in
various patterns and trends in association with entities and predictive rules.
The results are manipulated and used for:
1. The analysis of a collection
2. Providing information about intelligent navigation and browsing method.

Data mining and Text Mining:

1. Both processes seek novel and useful pattern.


2. Data Mining and Text mining are semi automated process.
3. The basic difference is the nature of data. Structured data include databases and unstructured data
includes word documents, PDF and XML files.
4. Text Mining imposes a structure to the specified data.

Technology premise of Text Mining:

1. Summarization: It is the process of creating a summary of any document which consists of a


large amount of information while the theme or main idea of a document is maintained.
2. Information Extraction: It is the process of utilizing relations within the text format. It uses
pattern matching format.
3. Categorization: Categorization is the supervised learning technique which places the document
according to content. Document categorization is largely used in libraries.
4. Visualization: Visualization is computer graphics used to represent information and visualizing
relationships. It is beneficial to depict a more clearer output.
5. Clustering: Clustering involves document’s textual similarity based on the unsupervised
technique used for data analysis to divide the text into a manual exclusive group.
6. Question Answering: It includes natural language queries with questions and answers and
finding an appropriate solution from the list of patterns.
7. Sentiment Analysis: Sentiment analysis is also known as opinion mining which is configured
based on user’s emotion with various categories such as positive, negative, neutral and mixed. It is
used to get people’s view and attitude towards anything with services and products.

Conclusion: –
Text and data mining are considered as complementary techniques required for efficient business

126
`

management. Data mining and text mining tools have gathered its primary location in the
marketplace. Natural Language processing is a subset of text mining tools which is used to define
accurate and complete domain specific taxonomies. This helps in effective metadata association. Text
mining is more mature and efficient in comparison with data mining process. 80 percent of the
information is made of text.

127
`

128
`

129
`

130
`

131
`

132
`

133
`

134
`

135
`

Big Data Analysis: Challenges and Opportunities

Nowadays, big data has been attracting increasing attention from academia, industry and government.

Big data is defined as the dataset whose size is beyond the processing ability of traditional databases or
computers. Four

Four elements are emphasized in the definition, which are capture, storage, management, and analysis.

The focus of the four elements is the last stage, the big data analytics, which is about automatic extraction
of knowledge from a large amount of data.

Big data analysis can be seen as mining or processing of massive data, thereby information
from large dataset [29]. Big data analytics can be characterized by several properties, such
as large volume, variety of dierent sources, and fast increasing speed (velocity) [ 26]. It is
of great interest to investigate the role of evolutionary computation (EC) techniques,
including evolutionary algorithms and swarm intelligence for the optimization and learning
involving big data, in particular, the ability of EC techniques to solve large scale, dynamic,
and sometimes multiobjective big data analytics problems. Traditional methods for data
analysis are based mainly on mathematical models and data is then collected to “t the
models. With the growth of the variety of temporal data, these mathematical models may
become ineective in solving problems. The paradigm should shift from the model-driven to
the datadriven approach. The data-driven approach not only focuses on predicting what is
going to happen, but also concentrates on what is happening right now and how to be
prepared for future events. With the amount of data growing constantly and exponentially,
the current data processing tasks are beyond the computing ability of traditional
computational models. The data science, or more speci“cally, the big data analytics, has
received more and more attention from researchers. The data are easily generated and
gathered, while the volume of data is increasing very quickly. It exceeds the computational
136
`

capacity of current systems to validate, analyze, visualize, store, and extract information. To
analyze these massive data, there are several kinds of diculties, such as the large volume
of data, dynamical changes of data, data noise, etc. New and ecient algorithms should be
designed to handle massive data analytics problems.

Over the past few years, big data analytics has received increasing attention

With the rapid growth of emerging applications like social network, semantic web, sensor
networks and LBS (Location Based Service) applications, a variety of data to be
processed continues to witness a quick increase. Effective management and processing
of large-scale data poses an interesting but critical challenge. Recently, big data has
attracted a lot of attention from academia, industry as well as government. This paper
introduces several big data processing techniques from system and application aspects.
First, from the view of cloud data management and big data processing mechanisms, we
present the key issues of big data processing, including definition of big data, big data
management platform, big data service models, distributed file system, data storage, data
virtualization platform and distributed applications. Following the MapReduce parallel
processing framework, we introduce some MapReduce optimization strategies reported in
the literature. Finally, we discuss the open issues and challenges, and deeply explore the
research directions in the future on big data processing in cloud computing environments.

Big data analytics is the often complex process of examining large and varied data sets,
or big data, to uncover information -- such as hidden patterns, unknown correlations,
market trends and customer preferences -- that can help organizations make informed
business decisions.
big data analytics

Big data analytics is the often complex process of examining large and varied data sets,
or big data, to uncover information -- such as hidden patterns, unknown correlations,
market trends and customer preferences -- that can help organizations make informed
business decisions.

137
`

On a broad scale, data analytics technologies and techniques provide a means to analyze
data sets and draw conclusions about them which help organizations make informed
business decisions. Business intelligence (BI) queries answer basic questions about
business operations and performance.

Big data analytics is a form of advanced analytics, which involves complex applications
with elements such as predictive models, statistical algorithms and what-if analysis
powered by high-performance analytics systems.

The importance of big data analytics

Driven by specialized analytics systems and software, as well as high-powered computing


systems, big data analytics offers various business benefits, including:

• New revenue opportunities

• More effective marketing

• Better customer service

• Improved operational efficiency

• Competitive advantages over rivals

Big data analytics applications enable big data analysts, data scientists, predictive
modelers, statisticians and other analytics professionals to analyze growing volumes of
structured transaction data, plus other forms of data that are often left untapped by
conventional BI and analytics programs. This encompasses a mix of semi-
structured and unstructured data -- for example, internet clickstream data, web server logs,
social media content, text from customer emails and survey responses, mobile phone
records, and machine data captured by sensors connected to the internet of things (IoT).

TECHTARGET
Big data analytics is a form of advanced analytics, which has marked differences
compared to traditional BI.

138
`

Big data analytics technologies and tools

Unstructured and semi-structured data types typically don't fit well in traditional data
warehouses that are based on relational databases oriented to structured data sets.
Further, data warehouses may not be able to handle the processing demands posed by sets
of big data that need to be updated frequently or even continually, as in the case of real-
time data on stock trading, the online activities of website visitors or the performance of
mobile applications.

As a result, many of the organizations that collect, process and analyze big data turn
to NoSQL databases, as well as Hadoop and its companion data analytics tools, including:

• YARN: a cluster management technology and one of the key features in second-
generation Hadoop.

• MapReduce: a software framework that allows developers to write programs


that process massive amounts of unstructured data in parallel across a
distributed cluster of processors or stand-alone computers.

• Spark: an open source, parallel processing framework that enables users to


run large-scale data analytics applications across clustered systems.

• HBase: a column-oriented key/value data store built to run on top of the


Hadoop Distributed File System (HDFS).

• Hive: an open source data warehouse system for querying and analyzing large
data sets stored in Hadoop files.

• Kafka: a distributed publish/subscribe messaging system designed to replace


traditional message brokers.

• Pig: an open-source technology that offers a high-level mechanism for the


parallel programming of MapReduce jobs executed on Hadoop clusters.

139
`

How big data analytics works

In some cases, Hadoop clusters and NoSQL systems are used primarily as
landing pads and staging areas for data before it gets loaded into a data
warehouse or analytical database for analysis -- usually in a summarized form
that is more conducive to relational structures.

More frequently, however, big data analytics users are adopting the concept of a
Hadoop data lake that serves as the primary repository for incoming streams
of raw data. In such architectures, data can be analyzed directly in a Hadoop
cluster or run through a processing engine like Spark. As in data warehousing,
sound data management is a crucial first step in the big data analytics process.
Data being stored in the HDFS must be organized, configured and partitioned
properly to get good performance out of both extract, transform and load (ETL)
integration jobs and analytical queries.

Once the data is ready, it can be analyzed with the software commonly used
for advanced analytics processes. That includes tools for:

• data mining, which sift through data sets in search of patterns and
relationships;

• predictive analytics, which build models to forecast customer behavior and


other future developments;

• machine learning, which taps algorithms to analyze large data sets; and

• deep learning, a more advanced offshoot of machine learning.

Text mining and statistical analysis software can also play a role in the big data
analytics process, as can mainstream business intelligence software and data
visualization tools. For both ETL and analytics applications, queries can be written
in MapReduce, with programming languages such as R, Python, Scala, and SQL,
140
`

the standard languages for relational databases that are supported via SQL-on-
Hadoop technologies.

Big data analytics uses and challenges

Big data analytics applications often include data from both internal systems and
external sources, such as weather data or demographic data on consumers
compiled by third-party information services providers. In addition, streaming
analytics applications are becoming common in big data environments as users
look to perform real-time analytics on data fed into Hadoop systems through
stream processing engines, such as Spark, Flink and Storm.

See the four types of big data analytics and what


each is used for.

Early big data systems were mostly deployed on premises, particularly in large
organizations that collected, organized and analyzed massive amounts of data.
But cloud platform vendors, such as Amazon Web Services (AWS) and Microsoft,
have made it easier to set up and manage Hadoop clusters in the cloud, as have
Hadoop suppliers such as Cloudera-Hortonworks, which supports the distribution
of the big data framework on the AWS and Microsoft Azure clouds. Users can
now spin up clusters in the cloud, run them for as long as they need and then take
them offline with usage-based pricing that doesn't require ongoing software
licenses.

Big data has become increasingly beneficial in supply chain analytics. Big
supply chain analytics utilizes big data and quantitative methods to enhance
decision making processes across the supply chain. Specifically, big supply
chain analytics expands datasets for increased analysis that goes beyond the
141
`

traditional internal data found on enterprise resource planning (ERP) and supply
chain management (SCM) systems. Also, big supply chain analytics implements
highly effective statistical methods on new and existing data sources. The
insights gathered facilitate better informed and more effective decisions that
benefit and improve the supply chain.

Potential pitfalls of big data analytics initiatives include a lack of internal


analytics skills and the high cost of hiring experienced data scientists and data
engineers to fill the gaps.

Big data analytics involves analyzing structured and unstructured data.

Emergence and growth of big data analytics

The term big data was first used to refer to increasing data volumes in the mid-
1990s. In 2001, Doug Laney, then an analyst at consultancy Meta Group Inc.,
expanded the notion of big data to also include increases in the variety of data
being generated by organizations and the velocity at which that data was being
created and updated. Those three factors -- volume, velocity and variety --
became known as the 3Vs of big data, a concept Gartner popularized after
acquiring Meta Group and hiring Laney in 2005.

Separately, the Hadoop distributed processing framework was launched as


an Apache open source project in 2006, planting the seeds for a clustered
platform built on top of commodity hardware and geared to run big data
applications. By 2011, big data analytics began to take a firm hold in
organizations and the public eye, along with Hadoop and various related big
data technologies that had sprung up around it.

Initially, as the Hadoop ecosystem took shape and started to mature, big data
applications were primarily the province of large internet and e-
commerce companies such as Yahoo, Google and Facebook, as well as
analytics and marketing services providers. In the ensuing years, though, big
data analytics has increasingly been embraced by retailers, financial services
142
`

firms, insurers, healthcare organizations, manufacturers, energy companies and


other enterprises.

How big data analytics works

In some cases, Hadoop clusters and NoSQL systems are used primarily as landing pads and staging areas for data
before it gets loaded into a data warehouse or analytical database for analysis -- usually in a summarized form that
is more conducive to relational structures.

More frequently, however, big data analytics users are adopting the concept of a Hadoop data lake that serves as
the primary repository for incoming streams of raw data. In such architectures, data can be analyzed directly in a
Hadoop cluster or run through a processing engine like Spark. As in data warehousing, sound data management is
a crucial first step in the big data analytics process. Data being stored in the HDFS must be organized, configured
and partitioned properly to get good performance out of both extract, transform and load (ETL) integration jobs
and analytical queries.

Once the data is ready, it can be analyzed with the software commonly used for advanced analytics processes.
That includes tools for:

143
`

• data mining, which sift through data sets in search of patterns and relationships;

• predictive analytics, which build models to forecast customer behavior and other future developments;

• machine learning, which taps algorithms to analyze large data sets; and

• deep learning, a more advanced offshoot of machine learning.

Text mining and statistical analysis software can also play a role in the big data analytics process, as can
mainstream business intelligence software and data visualization tools. For both ETL and analytics applications,
queries can be written in MapReduce, with programming languages such as R, Python, Scala, and SQL, the
standard languages for relational databases that are supported via SQL-on-Hadoop technologies.

Big data analytics uses and challenges

Big data analytics applications often include data from both internal systems and external sources,
such as weather data or demographic data on consumers compiled by third-party information services
providers. In addition, streaming analytics applications are becoming common in big data environments
as users look to perform real-time analytics on data fed into Hadoop systems through stream
processing engines, such as Spark, Flink and Storm.

Early big data systems were mostly deployed on premises, particularly in large organizations that
collected, organized and analyzed massive amounts of data. But cloud platform vendors, such as
Amazon Web Services (AWS) and Microsoft, have made it easier to set up and manage Hadoop
clusters in the cloud, as have Hadoop suppliers such as Cloudera-Hortonworks, which supports the
distribution of the big data framework on the AWS and Microsoft Azure clouds. Users can now spin up
clusters in the cloud, run them for as long as they need and then take them offline with usage-based
pricing that doesn't require ongoing software licenses.

Big data has become increasingly beneficial in supply chain analytics. Big supply chain analytics
utilizes big data and quantitative methods to enhance decision making processes across the supply
chain. Specifically, big supply chain analytics expands datasets for increased analysis that goes
beyond the traditional internal data found on enterprise resource planning (ERP) and supply chain
management (SCM) systems. Also, big supply chain analytics implements highly effective statistical
methods on new and existing data sources. The insights gathered facilitate better informed and more
effective decisions that benefit and improve the supply chain.

Potential pitfalls of big data analytics initiatives include a lack of internal analytics skills and the high
cost of hiring experienced data scientists and data engineers to fill the gaps.
144
`

Emergence and growth of big data analytics

The term big data was first used to refer to increasing data volumes in the mid-1990s. In 2001, Doug
Laney, then an analyst at consultancy Meta Group Inc., expanded the notion of big data to also include
increases in the variety of data being generated by organizations and the velocity at which that data
was being created and updated. Those three factors -- volume, velocity and variety -- became known
as the 3Vs of big data, a concept Gartner popularized after acquiring Meta Group and hiring Laney in
2005.

Separately, the Hadoop distributed processing framework was launched as an Apache open source
project in 2006, planting the seeds for a clustered platform built on top of commodity hardware and
geared to run big data applications. By 2011, big data analytics began to take a firm hold in
organizations and the public eye, along with Hadoop and various related big data technologies that had
sprung up around it.

Initially, as the Hadoop ecosystem took shape and started to mature, big data applications were
primarily the province of large internet and e-commerce companies such as
Yahoo, Google and Facebook, as well as analytics and marketing services providers. In the ensuing

145
`

years, though, big data analytics has increasingly been embraced by retailers, financial services firms,
insurers, healthcare organizations, manufacturers, energy companies and other enterprises.

BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location

Based Service) applications, a variety of data to be processed continues to witness a quick increase.

146
`

Module – IV

Performance Management Cycle

Performance management involves much more than just assigning ratings. It is a


continuous cycle that involves:

• Planning work in advance so that expectations and goals can be set;


• Monitoring progress and performance continually;
• Developing the employee's ability to perform through training and work assignments;
• Rating periodically to summarize performance and,
• Rewarding good performance.

Business Performance Management Cycle

147
`

148
`

149
`

150
`

151
`

152
`

153
`

154
`

155
`

156
`

157
`

158
`

159
`

160
`

161
`

162
`

163
`

164
`

165
`

166
`

167
`

Sales and Marketing Analytics

At a time when competition is intensifying, and businesses require greater accountability


from their investments, it is imperative that marketers know what they are getting from their
marketing expenditures.
By enabling our clients to look at marketing performance both historically (actual) as well as
in the future (projections) our clients have the capability to:
Retrospective Analysis: − Determine marketing / non-marketing business drivers −
Understand carry-over effects of advertising − Identify synergies between media − Estimate
impacts of different marketing elements − Apportion non-attributable business results.
Assess cost effectiveness (ROI) by media type Prospective Analysis: − Forecast impacts
of marketing plans and other activities − Understand “possibility set” of potential business
outcomes − Evaluate risks associated with market uncertainties − Run “what if” scenarios
for planning purposes − Optimize spending and media mix − Assess likelihood to meet
business goals Strategic capabilities and services Mix and Attribution Modeling / Marketing
Mix Modeling (MMM) A statistical technique used to understand the individual and combined
contributions of each multichannel marketing investment to business results, and then
adjust plans accordingly. Primary sales drivers are identified, and the return on investment
for all advertising and marketing expenditures is determined. Strategic decision making is
fueled by true insight into every component of a campaign. Clients can determine the effect
and shelf-life of different creative vehicles (TV spots or online ads, for example), and how
placement impacts results.
Lift Modeling Identifies the most promotion-sensitive customers – ideal for the optimization
of targeted marketing applications. With the marketer’s budget constraints as a factor, lift
modeling can help determine the audience that will generate maximum revenue gains. ROI
Modeling Determines the return on investment (ROI) for all advertising and marketing
expenditures. Campaign Analytics Allows a business to effectively assess the value of
marketing investments at the product or regional level, and examine sub-populations within
a market and how they are responding to advertising or marketing campaigns. Customer
acquisition and retention efforts have a more directed path and a clearer understanding of
what approaches are working best. Offer Optimization Marketers can move away from a
“one size ts all” over strategy, and tailor incentives / approach plans for better results from

168
`

each customer segment. Marketing programs can be refined from multiple perspectives;
such as maximizing sales for a given budget or maximizing profit for a given sales target.
Forecasting and Scenario Simulation Predictive tools that help companies to assess the
performance of an offer or program before making major media investments. In conjunction
with other analytics tools (such as ROI and Lift modelling, for example), clients can chart a
more effective sales and marketing roadmap.

Marketing Analytics - What it is and why it matters

Marketing analytics comprises the processes and technologies that enable marketers to
evaluate the success of their marketing initiatives. This is accomplished by measuring
performance (e.g., blogging versus social media versus channel communications).
Marketing analytics uses important business metrics, such as ROI, marketing attribution and
overall marketing effectiveness. In other words, it tells you how your marketing programs
are really performing.

Marketing analytics gathers data from across all marketing channels and consolidates it into
a common marketing view. From this common view, you can extract analytical results that
can provide invaluable assistance in driving your marketing efforts forward.

169
`

Why marketing analytics is important

Over the years, as businesses expanded into new marketing categories, new technologies
were adopted to support them. Because each new technology was typically deployed in
isolation, the result was a hodgepodge of disconnected data environments.

Consequently, marketers often make decisions based on data from individual channels
(website metrics, for example), not taking into account the entire marketing picture. Social
media data alone is not enough. Web analytics data alone is not enough. And tools that look
at just a snapshot in time for a single channel are woefully inadequate. Marketing analytics,
by contrast, considers all marketing efforts across all channels over a span of time – which
is essential for sound decision making and effective, efficient program execution.

What you can do with marketing analytics

With marketing analytics, you can answer questions like these:

• How are our marketing initiatives performing today? How about in the long run? What can
we do to improve them?

• How do our marketing activities compare with our competitors’? Where are they spending
their time and money? Are they using channels that we aren’t using?

• What should we do next? Are our marketing resources properly allocated? Are we devoting
time and money to the right channels? How should we prioritize our investments for next
year?

Three steps to marketing analytics success

To reap the greatest rewards from marketing analytics, follow these three steps:

1. Use a balanced assortment of analytic techniques.

2. Assess your analytic capabilities, and fill in the gaps.

3. Act on what you learn.

170
`

Use a balanced assortment of analytic techniques

To get the most benefit from marketing analytics, you need an analytic assortment that is
balanced – that is, one that combines techniques for:

• Reporting on the past. By using marketing analytics to report on the past, you can answer
such questions as: Which campaign elements generated the most revenue last quarter?
How did email campaign A perform against direct mail campaign B? How many leads did
we generate from blog post C versus social media campaign D?

• Analysing the present. Marketing analytics enables you to determine how your marketing
initiatives are performing right now by answering questions like: How are our customers
engaging with us? Which channels do our most profitable customers prefer? Who is talking
about our brand on social media sites, and what are they saying?

• Predicting and/or influencing the future. Marketing analytics can also deliver data-driven
predictions that you can use to influence the future by answering such questions as: How
can we turn short-term wins into loyalty and ongoing engagement? How will adding 10 more
sales people in under-performing regions affect revenue? Which cities should we target next
using our current portfolio?

Assess your analytic capabilities, and fill in the gaps

Marketing organizations have access to a lot of different analytic capabilities in support of


various marketing goals, but if you’re like most, you probably don’t have all your bases
covered. Assessing your current analytic capabilities is a good next step. After all, it’s
important to know where you stand along the analytic spectrum, so you can identify where
the gaps are and start developing a strategy for filling them in.

For example, a marketing organization may already be collecting data from online and POS
transactions, but what about all the unstructured information from social media sources or
call-center logs? Such sources are a gold mine of information, and the technology for
converting unstructured data into actual insights that marketers can use exists today. As
such, a marketing organization may choose to plan and budget for adding analytic
capabilities that can fill that particular gap. Of course, if you’re not quite sure where to start,
171
`

well, that’s easy. Start where your needs are greatest, and fill in the gaps over time as new
needs arise.

Act on what you learn

There is absolutely no real value in all the information marketing analytics can give you
– unless you act on it. In a constant process of testing and learning, marketing analytics
enables you to improve your overall marketing program performance by, for example:

• Identifying channel deficiencies.

• Adjusting strategies and tactics as needed.

• Optimizing processes.

• Gaining customer insight.

Without the ability to test and evaluate the success of your marketing programs, you would
have no idea what was working and what wasn’t, when or if things needed to change, or
how. By the same token, if you use marketing analytics to evaluate success, but you do
nothing with that insight, then what is the point?

Applied holistically, marketing analytics allows for better, more successful marketing by
enabling you to close the loop as it relates to your marketing efforts and investments. For
example, marketing analytics can lead to better supply and demand planning, price
optimization, as well as robust lead nurturing and management, all of which leads to more
revenue and greater profitability. By more effectively managing leads and being able to tie
those leads to sales – which is known as closed-loop marketing analytics – you can see
which specific marketing initiatives are contributing to your bottom line.

172
`

HR Analytics

Definition: What is HR analytics?

HR analytics is the application of statistics, modeling, and analysis of employee-related


factors to improve business outcomes.

HR analytics is also often referred to as:

• People analytics
• Talent analytics
• Workforce analytics

The below graph provided by Google Trends shows search interest for these terms since
2004. Both the terms HR analytics and people analytics have grown in popularity and
continue to gain interest.

These terms are often used interchangeably, although some debate their differences.
Definitions of HR analytics tend to encompass a broader scope of data, while people
analytics and talent analytics refer to data points specific to people and their behavior. Some
prefer the term workforce analytics because of the growing tendency to automate tasks with
robots, which may be considered part of the workforce.

Overview

HR analytics enables HR professionals to make data-driven decisions to attract, manage,


and retain employees, which improves ROI. It helps leaders make decisions to create better
work environments and maximize employee productivity. It has a major impact on the
bottom-line when used effectively.

HR professionals gather data points across the organization from sources like:

• Employee surveys
• Telemetric Data
• Attendance records

173
`

• Multi-rater reviews
• Salary and promotion history
• Employee work history
• Demographic data
• Personality/temperament data
• Recruitment process
• Employee databases

HR leaders must align HR data and initiatives to the organization’s strategic goals. For
example, a tech company may want to improve collaboration across departments to
increase the number of innovative ideas built into their software. HR initiatives like shared
workspaces, company events, collaborative tools, and employee challenges can be
implemented to achieve this goal. To determine how successful initiatives are, HR analytics
can be utilized to examine correlations between initiatives and strategic goals.

Once data is gathered, HR analysts feed workforce data into sophisticated data models,
algorithms, and tools to gain actionable insights. These tools provide insights in the form of
dashboards, visualizations, and reports. An ongoing process should be put in place to
ensure continued improvement:

• Benchmark analysis
• Data-gathering
• Data-cleansing
• Analysis
• Evaluate goals and KPIs
• Create action plan based on analysis (continuously test new ideas)
• Execute on plan
• Streamline process

Financial Analytics

There is an increasing use of analytics in many organizations these days. Today’s


businesses need timely information that helps the business people to take important
174
`

decisions in business. Finance plays an important role in increasing the value of your
business. Finance is finding its way as an important business function and it overlaps with
analytics in many areas. Financial executives are finding out new ways in the field of finance
to increase the value of their organization.

Financial analytics is a concept that provides different views on the business’ financial data.
It helps give in-depth knowledge and take strategic actions against them to improve your
business’ overall performance. Financial analytics is a subset of BI & EPM and has an
impact on every aspect of your business. It plays a crucial role in calculating your business’
profit. It helps you answer every business question related to your business while letting
your forecast the future of your business.

What is Financial Analytics

Financial analytics is a field that gives different views of a company’s financial data. It helps
to gain in depth knowledge and take action against it to improve the performance of your
business. Financial analytics has its effect on all parts of your business. Financial analytics
plays a very important role in calculating the profit of a business. Financial analytics helps
you to answer all your business questions related to your business and also lets you to
forecast the future of your business.

Why Financial Analytics is important

• Today’s businesses need timely information that helps the business people to take
important decisions in business.
• Every business should have a sound financial planning and forecasting to leverage
the business.
• The emergence of new business model, the changing needs of the traditional
financial department and the advancement in technology have all led to the need for
financial analytics.
• Financial analytics helps in shaping up tomorrow’s business goals. You can also
improve the decision making strategies of your business.
• Financial analytics focuses on measuring and managing the tangible assets of an
organization such as cash, machinery and others
175
`

• It gives a deeper insight about the financial status of your business and improve the
profitability, cash flow and value of your business.
• Financial analytics will help in making smart decisions to increase the business
revenue and minimize the waste of the business
• Accounting, tax and other areas of finance are having data warehouse which is
combined with analytics to effectively run the business and achieve the goals faster.

There are four main reasons why financial analytics is becoming more important these days.
They are listed below

1. Business Models

There are three new business models which form the basis of financial analytics

• Business to Business
• Business to Consumer
• Business to Employee

2. Changing role of the financial department

Most of the finance functions are automatic and requires only fewer resources to manage
them. This enables the finance executives to concentrate more on the business goals rather
than just focusing on processing and reconciling transactions.

3. Business Processes

Businesses are becoming more complex these days due to the advancement of
technologies. Lot of questions arise in the mind of the business people. Analytics provide
the answers to all these questions. Financial analytics lets the managers and executives in
an organization to have access to more accurate and detailed financial information of the
organization. This strengthens the relationship of the employee inside the organization.

Here are few questions for which financial analytics can give you an answer

• What are the risks to which the business is exposed?

176
`

• How to enhance and extend the business processes to make them work more
effectively?
• Are the investments made in the right path?
• How is the profit of the product across different sales channels and customers?
• Which segment of the market is expected to bring more profit to the business in the
future?
• What are the factors that could affect the business in the future?

4. Integrated Analytics

These days companies use integrated financial analytics to face the competition in the
financial analytics market place. Because of using such integrated financial analytics
companies will be able to analyze and share the information to the sources inside and
outside the organization. Organizations should use integrated financial analytics to survive
in the new economy.

5. Role of the Data Warehouse

The data warehousing solutions mainly focus on important analytical components like data
stores, data marts and reporting applications. Data warehousing in the future will require
rich analytical capabilities. Smart decisions are easily made when the data and business
processes are integrated across all business functions in an organization.

Uses of Financial Analytics

Financial analytics helps a business to

• Understand the performance of an organization


• Measure and manage the value of tangible and intangible assets of an organization
• Manage the investments of the company
• Forecast the variations in the market
• Increase the functionalities of information systems
• Improve the business processes and profits

177
`

Important of Financial Analytics

• Today’s businesses require timely information for decision-making purposes


• Every company needs prudent financial planning and forecasting
• The diverse needs of the traditional financial department, and advancements in technology,
all point to the need for financial analytics.
• Financial analytics can help shape up the business’ future goals. It can help you improve
the decision-making strategies for your business.
• Financial analytics can help you focus on measuring and managing your business’ tangible
assets such as cash and equipment.
• It provides an in-depth insight into the organization’s financial status and improves the cash
flow, profitability, and business value.

Important financial analytics you need to know

In today’s data-driven world, analytics is critical for any business that wants to remain
competitive. Financial analytics can help you understand your business’ past and present
performance and make strategic decisions. Here are some of the critical financial analytics
that any company, size notwithstanding, should be implementing.
178
`

1. Predictive sales analytics

Sales revenue is critical for every business. As such, accurate sales projection has essential
strategic and technical implications for the organization. A predictive sales analytics involves
coming up with an informed sales forecast. There are many approaches to predicting sales,
such as the use of correlation analysis or use of past trends to forecast your sales. Predictive
sales analytics can help you plan and manage your business’ peaks and troughs.

2. Client profitability analytics

Every business needs to differentiate between clients that make them money and clients
that lose them money. Customer profitability typically falls within the 80/20 rule, where 20
percent of the clients account for 80 percent of the profits, and 20 percent of the clients
account for 80 percent of customer-related expenses. Understanding of which is vital.

By understanding your customers’ profitability, you will be able to analyze every client group
and gains useful insight. However, the greatest challenge to customer profitability analytics
comes in when you fail to analyze the client’s contribution to the organization.

3. Product profitability analytics

For organizations to remain competitive within an industry, organizations need to know


where they are making, and losing money. Product profitability analytics can help you
establish the profitability of every product rather than analyzing the business as a whole. To
do this, you need to assess each product individually. Product profitability analytics can also
help you establish profitability insights across the product range so you can make better
decisions and protect your profit and growth over time.

4. Cash flow analytics

You need a certain amount of cash to run the organization on a day-to-day basis. Cash
flow is the lifeblood of your business. Understanding cash flow is crucial for gauging the
health of the business. Cash flow analytics involves the use of real-time indicators like the
Working Capital Ratio and Cash Conversion Cycle. You can also predict cash flow using
tools like regression analysis. Besides helping with cash flow management and ensuring

179
`

that you have enough money for day-to-day operations, cash flow analytics can also help
you support a range of business functions.

5. Value-driven analytics

Most organizations have a sense of where they are going to and what they are hoping to
achieve. These goals can be formal and listed on a strategy map that pinpoints the business’
value drivers. These value drivers are the vital drivers that the organization needs to pull to
realize its strategic goals. Value driver analytics assesses these levers to ensure that they
can deliver the expected outcome.

6. Shareholder value analytics

The profits and losses, and their interpretation by analysts, investors, and the media can
influence your business’ performance on the stock market. Shareholder value analytics
calculates the value of the company by looking at the returns it is providing to shareholders.
In other words, it measures the financial repercussions of a strategy and reports how much
value the strategy in question is delivering to the shareholders. Shareholder value analytics
is used concurrently with profit and revenue analytics. You can use tools like Economic
Value Added (EVA) to measure the shareholder value analytics.

Conclusion

Financial analytics is a valuable tool that every organization, small and large, should use to
manage and measure its progress. Done right, it can help the organization adapt to the
trends that affect its operations.

Oracle Financial Analytics Software

One example of financial analytics software is Oracle. Oracle is one of the popular financial
analytics software programs in the market.

Oracle Financial Analytics helps to improve the financial performance through proper
information about the expenses and revenue of all the departments in the organization. It
increases the cash flow through proper maintenance of receivables, payables and inventory
management. It gives you timely financial reports which will help you to determine the
180
`

performance of your business. It also helps you to have a future forecast and plan your
budget well. Oracle Financial Analytics software will help to improve the financial health of
the business.

Features

This software has a lot of features that includes the following

• Fixed Assets Analytics – Manages and measures the assets life cycle
• Budgetary Control Analytics – It helps in preventing over spending through
effective monitoring of the budget and spending effectively
• General Ledger Analytics – Manage the financial performance of the company
through various factors
• Profitability Analytics – Helps in identifying what type of customers and which
channels drive more profit to the company
• Payables Analytics – Manage and monitor the cash of the payables department
• Receivables Analytics – Manage collections and have a check on the cash cycles
• Proactive Intelligence – This feature can send a signal about the issue to the
managers and executives of the organization which helps them to take immediate
action and solve the issue
• Pre-built data models and metrics – Oracle Financial Analytics has more than 100
metrics and models
• Out-of-the-box integration with ERP systems – It helps easy integration with ERP
systems at a less risk, low cost and lesser effort
• Oracle Financial Analytics for Oracle Fusion Applications – It helps you to learn
about the company’s past, present and future performance and will let you take smart
decisions.
• Powered by Oracle Business Intelligence Foundation – Produces high quality
reports and has a good dashboard and is highly scalable
• Exalytics Ready – It goes beyond the values of traditional data analytics and gives
deeper knowledge about the huge volume of data at the speed of thought

Documents used in Financial Analysis

Finance is the language of a business. The goals of a business are always defined in terms
of finance and the output is also measured in financial terms. Financial analytics involves
181
`

analyzing the data involved in financial statements. By this way it provides useful information
to the business owners and let them take better decisions.

Operational Analytics

High-performing companies will embed analytics directly into decision and operational
processes, and take advantage of machine-learning and other technologies to generate
insights in the millions per second rather than an “insight a week or month.”

The full benefit of analytics

Many organizations employ ad hoc analytics projects that use predictive analytics to help
them find meaningful insights from vast volumes of data. Some have dedicated teams of
data scientists conducting ongoing manual analytics. Chances are, these analytic projects
are discovering previously unknown and valuable insights about the organization and its
business. However, many companies are struggling to see the impact of these analytic
projects on widespread organizational outcomes.

This gap between analytic projects and business impact is driven less by the quality of the
analytic methods than the inherent business ecosystem, culture resistance to change, and
suboptimal processes supporting integration of insights into business operations and
applications. It is no longer sufficient to produce robust analytics. Organizations that want
to see measurable business results from analytics must focus on embedding analytics and
insights into day-to-day operations to enable analytically driven decision-making — fast,
automated, and operational.

High-performing companies embed analytics directly into decision and operational


processes, and take advantage of machine-learning and other technologies to generate
insights in the millions per second rather than an “insight a week or month.”

The details on Operational Analytics

Operational Analytics is the interoperation of multiple disciplines that support the seamless
flow from initial analytic discovery to embedding predictive analytics into business
operations, applications, and machines. The impacts of these analytics are then measured,
monitored, and further analyzed to circle back to new analytic discoveries in a continuous
improvement loop, much like a fully matured industrial process.

182
`

However, the analytics field has not seen this type of industrial rigor around moving analytics
into business operations. Organizations that wish to achieve competitive advantage through
analytics need to cross the chasm between traditional ad hoc analytics and Operational
Analytics.

183
`

Analytics in Telecom Industry

Telecom industry is one which not only sees a large customer base, but a customer base
who’s needs and desires are constantly evolving and/or shifting. On top of this, telecom
firms face cut throat competition, making it a highly dynamic and challenging industry. In
such a scenario, each decision that a telecom firm takes becomes all the more crucial. It is
hence imperative for the firm to take decisions based on extensive data analytics so as to
ensure efficient and effective use of business resources. Although analytics can be
instrumental in the telecom industry in many ways, some of the major applications include:

Customer retention/improving customer loyalty:


With neck to neck competition between the numerous players in this industry, customer
retention is of top notch importance. Telecom is now much more than making calls and
analytical tools can help firms to identify cross selling opportunities and take crucial
decisions to retain the customer. Analytics can also help in identifying trends in customer
behavior to predict customer churn and apprise the decision makers to take suitable action
to prevent the same. When dealing with a large customer base, marketing across the board
would be expensive and ineffective. Hence, analytics can help in better channelizing
marketing efforts, such as identify target group and/or region to launch pilot projects, so that
the firm has better return on its marketing investment.

Network optimization:
It is very crucial for telecom operators to ensure that all its customers are able to avail its
products at all times. At the same time, firms need to be frugal when it comes to allocating
resources to network, because any unused capacity is a waste of resource. Analytics help
in better monitoring traffic and in facilitating capacity planning decisions. Analytical tools
leverage data collected through day to day transactions and help in both short term
optimization decisions and long term strategic decision making.

Predictive analytics:
The traditional method of initiating something new based on gut feeling or intuition has been
long outdated. With the use of predictive analytics in all their business departments - telecom

184
`

operators can predict the approximate success rate of a new scheme based on the past
preferences of customers. This will in turn provide telecom operators with great strategic
advantage. Predictive analytics helps in targeting the right customer at the right time based
on their past behavior and choices. It also helps in boosting revenue by proper planning and
reducing the operational costs in the long term. Lastly, staying advance always gives a
competitive edge which is of utmost importance in this highly competitive telecom industry.

Social Analytics:
The branding of telecom operators on social media plays a crucial part in customer gain and
retention. Data generated through social media can be interpreted into meaningful insights
using social analytical tools. The customer sentiment analysis, customer experience and
positioning of the company can be analyzed to make the customer experience richer and
smoother. Also data generated through such platform are more diverse both geographically
and demographically and hence helps in gaining a closer to reality customer information.

We thus see how the powerful analytical tools can act as a catalyst for business growth
providing a layout for proper planning and optimization of resources. Also, with the spectrum
prices touching skies, analytics can help in developing a sustainable business model with
limited resources. Not only the present but also the future trends can be seen through
analytics. Hence, such powerful tools should be leveraged now to gain long term uplift in
business profit.

185
`

Data Analytics in Retail Industry

With the retail market getting more and more competitive by the day, there has never been
anything more important than the ability for optimizing service business processes when
trying to satisfy the expectations of customers. Channelizing and managing data with the
aim of working in favor of the customer as well as generating profits is very significant for
survival.

For big retail players all over the world, data analytics is applied more these days at all
stages of the retail process – taking track of popular products that are emerging, doing
forecasts of sales and future demand via predictive simulation, optimizing placements of
products and offers through heat-mapping of customers and many others. With this,
identifying customers who would likely be interested in certain products depending on their
past purchases, finding the most suitable way to handle them via targeted marketing
strategies and then coming up with what to sell next is what data analytics deals with.

Strategic Areas in Data Analytics for Retailers

There are some strategic areas where retail players identify a ready use as far as it is
data analytics. Here are a few of those areas:1.) Price Optimization

Of course, data analytics plays a very important role in price determination. Algorithms
perform several functions like tracking demand, inventory levels and activities of
competitors, and respond automatically to market challenges in real time, which make
actions to be taken depending on insights safe manner. Price optimization helps to
determine when prices are to be dropped which is popularly known as ‘markdown
optimization.’ Before analytics was used, retailers would just bring down prices after a
buying season ends for a certain product line, when the demand is diminishing. Meanwhile,
analytics shows that a gradual price reduction from when demand starts sagging would lead
to increase in revenues. The US retail Stage Stores found this out by performing some
experiments and was backed by a predictive approach for determining the rise and fall of
demand for a certain product which beats the conventional end of season sale.

186
`

Retail giants like Walmart, spend millions merchandising systems on their real time with the
aim of building the world’s largest private cloud so as to track millions of transactions as
they happen daily. As stated earlier, algorithms perform this function and others.

2.) Future performance prediction

This is another important area when looking into data analytics in retail industry since every
customer interaction has a very big impact on both potential and existing relationships.
Dishing out the full idea to the full sales force personnel might be risky because making a
wrong decision could result in an immediate or prolonged loss. Rather, top business
organizations have discovered the best way to contain cause-and–effect relationship
between key performance and strategic shift indicators by using a test-and-learn approach.
This is carried out by customers or reps who compare the performance of the test group to
performance of a well-matched control group. This is the data science involved behind the
study.

3.) To accommodate small-scale retailers

Data analytics in retail is important for small-scale retailers, who can get assistance from
platforms who provide the services. Apart from this, there are organizations, mainly start-
ups, who offer social analytics to create the awareness of products on social media.
Therefore, small-scale businesses can take the advantage of data analytics retail without
spending too much in order to avoid hurting their finances.

4.) Demand prediction

The moment retailers get a real understanding of customers buying trends, the focus on
areas that would have high demand. It involves gathering seasonal, demographical,
occasions led data and economic indicators so as to create a good image of purchase
behavior across target market. This is very good for inventory management.

5.) Pick out the highest Return on Investment (ROI) Opportunities

Retailers use data-driven intelligence and predictive risk filters after having a good
understanding of their potential and existing customer base, for modeling expected

187
`

responses for marketing campaigns, depending on how they are measured by a propensity
to buy or likely buy.

6.) Forecasting trends

Retailers, nowadays have several advanced tools at their disposal to have an understanding
of the current trends. Algorithms that forecast trends go via the buying data to analyze what
needs to be promoted by marketing departments and what is not needed to be promoted.

7.) Identifying customers

This is also important in data analytics retail because choosing which customers would likely
desire a certain product, data analytics is the best way to go about it. Because of this, most
retailers rely so much on recommendation engine technology online, data gotten via
transactional records and loyalty programs online and offline. Companies like Amazon might
not be ready ship products straight to the customer’s before they order; they are looking in
that direction. Individual geographic areas depend on demographics that they have on their
customers which imply that demand is forecast. Therefore, it means that when they get
orders, they are able to fulfill them more efficiently and quickly while data gotten depicted
how customers make contact with retailers is used for deciding which would be the best
path in getting their attention on a certain product or

Role of Data Analytics in Retail Industry

Other key areas where data analytics play a key role are:

a.) Discount Efficiency

Almost 95% of shoppers have admitted that they use a coupon code when they do shopping.
For retailers to gain from offers, they need to first ask themselves how valuable such deal
would be their business. Such promotional deals definitely will get customers rush in but
might not be an effective strategy to sustain a long-term customer loyalty. Rather, retailers
can run analysis on historical data and utilize it in predictive modeling for determining the
impact such offers would have on a long-term basis. For instance, a team of data analysts
and scientists can make a history of events that might have occurred if there was no
discount. They then make a comparison of this with the real events when there were
188
`

discounts to have a better understanding of the effectiveness of each discount. After getting
this knowledge, the retailer will now readjust his discount strategy by increasing the number
of discounts on various categories and removing less profitable deals. This would certainly
boost the average monthly revenue.

b.) Churn Rate Reduction

The creation of customer loyalty is the main priority among all brands because the cost of
attracting a new customer is more than six times expensive than retaining the existing ones.
It is possible to represent churn rate in various like percent of customers lost, the number of
customers lost, percent of recurring value lost and value of recurring business lost. With the
help of big data analytics, insights got like things customers are likely to churn, retailers can
find it easy in determining the best way to alter their overall subscriptions to prevent such
scenarios. For instance, a retailer takes an analysis of customer data after a monthly
subscription box and can use it to get new subscribers who might likely end up as long term
customers. This would result to the retailer decreasing the monthly churn significantly and
would make brands be able to calculate lifetime value and make money back on marketing
costs that are steep.

c.) Product Sell-through rate

Products that are data related can be analyzed by retailers to find what pricing, visuals, and
terminology will resonate with the potential and existing customers. An alteration of the
product showcase depending on the data sets that are analyzed, retailers will obtain
improved sales rate. Take, for instance, Uber’s whole business model depends on big
analytics for sourcing of crowd and sell-through of products. With customers’ personal data,
Uber is able to match them with the most suitable drivers depending on the location and
rating of their customers. Customers, therefore because of such personalized experience,
would prefer to take advantage of Uber’s personalized offers against offers by competitors
of Uber or even regular taxis.

Getting the right customers to stores is very important too, something a US department store
giant recently discovered. Because of the way their analytics showed a dearth in vital
“millennials” demographic groups, their One Below basement was opened at their New York
flagship store. Promotions such as “selfie walls” and while-you-wait customized 3D-printed
189
`

smartphone cases were offered. All these were just ideas for attracting young customers to
their store with the aim of giving them an awesome experience.

Opportunities in Retail Analytics

Also, there are several opportunities in retail analytics:

1.) The promise of big data

Yearly, retail data is on the increase, exponentially in variety, volume, value, and velocity
every year. Retailers who are smart know that each interaction holds a potential for profit.
So much profit.

A report in 2011 states that retailers who use big data analytics could increase their
operating margins by as much as 60 percent. This, therefore, has created the need for the
data scientist whose job is to make big data (structured or unstructured, external or internal)
clear. This is to help retailers take actions that will help them increase sales while costs are
reduced.

2.) Marketing

• Online behavioral analysis and web analytics that create tailored offers.
• Personalized and location-based offers on mobile devices.
• Targeted campaigns that use analytics for segmenting consumers, identifying the best
channels and eventually achieving an optimal return on investment.
• “Second by second” metrics used by real-time pricing.

3.) Customer Experience

• Multi-level reward programs and personalized recommendations that depend on online


data purchase preference, smartphones apps, etc.
• Sentiment analysis of call center records, social media streams, product reviews and
many others for market insights and customer feedbacks.
• Predictive analytics for customer experience enhancement on all devices and channels,
online and offline.

190
`

4.) Merchandizing

• Detailed market basket analysis that yields more rapid growth in revenue.
• Identifying shopping trends and cross-selling opportunities with the aid of video data
analysis.
• Rise in daily profits via a combination of external and internal data such as seasonal and
holiday trends, economic forecasts, traffic and weather reports.

5.) Omni-Experience

The main aim is a streamlined and seamless experience for everyone involved. This is from
when the product leaves the manufacturer, to the store floor or warehouse, to it being
purchased, the retailer wants maximum efficiency in all departments.

It is no longer news that the retail industry has gone through a lot of operational changes
over the years due to data analytics in retail industry. The solutions of big data analytics in
retail industry have played an important role in bringing about these changes. Therefore,
the adoption of these analytics solutions is growing rapidly making more retailers work
tirelessly in order to enhance supply chain operations, improve on marketing campaigns
and raise the satisfaction of customer as well as achieves a high success rate in retailing.

Challenges

A lot of issues would be acknowledged to optimize data analytics in retail industry full
capacity. Factors such as security, privacy, liability policies and intellectual property have to
be stringent when talking about analytics. Analytics and big data are inter-related and
therefore professionals who are specially trained would need to be included in the team so
as to functionalize and utilize big data analytics.

Also, companies would find it pertinent to incorporate information from various sources of
data, mainly from third parties, and aid such environment by deploying efficient data.

Finally, companies often make the mistake of falling in short-sightedness, making them fail
in implementing the insights gotten from analytics. Of course, this could be fixed by
continuously altering retail styles where a particular team is given the task of arranging and
implementing insights.
191
`

Conclusion

Retailing has gotten to the platform for more disruption that is data-driven because data
quality gotten from several sources such as social network conversations, internet
purchases, location-specific interactions from smartphones have transformed into a new
entity for transactions that are digital based.

The benefits that organizations would reap from utilizing data analytics are better risk
management, improved performance and being able to discover insights that may have
been hidden.

With the big return of interest data analytics delivers in the retail industry, most retailers will
continue to utilize solutions so as to increase customer loyalty sustenance, boost the
perception of their brand and improve promoter scores. Data analytics retail allows retailers
and organizations gather information on their customers, how to reach them and how they
can use their needs to impact sales. As technology continues to dominate retail industry,
one thing is certain – data analytics is here to stay!

What Is Healthcare Analytics?

Healthcare analytics is the branch of analysis that focuses on offering insights into hospital
management, patient records, costs, diagnoses, and more. The field covers a broad swath
of the healthcare industry, offering insights on both the macro and micro level.

When combined with business intelligence suites and data visualization tools, healthcare
analytics helps managers operate better by providing real-time information that can support
decisions and deliver actionable insights.

For hospital and healthcare managers, healthcare analytics provide a combination of


financial and administrative data alongside information that can aid patient care efforts,
better services, and improve existing procedures.

192
`

Healthcare BI suites tend to emphasize broad categories of data for collection and parsing:
costs and claims, research and development, clinical data alongside patient behavior and
sentiment.

How Does It Help?


Digitization in the healthcare industry is allowing the organizations to cater to the needs of
the patients judiciously and with more attention. There can be significant benefits on the
way to realize like patient-centric treatment and medication, detection of diseases earlier
than before, patient satisfaction guaranteed, reducing cost in healthcare management and
for the patient.

There is a huge amount of data in the healthcare industry relating to patient details, disease
types, legal paperwork and compliance documents. This treasure builds upon the base for
descriptive and predictive analysis. Owing to which the solution provider creates the clinical
data thus required.

ROLE OF ANALYTICS IN THE HEALTH CARE SECTOR

Health care industry is one of the world’s fastest -growing industries; consuming
over 10 percent of gross domestic product (GDP) of most developed nations.
Health care can form an enormous part of a country’s economy. The World Health
Organization estimates there are 9.2 million physicians, 19.4 million nurses and
midwives, 1.9 million dentists and other dentistry personnel, 2.6 million
pharmacists and other pharmaceutical personnel, and over 1.3 million community
health workers worldwide, making the health care industry one of the largest
segments of the workforce. Obviously, such an industry also creates voluminous
data in respect of patients, diseases, diagnosis, medicines, research etc..

Indian healthcare industry is engaged in generating zettabytes (1021 gigabytes) of


data every day by capturing patient care records, prescriptions, diagnostic tests,
insurance claims, equipment generated data for monitoring vital signs and most
importantly the medical research.

193
`

According to a industry report, California-based managed care consortium Kaiser


Permanente is believed to have between 26.5 – 44 petabytes (1,000,000
gigabytes) of potentially rich data from EHRs.

In the healthcare industry, information can have a profound impact on improving


patient care and quality of life, while controlling escalating costs.

Some of the key issues that Healthcare companies face to gain competitive
advantage, in the market place, are as follows:

• What are the best ways to contain costs without reducing the level and quality of care?
• Can the companies reduce operating costs, perform dynamic budgeting and
forecasting, and improve overall profitability?
• How can doctors, patients, providers and payers work together to improve outcomes?
• Are there ways to perform faster diagnoses and understand different treatment
patterns to save and improve more lives?
• Are patients satisfied with their level of care?
• How do the companies compel individuals to become advocates for their own health?

Using data analytics provides the means to find answers to the issues facing the
health care industry. Following are some of the services that form the underlying
layer of analytics in the Healthcare Industry:

• Clinical Data Management (CDM) Services – that addresses the need for records of
patient history, treatment responses, number of patients and income from each of
them, mapping of symptoms and drugs etc.
• Promotional Spend Compliance – that addresses the customers’ need to adhere to
national and sub national healthcare regulations and similar laws worldwide. This
solution is strengthened with analytics capability for sales per spend analysis,
competitive returns bench-marking and spend fraud pattern detection capability.
• Social Media Intelligence/Analytics – that helps identify deep business insights on
treatment effectiveness, peer physician preferences, key opinion leader’s
recommendations, disease spread and concentration etc. using data and opinion from

194
`

social media. Amongst the end results is enhanced treatment effectiveness and
improved clinical trial results.
• Service Analytics for Medical Devices – that ensures monitoring and tracking of all
after sales and service related processes, offers insights into supplies management
and ensures proactive as well as preventive maintenance. The solution is aimed at the
effective management of the services business of medical devices.
• Predictive Asset Analytics of Medical Devices – that enables smoother operations
by reducing the risk of medical device failure and by ensuring that the medical devices
run at optimal performance.
• Trade Promotion Optimization – that enables trade promotion managers to model,
optimize, forecast, budget, execute, manage and measure the trade promotion spends
for consumer, health and OTC products.

Predictive Analytics Solutions

Defining Predictive Analytics in Healthcare

Predictive analytics and machine learning in healthcare are rapidly becoming some of the
most-discussed, perhaps most-hyped topics in healthcare analytics. Machine learning is a
well-studied discipline with a long history of success in many industries. Healthcare can
learn valuable lessons from this previous success to jumpstart the utility of predictive
analytics for improving patient care, chronic disease management, hospital administration,
and supply chain efficiencies. The opportunity that currently exists for healthcare systems
is to define what “predictive analytics” means to them and how can it be used most
effectively to make improvements.

However, predictions made solely for the sake of making a prediction are a waste of time
and money. In healthcare and other industries, prediction is most useful when that
knowledge can be transferred into action. The willingness to intervene is the key to
harnessing the power of historical and real-time data. Importantly, to best gauge efficacy
and value, both the predictor and the intervention must be integrated within the same system
and workflow where the trend occurs.

195
`

The following Health Catalyst® paper, “Using Predictive Analytics in


Healthcare: Technology Hype vs Reality” is a good summary of both the hype and hope of
predictive analytics in healthcare

How To Get Started With Predictive Analytics and Machine Learning

Given the many pitfalls to avoid in healthcare predictive analytics, then where do you get
started? The most important starting point is to establish a fundamental data and analytic
infrastructure upon which to build. Deliberately but quickly move your organization up the
levels of the Healthcare Analytics Adoption Model. This model draws upon lessons learned
from the HIMSS EHR Adoption Model and describes a similar approach for assessing the
adoption of analytics in healthcare. This model starts a level 1 foundation of an integrated,
enterprise data warehouse combined with a basic set of foundational and discovery analytic
applications.

1. Start With an Integrated Data Warehouse and Analytics Platform

Enterprise Data Warehouse

You need data across the entire continuum of care to manage patient populations. This
requires an enterprise data warehouse (EDW) platform. An EDW is the central platform
upon which you can build a scalable analytics approach to systematically integrate and
make sense of the data.

Health Catalyst® deploys a unique Late-Binding™ Data Warehouse that enables


healthcare organizations to automate extraction, aggregation, and integration of clinical,
financial, administrative, patient experience, and other relevant data and apply advanced
analytics to organize and measure clinical, patient safety, cost, and patient
satisfaction processes and outcomes.

2. Use the Three Basic Steps of Predictive Modelling

196
`

The following is a simple schematic of the predictive modeling process. For predictive
analytics to be effective, Lean practitioners must truly “live the process” to best understand
the type of data, the actual workflow, the target audience and what action will be prompted
by knowing the prediction.

1. The first step is to carefully define the problem you want to address, then gather the initial
data necessary and evaluate several different algorithm approaches.
2. Step two refines this process by selecting one of the best performing models and testing
with a separate data set to validate the approach.
3. The final step is to run the model in a real world setting.

The more specific term is prescriptive analytics, which includes evidence, recommendations
and actions for each predicted category or outcome. Specifically, prediction should link
carefully to clinical priorities and measurable events such as cost effectiveness, clinical
protocols or patient outcomes. Finally, these predictor-intervention sets are best evaluated
within that same data warehouse environment.

So many options exist when it comes to developing predictive algorithms or stratifying


patient risk. This presents a daunting challenge to health care personnel tasked with sorting
through all the buzzwords and marketing noise. Healthcare providers need to partner with
groups that have a keen understanding of the leading academic and commercial tools, and
the expertise to develop appropriate prediction models.

197
`

Follow 4 Key Lessons Learned for Adopting Predictive Analytics and Machine
Learning in Healthcare

Given that predictive analytics are listed as level 7 out of the 8 possible levels on the
Healthcare Analytics Adoption Model, there are many keys and pitfalls that can occur at
such a level if not properly prepared. Fortunately for healthcare, there are numerous existing
models from other industries that can be combined with past healthcare examples to ease
some of the potential pains and pitfalls. Highlights of some those key lessons include:

1. Don’t confuse more data with more insight: While many solid scientific findings may be
interesting, they do little to significantly improve current clinical outcomes.
2. Don’t confuse insight with value: While many solid scientific findings may be interesting,
they do little to significantly improve current clinical outcomes.
3. Don’t overestimate the ability to interpret the data: Sometimes even the best data may afford
only limited insight into clinical health outcomes.
4. Don’t underestimate the challenge of implementation: Leveraging large data sets
successfully requires a health system to be prepared to embrace new methodologies; this,
however, may require a significant investment of time and capital and alignment of economic
interests.

The following Health Catalyst Executive Report, “4 Essential Lessons for Adopting
Predictive Analytics in Healthcare” expounds more in detail around each of these 4 lessons:

In order to be successful, we feel that clinical event prediction and subsequent intervention
should be both content driven and clinician driven. Importantly, the underlying data
warehouse platform is key to gathering rich data sets necessary for training and
implementing predictors. Notably, prediction should be used in the context of when and
where needed—with clinical leaders that have the willingness to act on appropriate
intervention measures.

In the end, the overall goal is to leverage historical patient data to improve current patient
outcomes. Predictive analytics is a powerful tool in this regard.

198
`

Health Catalyst Predictive Analytics and Machine Learning Solutions

Health Catalyst not only has the expertise to develop machine learning models, but our
underlying healthcare analytics platform is key to gathering the rich data sets necessary for
training and implementing predictors. Notably, our prediction is only used “in context”—
meaning when and where needed, with clinical leaders that have the willingness to act on
appropriate intervention measures. Most important, however, these predictor-intervention
sets can best be monitored and measured within that same data warehouse environment
where otherwise not possible. Health Catalyst’s new machine learning solution makes
machine learning in healthcare routine, actionable, and pervasive through three avenues:

• catalyst.ai™—our machine learning models and strategy for building machine learning into
all Health Catalyst products.
• healthcare.ai™—our way of stimulating the adoption of machine learning in healthcare
through free, open-source machine learning software that democratizes machine learning
by lowering barriers to entry.
• Healthcare analytics platform—the second-to-none backbone (foundation) for machine
learning.

Within Health Catalyst, data modeling and algorithm development is performed using
industry leading tools for data mining and supervised machine learning via our open-
source R and Python packages. Ongoing efforts include classification models for a
generalized predictor of hospital readmissions, heart failure, length of stay, and clustering
of patient outcomes to historical cohorts at time of admit. Most importantly, we have internal
access to millions of de-identified hospital records in both the inpatient and outpatient
settings and adult and pediatric populations. This training data is crucial to addressing the
predictive analytics and machine learning demands of clients and site customization.

We have a number of analytic applications that can be used in predictive analytics and
machine learning initiatives, including CLABSI, Labor Management
Explorer, COPD, Patient Flow Explorer. So, when your request comes—whether it involves
classification or clustering or feature selection—Health Catalyst has the tools and the data
and the expertise to successfully deliver top performing predictive analytics. If you have
interest or questions on any of these applications, feel free to contact us or schedule a
demo by filling out our online form.
199
`

Business Analytics - IV Module

Performance Management Cycle

Performance management involves much more than just assigning ratings. It is a


continuous cycle that involves:

• Planning work in advance so that expectations and goals can be set;


• Monitoring progress and performance continually;
• Developing the employee's ability to perform through training and work assignments;
• Rating periodically to summarize performance and,
• Rewarding good performance.

Business Performance Management Cycle

200
`

201
`

202
`

203
`

204
`

205
`

206
`

207
`

208
`

209
`

210
`

211
`

212
`

213
`

214
`

215
`

216
`

217
`

218
`

219
`

220
`

Sales and Marketing Analytics

At a time when competition is intensifying, and businesses require greater accountability


from their investments, it is imperative that marketers know what they are getting from their
marketing expenditures.
By enabling our clients to look at marketing performance both historically (actual) as well as
in the future (projections) our clients have the capability to:
Retrospective Analysis: − Determine marketing / non-marketing business drivers −
Understand carry-over effects of advertising − Identify synergies between media − Estimate
impacts of different marketing elements − Apportion non-attributable business results.
Assess cost effectiveness (ROI) by media type Prospective Analysis: − Forecast impacts
of marketing plans and other activities − Understand “possibility set” of potential business
outcomes − Evaluate risks associated with market uncertainties − Run “what if” scenarios
for planning purposes − Optimize spending and media mix − Assess likelihood to meet
business goals Strategic capabilities and services Mix and Attribution Modeling / Marketing
Mix Modeling (MMM) A statistical technique used to understand the individual and combined
contributions of each multichannel marketing investment to business results, and then
adjust plans accordingly. Primary sales drivers are identified, and the return on investment
for all advertising and marketing expenditures is determined. Strategic decision making is
fueled by true insight into every component of a campaign. Clients can determine the effect
and shelf-life of different creative vehicles (TV spots or online ads, for example), and how
placement impacts results.
Lift Modeling Identifies the most promotion-sensitive customers – ideal for the optimization
of targeted marketing applications. With the marketer’s budget constraints as a factor, lift
modeling can help determine the audience that will generate maximum revenue gains. ROI
Modeling Determines the return on investment (ROI) for all advertising and marketing
expenditures. Campaign Analytics Allows a business to effectively assess the value of
marketing investments at the product or regional level, and examine sub-populations within
a market and how they are responding to advertising or marketing campaigns. Customer
acquisition and retention efforts have a more directed path and a clearer understanding of
what approaches are working best. Offer Optimization Marketers can move away from a
“one size ts all” over strategy, and tailor incentives / approach plans for better results from

221
`

each customer segment. Marketing programs can be refined from multiple perspectives;
such as maximizing sales for a given budget or maximizing profit for a given sales target.
Forecasting and Scenario Simulation Predictive tools that help companies to assess the
performance of an offer or program before making major media investments. In conjunction
with other analytics tools (such as ROI and Lift modeling, for example), clients can chart a
more effective sales and marketing roadmap.

Marketing Analytics - What it is and why it matters

Marketing analytics comprises the processes and technologies that enable marketers to
evaluate the success of their marketing initiatives. This is accomplished by measuring
performance (e.g., blogging versus social media versus channel communications).
Marketing analytics uses important business metrics, such as ROI, marketing attribution and
overall marketing effectiveness. In other words, it tells you how your marketing programs
are really performing.

Marketing analytics gathers data from across all marketing channels and consolidates it into
a common marketing view. From this common view, you can extract analytical results that
can provide invaluable assistance in driving your marketing efforts forward.

222
`

Why marketing analytics is important

Over the years, as businesses expanded into new marketing categories, new technologies
were adopted to support them. Because each new technology was typically deployed in
isolation, the result was a hodgepodge of disconnected data environments.

Consequently, marketers often make decisions based on data from individual channels
(website metrics, for example), not taking into account the entire marketing picture. Social
media data alone is not enough. Web analytics data alone is not enough. And tools that look
at just a snapshot in time for a single channel are woefully inadequate. Marketing analytics,
by contrast, considers all marketing efforts across all channels over a span of time – which
is essential for sound decision making and effective, efficient program execution.

What you can do with marketing analytics

With marketing analytics, you can answer questions like these:

• How are our marketing initiatives performing today? How about in the long run? What can
we do to improve them?

• How do our marketing activities compare with our competitors’? Where are they spending
their time and money? Are they using channels that we aren’t using?

• What should we do next? Are our marketing resources properly allocated? Are we devoting
time and money to the right channels? How should we prioritize our investments for next
year?

Three steps to marketing analytics success

To reap the greatest rewards from marketing analytics, follow these three steps:

4. Use a balanced assortment of analytic techniques.

5. Assess your analytic capabilities, and fill in the gaps.

6. Act on what you learn.

223
`

Use a balanced assortment of analytic techniques

To get the most benefit from marketing analytics, you need an analytic assortment that is
balanced – that is, one that combines techniques for:

• Reporting on the past. By using marketing analytics to report on the past, you can answer
such questions as: Which campaign elements generated the most revenue last quarter?
How did email campaign A perform against direct mail campaign B? How many leads did
we generate from blog post C versus social media campaign D?

• Analyzing the present. Marketing analytics enables you to determine how your marketing
initiatives are performing right now by answering questions like: How are our customers
engaging with us? Which channels do our most profitable customers prefer? Who is talking
about our brand on social media sites, and what are they saying?

• Predicting and/or influencing the future. Marketing analytics can also deliver data-driven
predictions that you can use to influence the future by answering such questions as: How
can we turn short-term wins into loyalty and ongoing engagement? How will adding 10 more
sales people in under-performing regions affect revenue? Which cities should we target next
using our current portfolio?

Assess your analytic capabilities, and fill in the gaps

Marketing organizations have access to a lot of different analytic capabilities in support of


various marketing goals, but if you’re like most, you probably don’t have all your bases
covered. Assessing your current analytic capabilities is a good next step. After all, it’s
important to know where you stand along the analytic spectrum, so you can identify where
the gaps are and start developing a strategy for filling them in.

For example, a marketing organization may already be collecting data from online and POS
transactions, but what about all the unstructured information from social media sources or
call-center logs? Such sources are a gold mine of information, and the technology for
converting unstructured data into actual insights that marketers can use exists today. As
such, a marketing organization may choose to plan and budget for adding analytic
capabilities that can fill that particular gap. Of course, if you’re not quite sure where to start,
224
`

well, that’s easy. Start where your needs are greatest, and fill in the gaps over time as new
needs arise.

Act on what you learn

There is absolutely no real value in all the information marketing analytics can give you
– unless you act on it. In a constant process of testing and learning, marketing analytics
enables you to improve your overall marketing program performance by, for example:

• Identifying channel deficiencies.

• Adjusting strategies and tactics as needed.

• Optimizing processes.

• Gaining customer insight.

Without the ability to test and evaluate the success of your marketing programs, you would
have no idea what was working and what wasn’t, when or if things needed to change, or
how. By the same token, if you use marketing analytics to evaluate success, but you do
nothing with that insight, then what is the point?

Applied holistically, marketing analytics allows for better, more successful marketing by
enabling you to close the loop as it relates to your marketing efforts and investments. For
example, marketing analytics can lead to better supply and demand planning, price
optimization, as well as robust lead nurturing and management, all of which leads to more
revenue and greater profitability. By more effectively managing leads and being able to tie
those leads to sales – which is known as closed-loop marketing analytics – you can see
which specific marketing initiatives are contributing to your bottom line.

225
`

HR Analytics

Definition: What is HR analytics?

HR analytics is the application of statistics, modeling, and analysis of employee-related


factors to improve business outcomes.

HR analytics is also often referred to as:

• People analytics
• Talent analytics
• Workforce analytics

The below graph provided by Google Trends shows search interest for these terms since
2004. Both the terms HR analytics and people analytics have grown in popularity and
continue to gain interest.

These terms are often used interchangeably, although some debate their differences.
Definitions of HR analytics tend to encompass a broader scope of data, while people
analytics and talent analytics refer to data points specific to people and their behavior. Some
prefer the term workforce analytics because of the growing tendency to automate tasks with
robots, which may be considered part of the workforce.

Overview

HR analytics enables HR professionals to make data-driven decisions to attract, manage,


and retain employees, which improves ROI. It helps leaders make decisions to create better
work environments and maximize employee productivity. It has a major impact on the
bottom-line when used effectively.

HR professionals gather data points across the organization from sources like:

• Employee surveys
• Telemetric Data
• Attendance records

226
`

• Multi-rater reviews
• Salary and promotion history
• Employee work history
• Demographic data
• Personality/temperament data
• Recruitment process
• Employee databases

HR leaders must align HR data and initiatives to the organization’s strategic goals. For
example, a tech company may want to improve collaboration across departments to
increase the number of innovative ideas built into their software. HR initiatives like shared
workspaces, company events, collaborative tools, and employee challenges can be
implemented to achieve this goal. To determine how successful initiatives are, HR analytics
can be utilized to examine correlations between initiatives and strategic goals.

Once data is gathered, HR analysts feed workforce data into sophisticated data models,
algorithms, and tools to gain actionable insights. These tools provide insights in the form of
dashboards, visualizations, and reports. An ongoing process should be put in place to
ensure continued improvement:

• Benchmark analysis
• Data-gathering
• Data-cleansing
• Analysis
• Evaluate goals and KPIs
• Create action plan based on analysis (continuously test new ideas)
• Execute on plan
• Streamline process

227
`

Financial Analytics

There is an increasing use of analytics in many organizations these days. Today’s


businesses need timely information that helps the business people to take important
decisions in business. Finance plays an important role in increasing the value of your
business. Finance is finding its way as an important business function and it overlaps with
analytics in many areas. Financial executives are finding out new ways in the field of finance
to increase the value of their organization.

Financial analytics is a concept that provides different views on the business’ financial data.
It helps give in-depth knowledge and take strategic actions against them to improve your
business’ overall performance. Financial analytics is a subset of BI & EPM and has an
impact on every aspect of your business. It plays a crucial role in calculating your business’
profit. It helps you answer every business question related to your business while letting
your forecast the future of your business.

What is Financial Analytics

Financial analytics is a field that gives different views of a company’s financial data. It helps
to gain in depth knowledge and take action against it to improve the performance of your
business. Financial analytics has its effect on all parts of your business. Financial analytics
plays a very important role in calculating the profit of a business. Financial analytics helps
you to answer all your business questions related to your business and also lets you to
forecast the future of your business.

Why Financial Analytics is important

• Today’s businesses need timely information that helps the business people to take
important decisions in business.
• Every business should have a sound financial planning and forecasting to leverage
the business.
• The emergence of new business model, the changing needs of the traditional
financial department and the advancement in technology have all led to the need for
financial analytics.
228
`

• Financial analytics helps in shaping up tomorrow’s business goals. You can also
improve the decision making strategies of your business.
• Financial analytics focuses on measuring and managing the tangible assets of an
organization such as cash, machinery and others
• It gives a deeper insight about the financial status of your business and improve the
profitability, cash flow and value of your business.
• Financial analytics will help in making smart decisions to increase the business
revenue and minimize the waste of the business
• Accounting, tax and other areas of finance are having data warehouse which is
combined with analytics to effectively run the business and achieve the goals faster.

There are four main reasons why financial analytics is becoming more important these days.
They are listed below

2. Business Models

There are three new business models which form the basis of financial analytics

• Business to Business
• Business to Consumer
• Business to Employee

3. Changing role of the financial department

Most of the finance functions are automatic and requires only fewer resources to manage
them. This enables the finance executives to concentrate more on the business goals rather
than just focusing on processing and reconciling transactions.

4. Business Processes

Businesses are becoming more complex these days due to the advancement of
technologies. Lot of questions arise in the mind of the business people. Analytics provide
the answers to all these questions. Financial analytics lets the managers and executives in
an organization to have access to more accurate and detailed financial information of the
organization. This strengthens the relationship of the employee inside the organization.

229
`

Here are few questions for which financial analytics can give you an answer

• What are the risks to which the business is exposed?


• How to enhance and extend the business processes to make them work more
effectively?
• Are the investments made in the right path?
• How is the profit of the product across different sales channels and customers?
• Which segment of the market is expected to bring more profit to the business in the
future?
• What are the factors that could affect the business in the future?

5. Integrated Analytics

These days companies use integrated financial analytics to face the competition in the
financial analytics market place. Because of using such integrated financial analytics
companies will be able to analyze and share the information to the sources inside and
outside the organization. Organizations should use integrated financial analytics to survive
in the new economy.

6. Role of the Data Warehouse

The data warehousing solutions mainly focus on important analytical components like data
stores, data marts and reporting applications. Data warehousing in the future will require
rich analytical capabilities. Smart decisions are easily made when the data and business
processes are integrated across all business functions in an organization.

Uses of Financial Analytics

Financial analytics helps a business to

• Understand the performance of an organization


• Measure and manage the value of tangible and intangible assets of an organization
• Manage the investments of the company
• Forecast the variations in the market
• Increase the functionalities of information systems
• Improve the business processes and profits
230
`

Important of Financial Analytics

• Today’s businesses require timely information for decision-making purposes


• Every company needs prudent financial planning and forecasting
• The diverse needs of the traditional financial department, and advancements in technology,
all point to the need for financial analytics.
• Financial analytics can help shape up the business’ future goals. It can help you improve
the decision-making strategies for your business.
• Financial analytics can help you focus on measuring and managing your business’ tangible
assets such as cash and equipment.
• It provides an in-depth insight into the organization’s financial status and improves the cash
flow, profitability, and business value.

231
`

Important financial analytics you need to know

In today’s data-driven world, analytics is critical for any business that wants to remain
competitive. Financial analytics can help you understand your business’ past and present
performance and make strategic decisions. Here are some of the critical financial analytics
that any company, size notwithstanding, should be implementing.

1. Predictive sales analytics

Sales revenue is critical for every business. As such, accurate sales projection has essential
strategic and technical implications for the organization. A predictive sales analytics involves
coming up with an informed sales forecast. There are many approaches to predicting sales,
such as the use of correlation analysis or use of past trends to forecast your sales. Predictive
sales analytics can help you plan and manage your business’ peaks and troughs.

232
`

2. Client profitability analytics

Every business needs to differentiate between clients that make them money and clients
that lose them money. Customer profitability typically falls within the 80/20 rule, where 20
percent of the clients account for 80 percent of the profits, and 20 percent of the clients
account for 80 percent of customer-related expenses. Understanding of which is vital.

By understanding your customers’ profitability, you will be able to analyze every client group
and gains useful insight. However, the greatest challenge to customer profitability analytics
comes in when you fail to analyze the client’s contribution to the organization.

3. Product profitability analytics

For organizations to remain competitive within an industry, organizations need to know


where they are making, and losing money. Product profitability analytics can help you
establish the profitability of every product rather than analyzing the business as a whole. To
do this, you need to assess each product individually. Product profitability analytics can also
help you establish profitability insights across the product range so you can make better
decisions and protect your profit and growth over time.

4. Cash flow analytics

You need a certain amount of cash to run the organization on a day-to-day basis. Cash
flow is the lifeblood of your business. Understanding cash flow is crucial for gauging the
health of the business. Cash flow analytics involves the use of real-time indicators like the
Working Capital Ratio and Cash Conversion Cycle. You can also predict cash flow using
tools like regression analysis. Besides helping with cash flow management and ensuring
that you have enough money for day-to-day operations, cash flow analytics can also help
you support a range of business functions.

5. Value-driven analytics

Most organizations have a sense of where they are going to and what they are hoping to
achieve. These goals can be formal and listed on a strategy map that pinpoints the business’
value drivers. These value drivers are the vital drivers that the organization needs to pull to

233
`

realize its strategic goals. Value driver analytics assesses these levers to ensure that they
can deliver the expected outcome.

6. Shareholder value analytics

The profits and losses, and their interpretation by analysts, investors, and the media can
influence your business’ performance on the stock market. Shareholder value analytics
calculates the value of the company by looking at the returns it is providing to shareholders.
In other words, it measures the financial repercussions of a strategy and reports how much
value the strategy in question is delivering to the shareholders. Shareholder value analytics
is used concurrently with profit and revenue analytics. You can use tools like Economic
Value Added (EVA) to measure the shareholder value analytics.

Conclusion

Financial analytics is a valuable tool that every organization, small and large, should use to
manage and measure its progress. Done right, it can help the organization adapt to the
trends that affect its operations.

Oracle Financial Analytics Software

One example of financial analytics software is Oracle. Oracle is one of the popular financial
analytics software programs in the market.

Oracle Financial Analytics helps to improve the financial performance through proper
information about the expenses and revenue of all the departments in the organization. It
increases the cash flow through proper maintenance of receivables, payables and inventory
management. It gives you timely financial reports which will help you to determine the
performance of your business. It also helps you to have a future forecast and plan your
budget well. Oracle Financial Analytics software will help to improve the financial health of
the business.

234
`

Features

This software has a lot of features that includes the following

• Fixed Assets Analytics – Manages and measures the assets life cycle
• Budgetary Control Analytics – It helps in preventing over spending through
effective monitoring of the budget and spending effectively
• General Ledger Analytics – Manage the financial performance of the company
through various factors
• Profitability Analytics – Helps in identifying what type of customers and which
channels drive more profit to the company
• Payables Analytics – Manage and monitor the cash of the payables department
• Receivables Analytics – Manage collections and have a check on the cash cycles
• Proactive Intelligence – This feature can send a signal about the issue to the
managers and executives of the organization which helps them to take immediate
action and solve the issue
• Pre-built data models and metrics – Oracle Financial Analytics has more than 100
metrics and models
• Out-of-the-box integration with ERP systems – It helps easy integration with ERP
systems at a less risk, low cost and lesser effort
• Oracle Financial Analytics for Oracle Fusion Applications – It helps you to learn
about the company’s past, present and future performance and will let you take smart
decisions.
• Powered by Oracle Business Intelligence Foundation – Produces high quality
reports and has a good dashboard and is highly scalable
• Exalytics Ready – It goes beyond the values of traditional data analytics and gives
deeper knowledge about the huge volume of data at the speed of thought

Documents used in Financial Analysis

Finance is the language of a business. The goals of a business are always defined in terms
of finance and the output is also measured in financial terms. Financial analytics involves
analyzing the data involved in financial statements. By this way it provides useful information
to the business owners and let them take better decisions.

235
`

Operational Analytics

High-performing companies will embed analytics directly into decision and operational
processes, and take advantage of machine-learning and other technologies to generate
insights in the millions per second rather than an “insight a week or month.”

The full benefit of analytics

Many organizations employ ad hoc analytics projects that use predictive analytics to help
them find meaningful insights from vast volumes of data. Some have dedicated teams of
data scientists conducting ongoing manual analytics. Chances are, these analytic projects
are discovering previously unknown and valuable insights about the organization and its
business. However, many companies are struggling to see the impact of these analytic
projects on widespread organizational outcomes.

This gap between analytic projects and business impact is driven less by the quality of the
analytic methods than the inherent business ecosystem, culture resistance to change, and
suboptimal processes supporting integration of insights into business operations and
applications. It is no longer sufficient to produce robust analytics. Organizations that want
to see measurable business results from analytics must focus on embedding analytics and
insights into day-to-day operations to enable analytically driven decision-making — fast,
automated, and operational.

High-performing companies embed analytics directly into decision and operational


processes, and take advantage of machine-learning and other technologies to generate
insights in the millions per second rather than an “insight a week or month.”

The details on Operational Analytics

Operational Analytics is the interoperation of multiple disciplines that support the seamless
flow from initial analytic discovery to embedding predictive analytics into business
operations, applications, and machines. The impacts of these analytics are then measured,
monitored, and further analyzed to circle back to new analytic discoveries in a continuous
improvement loop, much like a fully matured industrial process.

However, the analytics field has not seen this type of industrial rigor around moving analytics
into business operations. Organizations that wish to achieve competitive advantage through
analytics need to cross the chasm between traditional ad hoc analytics and Operational
Analytics.
236
`

237
`

Analytics in Telecom Industry

Telecom industry is one which not only sees a large customer base, but a customer base
who’s needs and desires are constantly evolving and/or shifting. On top of this, telecom
firms face cut throat competition, making it a highly dynamic and challenging industry. In
such a scenario, each decision that a telecom firm takes becomes all the more crucial. It is
hence imperative for the firm to take decisions based on extensive data analytics so as to
ensure efficient and effective use of business resources. Although analytics can be
instrumental in the telecom industry in many ways, some of the major applications include:

Customer retention/improving customer loyalty:


With neck to neck competition between the numerous players in this industry, customer
retention is of top notch importance. Telecom is now much more than making calls and
analytical tools can help firms to identify cross selling opportunities and take crucial
decisions to retain the customer. Analytics can also help in identifying trends in customer
behavior to predict customer churn and apprise the decision makers to take suitable action
to prevent the same. When dealing with a large customer base, marketing across the board
would be expensive and ineffective. Hence, analytics can help in better channelizing
marketing efforts, such as identify target group and/or region to launch pilot projects, so that
the firm has better return on its marketing investment.

Network optimization:
It is very crucial for telecom operators to ensure that all its customers are able to avail its
products at all times. At the same time, firms need to be frugal when it comes to allocating
resources to network, because any unused capacity is a waste of resource. Analytics help
in better monitoring traffic and in facilitating capacity planning decisions. Analytical tools
leverage data collected through day to day transactions and help in both short term
optimization decisions and long term strategic decision making.

Predictive analytics:
The traditional method of initiating something new based on gut feeling or intuition has been
long outdated. With the use of predictive analytics in all their business departments - telecom

238
`

operators can predict the approximate success rate of a new scheme based on the past
preferences of customers. This will in turn provide telecom operators with great strategic
advantage. Predictive analytics helps in targeting the right customer at the right time based
on their past behavior and choices. It also helps in boosting revenue by proper planning and
reducing the operational costs in the long term. Lastly, staying advance always gives a
competitive edge which is of utmost importance in this highly competitive telecom industry.

Social Analytics:
The branding of telecom operators on social media plays a crucial part in customer gain and
retention. Data generated through social media can be interpreted into meaningful insights
using social analytical tools. The customer sentiment analysis, customer experience and
positioning of the company can be analyzed to make the customer experience richer and
smoother. Also data generated through such platform are more diverse both geographically
and demographically and hence helps in gaining a closer to reality customer information.

We thus see how the powerful analytical tools can act as a catalyst for business growth
providing a layout for proper planning and optimization of resources. Also, with the spectrum
prices touching skies, analytics can help in developing a sustainable business model with
limited resources. Not only the present but also the future trends can be seen through
analytics. Hence, such powerful tools should be leveraged now to gain long term uplift in
business profit.

239
`

Data Analytics in Retail Industry

With the retail market getting more and more competitive by the day, there has never been
anything more important than the ability for optimizing service business processes when
trying to satisfy the expectations of customers. Channelizing and managing data with the
aim of working in favor of the customer as well as generating profits is very significant for
survival.

For big retail players all over the world, data analytics is applied more these days at all
stages of the retail process – taking track of popular products that are emerging, doing
forecasts of sales and future demand via predictive simulation, optimizing placements of
products and offers through heat-mapping of customers and many others. With this,
identifying customers who would likely be interested in certain products depending on their
past purchases, finding the most suitable way to handle them via targeted marketing
strategies and then coming up with what to sell next is what data analytics deals with.

Strategic Areas in Data Analytics for Retailers

There are some strategic areas where retail players identify a ready use as far as it is
data analytics. Here are a few of those areas:1.) Price Optimization

Of course, data analytics plays a very important role in price determination. Algorithms
perform several functions like tracking demand, inventory levels and activities of
competitors, and respond automatically to market challenges in real time, which make
actions to be taken depending on insights safe manner. Price optimization helps to
determine when prices are to be dropped which is popularly known as ‘markdown
optimization.’ Before analytics was used, retailers would just bring down prices after a
buying season ends for a certain product line, when the demand is diminishing. Meanwhile,
analytics shows that a gradual price reduction from when demand starts sagging would lead
to increase in revenues. The US retail Stage Stores found this out by performing some
experiments and was backed by a predictive approach for determining the rise and fall of
demand for a certain product which beats the conventional end of season sale.

240
`

Retail giants like Walmart, spend millions merchandising systems on their real time with the
aim of building the world’s largest private cloud so as to track millions of transactions as
they happen daily. As stated earlier, algorithms perform this function and others.

2.) Future performance prediction

This is another important area when looking into data analytics in retail industry since every
customer interaction has a very big impact on both potential and existing relationships.
Dishing out the full idea to the full sales force personnel might be risky because making a
wrong decision could result in an immediate or prolonged loss. Rather, top business
organizations have discovered the best way to contain cause-and–effect relationship
between key performance and strategic shift indicators by using a test-and-learn approach.
This is carried out by customers or reps who compare the performance of the test group to
performance of a well-matched control group. This is the data science involved behind the
study.

3.) To accommodate small-scale retailers

Data analytics in retail is important for small-scale retailers, who can get assistance from
platforms who provide the services. Apart from this, there are organizations, mainly start-
ups, who offer social analytics to create the awareness of products on social media.
Therefore, small-scale businesses can take the advantage of data analytics retail without
spending too much in order to avoid hurting their finances.

4.) Demand prediction

The moment retailers get a real understanding of customers buying trends, the focus on
areas that would have high demand. It involves gathering seasonal, demographical,
occasions led data and economic indicators so as to create a good image of purchase
behavior across target market. This is very good for inventory management.

5.) Pick out the highest Return on Investment (ROI) Opportunities

Retailers use data-driven intelligence and predictive risk filters after having a good
understanding of their potential and existing customer base, for modeling expected

241
`

responses for marketing campaigns, depending on how they are measured by a propensity
to buy or likely buy.

6.) Forecasting trends

Retailers, nowadays have several advanced tools at their disposal to have an understanding
of the current trends. Algorithms that forecast trends go via the buying data to analyze what
needs to be promoted by marketing departments and what is not needed to be promoted.

7.) Identifying customers

This is also important in data analytics retail because choosing which customers would likely
desire a certain product, data analytics is the best way to go about it. Because of this, most
retailers rely so much on recommendation engine technology online, data gotten via
transactional records and loyalty programs online and offline. Companies like Amazon might
not be ready ship products straight to the customer’s before they order; they are looking in
that direction. Individual geographic areas depend on demographics that they have on their
customers which imply that demand is forecast. Therefore, it means that when they get
orders, they are able to fulfill them more efficiently and quickly while data gotten depicted
how customers make contact with retailers is used for deciding which would be the best
path in getting their attention on a certain product or

Role of Data Analytics in Retail Industry

Other key areas where data analytics play a key role are:

a.) Discount Efficiency

Almost 95% of shoppers have admitted that they use a coupon code when they do shopping.
For retailers to gain from offers, they need to first ask themselves how valuable such deal
would be their business. Such promotional deals definitely will get customers rush in but
might not be an effective strategy to sustain a long-term customer loyalty. Rather, retailers
can run analysis on historical data and utilize it in predictive modeling for determining the
impact such offers would have on a long-term basis. For instance, a team of data analysts
and scientists can make a history of events that might have occurred if there was no
discount. They then make a comparison of this with the real events when there were
242
`

discounts to have a better understanding of the effectiveness of each discount. After getting
this knowledge, the retailer will now readjust his discount strategy by increasing the number
of discounts on various categories and removing less profitable deals. This would certainly
boost the average monthly revenue.

b.) Churn Rate Reduction

The creation of customer loyalty is the main priority among all brands because the cost of
attracting a new customer is more than six times expensive than retaining the existing ones.
It is possible to represent churn rate in various like percent of customers lost, the number of
customers lost, percent of recurring value lost and value of recurring business lost. With the
help of big data analytics, insights got like things customers are likely to churn, retailers can
find it easy in determining the best way to alter their overall subscriptions to prevent such
scenarios. For instance, a retailer takes an analysis of customer data after a monthly
subscription box and can use it to get new subscribers who might likely end up as long term
customers. This would result to the retailer decreasing the monthly churn significantly and
would make brands be able to calculate lifetime value and make money back on marketing
costs that are steep.

c.) Product Sell-through rate

Products that are data related can be analyzed by retailers to find what pricing, visuals, and
terminology will resonate with the potential and existing customers. An alteration of the
product showcase depending on the data sets that are analyzed, retailers will obtain
improved sales rate. Take, for instance, Uber’s whole business model depends on big
analytics for sourcing of crowd and sell-through of products. With customers’ personal data,
Uber is able to match them with the most suitable drivers depending on the location and
rating of their customers. Customers, therefore because of such personalized experience,
would prefer to take advantage of Uber’s personalized offers against offers by competitors
of Uber or even regular taxis.

Getting the right customers to stores is very important too, something a US department store
giant recently discovered. Because of the way their analytics showed a dearth in vital
“millennials” demographic groups, their One Below basement was opened at their New York
flagship store. Promotions such as “selfie walls” and while-you-wait customized 3D-printed
243
`

smartphone cases were offered. All these were just ideas for attracting young customers to
their store with the aim of giving them an awesome experience.

Opportunities in Retail Analytics

Also, there are several opportunities in retail analytics:

1.) The promise of big data

Yearly, retail data is on the increase, exponentially in variety, volume, value, and velocity
every year. Retailers who are smart know that each interaction holds a potential for profit.
So much profit.

A report in 2011 states that retailers who use big data analytics could increase their
operating margins by as much as 60 percent. This, therefore, has created the need for the
data scientist whose job is to make big data (structured or unstructured, external or internal)
clear. This is to help retailers take actions that will help them increase sales while costs are
reduced.

2.) Marketing

• Online behavioral analysis and web analytics that create tailored offers.
• Personalized and location-based offers on mobile devices.
• Targeted campaigns that use analytics for segmenting consumers, identifying the best
channels and eventually achieving an optimal return on investment.
• “Second by second” metrics used by real-time pricing.

3.) Customer Experience

• Multi-level reward programs and personalized recommendations that depend on online


data purchase preference, smartphones apps, etc.
• Sentiment analysis of call center records, social media streams, product reviews and
many others for market insights and customer feedbacks.
• Predictive analytics for customer experience enhancement on all devices and channels,
online and offline.

244
`

4.) Merchandizing

• Detailed market basket analysis that yields more rapid growth in revenue.
• Identifying shopping trends and cross-selling opportunities with the aid of video data
analysis.
• Rise in daily profits via a combination of external and internal data such as seasonal and
holiday trends, economic forecasts, traffic and weather reports.

5.) Omni-Experience

The main aim is a streamlined and seamless experience for everyone involved. This is from
when the product leaves the manufacturer, to the store floor or warehouse, to it being
purchased, the retailer wants maximum efficiency in all departments.

It is no longer news that the retail industry has gone through a lot of operational changes
over the years due to data analytics in retail industry. The solutions of big data analytics in
retail industry have played an important role in bringing about these changes. Therefore,
the adoption of these analytics solutions is growing rapidly making more retailers work
tirelessly in order to enhance supply chain operations, improve on marketing campaigns
and raise the satisfaction of customer as well as achieves a high success rate in retailing.

Challenges

A lot of issues would be acknowledged to optimize data analytics in retail industry full
capacity. Factors such as security, privacy, liability policies and intellectual property have to
be stringent when talking about analytics. Analytics and big data are inter-related and
therefore professionals who are specially trained would need to be included in the team so
as to functionalize and utilize big data analytics.

Also, companies would find it pertinent to incorporate information from various sources of
data, mainly from third parties, and aid such environment by deploying efficient data.

Finally, companies often make the mistake of falling in short-sightedness, making them fail
in implementing the insights gotten from analytics. Of course, this could be fixed by
continuously altering retail styles where a particular team is given the task of arranging and
implementing insights.
245
`

Conclusion

Retailing has gotten to the platform for more disruption that is data-driven because data
quality gotten from several sources such as social network conversations, internet
purchases, location-specific interactions from smartphones have transformed into a new
entity for transactions that are digital based.

The benefits that organizations would reap from utilizing data analytics are better risk
management, improved performance and being able to discover insights that may have
been hidden.

With the big return of interest data analytics delivers in the retail industry, most retailers will
continue to utilize solutions so as to increase customer loyalty sustenance, boost the
perception of their brand and improve promoter scores. Data analytics retail allows retailers
and organizations gather information on their customers, how to reach them and how they
can use their needs to impact sales. As technology continues to dominate retail industry,
one thing is certain – data analytics is here to stay!

What Is Healthcare Analytics?

Healthcare analytics is the branch of analysis that focuses on offering insights into hospital
management, patient records, costs, diagnoses, and more. The field covers a broad swath
of the healthcare industry, offering insights on both the macro and micro level.

When combined with business intelligence suites and data visualization tools, healthcare
analytics helps managers operate better by providing real-time information that can support
decisions and deliver actionable insights.

For hospital and healthcare managers, healthcare analytics provide a combination of


financial and administrative data alongside information that can aid patient care efforts,
better services, and improve existing procedures.

246
`

Healthcare BI suites tend to emphasize broad categories of data for collection and parsing:
costs and claims, research and development, clinical data alongside patient behavior and
sentiment.

How Does It Help?


Digitization in the healthcare industry is allowing the organizations to cater to the needs of
the patients judiciously and with more attention. There can be significant benefits on the
way to realize like patient-centric treatment and medication, detection of diseases earlier
than before, patient satisfaction guaranteed, reducing cost in healthcare management and
for the patient.

There is a huge amount of data in the healthcare industry relating to patient details, disease
types, legal paperwork and compliance documents. This treasure builds upon the base for
descriptive and predictive analysis. Owing to which the solution provider creates the clinical
data thus required.

ROLE OF ANALYTICS IN THE HEALTH CARE SECTOR

Health care industry is one of the world’s fastest -growing industries; consuming
over 10 percent of gross domestic product (GDP) of most developed nations.
Health care can form an enormous part of a country’s economy. The World Health
Organization estimates there are 9.2 million physicians, 19.4 million nurses and
midwives, 1.9 million dentists and other dentistry personnel, 2.6 million
pharmacists and other pharmaceutical personnel, and over 1.3 million community
health workers worldwide, making the health care industry one of the largest
segments of the workforce. Obviously, such an industry also creates voluminous
data in respect of patients, diseases, diagnosis, medicines, research etc..

Indian healthcare industry is engaged in generating zettabytes (1021 gigabytes) of


data every day by capturing patient care records, prescriptions, diagnostic tests,
insurance claims, equipment generated data for monitoring vital signs and most
importantly the medical research.

247
`

According to a industry report, California-based managed care consortium Kaiser


Permanente is believed to have between 26.5 – 44 petabytes (1,000,000
gigabytes) of potentially rich data from EHRs.

In the healthcare industry, information can have a profound impact on improving


patient care and quality of life, while controlling escalating costs.

Some of the key issues that Healthcare companies face to gain competitive
advantage, in the market place, are as follows:

• What are the best ways to contain costs without reducing the level and quality of care?
• Can the companies reduce operating costs, perform dynamic budgeting and
forecasting, and improve overall profitability?
• How can doctors, patients, providers and payers work together to improve outcomes?
• Are there ways to perform faster diagnoses and understand different treatment
patterns to save and improve more lives?
• Are patients satisfied with their level of care?
• How do the companies compel individuals to become advocates for their own health?

Using data analytics provides the means to find answers to the issues facing the
health care industry. Following are some of the services that form the underlying
layer of analytics in the Healthcare Industry:

• Clinical Data Management (CDM) Services – that addresses the need for records of
patient history, treatment responses, number of patients and income from each of
them, mapping of symptoms and drugs etc.
• Promotional Spend Compliance – that addresses the customers’ need to adhere to
national and sub national healthcare regulations and similar laws worldwide. This
solution is strengthened with analytics capability for sales per spend analysis,
competitive returns bench-marking and spend fraud pattern detection capability.
• Social Media Intelligence/Analytics – that helps identify deep business insights on
treatment effectiveness, peer physician preferences, key opinion leader’s
recommendations, disease spread and concentration etc. using data and opinion from

248
`

social media. Amongst the end results is enhanced treatment effectiveness and
improved clinical trial results.
• Service Analytics for Medical Devices – that ensures monitoring and tracking of all
after sales and service related processes, offers insights into supplies management
and ensures proactive as well as preventive maintenance. The solution is aimed at the
effective management of the services business of medical devices.
• Predictive Asset Analytics of Medical Devices – that enables smoother operations
by reducing the risk of medical device failure and by ensuring that the medical devices
run at optimal performance.
• Trade Promotion Optimization – that enables trade promotion managers to model,
optimize, forecast, budget, execute, manage and measure the trade promotion spends
for consumer, health and OTC products.

Predictive Analytics Solutions

Defining Predictive Analytics in Healthcare

Predictive analytics and machine learning in healthcare are rapidly becoming some of the
most-discussed, perhaps most-hyped topics in healthcare analytics. Machine learning is a
well-studied discipline with a long history of success in many industries. Healthcare can
learn valuable lessons from this previous success to jumpstart the utility of predictive
analytics for improving patient care, chronic disease management, hospital administration,
and supply chain efficiencies. The opportunity that currently exists for healthcare systems
is to define what “predictive analytics” means to them and how can it be used most
effectively to make improvements.

However, predictions made solely for the sake of making a prediction are a waste of time
and money. In healthcare and other industries, prediction is most useful when that
knowledge can be transferred into action. The willingness to intervene is the key to
harnessing the power of historical and real-time data. Importantly, to best gauge efficacy
and value, both the predictor and the intervention must be integrated within the same system
and workflow where the trend occurs.

249
`

The following Health Catalyst® paper, “Using Predictive Analytics in


Healthcare: Technology Hype vs Reality” is a good summary of both the hype and hope of
predictive analytics in healthcare

Important Questions

Business Analytics (4529201) Subject Faculty: Dr. Vishal Patidar

Important Questions

Module: 1

Q.1 Briefly explain the information users and their requirements.

Q.2 Define Business Intelligence and explain the importance of business intelligence with suitable
examples.

Q.3 Explain the Features of Business Intelligence

Q.4 Explain the Use of Business Intelligence

Q.5 Explain the main components of Business Intelligence

Q.6 Explain Business Intelligence Tools

Q.7 Explain the key functionalities of Business Intelligence

Q.8 What are the Key Success Factors for Business Intelligence Implementation?

Q.9 Explain the advantages and disadvantages of Business Intelligence

Q.10 Define Business Analytics and types of Business Analytics

Q.11 Differentiate Business Intelligence versus Business Analytics

Q.12 Explain the On-Line Analytical Processing system with suitable examples

Q.13 Explain the On-Line Transaction Processing system with a suitable example

Q.14 Differentiate OLTP and OLAP

Q.15 Types of OLAP Servers

Q.16 Explain OLAP Operations in detail

Q.17 Difference between ROLAP versus MOLAP

Q.18 What is Entity- Relationship Model with suitable schema?

Q.19 Explain the types of Data warehouse Schema

Q.20 Differentiate Star Schema versus Snowflake Schema

250
`

Module: II

Q.1 Explain the Types of Digital Data

Q.2 Compare and Contrast Structured, Semi-Structured and Unstructured data

Q.3 What is Data Mart?

Q.4 Explain the Goals of a Data Warehouse

Ans: • Information Accessibility • Information Credibility • Flexible to Change • Support for more
fact-based decision making • Support for the data security • Information consistency

Q.5 What is Data Warehouse? Explain

Q.6 Explain the Key Characteristics of Data Warehouse

Q.7 Explain the Advantages of Data Warehouse

Q.8 What is Data Lake? Explain the key Data Lake concept?

Q.9 Explain the important tiers in Data Lake Architecture

Q.10 Explain the need of Data Lake

Q.11 Explain the difference between Data Lake and Data Warehouse

Module: III

Q.1 What Is Data Mining? Explain the goals of Data Mining.

Q.2 Explain Data Mining Process in detail

Q.3 Explain Data Mining in Business Intelligence Context

Q.4 Explain the difference between Data Mining & Data Warehouse

Q.5 Explain some of the important reasons for using Data Mining

Q.6 Explain the difference Between SQL, OLAP and Data Mining

Q.7 Explain Text Mining also stated the reasons for using text mining (i.e., Motivation for text Mining)

Q.8 Explain Text Mining Applications with suitable examples

Q.9 Explain Text Mining Process

Q.10 Explain the challenges in Text Mining

Q.11 Explain Text Mining Vs. Data Mining

Q.12 Explain Web Mining and types of web mining techniques

Q.13 Explain Common Mining Techniques

Q.14 Explain the Applications of Web Mining with suitable examples

Q.15 What are the challenges in Web Mining?

251
`

Q.16 What is a Social Media Analytics? Explain some of the use of social media analytics with suitable
examples.

Q.17 Explain Social Media Analytics benefits with suitable examples

Q.18 What is Sentiment? Explain techniques of sentiment classification

Q.19 What are the applications of sentiment analysis in business?

Q.20 What are the challenges in sentiment analysis?

Q.21 What is Big Data? Explain the technology drivers for Big Data Analytics

Q.22 Explain the use of technology in Big Data Analytics

Q.23 Explain the Business Benefits of Big Data Analytics

Q.24 Big Data Analytics and Predicative Analytics with suitable examples.

Module: IV

Q.1 Explain Business Performance Management and steps to carry out effective Business Process
Management.

Q.2 Explain Business Process Management Methodology for enhancing business performance.

Q.3 What are Key Performance Indicators? Explain few Key factors for commercial airline company.

Q.4 Sales and Marketing Analytics What it is and why it matters.

Q.5 Define HR Analytics? Explain HR professionals gather data points across the organization from
different sources.

Q.6 What is Financial Analytics? Why Financial Analytics is Important.

Q.7 What is the use of Financial Analytics in Business?

Q.8 Explain the Important of Financial Analytics?

Q.9 Explain the critical Financial Analytics that use by the businesses.

Q.10 What is Operational Analytics? Explain the Key Process Areas of Operational Analysis.

Q.11 Explain the area of Analytics in Telecom Industries

Q.12 Explain the Strategic areas of Data Analytics for Retailers

Q.13 Role of Data Analytics in Retail Industry

Q.14 What Is Healthcare Analytics? Role of Analytics in the Health Care Sector.

252

You might also like