Business Analytics Concepts by IBM

Module 5 - Business Data
Analytics
Disclaimer: The content is curated for educational purposes only.

© Edunet Foundation. All rights reserved.
After going through this module, students will be able to
● Understand business analytics and develop business intelligence.

● Analyze data using statistical and data mining techniques for business
intelligence.
● Understand case studies for predictive models.
● Develop case studies for predictive analytical models.

Understand business
analytics and develop
business intelligence.

In this section, we will discuss:
● Introduction to business analytics and Concepts of business analytics

● Trends in business analytics
● Descriptive analytics
● Introduction to statistics
● Types of data
● Measure of Central Tendency
● Arithmetic mean
● Geometric Mean
● Harmonic Mean

● Median in Raw and Grouped Data

● Mode in Raw and Grouped Data
● Standard Deviation
● Variance
● Properties of Variance and standard deviation
● Usage of variance in business analytics
● OLAP Concept
● OLTP Concept

Introduction to business
analytics and Concepts of
business analytics
What is Business Analytics?
● Business analytics (BA) is the iterative,

methodical exploration of an
organization's data, with an emphasis on
statistical analysis.
● Business analytics is used by
companies that are committed to making
data-driven decisions.
Image Source: https://www.businessanalytics.com/

business analytics
What is Business Analytics?(Contd)
● Business Analytics is "the study of data

through statistical and operations
analysis, the formation of predictive
models, application of optimization
techniques, and the communication of
these results to customers, business
partners, and college executives."
Image Source:
https://www.proschoolonline.com/certification-business-analytics-course/what-is-b
business analytics
What is Business Analytics?(Contd)
● It adopts quantitative methods and

evidence is required for data to build
certain models for businesses and make
profitable decisions. Thus, Business
Analytics majorly depends on and uses
Big Data( large volume of data) .
Image Source:
https://www.altudo.co/resources/blogs/business-analytics-vs-marketing-analytics-
business analytics
Understanding Business Analytics
● Business Analytics is the procedure

through which information is dissected
after studying past performances and
issues, to devise a successful plan for
the future.
● Big Data or large amounts of data is
used to derive solutions.
Image Source:
https://www.indiaeducation.net/management/streams/business-analytics.html
business analytics
Understanding Business
Analytics(Contd)
● This method of going about a business

or this outlook towards building and
sustaining a business is vital to the
economy and industries that thrive in the
economy.
Image Source: https://www.martinsights.com/?p=1049

business analytics
Components of Business Analytics
● Define Objective
● Data Aggregation
● Data Cleaning
● Analytical Methodology
● Evaluation and Validation
● Reporting and Data Visualisation
Image Source: https://www.analytixlabs.co.in/blog/what-is-business-analytics/

business analytics
Types of Business Analytics Methods
● Descriptive Analytics
● Diagnostic Analytics
● Predictive Analytics
● Prescriptive Analytics
Image Source:
business analytics
Uses and Benefits of Business
Analytics
● To carry out data mining and exploring

new data to find new patterns and
relationships.
● To carry out statistical and quantitative
analysis to provide explanations for
certain occurrences.
Image Source: https://www.analytixlabs.co.in/blog/business-analytics-career/

business analytics
Uses and Benefits of Business
Analytics
● Test previous decisions are taken with

the help of A/B testing and multivariate
testing.
● Deploy predictive modeling to predict
future outcomes.
Image Source:
https://www.datapine.com/blog/benefits-of-business-intelligence-and-business-an
business analytics
Business Analytics Tools
● SQL
● Tableau/ QlikView/ Power BI
● Birt
● Python
● R
● MS Excel
● Sisense
● Clear Analytics
● Pentaho BI
● MicroStrategy Image Source: https://sigma4sap.com/?page_id=466
business analytics
Applications of Business Analytics
● Marketing
● Finance
● Human Resources
● Manufacturing
Image Source:
Trends in Business
Analytics
Business Analytics Trends For 2020
● Data Quality Management

● Data Discovery/Visualization
● Artificial Intelligence
● Predictive and Prescriptive Analytics
Tools
● Collaborative Business Intelligence
● Data-driven Culture
Image Source: https://www.datapine.com/blog/business-intelligence-trends/

Trends in Business
Analytics
Business Analytics Trends For
2020(Contd)
● Augmented Analytics
● Mobile BI
● Data Automation
● Embedded Analytics
● Natural language processing
Image Source: https://codeit.us/blog/top-data-and-analytics-trends

Descriptive analytics
What is Descriptive Analytics?
● Descriptive analytics is a statistical

method that is used to search and
summarize historical data in order to
identify patterns or meaning.
● Descriptive analytics are based on
standard aggregate functions in
databases
Image Source:
https://www.dezyre.com/article/types-of-analytics-descriptive-predictive-prescriptiv
What is Descriptive Analytics?
(Contd)
● For example, in an online learning

course with a discussion board,
descriptive analytics could determine
how many students participated in the
discussion, or how many times a
particular student posted in the
discussion forum.
Image Source: https://www.valamis.com/hub/descriptive-analytics
How does descriptive analytics work?
● Data aggregation and data mining are

two techniques used in descriptive
analytics to discover historical data.
● Data is first gathered and sorted by data
aggregation in order to make the
datasets more manageable by analysts.
Image Source: https://www.dataversity.net/fundamentals-descriptive-analytics/

How does descriptive analytics work?
(Contd)
● Data mining describes the next step of
the analysis and involves a search of the
data to identify patterns and meaning.
● Identified patterns are analyzed to
discover the specific ways that learners
interacted with the learning content and
within the learning environment.
Image Source: hhttps://www.sisense.com/glossary/descriptive-analytics/

Examples of descriptive analytics
● Tracking course enrollments, course

compliance rates,
● Recording which learning resources are
accessed and how often
● Summarizing the number of times a
learner posts in a discussion board
● Tracking assignment and assessment
grades
Image Source:
https://www.vertical-leap.uk/blog/data-science-for-marketers-part-2-descriptive-v-
Examples of descriptive
analytics(Contd)
● Comparing pre-test and post-test

assessments
● Analyzing course completion rates by
learner or by course
● Collating course survey results
● Identifying length of time that learners
took to complete a course
Image Source:
https://www.vectorstock.com/royalty-free-vector/data-analytics-icons-flat-pack-vec
Advantages of descriptive analytics
● Quickly and easily report on the Return

on Investment (ROI) by showing how
performance achieved business or
target goals.
● Identify gaps and performance issues
early - before they become problems.
.
Image Source:
https://forums.bsdinsight.com/threads/descriptive-predictive-and-prescriptive-anal
Advantages of descriptive
analytics(Contd)
● Identify specific learners who require

additional support, regardless of how
many students or employees there are
● Identify successful learners in order to
offer positive feedback or additional
resources.
● Analyze the value and impact of course
design and learning resources.
. Image Source:
https://econsultancy.com/analytics-approaches-every-marketer-should-know-1-de
Introduction to statistics
Introduction to Statistics
● It is a branch of mathematics that

deals with the organization,presentation,
collection,analyzation and interpretation
of numerical data.
Image Source: https://www.youtube.com/watch?v=7rKQBKQOIQw

Introduction to statistics
Types of Statistics
● Descriptive statistics
● Inferential statistics
Image Source: https://slideplayer.com/slide/6642532/

Types of Statistics
Descriptive statistics
● It is used to describe the basic features

of data in a study.
● Descriptive statistics deals with the
processing of data without attempting to
draw any inferences from it.
● The data are presented in the form of
tables and graphs.
Image Source: https://data-flair.training/blogs/stat-descriptive-statistics/

. © Edunet Foundation. All rights reserved.
Types of Statistics
Descriptive statistics
● The characteristics of the data are

described in simple terms.
● Events that are dealt with include
everyday happenings such as accidents,
prices of goods, business, incomes,
epidemics, sports data, population data.
Image Source: : https://data-flair.training/blogs/stat-descriptive-statistics/

. © Edunet Foundation. All rights reserved.
Types of Statistics
Inferential statistics
● Inferential statistics is a scientific

discipline that uses mathematical tools
to make forecasts and projections by
analyzing the given data.
● This is of use to people employed in
such fields as engineering, economics,
biology, the social sciences, business,
agriculture and communications.
. Image Source:
https://mahritaharahap.wordpress.com/teaching-areas/inferential-statistics/
Types of data
Qualitative or Categorical Data
● Qualitative data, also known as the

categorical data.
● It describes the data that fits into the
categories.
● Qualitative data are not numerical.
● The categorical information involves
categorical variables that describe the
features such as a person’s gender,
hometown etc. Image Source:
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
Types of data
Continued……..
● Categorical measures are defined in

terms of natural language specifications,
but not in terms of numbers.
● Sometimes categorical data can hold
numerical values (quantitative value)
● But those values do not have
mathematical sense
Image Source:
Types of data
Continued……..
● Here, the birthdate and school postcode

hold the quantitative value
● But it does not give numerical meaning.
Image Source :
Types of data
Continued……..
Nominal Data:
● Nominal data is one of the types of
qualitative information which helps to
label the variables without providing the
numerical value.
● Nominal data is also called the nominal
scale. It cannot be ordered and
measured.
Image Source :
Types of data
Continued……..
● But sometimes, the data can be

qualitative and quantitative
● Examples of nominal data are letters,
symbols, words, gender etc.
● The nominal data are examined using
the grouping method.
Image Source :
Types of data
Continued……..
● In this method, the data are grouped into

categories, and then the frequency or
the percentage of the data can be
calculated.
● These data are visually represented
using the pie charts.
Image Source :
Types of data
Continued……..
Ordinal Data:
● Ordinal data/variable is a type of data
which follows a natural order.
● The significant feature of the nominal
data is that the difference between the
data values are not determined.
● This variable is mostly found in surveys,
finance, economics, questionnaires, and
so on. Image Source :
Types of data
Continued……..
● The ordinal data is commonly

represented using a bar chart.
● These data are investigated and
interpreted through many visualisation
tools.
● The information may be expressed
using tables in which each row in the
table shows the distinct category.
Image Source :
Types of data
Continued……..
Binary Data:
● Binary data has only 2 values/states.
● For Example yes or no, affected or
unaffected, true or false.
i) Symmetric : Both values are equally
important (Gender).
ii) Asymmetric : Both values are not equally
important (Result).
Image Source :
Types of data
Continued……..
Advantages:
● Better understanding- Qualitative data
gives a better understanding of the
perspectives and needs of participants.
● Provides Explaination- Qualitative data
along with quantitative data can explain
the result of the survey and can
measure the correction of the
quantitative data.
Image Source :
Types of data
Continued……..
● Better Identification- of behavior

patterns - Qualitative data can provide
detailed information which can prove
itself useful in identification of behavioral
patterns.
Image Source :
Types of data
Continued……..
Disadvantages:
● Lesser reachability- Being subjective in
nature, small population is generally
covered to represent the large
population.
● Time Consuming- Qualitative data is
time consuming as large data is to be
understood.
● Possibility of Bias- Being subjective
analysis; evaluator bias is quite feasible. Image Source :
Types of data
Quantitative or Numerical Data
Continued……..
● Quantitative data is also known as

numerical data which represents the
numerical value (i.e., how much, how
often, how many).
● Numerical data gives information about
the quantities of a specific thing.
● Some of the examples of numerical data
are height, length, size, weight, and so
on.
Image Source :
Types of data
Continued……..
● The quantitative data can be classified

into two different types based on the
data sets.
● The two different classifications of
numerical data are discrete data and
continuous data.
Image Source :
Types of data
Continued……..
Discrete Data:
● Discrete data can take only discrete
values.
● Discrete information contains only a
finite number of possible values.
Image Source :
Types of data
Continued……..
● Those values cannot be subdivided

meaningfully.
● Here, things can be counted in the
whole numbers.
● Example: Number of students in the
class
Image Source :
Types of data
Continued……..
Continuous Data:
● Continuous data is data that can be

calculated.
● It has an infinite number of probable
values that can be selected within a
given specific range.
● Example: Temperature range
Image Source :
Types of data
Continued……..
Advantages:
● Specific- Quantitative data is clear and

specific to the survey conducted.
● High Reliability- If collected properly,
quantitative data is normally accurate
and hence highly reliable.
Image Source :
Types of data
Continued……..
● Easy communication- Quantitative

data is easy to communicate and
elaborate using charts, graphs etc.
● Existing support- Many large datasets
may be already present that can be
analyzed to check the relevance of the
survey.
Image Source :
Types of data
Continued……..
Disadvantages:
● Limited Options- Respondents are

required to choose from limited options.
● High Complexity- Qualitative data may
need complex procedures to get correct
sample.
● Require Expertise- Analysis of
qualitative data requires certain
expertise in statistical analysis. Image Source :
Measure of Central
Tendency
Definition
● A measure of central tendency is a

summary statistic that represents the
center point or typical value of a dataset.
● These measures indicate where most
values in a distribution fall and are also
referred to as the central location of a
distribution.
Image Source :
http://www.brainkart.com/article/Various-measures-of-central-tendency_35079/
Measure of Central
Tendency
Definition
Continued…...
● We can think of it as the tendency of

data to cluster around a middle value.
● In statistics the three most common
measures of central tendency are the
mean, median and mode.
Image Source :
Measure of Central
Tendency
Definition
Continued…...
● Each of these measures calculates the

location of the central point using a
different method.
● Choosing the best measure of central
tendency depends on the type of data
we have.
Image Source :
Measure of Central
Tendency
Mean
Continued….
● The mean is the arithmetic average, and

it is probably the measure of central
tendency that you are most familiar.
● Calculating the mean is very simple.
Image Source:
statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
Measure of Central
Tendency
Mean
Continued….
● We just add up all of the values and

divide by the number of observations in
your dataset.
x1+x2+x3+.....+xn
_______________
n
Image Source :
Measure of Central
Tendency
Mean
Continued….
● The calculation of the mean

incorporates all values in the data.
● If you change any value, the mean
changes.
● However, the mean doesn’t always
locate the center of the data accurately.
Image Source :
Measure of Central
Tendency
Mean
Continued….
● In a symmetric distribution, the mean

locates the center accurately.
Image Source :
Measure of Central
Tendency
Mean
Continued….
● However, in a skewed distribution, the

mean can miss the mark.
● This problem occurs because outliers
have a substantial impact on the mean.
● Extreme values in an extended tail pull
the mean away from the center.
● As the distribution becomes more
skewed, the mean is drawn further away
from the center.
Image Source :
Measure of Central
Tendency
Median
● The median is the middle value.

● It is the value that splits the dataset in
half.
● To find the median, order your data from
smallest to largest, and then find the
data point that has an equal amount of
values above it and below it.
Image Source :
Measure of Central
Tendency
Median
Continued….
● The method for locating the median

varies slightly depending on whether
your dataset has an even or odd number
of values.
Image Source :
Measure of Central
Tendency
Median
Continued….
● In the dataset with the odd number of

observations, notice how the number 12
has six values above it and six below it.
● Therefore, 12 is the median of this
dataset.
Image Source :
Measure of Central
Tendency
Median
Continued….
● When there is an even number of

values, you count in to the two
innermost values and then take the
average.
● The average of 27 and 29 is 28.
Consequently, 28 is the median of this
dataset.
Image Source :
Measure of Central
Tendency
Mode
● The mode is the value that occurs the

most frequently in your data set.
● On a bar chart, the mode is the highest
bar.
● If the data have multiple values that are
tied for occurring the most frequently,
you have a multimodal distribution.
● If no value repeats, the data do not have
a mode.
Image Source :
Measure of Central
Tendency
Mode
Continued….
● In the dataset, the value 5 occurs most

frequently, which makes it the mode.
● These data might represent a 5-point
Likert scale.
Image Source :
https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode
Measure of Central
Tendency
Mode
Continued….
● Typically, you use the mode with

categorical, ordinal, and discrete data.
● In fact, the mode is the only measure of
central tendency that you can use with
categorical data—such as the most
preferred flavor of ice cream.
● However, with categorical data, there
isn’t a central value because you can’t
order the groups.
Image Source :
Measure of Central
Tendency
Mode
Continued….
● With ordinal and discrete data, the mode

can be a value that is not in the center.
● Again, the mode represents the most
common value.
Image Source :
Arithmetic mean
Definition
● Arithmetic Mean in the most common

and easily understood measure of
central tendency.
● We can define mean as the value
obtained by dividing the sum of
measurements with the number of
measurements contained in the data
set and is denoted by the symbol
x
¯
x¯ Image Source :
Arithmetic mean
Arithmetic Mean for three types of

series
● Individual Data Series

● Discrete Data Series
● Continuous Data Series
Image Source :
Arithmetic mean
Individual Data Series
● When data is given on individual basis.

Following is an example of individual
series:
Items:
5 10 20 30 40 50 60 70
Image Source :
Arithmetic mean

Continued….
● For individual series, the Arithmetic Mean can

be calculated using the following formula.
Formula:
Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
Arithmetic mean

Continued….
● Alternatively, we can write same formula

as follows:
Image Source :
Arithmetic mean

Continued….
Image Source :
Arithmetic mean

Continued….
Example:
Problem Statement:
● Calculate Arithmetic Mean for the
following individual data:
Items:
14 36 45 70 105
Image Source :
Arithmetic mean

Continued….
Solution:
● Based on the above mentioned
formula, Arithmetic Mean x¯ will be:
● The Arithmetic Mean of the given Image Source :

numbers is 54. https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
Arithmetic mean
Discrete Data Series
● When data is given alongwith their

frequencies. Following is an example of
discrete series:
Items : 5 10 20 30 40 50 60 70
Frequency: 2 5 1 3 12 0 5 7
Image Source :
Arithmetic mean

Continued….
● For discrete series, the Arithmetic Mean

can be calculated using the following
formula.
Formula
Image Source :
Arithmetic mean

Continued….
● Alternatively, we can write same formula

as follows:
Formula:
Image Source :
Arithmetic mean

Continued….
Image Source :
Arithmetic mean

Continued….
Example:
Problem Statement:
● Calculate Arithmetic Mean for the
following discrete data:
Items: 14 36 45 70
Frequency: 2 5 1 3
Image Source :
Arithmetic mean

Continued….
Solution:
Based on the given data, we have:
Image Source :
Arithmetic mean

Continued….

● The Arithmetic Mean of the given

numbers is 42.09. Image Source :
Arithmetic mean
Continuous Data Series
● When data is given based on ranges

along with their frequencies. Following
is an example of continuous series:
Items: 0-5 5-10 10-20 20-30 30-40
Frequency: 2 5 1 3 12
Image Source :
Arithmetic mean

Continued….
● In case of continuous series, a mid

point is computed as
(lower−limit+upper−limit)/2 and
Arithmetic Mean is computed using
following formula.
Formula:
Image Source :
Arithmetic mean

Continued….
Image Source :
Arithmetic mean

Continued….
Example:
Problem Statement:
Let's calculate Arithmetic Mean for the
following continuous data:
Items: 0-10 10-20 20-30 30-40
Frequency: 2 5 1 3
Image Source :
Arithmetic mean

Continued….
Solution:
Based on the given data, we have:
Image Source :
Arithmetic mean

Continued….

The Arithmetic Mean of the given numbers is

19.54. Image Source :
Geometric mean
Geometric mean
● Geometric mean of n numbers is

defined as the nth root of the product
of n numbers.
Formula:
Image Source : https://www.tutorialspoint.com/statistics/geometric_mean.htm

Geometric mean
Geometric mean
Continued….

Geometric mean
Geometric mean
Continued….
Example:
Problem Statement:
● Determine the geometric mean of
following set of numbers.
1 3 9 27 81

Geometric mean
Geometric mean
Continued….
Solution:
Here n = 5

Harmonic Mean
● What is mean
Harmonic Harmonic Mean?
is a type of average that
is calculated by dividing the number of
values in a data series by the sum of the
reciprocals (1/x_i) of each value in the
data series.
● A harmonic mean is one of the three
Pythagorean means (the other two are
arithmetic mean and geometric mean).
The harmonic mean always shows the
lowest value among the Pythagorean
means.
Image Source:
https://www.google.com/url?sa=i&source=imgres&cd=&cad=rja&uact=8&ved=2ah
Harmonic Mean
● The general formula for calculating a
harmonic mean is:
● Formula
Harmonicfor Harmonic
mean Mean
= n / (∑1/x_i)
● Where:
● n – the number of the values in a
dataset
● x_i – the point in a dataset
● The weighted harmonic mean can be
calculated using the following formula:
● Weighted Harmonic Mean = (∑w_i ) /
(∑w_i/x_i)
● Where:
● w_i – the weight of the data point
● x_i – the point in a dataset
Image Source:
Harmonic Mean
● You are a stock

Example analyst in an
of Harmonic investment
Mean
bank.
● Your manager asked you to determine
the P/E ratio of the index of the stocks of
Company A and Company B.
● Company A reports a market
capitalization of $1 billion and earnings
of $20 million, while Company B reports
a market capitalization of $20 billion and
earnings of $5 billion.
● The index consists of 40% of Company
A and 60% of Company B.
Image Source:
Harmonic Mean
Example of Harmonic Mean
● Firstly, we need to find the P/E ratios of

each company. Remember that the P/E
ratio is essentially the market
capitalization divided by the earnings.
● P/E (Company A) = ($1 billion) / ($20
million) = 50
● P/E (Company B) = ($20 billion) / ($5
billion) = 4
Image Source:
Harmonic Mean
● Example
We must of Harmonic
use Mean
the weighted harmonic
mean to calculate the P/E ratio of the
index. Using the formula for the
weighted harmonic mean, the P/E ratio
of the index can be found in the
following way:
● P/E (Index) = (0.4+0.6) / (0.4/50 + 0.6/4)
= 6.33
● Note that if we calculate the P/E ratio of
the index using the weighted arithmetic
mean, it would be significantly
overstated:
● P/E (Index) = 0.4×50 + 0.6×4 = 22.4 Image Source:
Median in Raw and
Grouped Data
Median in Raw Data
● The median of raw data is the number

which divides the observations when
arranged in an order (ascending or
descending) in two equal parts.
Image Source: https://www.math-only-math.com/images/median-of-raw-data.png

Median in Raw and
Grouped Data
Method of finding median
● Take the following steps to find the

median of raw data.
● Step I: Arrange the raw data in
ascending or descending order.
● Step II: Observe the number of variates
in the data. Let the number of variates in
the data be n. Then find the median as
following.
● (i) If n is odd then [Math Processing
Error]th variate is the median
Median in Raw and
Grouped Data
Method of finding median
● (ii) If n is even then the mean of [Math

Processing Error]th and ([Math
Processing Error] + 1)th variates is the
median, i.e.,
● median = [Math Processing Error].

Median in Raw and
Grouped Data
Solved Examples on Median of Raw
Data
● Find the median of the ungrouped data.
● 15, 18, 10, 6, 14
● Solution:
● Arranging variates in ascending order,
we get
● 6, 10, 14, 15, 18.
● The number of variates = 5, which is
odd.
● Therefore, median = [Math Processing
Error]th variate
● = 3rd variate
Median in Raw and
Grouped Data
Finding Median for Grouped Data
● Median is the value which occupies the

middle position when all the
observations are arranged in an
ascending or descending order. It is a
positional average.
● (i) Construct the cumulative frequency
distribution.
● (ii) Find (N/2)th term
● (iii) The class that contains the
cumulative frequency N/2 is called the
median class.
Median in Raw and
Grouped Data
Finding Median for Grouped Data
● (iv) Find the median by using the

formula:
● Where l = Lower limit of the median

class,
● f = Frequency of the median class
● c = Width of the median class,
● N = The total frequency (∑f)
● m = cumulative frequency of the class
Image Source:
preceeding the median class https://www.onlinemath4all.com/images/formulaformedianiofgroupeddata.png
Median in Raw and
Grouped Data
Solved Examples on Median of
Grouped Data
● A researcher studying the behavior of
mice has recorded the time (in seconds)
taken by each mouse to locate its food
by considering 13 different mice as 31,
33, 63, 33, 28, 29, 33, 27, 27, 34, 35,
28, 32. Find the median time that mice
spent in searching its food.
● 31, 33, 63, 33, 28, 29, 33, 27, 27, 34,
35, 28, 32
● Ascending order of given data is
● 27, 27, 28, 28, 29, 31, 32, 33, 33, 33,
34, 35, 63
● Middle value is 7th observation © Edunet Foundation. All rights reserved.
Mode in Raw and Grouped
Data
Finding the Mode in Raw Data
● To find the mode, or modal value, it is

best to put the numbers in order. Then
count how many of each number. A
number that appears most often is the
mode.
● 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14,
12, 56, 23, 29
● In order these numbers are:

Data
Finding the Mode in Raw Data
● 3, 5, 7, 12, 13, 14, 20, 23, 23, 23, 23,

29, 39, 40, 56
● This makes it easy to see which
numbers appear most often.
● This makes it easy to see which
numbers appear most often.
● In this case the mode is 23.

Data
Finding the Mode in Grouped Data
● In some cases (such as when all values

appear the same number of times) the
mode is not useful. But we can group
the values to see if one group has more
than the others.
● Example: {4, 7, 11, 16, 20, 22, 25, 26,
33}
● Each value occurs once, so let us try to
group them.
Data
Finding the Mode in Grouped Data
● We can try groups of 10:

● 0-9: 2 values (4 and 7)
● 10-19: 2 values (11 and 16)
● 20-29: 4 values (20, 22, 25 and 26)
● 30-39: 1 value (33)
● In groups of 10, the "20s" appear most
often, so we could choose 25 (the
middle of the 20s group) as the mode.
Standard Deviation
Standard Deviation Formulas
● The Standard Deviation is a measure of

how spread out numbers are.
● You might like to read this simpler page
on Standard Deviation first.
● But here we explain the formulas.
● The symbol for Standard Deviation is σ
(the Greek letter sigma).

Standard Deviation
Standard Deviation Formulas
● This is the formula for Standard

Deviation:
●
Image Source:
https://www.mathsisfun.com/data/images/standard-deviation-formula.gif
Standard Deviation
Steps for Standard Deviation
● Say we have a bunch of numbers like 9,

2, 5, 4, 12, 7, 8, 11.
● To calculate the standard deviation of
those numbers:
● 1. Work out the Mean (the simple
average of the numbers)
● 2. Then for each number: subtract the
Mean and square the result
● 3. Then work out the mean of those
squared differences.
● 4. Take the square root of that and we
are done! © Edunet Foundation. All rights reserved.
Variance
● Variance is What is Variance?

the expected value of the squared
deviation of a random variable from its mean.
● In short, it is the measurement of the distance of a
set of random numbers from their collective
average value.
● Variance is used in statistics as a way of better
understanding a data set's distribution.
Image Source:
https://365datascience.com/wp-content/uploads/2018/09/image7.jpg
Variance
How does Variance work?

● Variance is calculated by finding the square of the
standard deviation of a variable, and the
covariance of the variable with itself.
● In the formula above, u represents the mean of
the data points, x is the value of an individual data
point, and N is the total number of data points.
Image Source: https://images.deepai.org/glossary-terms/variance-6302132.jpg

Variance
How to Calculate Variance?

● Steps to Calculate Variance:
1. List elements of data set.The following are ages of
students pursuing a Master’s degree:
Data set 1: 28,25,26,27,31,32,24
2. Calculate the mean.
● (28 + 25 +26 +27 +31 +32 + 24) / 7 = 27.57
Image Source:
https://www.onlinemathlearning.com/image-files/population-mean.png
Variance

● (Continued)
Find the deviation from the mean for each data
point.
Image Source:
https://s3.amazonaws.com/acadgildsite/wordpress_images/Data+Science/varianc
Variance

(Continued)
● Square it
Image Source:
Variance
How to Calculate Variance? => (0.1849 + 6.6049 + 2.4649 + .3249 + 11.76 +

(Continued) 19.6249 + 12. 4609) / 7
⇒ 53.4303 /7 = 7.6329
● The average of all squared differences is ⇒ Variance=7.6329
the variance. To find it, add all squared
⇒ Standard Deviation=sqrt of Variance
variances and divide the sum by a
number of elements in data set (n).
● To find the standard deviation in ages of
students pursuing Master’s, we calculate
the square root of the variance
Image Source:
Variance
● Variance plays a major role in
Applications of Variance
interpreting data in statistics.
● The most common application of
variance is in polls.
● For opinion polls, the data gathering
agencies cannot invest in collecting data
from the entire population.
● They set criteria for sampling the
population based on ethnicity, income
group, regions, education level, salary
and religion, so that the population is Image Source:
https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uplo
Properties of Variance and
standard deviation
● Properties
Variance is a ofnumerical
Variancevalue that
describes the variability of observations
from its arithmetic mean.
● Variance is nothing but an average of
squared deviations.
● Variance is denoted by sigma-squared
(σ2)
● Variance is expressed in square units
which are usually larger than the values
in the given dataset.
Image Source:
● Variance measures how far individuals https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
standard deviation
Properties of Variance
● In statistics, variance is defined as the
(Continued)
measure of variability that represents
how far members of a group are spread
out.
● It finds out the average degree to which
each observation varies from the mean.
● When the variance of a data set is small,
it shows the closeness of the data points
to the mean whereas a greater value of
variance represents that the
Image Source:
observations are very dispersed ©around https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
Edunet Foundation. All rights reserved.
standard deviation
Properties of Standard Deviation
● Standard deviation is a measure that

quantifies the amount of dispersion of
the observations in a dataset.
● The low standard deviation is an
indicator of the closeness of the scores
to the arithmetic mean and a high
standard deviation represents.
● The scores are dispersed over a higher
Image Source:
range of values. https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
standard deviation
Properties of Standard Deviation
● Standard deviation is a measure of the
(Continued)
dispersion of observations within a data
set relative to their mean.
● The standard deviation is the root mean
square deviation.
● standard deviation is labelled as sigma
(σ).
● standard deviation which is expressed in
the same units as the values in the set
of data.
Image Source:
● Standard Deviation measures how© Edunet
much https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
Foundation. All rights reserved.
standard deviation
Example : To find Standard Deviation

and Variance
● Marks scored by a student in five
subjects are 60, 75, 46, 58 and 80
respectively.
● You have to find out the standard
deviation and variance.
● First of all, you have to find out the
mean,
Image Source:
https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
standard deviation
Example : To find Standard Deviation

and Variance
● Now calculate the variance
● Where, X = Observations
● A = Arithmetic Mean
● Both variance and standard deviation
are always positive.
● If all the observations in a data set are
identical, then the standard deviation
and variance will be zero.
Image Source:
standard deviation
Difference between Standard

Deviation and Variance
Image Source:
OLAP Concept
What is OLAP?
● Online Analytical Processing (OLAP) is
a category of software that allows users
to analyze information from multiple
database systems at the same time.
● It is a technology that enables analysts
to extract and view business data from
different points of view.
● Analysts frequently need to group,
aggregate and join data.
Image Source:
● These operations in relational databases https://myventurepad.com/wp-content/uploads/2017/05/what-is-ola-analysis.png
OLAP Concept
● OLAP Cube
OLAP databases are divided into one or
more cubes.
● The cubes are designed in such a way
that creating and viewing reports
become easy. The OLAP cube is a data
structure optimized for very quick data
analysis.
● The OLAP Cube consists of numeric
facts called measures which are
categorized by dimensions. Image Source:
https://myventurepad.com/wp-content/uploads/2017/05/what-is-ola-analysis.png
OLAP Concept
OLAP Cube
(Continued)
● A
How it Works?
Data warehouse would extract
information from multiple data sources
and formats like text files, excel sheet,
multimedia files, etc.
● The extracted data is cleaned and

transformed. Data is loaded into an
OLAP server (or OLAP cube) where
information is pre-calculated in advance Image Source:
https://myventurepad.com/wp-content/uploads/2017/05/what-is-ola-analysis.png
OLAP Concept
Basic analytical operations of OLAP
Four types of analytical operations in OLAP

are:
● Roll-up
● Drill-down
● Slice and dice
● Pivot (rotate)
Image Source:
https://cdn.educba.com/academy/wp-content/uploads/2019/11/Operations-in-OLA
OLAP Concept
Rollup
• Roll-up is also known as "consolidation"

or "aggregation." The Roll-up operation
can be performed in 2 ways
• 1.Reducing dimensions
• 2.Climbing up concept hierarchy.
Concept hierarchy is a system of
grouping things based on their order or
level. Image Source:
OLAP Concept
Rollup
(Continued)
• In this example, cities New jersey and
Lost Angles and rolled up into country
USA
• The sales figure of New Jersey and Los
Angeles are 440 and 1560 respectively.
They become 2000 after roll-up
• In this aggregation process, data is
location hierarchy moves up from city to
the country.
• In the roll-up process at least one or
Image Source:
more dimensions need to be removed. https://cdn.educba.com/academy/wp-content/uploads/2019/11/Operations-in-OLA
OLAP Concept
Drilldown
• In drill-down data is fragmented into

smaller parts. It is the opposite of the
rollup process. It can be done via
1. Moving down the concept hierarchy
2.Increasing a dimension
Image Source:
OLAP Concept
Drill Down
(Continued)
Consider the diagram :
1. Quater Q1 is drilled down to months

January, February, and March.
Corresponding sales are also registers.
2. In this example, dimension months are
added.
Image Source:
OLAP Concept
Slice
• In drill-down data is fragmented into

smaller parts. It is the opposite of the
rollup process. It can be done via
1. Moving down the concept hierarchy
2.Increasing a dimension
Image Source: https://www.guru99.com/online-analytical-processing.html

OLAP Concept
Slice
(Continued)
Consider the diagram :
● Dimension Time is Sliced with Q1 as the

filter.
● A new cube is created altogether.
Image Source: hhttps://www.guru99.com/online-analytical-processing.html

OLAP Concept
Dice
• This operation is similar to a slice. The

difference in dice is you select 2 or more
dimensions that result in the creation of
a sub-cube.
Image Source: https://www.guru99.com/online-analytical-processing.html

OLAP Concept
Pivot
● In Pivot, you rotate the data axes to

provide a substitute presentation of
data.
● In the following example, the pivot is
based on item types.

OLAP Concept
Types of OLAP systems
● Types of OLAP Systems

● ROLAP
● MOLAP
● HOLAP
● WOLAP
● DOLAP
● SOLAP
OLAP Concept
● What
Relational is ROLAP?
Online analytical
processing(ROLAP).
● ROLAP is an extended RDBMS along
with multidimensional data mapping to
perform the standard relational
operation.
● ROLAP works with data that exist in a
relational database.
● Facts and dimension tables are stored
as relational tables. It also allows Image Source::
https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
OLAP Concept
Advantages of ROLAP
● High data efficiency. It offers high data

efficiency because query performance
and access language are optimized
particularly for the multidimensional data
analysis.
● Scalability. This type of OLAP system
offers scalability for managing large
volumes of data, and even when the
Image Source::
data is steadily increasing. https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
OLAP Concept
Disadvantages of ROLAP
● Demand for higher resources: ROLAP
needs high utilization of manpower,
software, and hardware resources.
● Aggregately data limitations. ROLAP
tools use SQL for all calculation of
aggregate data. However, there are no
set limits to the for handling
computations.
● Slow query performance. Query
Image Source::
performance in this model is slow© Edunet
when https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
OLAP Concept
What is MOLAP?
● MOLAP uses array-based

multidimensional storage engines to
display multidimensional views of data.
Basically, they use an OLAP cube.
Image Source::
OLAP Concept
What is Hybrid OLAP?
● Hybrid OLAP is a mixture of both

ROLAP and MOLAP.
● It offers fast computation of MOLAP and
higher scalability of ROLAP. HOLAP
uses two databases.
● Aggregated or computed data is stored
in a multidimensional OLAP cube
● Detailed information is stored in a
Image Source::
relational database. https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
OLAP Concept
● This kind of OLAP helps to economize

Benefits
the of Hybrid
disk space, and itOLAP
also remains
compact which helps to avoid issues
related to access speed and
convenience.
● Hybrid HOLAP's uses cube technology
which allows faster performance for all
types of data.
● ROLAP are instantly updated and
HOLAP users have access to this
real-time instantly updated data. MOLAP Image Source::
OLAP Concept
OLAP tools
● Business Analytic tools (OLAP) are IBM

Cognos, Micro Strategy, Palo OLAP
Server, Apache Kylin, Oracle OLAP,
icCube, Pentaho BI, JsHypercube, etc.
● We can apply security restrictions on
users and objects using OLAP tools.
● It creates a single platform for planning,
forecasting, reporting, and analysis. Image Source:: https://www.educba.com/olap-tools/?source=leftnav
OLAP Concept
● OLAP is a platform for all type of

Advantages of OLAP
business includes planning, budgeting,
reporting, and analysis.
● Information and calculations are
consistent in an OLAP cube. This is a
crucial benefit.
● Quickly create and analyze "What if"
scenarios
● Easily search OLAP database for broad
or specific terms.
Image Source::
● OLAP provides the building blocks for
© Edunet https://cdn.educba.com/academy/wp-content/uploads/2019/04/OLAP-TOOLS.jpg
OLAP Concept
Advantages of OLAP
(Continued)
● Allows users to do slice and dice cube
data all by various dimensions,
measures, and filters.
● It is good for analyzing time series.
● Finding some clusters and outliers is
easy with OLAP.
● It is a powerful visualization online
analytical process system which
Image Source::
provides faster response times https://cdn.educba.com/academy/wp-content/uploads/2019/04/OLAP-TOOLS.jpg
OLAP Concept
Disadvantages of OLAP
● OLAP requires organizing data into a
star or snowflake schema. These
schemas are complicated to implement
and administer
● You cannot have large number of
dimensions in a single OLAP cube
● Transactional data cannot be accessed
with OLAP system.
● Any modification in an OLAP cube
Image Source::
needs a full update of the cube. This is Foundation.
© Edunet a https://cdn.educba.com/academy/wp-content/uploads/2019/04/OLAP-TOOLS.jpg
All rights reserved.
OLTP Concept
Overview of OLTP
● OLTP or Online Transaction Processing

is a type of data processing approach,
where the transactions play the major
role for data manipulation in the
database.
● This type of data processing is known
for its high performance, faster
accessibility and reliable & consistent Image Source:
https://www.opentextbooks.org.hk/system/files/resource/25/25212/25291/media/image58.JPG
data.
OLTP Concept
Understanding OLTP
● In the case of online airline booking, we

need to book an airline which is related
to insertion in the database.
● OLTP ensures the availability in the cart
and concurrency in case a large number
of users are accessing the same
website at the same time.

OLTP Concept
Characteristics OLTP
● 3NF databases
● Predefined operations
● Updating of databases is directly

accessible to end users.
● A small number of records

Image Source: https://www.educba.com/what-is-oltp/
● Maintaining historical data
OLTP Concept
How does OLTP make working so

easy
● Online transaction process concerns

about concurrency and atomicity.
● OLTP stores less historical data which

make it efficient.
● it maintains the consistency and

concurrency of the data in the
databases.
OLTP Concept
What can you do with OLTP?
● Its goal is to availability, speed,

concurrency, and recoverability.
● A large number of users can conduct
short transactions using OLTP systems.
● We can design such systems that help
in performing operations whose
database queries are usually simple,
require less than second response times
and return comparatively fewer records.
OLTP Concept
Working with OLTP
● It involves gathering information as

input, processing the data according to
needs and updating data to reflect the
processing information.
● For various decentralized database
systems, OLTP brokering programs
distribute transactions processes among
multiple computers on a network.
● OLTP is also carried into the
service-oriented architecture (SOA) and
Web services. © Edunet Foundation. All rights reserved.
OLTP Concept
OLTP Advantages
● Concurrency
● Acid Compliance
● Availability
● Integrity

OLTP Concept
OLTP Disadvantages
● For such concurrency, availability and

faster transactions OLTP often requires
support for transactions that include
many companies networks.
● Thus in today’s era, we require a more
decentralized system.

OLTP Concept
Why should we use OLTP?
● To use less paper and make a faster,

more accurate prediction of revenues
and expenses.
● The system that requires offline
maintenance makes a good requirement
for online transaction processing.
● Availability, concurrency, and atomicity
of data are much more important.

OLTP Concept
Why do we need OLTP?
● OLTP to perform the tasks

● Maintains normalized databases
● Decentralized system
● Business intelligence tasks

Analyze data using
statistical and data mining
techniques for business
intelligence.

● BI component framework
● business intelligence for management
● operational BI
● BI for process and performance improvement
● Role of Business Intelligence in Improving customer experience
● business intelligence role and responsibilities
● Popular BI tools in the market.

BI component framework
Architecture
● Architecture and components of a BI

system
Image Source:
http://www.myreadingroom.co.in/notes-and-studymaterial/65-dbms/560-business-intelligence
-and-its-architecture.html

Architecture Components
Data Warehouse
● Data warehouse is the core of the BI

system.
● A data warehouse is a database built for
the purpose of data analysis and
reporting
Image Source:

Extract Transform Load
● It is very likely that more than one

system acts as the source of data
required for the BI system.
● Finally, loads it into the data warehouse;
this process is called Extract Transform
Load (ETL).
Image Source:

Data model – BISM
● This layer, which we call the data model,

contains a file-based or memory-based
model of the data for producing very
quick responses to reports.
Image Source:

Data visualization
● The frontend of a BI system is data
visualization. In other words, data
visualization is a part of the BI system
that users can see.
● There are different methods for
visualizing information, such as strategic
and tactical dashboards, Key
Performance Indicators (KPIs), and Image Source:
detailed or consolidated reports. -and-its-architecture.html

Master Data Management
● Master Data Management (MDM) is the

process of maintaining the single
version of truth for master data entities
through multiple systems.
Image Source:

Data Quality Services
● The quality of data is different in each

operational system, especially when we
deal with legacy systems or systems
that have a high dependence on user
inputs.
Image Source:

Business intelligence for
management
BI Management
● BI Management ensures the

management and steering of business
intelligence and of the organizational
units involved as well as the integration
into an existing expert, technical and
organizational BI environment Image Source:
https://www.cubeserv.com/en/services/business-intelligence-ma
nagement/

management
What does Business Intelligence
Management include?
● Four components: analysts, data

solutions, decision making, and
oversight.
Image Source: https://www.betterbuys.com/bi/business-intelligence-management-optimizing-bi/

management
BI Governance
● BI governance defines the rules

according to which business intelligence
is steered, organized, implemented, and
developed further.
Image Source:
nagement/

management
BI Awareness
● BI awareness describes the

company-wide understanding of BI.
Uniform and consistent BI
understanding forms the basis for
successful BI projects.
Image Source:
nagement/

management
BI Strategy
● A BI strategy must be developed,

adapted, and updated continuously.
● i.e. the identification of the ambitions of
the BI sponsors and based on this, the
definition of the strategy-relevant initial
situation according to which concrete
Image Source:
goals can be derived. https://www.cubeserv.com/en/services/business-intelligence-ma
nagement/

management
BI Organization
● The BI Competence Centre is an

organizational unit that, ideally, will be a
service-providing division that is part of
a certain management field.
Image Source:
nagement/

management
BI Requirements Engineering
● Identification of BI-specific requirements

and their distinction from other projects
Image Source:
nagement/

Operational BI
Introduction
● Operational business intelligence (OBI)

systems provide an intermediate step
toward satisfying the strategic needs
that data warehouses address as well
as the tactical decision-making that
enterprise application integration (EAI)
addresses

Operational BI
Business Operations
● Hourly/daily minibatches of transactions

are sent to the OBI system that first logs
the transactions in a transaction
database, and then processes changes
in a data-mining engine. From this data,
the OBI system runs its rules-based
detection system, and generates a Image Source:
suspected fraud report. https://www.shadowbasesoftware.com/solutions/application-integration/rtbi/o
perational-bi/

Operational BI
Business Operations(Contd..)
● Business intelligence (BI) that helps

drive and optimize business operations
on a daily basis and sometimes used for
intra-day decision-making, is called
operational business intelligence.
Image Source:
https://www.shadowbasesoftware.com/solutions/application-integration/rtbi/o
perational-bi/

Operational BI
Business Operations(Contd..)
● Conceptually, OBI systems are thought

of as a data mart that is updated
frequently (daily, every few hours, or
even every few minutes or seconds)
with minibatches.
● OBI systems are similar to data marts
because they generally focus on a
specific task rather than on Image Source:
https://www.shadowbasesoftware.com/solutions/application-integration/rtbi/o
enterprise-wide functions. perational-bi/

Operational BI
Case Study : Real-Time Credit and
Debit Card Fraud Detection, an
HPE Shadowbase
● A complex suspicious or fraudulent

activity determination is made and
action taken while a transaction is in the
process of being gathered, routed,
authorized, and returned to the
origination point, or shortly thereafter,
Image Source:
typically far sooner than otherwise
/
https://www.shadowbasesoftware.com/solutions/application-integration/rtbi/operational-bi
achievable.

BI for process and
performance improvement
What is Business Intelligence
● Business intelligence (BI) is software

and services that take raw data and turn
it into relevant and practical insights that
companies can use to strengthen their
positions and business decisions.
● BI tools analyze large sets of data based
on queries that are written to fetch
specific types of information.
Image
Source:https://www.digitalvidya.com/wp-content/uploads/2018/05/What-is-Busine
BI for process and
What is Business Intelligence
● The results are then formatted and

displayed as summaries, graphs,
reports, charts, and maps for further
analysis for decision making.
● There are many company benefits when
it comes to gathering customer and
competitor data. Let's explore the
benefits of using BI for improving
internal workflows.
Image Source:
https://www.digitalvidya.com/wp-content/uploads/2018/05/What-is-Business-Intelli
BI for process and
Benefits of using business intelligence
for internal process improvement
● Change in your industry and even global

changes will affect business processes
at some point, and this means your
business will need to evolve its
processes to remain competitive. Using
BI to gather workflow-based information
offers many benefits, including these:
Image Source:
https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-of-bi-tool
BI for process and
● Reducing the time it takes to make

decisions
● Optimizing internal processes to help
your employees focus on higher-value
work
● Reducing the time it takes to get your
product or service to market.
Image Source:
BI for process and
● Increasing customer satisfaction through

improved efficiencies and better service
● Freeing up more time to focus on other
things, like quality and customer
retention initiatives
● Overall improved operational efficiency
and agility
Image
Source:https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-o
Role of Business
Intelligence in Improving
customer experience
Visualizing with Big Data
● As the old saying goes, ‘a picture is
worth a thousand words,’ and the same
can be said for understanding data.
● Visualization tools help organize
extremely large, fast-moving data sets in
real-time to understand the current state
of the customer experience.
● “[Data imaging] tools are very helpful in
drawing attention to critical points in the
customer journey and experience, and
pulling out some actionable insights,”
Image Source:
advises https://www.scnsoft.com/blog-pictures/business-intelligence/big-data-visualization
Role of Business
customer experience
Enabling self-service
● Jerry Leisure, head of customer

experience mobile-gaming company,
Kabam, says the gaming industry has a
well-developed player community that
altruistically wants to help fellow
gamers.
● Traditionally, this kind of crowd-sourced
self-support has lived in player forums
unaffiliated with the game maker itself.
Image
Source:https://www.bigmountainanalytics.com/wp-content/uploads/2019/04/worki
Role of Business
customer experience
Enabling self-service
● Companies can also use BI to

collaborate with influential customers.
● “We [can] provide helpful information to
content creators on YouTube, for
example, who then share that
information with other players,” says
Leisure.
Image
Source:https://www.bigmountainanalytics.com/wp-content/uploads/2019/04/worki
Role of Business
customer experience
Leveraging artificial Intelligence
● AI is already helping to improve the

traveler experience at various
touchpoints, explains Erica Ellington,
director of projects and support for
Southwest Airlines.
● “Business intelligence enables us to
leverage both structured and
unstructured data to help us make more
informed decisions to improve the
customer experience.”
Image Source:
https://emerj.com/wp-content/uploads/2017/05/Artificial-Intelligence-in-Business-I
Role of Business
customer experience
Leveraging artificial Intelligence
● McCallister of CX University adds that in

order for AI and BI to effectively discover
customer insights together, more data
needs to become structured (i.e.,
well-organized, uniform information).
● To transform data from unstructured to
structured requires capturing, tagging
and classifying as much data as
possible.
Image
Source:https://emerj.com/wp-content/uploads/2017/05/Artificial-Intelligence-in-Bus
Business Intelligence Role
and Responsibilities
The Role of Business Intelligence
● Business intelligence, or BI, is a type of

software that can harness the power of
data within an organization.
● It offers a better way to sort, compare,
and review data in order for companies
to make smart decisions.
Image
Source:https://expert360.com/sites/default/files/media_embed/2019-02/blog.1.png
● Companies adopting business

intelligence solutions can turn business
data into insights and take plausible
action.
● These insights can help companies
make strategic business decisions that
increase productivity, improve revenues,
and enhance growth
Image
● Companies adopting business

intelligence solutions can turn business
data into insights and take plausible
action.
● These insights can help companies
make strategic business decisions that
increase productivity, improve revenues,
and enhance growth
Image
BI lives Up to Its Reputation
● There are many reasons why

companies choose business intelligence
solutions.
○ Better planning and analysis
○ Increased accuracy
○ Helped considerably with sales
forecasting
○ Improved pricing and offers
Image Source:
https://specials-images.forbesimg.com/imageserve/5ed9c9ad6d83330007c8079d/
Better planning and analysis
● Companies felt that BI systems helped

them the most with faster reporting,
planning, and analysis.
● 64% of responding companies ranked
their ability to report, plan and analyze
data as “good” after implementing a
business intelligence suite.
Image Source:
https://thebusinessanalystjobdescription.com/wp-content/uploads/2015/01/Busine
Increased Accuracy
● Among the companies surveyed, 56%

felt that business intelligence data
increased the accuracy of their business
analysis and planning.
Image
Source:https://cdn.datafloq.com/cache/blog_pictures/878x531/6-data-analytics-bu
Helped considerably with sales
forecasting
● Among the many tasks that companies

felt that business intelligence data
helped with, 57% ranked sales
forecasting and planning as the area
receiving the most benefit from BI data.
● Other areas where they felt that BI date
provided assistance was in customer
behavior analysis (40%) and a unified
view of customers (32%).
Image Source:
https://s32519.pcdn.co/wp-content/uploads/2019/12/measuring-forecast-accuracy
Improved pricing and offers
● Pricing and offer optimization benefited

somewhat from the implementation of a
BI system.
● 27% of respondents felt that the
additional data derived from their BI
system helped them improve their
pricing structure to become more
competitive, as well as improve the
attractiveness of their offers.
Image Source
https://images.techhive.com/images/article/2016/02/bi-business-intelligence-ts-10
Popular BI tools in the
market
What are BI Tools
● BI tools are types of software used to

gather, process, analyze, and visualize
large volumes of past, current, and
future data in order to generate
actionable business insights, create
interactive reports, and simplify the
decision-making processes.
Image
Source:https://www.infomazeelite.com/wp-content/uploads/2020/01/Business-intel
market
What are BI Tools
● These tools include key features such

as data visualization, visual analytics,
interactive dashboarding and KPI
scorecards.
● Additionally, they enable users to utilize
automated reporting and predictive
analytics features based on self-service.
Image
Source:https://www.infomazeelite.com/wp-content/uploads/2020/01/Business-intel
market
The benefit of BI tools
● Professional software and tools offer

various prominent benefits, here we will
focus on the most invaluable ones:
○ They bring together all relevant
data
○ Their true self-service analytics
approaches unlock data access
○ Users can take advantage of
predictions
○ They eliminate manual tasks
○ They reduce business costs
Image Source:
market
● BI tools that are leaders in the business

intelligence community, often mentioned
in industry articles, and obtain a favorable
level of user reviews on Capterra, as
mentioned.
● The order of the tools is random and
doesn't represent a grading or ranking
system in any form.
Image Source:
market
● DATAPINE
● SAS Business Intelligence
● Clear Analytics
● SAP Business Objects
● DOMO
● Microstrategy
● Good Data
● IBM Cognos Analytics
● Qlikview
● Yellowfin BI
Image
Source:https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-o
market
DATAPINE
● Datapine is a BI software that lets you

connect your data from various sources
and analyze with advanced analytics
features (including predictive).
● With your analysis, you can create a
powerful business dashboard (or
several), generate standard or
customized reports or incorporate
intelligent alerts to get notified of
anomalies and targets.
Image Source: https://www.datapine.com/images/datapine-bi-tool.png
market
DATAPINE
● This tool, rated with outstanding 4.6

stars on Capterra, is a powerful solution
for businesses of all sizes since
datapine can be implemented for
various industries, functions, and
platforms, no matter the size.
Image Source: https://www.datapine.com/images/datapine-bi-tool.png

market
DATAPINE
● Key Feature of DATAPINE

○ Intuitive drag-and-drop interface
○ Easy-to-use predictive analytics
○ Many interactive dashboard
features
○ Multiple reporting options
○ Smart insights and alarms based
on artificial intelligence
Image Source:https://www.datapine.com/images/datapine-bi-tool.png
market
SAS BUSINESS INTELLIGENCE
● SAS Business Intelligence is a software

solution offering numerous products and
technologies for data scientists, text
analysists, data engineers, forecasting
analysts, econometricians, and
optimization modelers, among others.
Image Source:https://www.datapine.com/images/sas-business-intelligence.png
market
● Founded in the 70s, SAS Business

Intelligence enjoys a long tradition in the
market, building and expanding its
products every year.
● With a Capterra rating of 4.5*, this
software enjoys a high level of users’
trust and satisfaction.
market
● Key features of SAS Business

Intelligence
○ Data exploration supported by
machine learning
○ Text analytics capabilities
○ Reports and dashboards across
devices
○ Integration with other applications
market
CLEAR ANALYTICS
● Clear Analytics is a tool that

consolidates data from internal systems,
cloud, accounting, CRM, and allows you
to drag-and-drop that data into Excel.
● It works with Microsoft Power BI, using
Power Query and Power Pivot to clean
and model different datasets.
● Capterra gives a high user review of 4.5
stars making this tool also one of the
highest-rated on our list.
Image Source:https://www.datapine.com/images/clear-analytics.png
market
CLEAR ANALYTICS
● Key Features of Clear Analytics:

○ Reports delivered to Power BI
○ Connected with Excel
○ Sharing on mobile devices
○ A full audit trail
○ Fetch data elements with a
semantic layer
Image Source:https://www.datapine.com/images/clear-analytics.png
market
SAP BUSINESS OBJECTS
● SAP BusinessObjects is a business

intelligence suite designed for
comprehensive reporting, analysis, and
data visualization.
● They provide Office integrations with
Excel and PowerPoint where you can
create live presentations and hybrid
analytics that connects to their
on-premise and cloud SAP systems.
Image Source:
https://www.apprisia.com/blog/wp-content/uploads/2013/04/SAPBIBO.gif
market
● They’re focused on business categories

such as CRM and customer experience,
ERP and digital core, HR and people
engagement, digital supply chain, and
many more.
● To be accurate, more than 170M users
leverage SAP across the world, making
it one of the largest software suppliers in
the world.
Image
Source:https://www.apprisia.com/blog/wp-content/uploads/2013/04/SAPBIBO.gif
market
● Key features of SAP Business Objects:

○ A BI enterprise reporting system
○ Self-service, role-based
dashboards
○ Cross-enterprise sharing
○ Connection with SAP Warehouse
and HANA
○ Integration with Office
Image
Source:https://www.apprisia.com/blog/wp-content/uploads/2013/04/SAPBIBO.gif
market
DOMO
● Domo is a BI solution comprised of

multiple systems that are featured in this
platform, starting with connecting the
data, and finishing with extending data
with pre-built and custom apps from the
Domo Appstore.
● You can use Domo also for your data
lakes, warehouses, and ETL tools,
alongside with R or Python scripts to
prepare data for predictive modeling.
Image Source:
https://salestechstar.com/wp-content/uploads/2020/01/Domo-Ranked-Top-Platfor
market
DOMO
● Key features of DOMO:

○ Numerous pre-built cloud
connectors
○ Magic ETL feature
○ Automatically suggested
visualizations
○ Mr. Roboto as an AI engine
○ Domo Appstore
Image
Sourcehttps://salestechstar.com/wp-content/uploads/2020/01/Domo-Ranked-Top-
market
MICROSTRATEGY
● MicroStrategy is an enterprise analytics

and mobility platform focused on hyper
intelligence, federated analytics, and
cloud solutions.
● Their mobile dossiers enable users to
build interactive books of analytics that
render on iOS or Android devices, with
the possibility to extend the
MicroStrategy content into their apps by
using Xcode or JavaScript.
Image Source:https://i.ytimg.com/vi/LGrKsT76Es4/maxresdefault.jpg
market
MICROSTRATEGY
● Capterra users gave a solid 4* review,

hence, this is one of our examples of
business intelligence tools having strong
references on the BI market.
Image Sourcehttps://i.ytimg.com/vi/LGrKsT76Es4/maxresdefault.jpg:
market
MICROSTRATEGY
● Key features of Microstrategy

○ Hyperintelligence pulls your data
○ Federated analytics
○ Mobile deployment
○ Integration with voice technology
○ Cloud technology
Image Source https://i.ytimg.com/vi/LGrKsT76Es4/maxresdefault.jpg:

market
GOODDATA
● GoodData is a business analytics

software that provides the tools for data
ingestion, storage, analytic queries,
visualizations, and application
integration.
● You can embed their analytics into your
website, desktop or mobile application
or create dashboards and reports for
your daily activities, without the need to
obtain a Ph.D., as stated on their
website.
Image Source: https://images.g2crowd.com/uploads/attachment/file/108698/1.png
market
GOODDATA
● Key features of Gooddata

○ Customers can publish their own
reports
○ A modular data pipeline
○ A platform for developers
○ Additional support
○ 4 Data centers
Image Source:https://images.g2crowd.com/uploads/attachment/file/108698/1.png
market
IBM COGNOS ANALYTICS
● Part of the Microsoft family, IBM Cognos

Analytics is a cloud-based business
intelligence software that utilizes AI
recommendations when creating
dashboards and reports, geospatial
capabilities to overlay your data with the
physical world, and enables you to ask
questions in plain English to
communicate with the software.
Image Source
https://software-advice.imgix.net/managed/products/screenshots/screenshot_144
market
● A robust solution from one of the

industry leaders in software
development, IBM Cognos Analytics
received a sturdy 4 stars review on
Capterra.
Image Source:
market
● Key features of IBM Cognos Analytics:

○ Search mechanism
○ A single data module
○ Interactive data visualization
○ AI assistant
○ Extensive knowledge center:
Integration with other applications
Image Source:
market
QLIKVIEW
● QlikView is one of BI applications

offered by Qlik as part of its data
analytics platform focused on rapid
development and guided analytics
applications and dashboards.
● It’s built on an Associative Engine that
allows data discovery without the need
to use query-based tools, eliminating the
risk of data loss and inaccurate results.
Image Source:https://cdn.buttercms.com/q4iN12vOSZKTsjOWdxbB
market
QLIKVIEW
● A high rating of 4.5 stars on Capterra,

users are quite satisfied with this
product and its features, making it one
of the top BI tools on our list.
market
QLIKVIEW
● Key features of QLIKVIEW:

○ Associative exploration
○ Visually highlighted dashboards
○ Associative Engine
○ A dual-use strategy
○ Developer’s platform
market
YELLOWFIN BI
● A suite of products consisted of

dashboards, signals, stories, data
discovery and data prep, this BI
analytics tool offers numerous features,
including a mobile app available both for
Android and iOS devices.
● Capterra users gave a strong rating of
4.5*, hence, it makes sense to take a
closer look at what they have on offer.
Image
Source:https://www.channelfutures.com/files/2018/07/Cloud-Analytics-2018.jpg
market
YELLOWFIN BI
● Key features of YELLOWFIN BI:

○ Yellowfin signals via smartphone
○ Persuasive data stories
○ Smart tasks
Image
Source:https://www.channelfutures.com/files/2018/07/Cloud-Analytics-2018.jpg
Understand case studies for
predictive models

● Concept of data mining techniques

● Concepts of data mining model with its development and deployment in
business scenario
● Data mining models
● CRISP-DM model
● understanding of data and its preparation techniques for the better model
building
● introduction to sampling and data partitioning in data mining project

Concept of Data Mining
Techniques
Data Mining concept
● Data mining processes structured

information through the application of
artificial intelligence, neural networks,
and advanced statistical tools in order to
detect patterns and summarize data into
a format that can be understood.
● It allows corporations to anticipate future
trends, uncover new opportunities, and
most importantly improve overall
performance.
Image Source: https://slideplayer.com/slide/10943778/
Techniques
Data Mining concept (Contd.)

● Data mining is the term used to describe
the process of extracting value from a
database..
● Data mining involves the use of
sophisticated data analysis tools to
discover previously unknown, valid
patterns and relationships in large
data sets.
● Data Mining consists of more than
collecting and managing data, it also
includes analysis and prediction. Image Source:
https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#DMCO
Techniques
Data mining techniques
– Tracking patterns
● It is one of the most elementary

techniques in data mining is learning to
identify outlines in your data sets.
● It is typically a recognition of some
deviation in your data trendy at regular
intervals, or a variation of a certain
variable over time.
Image Source: https://www.guru99.com/data-mining-tutorial.html
Techniques
– Classification
● It is a classic data mining technique

based on machine learning.
● It is used to classify each item in a set of
data into one of a predefined set of
classes or groups.
● Classification method makes use of
mathematical techniques such as
decision trees, linear programming,
neural network, and statistics. Image Source: https://www.guru99.com/data-mining-tutorial.html
Techniques
– Association
● Association data mining notices

recurring themes in databases,
recognizes relations between them and
develops a pattern of these relations.
● It will then use these patterns as a
reference to predict future behaviour.

Techniques
– Outlier Detection
● Outlier is defined as an observation that

deviates too much from other
observations. The identification of
outliers can lead to the discovery of
useful and meaningful knowledge.
● In many cases, basically identifying
the all-embracing pattern cannot give
you a clear understanding of your data
set. You also need to be able to classify
irregularities or outliers in your data. Image Source: https://www.guru99.com/data-mining-tutorial.html
Techniques
– Clustering
● Clustering is a data mining technique

that makes a meaningful or useful
cluster of objects which have similar
characteristics using the automatic
technique.
● The clustering technique defines the
classes and puts objects in each
class, while in the classification
techniques, objects are assigned into
predefined classes. © Edunet Foundation. All rights reserved.
Techniques
– Regression
● Regression, used mainly as a form

of planning and modelling, is used to
classify the probability of a certain
variable, given the presence of other
variables.
● Regression and Classification both are
used in prediction analysis, but
regression is used to predict a numeric
or continuous value while classification
assigns data into discrete categories.
Techniques
– Prediction
● It is one of a data mining technique

that learns the relationship between
independent variables and the
relationship between dependent and
independent variables.
● Prediction derives the relationship
between a thing you know and a thing
you need to predict for future reference.
Concepts of data mining
model with its development
and deployment in business
Phases and Tasks
scenario
● It is a step by step procedure for

implementation of data mining in a
business scenario.
● The phases and tasks include –
Business understanding, Data
understanding, Data preparation,
Modelling, Evaluation, Deployment.

Business understanding
scenario
● Data mining goals are defined.

● The fundamental requirement is to
understand client and business
objectives.
● Current data mining scenario, factors
in resources, constraints and
assumptions should be taken into the
assessment.
Image Source:
https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
Datascenario
understanding
● In this stage, a sanity check is

conducted to understand whether it is
appropriate for data mining goals.
● The data is collected from various
sources within the organization.
● It is a highly complex process since data
and process from various sources
unlikely to match easily.
Image Source:
Data preparation
scenario
● The data is production ready in this

stage.
● The data from diverse sources
should be nominated, cleaned,
transformed, formatted, anonymized,
and created.
● Data cleaning is a process to "clean" the
data by smoothing noisy data and
satisfying in missing values. Image Source:
Modelling
scenario
● In this stage, mathematical models

are used to determine the data
patterns.
● Suitable modelling techniques need to
be chosen for the prepared data set.
● After that, create a scenario to validate
the model. Then run the model on the
prepared data set.
Image Source:
Evaluation
scenario
● In this stage, patterns recognized are

examined against business objectives.
● A go or no-go decision should be taken
to move the model in the deployment
phase.
Image Source:
Deployment
scenario
● In this stage, ship your data mining

discoveries to every business operation.
● A thorough deployment plan, for
shipping, maintenance, and monitoring
of data mining discoveries is created.
Image Source:
Data mining models
Types of models
● Data mining models can be broadly

classified into two categories:
○ Predictive Model
○ Descriptive Model
Image Source:
https://www.researchgate.net/figure/Basic-data-mining-models_fig3_308698620
Data mining models
Predictive model
● The predictive model makes a forecast

about unidentified data values by
using the identified values.
● The forecast is the process of
investigating the existing and previous
states of the attribute and forecast of its
forthcoming state.
● The techniques that fall under this
category are the classification,
Image Source:
regression and time-series analysis. https://www.researchgate.net/figure/Basic-data-mining-models_fig3_308698620
Data mining models
Descriptive model
● It identifies the projects or

relationships in data and discovers
the properties of the data studied.
● These descriptive data mining
techniques are used to obtain
information on the regularity of the data
by using raw data as input and to
discover important patterns.
● For example, Clustering,
Summarization, Association rule, Image Source:
https://www.researchgate.net/figure/Basic-data-mining-models_fig3_308698620
CRISP-DM model
Basic concepts
● It stands for Cross-Industry Standard

Process for Data Mining, an
industry-proven way to guide your data
mining efforts.
● As a methodology, it includes
descriptions of the typical phases of a
project, the tasks involved with each
phase, and an explanation of the
relationships between these tasks.
● As a process model, CRISP-DM
Image Source:
provides an overview of the data ©mining https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Why is it necessary to Partition ?
● For easy management

● To assist backup/recovery
● To enhance performance
Image Source: https://dev.to/alibayatgh/what-is-data-partitioning-171o

Understanding of data and
its preparation techniques
for the better model building
What is data preparation ?
● Data preparation is the process of

cleaning and transforming raw data prior
to processing and analysis.
● It is an important step prior to
processing and often involves
reformatting data, making corrections to
data and the combining of data sets to
enrich data.
Image Source:
https://bigdataanalyticsnews.com/data-preparation-why-is-it-important/
Why prepare data ?
● Data need to be formatted for a given

software tool.
● Data need to be made adequate for a
given method.
● Data in the real world is dirty as :
○ Incomplete
○ Noisy (contains error)
○ Inconsistent
Image Source:
Major task in Data preparation
● Data discretization
● Data cleaning
● Data integration
● Data transformation
● Data reduction
Image Source:
Crisp-DM (Data Understanding)
● Collect Data
● Describe data
● Explore data
● Verify data quality
Image Source:
https://paginas.fe.up.pt/~ec/files_1112/week_03_Data_Preparation.pdf
Crisp-DM (Data preparation)
● Select data
○ Reconsider data selection criteria.
○ Decide which dataset will be used.
○ Collect appropriate additional data
(internal or external).
○ Consider use of sampling
techniques.
○ Explain why certain data was
included or excluded.
Image Source:
● Clean data
○ Correct, remove or ignore noise.
○ Decide how to deal with special
values and their meaning
○ Aggregation level, missing values,
etc.
○ Outliers?
Image Source:
● Construct data
○ Derived attributes.
○ Background knowledge.
○ How can missing attributes be
constructed or imputed?
Image Source:
● Integrate data
○ Integrate sources and store result
(new tables and records).
Image Source:
● Format data
○ Rearranging attributes
○ Reordering records (Perhaps the
modelling tool requires that the
records be sorted according to the
value of the outcome attribute).
○ Reformatted within-value These
are purely syntactic changes made to
satisfy the requirements of the
specific modelling tool, remove illegal
Image Source:
characters, uppercase lowercase). https://paginas.fe.up.pt/~ec/files_1112/week_03_Data_Preparation.pdf
mining projects
What is data sampling ?
● Data sampling is a statistical analysis

technique used to select, manipulate
and analyze a representative subset of
data points to identify patterns and
trends in the larger dataset being
examined.
Image Source:
https://www.analyticsvidhya.com/blog/2019/09/data-scientists-guide-8-types-of-sa
mining projects
What is data sampling ?
● It enables data scientists, predictive

modelers and other data analysts to
work with a small, manageable amount
of data about a statistical population to
build and run analytical models more
quickly, while still producing accurate
findings.
Image Source:
mining projects
Advantages of sampling
● Sampling can be particularly useful with

data sets that are too large to efficiently
analyze in full.
● Identifying and analyzing a
representative sample is more efficient
and cost-effective than surveying the
entirety of the data or population.
Image Source: https://www.dreamstime.com/illustration/advantages.html

mining projects
Steps involved in sampling
● Identify and define Target population

● Select sampling frame
● Choose sampling methods
● Determine Sample size
● Collect the required data
Image Source:
mining projects
Types of data sampling methods
● Simple random sampling

● Stratified sampling
● Cluster sampling
● Systematic sampling
Image Source:
mining projects
Types of data sampling methods
(Nonprobability)
● Convenience sampling
● Snowball sampling
● Purposive or judgmental sampling
● Quota sampling
Image Source:
mining projects
Data Partitioning
● The simplest and most fundamental

version of cluster analysis is partitioning,
which organizes the objects of a set into
several exclusive groups or clusters.
● Given a data set, D, of n objects, and k,
the number of clusters to form, a
partitioning algorithm organizes the
objects into k partitions (k ≤ n), where
each partition represents a cluster.
Image Source: https://dev.to/alibayatgh/what-is-data-partitioning-171o
Able to develop case
studies for predictive
analytical models

● Concepts of machine learning

● approach for data mining using decision tree inductive concept
● conceptual cluster
● attribute oriented induction
● iterative database scanning
● attribute focusing
● neural networks
● rough sets
● visualization
● concepts of odds
● concepts of odds ratio
Concepts of Machine
Learning
What is Machine Learning ?
● The term machine learning was first

introduced by Arthur Samuel in 1959.
● Machine Learning is said as a subset of
artificial intelligence that is mainly
concerned with the development of
algorithms which allow a computer to
learn from the data and past
experiences on their own.
Image Source: https://expertsystem.com/machine-learning-definition/
Concepts of Machine
Learning
What is Machine Learning ?
● We can define it in a summarized way

as:
Machine Learning enables a machine to

automatically learn from data, improves
performance from experiences, and
predict things without being explicitly
programmed.
Image Source: https://www.javatpoint.com/machine-learning

Concepts of Machine
Learning
How does Machine Learning work ?
● A Machine Learning system learns from

historical data, builds the prediction
models, and whenever it receives new
data, predicts the output for it.
● The accuracy of predicted output
depends upon the amount of data, as
the huge amount of data helps to build a
better model which predicts the output
more accurately.
Image Source: https://www.javatpoint.com/machine-learning
Concepts of Machine
Learning
Features of Machine Learning
● Machine Learning uses data to detect

various patterns in a given dataset.
● It can learn from past data and improve
automatically.
● It is a data-driven technology.
● Machine learning is much similar to data
mining as it also deals with the huge
amount of the data.
Image Source:
https://sarvosys.com/electronic-logging-devices/attachment/cwtype-features/
Concepts of Machine
Learning
Importance of Machine Learning
● Rapid increment in the production of

data
● Solving complex problems, which are
difficult for a human
● Decision making in various sector
including finance
● Finding hidden patterns and extracting
useful information from data.
Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/

Concepts of Machine
Learning
Classification of Machine Learning
(Supervised Learning)
● Supervised learning is a type of machine

learning method in which we provide
sample labeled data to the machine
learning system in order to train it, and
on that basis, it predicts the output.
● The goal of supervised learning is to
map input data with the output data.
Concepts of Machine
Learning
(Supervised Learning)
● The supervised learning is based on

supervision, and it is the same as when
a student learns things in the
supervision of the teacher. The example
of supervised learning is spam filtering.
● Supervised learning can be grouped
further in two categories of algorithms:
● Classification
● Regression Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/
Concepts of Machine
Learning
(Unsupervised Learning)
● Unsupervised learning is a learning

method in which a machine learns
without any supervision.
● The goal of unsupervised learning is to
restructure the input data into new
features or a group of objects with
similar patterns.
Concepts of Machine
Learning
(Unsupervised Learning)
● In Unsupervised learning, we don't have

a predetermined result.
● he machine tries to find useful insights
from the huge amount of data. It can be
further classified into two categories of
algorithms:
● Clustering
● Association
Concepts of Machine
Learning
(Reinforcement Learning)
● Reinforcement learning is a
feedback-based learning method, in
which a learning agent gets a reward for
each right action and gets a penalty for
each wrong action.
● The agent learns automatically with
these feedbacks and improves its
performance.
Concepts of Machine
Learning
(Reinforcement Learning)
● In reinforcement learning, the agent

interacts with the environment and
explores it.
● The goal of an agent is to get the most
reward points, and hence, it improves its
performance.
● The robotic dog, which automatically
learns the movement of his arms, is an
example of Reinforcement learning. Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/
Approach for data mining
using decision tree inductive
concept
What is Data Mining ?
● The process of extracting information to

identify patterns, trends, and useful data
that would allow the business to take the
data-driven decision from huge sets of
data is called Data Mining.
● Data Mining is also called Knowledge
Discovery of Data (KDD).
Image Source: https://www1.cmc.edu/pages/faculty/BHunter/datamining.html
concept
What is Decision Tree ?
● Decision Tree is a supervised learning

method used in data mining for
classification and regression methods.
● It is a tree that helps us in
decision-making purposes.
● The decision tree creates classification
or regression models as a tree structure.
● Decision trees can deal with both
categorical and numerical data. Image Source: https://www.educba.com/decision-tree-in-data-mining/
concept
What is Decision Tree ?
● A decision tree is a structure that

includes a root node, branches, and leaf
nodes. Each internal node denotes a
test on an attribute, each branch
denotes the outcome of a test, and each
leaf node holds a class label. The
topmost node in the tree is the root
node.
Image Source:
http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4_dtrees1.html
concept
Key Factors
● Entropy : Entropy refers to a common

way to measure impurity. In the decision
tree, it measures the randomness or
impurity in data sets.
Image Source: https://www.javatpoint.com/decision-tree-induction

concept
Key Factors
● Information Gain : Information Gain

refers to the decline in entropy after the
dataset is split. It is also called Entropy
Reduction. Building a decision tree is all
about discovering attributes that return
the highest data gain.
Image Source: https://www.javatpoint.com/decision-tree-induction

concept
Why are Decision Trees useful
● It enables us to analyze the possible

consequences of a decision thoroughly.
● It provides us a framework to measure
the values of outcomes and the
probability of accomplishing them.
● It helps us to make the best decisions
based on existing data and best
speculations.
Image Source: https://www.empathyrooms.com/why-not-to-use-the-word-why/
concept
Advantages of using Decision Trees
● A decision tree does not need scaling of

information.
● Decision trees need less exertion for
data preparation during pre-processing.
● It is automatic and simple to explain to
the technical team as well as
stakeholders.
● A decision tree does not require a
standardization of data. Image Source: https://www.dreamstime.com/illustration/advantages.html
Conceptual Cluster Learning
Introduction
● Conceptual clustering is a machine

learning paradigm for unsupervised
classification developed mainly during
the 1980s.
● It is distinguished from ordinary data

clustering by generating a concept
description for each generated class.
https://image.slideserve.com/1450644/types-of-clusters-conceptual-clusters-l.jpg
Introduction (Continue)
● Most conceptual clustering methods are

capable of generating hierarchical
category structures.
● Conceptual clustering is closely related

to formal concept analysis, decision tree
learning, and mixture model learning.
https://www.researchgate.net/publication/267363337/figure/fig3/AS:66904603492
3528@1536524413930/A-conceptual-cluster-Some-intersected-points-in-both-the
● The conceptual part of the process lies

in how the exemplars are agglomerated/
divided rather than in how the clusters
are described (i.e.. the cluster forming
mechanism need not maintain any
cluster descriptions).
● The second view is that of concept

formation, with exemplars as the
catalyst. © Edunet Foundation. All rights reserved.
● Under this view clusters are formed

according to their conceptual
descriptions, i.e., the system must
constantly maintain conceptual
descriptions of clusters and cluster
membership is constrained by the
concepts available to describe the
results.
● Following the terminology of psychology,

the first view will here be called
conceptual sorting. The second view will
be called concept discovery. Each in its
own way can be said to involve
conceptual clustering.

Conceptual Clustering vs. Data

Clustering
● Conceptual clustering is obviously

closely related to data clustering.
● However, in conceptual clustering it is

not only the inherent structure of the
data that drives cluster formation, but
also the Description language which is
available to the learner.
https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRvzdyEAHlrBDDCI
© Edunet Foundation. All rights reserved. 8-php1EbhwKjvd1fTEhzg&usqp=CAU
Conceptual Clustering vs. Data

Clustering (Continue)
● Thus, a statistically strong grouping in

the data may fail to be extracted by the
learner if the prevailing concept
description language is incapable of
describing that particular regularity.

Conceptual Clustering as Concept

Sorting
● One view of conceptual clustering

proposes to produce interesting
groupings and then provide them with a
conceptual interpretation.
● That is. to build extensionally defined

categories (by enumerating their
members) and then find a conceptual
interpretation. https://i.ytimg.com/vi/XpUk3vhC9AA/maxresdefault.jpg

Sorting (Continue)
● A major reason independently rendered

clusters can have rather unappealing
conceptual interpretations is that they
practice no concept-related similarity
measurement.


Sorting (Continue)
There are two points to be made here:
1. The similarity metric used defines a

gradient over the feature space that
possesses one of the conceptual
irregularities that underlying the domain.
2. The similarity metric views all attributes

with a fixed relevance to the problem
without any way to determine attribute https://blog.maketaketeach.com/wp-content/uploads/2016/03/Sort-WOW-border.j
relevancy from patterns in' the data.

© Edunet Foundation. All rights reserved. pg

Discovery
● Concept discovery systems focus on the

determination of concepts (according to
some concept representation system) to
describe each category that is formed.
● Indeed, categories are formed such that

their descriptions are as desired by the
applied biases (including
representational constraints) and a
https://www.researchgate.net/profile/Chung-Hsien_Wu/publication/224202721/fig
concept based cluster quality measure. ure/fig10/AS:668839448682505@1536475159358/Overview-of-textual-concept-di

Discovery (Continue)
● It is the category descriptions that are

constantly monitored, generalized,
specialized, and evaluated by the
concept-based quality measure.
● These systems incorporate mechanisms

to propose multi-relation (polythetic)
concepts as category descriptions.

● The availability of concepts is governed

by the biases of the system and the
background knowledge that is applied.
● For Example, The grape and apple differ

in color and type-of-fruit but are both
ripe; the orange and apple differ in color
and type and ripeness.

● Without background knowledge, the

concept-based approach reverts to the
attribute-based one.
● It is background knowledge that makes

the feature space and concept space
rough and irregular so that the fit of the
data to the irregularities can be used to
help confirm a candidate conceptual
interpretation. © Edunet Foundation. All rights reserved.
Knowledge-Based Conceptual
Clustering
● Discovering concepts by conceptual

clustering is not purely an inductive
inference process.
● A portion of the process involves

deductive inference to determine from
background knowledge latent attributes
for exemplars and appropriate concepts
to ready as candidate category
http://www.inf.ufrgs.br/~engel/data/media/file/Aprendizagem/Cobweb.pdf
descriptions. © Edunet Foundation. All rights reserved.
● A system equipped with sizable

background knowledge and a deductive
mechanism for accessing and applying it
can make a wide variety of appropriate
transformations of exemplars that will
greatly aid concept formation.
http://www.inf.ufrgs.br/~engel/data/media/file/Aprendizagem/Cobweb.pdf
● For example, an inference rule could
suggest the construction of an attribute
whose values report the number of other
attributes (from a subset of other
attributes) having values that differ from
the most frequent attribute values.
● Such a derived attribute supports

polymorphic concepts like "2 of the 3
attributes A. B. and C have target values
of x. y, and z. respectively”.
● Since the system knows the definition of

the attribute (from background
knowledge) it is able to state
polymorphic concepts in easily
understood terms.
● The point is that additional knowledge

applied during clustering can have a
great effect on the types of categories
formed. © Edunet Foundation. All rights reserved.
Type Levels
● Type-0: Statistic-based quality measure;

no conceptual interpretation.
● Type-1: Statistic-based quality measure;

conceptual interpretation after-the-fact.
● Type-2: Attribute-based quality

measure; no conceptual interpretation.
● Type-3: Attribute-based quality

measure; conceptual interpretation https://www.analyticsvidhya.com/wp-content/uploads/2016/11/clustering-6.png
Type Levels (Continue)
● Type-4: Concept-based quality

measure; no background knowledge.

measure; background knowledge.

measure; background knowledge;
structured exemplars.
Application Areas
● Biology
● Medicine
● Psychology
● Climate
● Business
● Information Retrieval © Edunet Foundation. All rights reserved.

Attribute Oriented Induction
Introduction
● Attribute Oriented Induction (AOI)

method was first proposed in 1989
integrates a machine learning paradigm
especially learning-from-examples
techniques with database operations,
extracts generalized rules from an
interesting set of data and discovers
high level data regularities.
https://d3i71xaburhd42.cloudfront.net/7fdc538e50ab4d4a0e4c531ae180ea84c63
© Edunet Foundation. All rights reserved. 9ba8c/3-Figure2-1.png
● AOI provides an efficient and effective

mechanism for discovering various kinds
of knowledge rules from datasets or
databases.
● AOI approach is developed for learning

different kinds of knowledge rules such
as characteristic rules, discrimination
rules, classification rules, data evolution
regularities, association rules and
Characteristic Rule
● Characteristic rule is an assertion which

characterizes the concepts which
satisfied by all of the data stored in
database.
● This rule provide generalized concepts

about a property which can help people
recognize the common features of the
data in a class.
● For example the symptom of the specific

Discriminant Rule
● Discriminant rule is an assertion which

discriminates the concepts of one
(target) class from another (contrasting).
● This rule give a discriminant criterion

which can be used to predict the class
membership of new data.
● For example to distinguish one disease

from the other
Classification Rule
● Classification rule is a set of rules which

classifies the set of relevant data
according to one or more specific
attributes.
● For example, classifying diseases into

classes and provide the symptoms of
each
https://ars.els-cdn.com/content/image/1-s2.0-S0957417404000053-gr1.gif
Association Rule
● Association rule is association

relationships among the set of relevant
data.
● For example, discovering a set of

symptoms frequently occurring together

Data Evolution Regularities Rule
● Data evolution regularities rule is a

general evolution behavior of a set of
the relevant data (valid only in
time-related/temporal data).
● For example, describing the major

factors that influence the fluctuations of
stock values through time.
Cluster Description Rule
● Cluster description rule is used to cluster

data according to data semantics.
● For example clustering the university

student based on different attribute(s).

Quantitative and Qualitative Rules in

AOI
● Quantitative rule is a rule which is

associated with quantitative information
such as statistical information which
asses the representativeness of the rule
in the database.
● There are three types quantitative rule

i.e. quantitative characteristic rule,
quantitative discriminative rule and
quantitative characteristic and

AOI (Continue)
● Quantitative characteristic rule is

quantitative information of a
characteristic rule and each rule in final
generalization can be measured with
t-weight in formula 1.


AOI (Continue)
● t-weight = percentage of each rule in the

final generalized relation.
● Votes(qa) = number of tuples in each

rule in the final generalized relation
Where Votes(qa) is in Votes{q1,...,qN}.
● N = number of rules in the final

generalized relation.

AOI (Continue)
● Quantitative discriminative rule is a

discrimination rule that use quantitative
information. Each rule in the target class
will be discriminated against a rule in the
constrating class and is measured with
d-weight in formula 2.


AOI (Continue)
● d-weight = percentage ratio per rule in

the target class to the total number of
tuples in the target class and the
contrasting class for the same rule.
● Votes(qa) = number of tuples in each

rule in the target class Cj.
● Cj is in {C1,...,CK}.
● K = total number of the target and


AOI (Continue)
● Quantitative characteristic and

discriminative rule use quantitative
information characteristic rule and
discriminative rule which have both
t-weight and d-weight for the same
rules.
● Each rule is measured with t-weight in

formula 1 for characteristic rule and
d-weight in formula 2 for discriminative

AOI (Continue)
● Qualitative rule can be obtained by

using the same process of learning
applied in its quantitative counterpart
without the association of the
quantitative attribute in the generalized
relations.

Concept Hierarchies
● One advantage of AOI is that it has

concept hierarchy as the background
knowledge which can be provided by the
knowledge engineers or domain experts.
● Concept hierarchy stored a relation in

the database provides essential
background knowledge for data
generalization and multiple level data
https://ars.els-cdn.com/content/image/3-s2.0-B9780123814791000046-f04-09-97
mining. © Edunet Foundation. All rights reserved. 80123814791.jpg
Concept Hierarchies (Continue)
● Concept hierarchy represents a

taxonomy of concept of the attribute
domain values.
● Concept hierarchy can be specified

based on the relationship among
database attributes or by set groupings
and be stored in the form of relations in
the same database.
https://ars.els-cdn.com/content/image/3-s2.0-B9780123814791000034-f03-13-97
© Edunet Foundation. All rights reserved. 80123814791.jpg
Concept Hierarchies (Continue)
● Concept hierarchy can be adjusted

dynamically based on the distribution of
the set of data relevant to the data
mining tasks.
● The hierarchies for numerical attributes

can be constructed automatically based
on data distribution analysis.
A concept hierarchy tree for attribute workclass in adult dataset
AOI Prototype
● The AOI method was implemented in a

data mining system prototype called
DBMINER which previously called
DBLearn and been tested successfully
against large relational database.
● DBLearn is a prototype data mining

system which was developed in Simon
Fraser University.
AOI Prototype (Continue)
● DBMINER was developed by integrating

database, OLAP and data mining
technologies has following features:
1. Incorporating several data mining

techniques like attribute oriented
induction, statistical analysis,
progressive deepening for mining
multiple-level rules and meta-rule
guided knowledge mining data cube and
2. Mining new kinds of rules from large

databases including multiple level
association rules, classification rules,
cluster description rules and prediction.
3. Automatic generation of numeric

hierarchies and refinement of concept
hierarchies.
4. High level SQL-like and graphical data

mining interfaces. © Edunet Foundation. All rights reserved.
5. Client server architecture and

performance improvements for larger
application.
6. SQL-like data mining query language

DMQL and Graphical user interfaces
have been enhanced for interactive
knowledge mining.
7. Perform roll-up and drill-down at

multiple concept levels with multiple
AOI Algorithms
● AOI can be implemented with an

architecture design shown in figure,
where characteristic rule (LCHR) and
classification rule (LCLR) can be learned
directly from the transactional database
(OLTP) or Data warehouse (OLAP) with
the help of the concept hierarchy as the
knowledge generalization. Concept
hierarchy can be created from OLTP
AOI architecture
database as a direct resource. © Edunet Foundation. All rights reserved.
AOI Algorithms (Continue)
From a database we can identify two types

of learnings:
1. Positive learning as the target class

where the data are tuples in the
database which are consistent with the
learning concepts. Positive
learning/target class will be built when
learn characteristic rule
AOI Algorithms (Continue)
2. Negative learning as the contrasting

class in which the data do not belong to
the target class. negative
learning/contrasting class will be built
when learn discrimination or
classification rule.

AOI Characteristic Rule Algorithm
● This AOI characteristic rule algorithm is

the implementation of step one to seven
of the generalization strategy steps.
● The algorithm shows two sub processes

i.e. control number of distinct attributes
and control number of tuples.
AOI characteristic rule algorithm
AOI Advantages
● AOI provides additional flexibility over

many machine learning algorithms.
● AOI can learn knowledge rules in

different conjunctive and disjunctive
forms and provides more choices for the
experts and users.

AOI Advantages (Continue)
● AOI can use database facilities as the

traditional relational database such as
selection, join, projection whereas most
learning algorithms suffer from
inefficiency problems in a large
database environment.
● AOI can learn qualitative rules with

quantitative information while many
machine learning algorithm only can
AOI Advantages (Continue)
● AOI can handle noisy data and

exceptional cases elegantly by
incorporating statistical techniques in the
learning process whereas some learning
system can only work in a ‘noise free’
environment.

AOI Disadvantages
● AOI can only provides a snapshot of the

generalized knowledge and not a global
picture. Yet, the global picture can be
revealed by trying different thresholds
repeatedly.
● Adjusting different thresholds will result

in different sets of generalized tuples.
However, using different thresholds
repeatedly is a time consuming and
AOI Disadvantages (Continue)
● There will be a problem in selecting the

best generalized rules between the large
and small threshold. Where in a large
threshold value will lead to a relatively
complex rule with many disjuncts and
the results may not be fully generalized.
On the other hand a small threshold
value will lead to a simple rule with few
disjuncts and the results may over
generalized the rule with a risk of losing
Iterative Database Scanning
Introduction
● An iterative search starts just the same

as a non-iterative search, the query
sequence is compared to the database
and the score list, pairwise and multiple
alignment outputs are reported.
● The multiple alignment is then used to

create a query “profile” that contains
information about the types of amino
acid seen at each position in the https://www.researchgate.net/profile/Teresa_Attwood3/publication/11160012/figur
e/fig2/AS:277320108658689@1443129674824/Overview-of-the-iterative-process-
● This profile is then searched against the

database, a score list, pairwise and
multiple alignments are output and the
process is then repeated.
● The iterations will stop either when the

number of iterations has been reached,
or if two successive iterations find
exactly the same sequences.

● Iterative searching will normally be able

to find more remote similarities to the
query sequence than a single sequence
search.

Applications
● Iterative K-Means Algorithm to create

clusters of related data, through iterative
database scan and minimization of
group cluster system error, namely; root
mean square errors.

Applications (Continue)
● Matching the protein sequences through

iterative scan of protein database
scanning and finding the best match as
per protein generics.
https://www.researchgate.net/profile/Teresa_Attwood3/publication/11160012/figur
e/fig2/AS:277320108658689@1443129674824/Overview-of-the-iterative-process-
Applications (Continue)
● Finding and predicting lungs cancer

through iterative scanning of database
of image samples of lungs scans.

Advantages
● Iterative information retrieval
● Mining more useful information through

iterations
● Matching patterns
● Iterative querying allows web based

database application to integrate results
Disadvantages
● Comparative Slow Process
● Resource Intensive
● Complex to design and implement
● Availability of more advanced

methodologies

Attribute Focusing
Introduction
● Attribute Focusing is a technique

designed for detecting interesting
attribute values, in the sense that the
values differ from an expected value.
Bhandari (1993), Bhandari and Biyani
(1994) proposed two methods for
detecting interesting attribute values.

Attribute Focusing
Introduction
● The first method consists of finding

interesting values of a given attribute by
comparing the observed frequency of
that value with its expected frequency
assuming a uniform probability
distribution.
● Since this is a one-dimensional method,

analyzing just one attribute at a time, it
involves no attribute interaction and so
Attribute Focusing
Introduction
● Since the goal of data mining is to

discover knowledge that is not only
accurate but also comprehensible for
human decision makers, the field of
cognitive psychology is clearly relevant
for data mining.
● In the classical view, categories are

defined by a small set of attributes.
Attribute Focusing
● By contrast, in the natural view of

concepts, highly correlated
(non-independent) attributes are the
rule, not the exception.
● To summarize, in the natural view of

concepts, which is currently much more
accepted in psychology than the
classical view, attribute interaction is the
Large degree of attribute interaction makes a concept harder to learn
rule, and not the exception. © Edunet Foundation. All rights reserved.
Attribute Focusing
● It is also increasingly likely that data

pertaining to their professional activity is
available in a database.
● Clearly, a machine-assisted method

which allows them to learn more about
their domain from such data should be a
powerful knowledge discovery technique
since it could help a lot of people
improve at their jobs rapidly. © Edunet Foundation. All rights reserved.
Attribute Focusing
The Importance of Attribute Focusing

in Data Mining
● Evidence for this natural view of

concepts is provided, in the context of
data mining, by projects that did found a
significant degree of attribute interaction
in real-world data sets.
● An example is the large number of small

disjuncts found by Provost & Danyluk
(1993) in telecommunications data.
Attribute Focusing
The Importance of Attribute Focusing

in Data Mining (Continue)
● Another example is the several

instances of Simpson’s paradox
discovered in real-world data sets by
Fabris & Freitas (1999)
● Yet another example is the existence of

strong attribute interactions in a typical
financial data set, as discussed by Dhar
et al. (2000)
Attribute Focusing
The Influence of Attribute Interaction

on Concept Hardness
● There are, of course, many factors that

make a concept (class description)
difficult to be learned, including
unbalanced class distributions, noise,
missing relevant attributes, etc.
● However, in some cases even if all

relevant information for class separation
is included in the data - i.e. all relevant
attributes are present, there is little
Attribute Focusing
Interestingness Function
● An interestingness function I2 is used to

detect an interesting pair of attribute
values, where each of the values belong
to a different attribute of a given pair of
attributes.
● The function I2 measures how much the

observed joint frequency of a pair of
attribute values deviates from the
expected frequency assuming that the
two attributes are statistically
Attribute Focusing
Interestingness Function (Continue)
● Hence, the essence of Attribute

Focusing (using the interestingness
function I2) is precisely to detect
attribute values whose interactions
produce unexpected observed joint
frequency.
https://www.researchgate.net/profile/Edgar_Reehuis/publication/259214544/figure
/fig1/AS:650799420043265@1532174081474/Novelty-vs-Interestingness-Interest
Attribute Focusing
● Goil and Choudhary (1997) have

extended Attribute Focusing for
multidimensional databases (data
cubes). A contribution of this work was
to introduce a parallel algorithm to
compute the above-discussed
interestingness function I2.
● This research addressed the problem of

making Attribute Focusing more
computationally efficient, which is
Attribute Focusing
● However, it did not adapt Attribute

Focusing to one of the major
characteristics of data cubes, namely
the fact that dimensions contain
hierarchical attributes.
● This characteristic of data cubes

introduces new opportunities and
requirements for adapting the
computation of the interestingness
function I2.
Attribute Focusing
Advantages
● Attribute Focusing has been

successfully deployed to discover
hitherto unknown knowledge in a
real-life, commercial setting.
● It actually helps people do their jobs

better. That kind of practical success has
not been demonstrated even for
advanced knowledge discovery
techniques. © Edunet Foundation. All rights reserved.
Attribute Focusing
Advantages (Continue)
● There are three possible areas where

Attribute Focusing may enjoy an
advantage over other methods: superior
mathematical algorithms, ability to
process more data, the use of the
analyst.
● Interactive systems will provide,

perhaps, the best opportunity for
discovery in tile near term. In such
systems, a knowledge analyst is
Attribute Focusing
Characteristics
● Attribute Focusing approach uses an

explicit model. Uses filtering functions
and model of interpretation.
● Attribute Focusing represent a means of

deriving immediate and significant
practical advantages by combining the
results of existing research on
knowledge discovery with models based
on human factors and cognitive science.
Attribute Focusing
Future Work
● Formation-theoretic, entropy-based
measures and statistical measures of
association/correlation may be used to
evolve new instances of interestingness
functions.
● Similarly, new instances of filtering

functions may be evolved by considering
human factors issues. https://i.stack.imgur.com/v8RVc.png
Introduction to neural
networks
Definition
● The neural network is a technology

based on the structure of the neurons
inside a human brain.
Image
Source:https://miro.medium.com/max/700/1*BQ0pIVk56WHyqigI9adDLw.gif
networks
Definition
● Neural network algorithm will try to

create a function to map your input to
your desired output.
Image Source:
https://miro.medium.com/max/1400/1*Ne7jPeR6Vrl1f9d7pLLG8Q.jpeg
networks
Definition
● Artificial Neural Networks, cell nucleus

represents Nodes, synapse represents
Weights, and Axon represents Output.
Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
networks
Biological Neural Network
Vs artificial neural network
Biological Neural Artificial Neural Network

Network
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
Image Source:
networks
The architecture of an artificial
neural network
● Input Layer:
As the name suggests, it accepts inputs in
several different formats provided by the
programmer.
Image Source:
networks
neural network
● Hidden Layer:
The hidden layer presents in-between input
and output layers. It performs all the
calculations to find hidden features and
patterns.
Image Source:
networks
neural network
● Output Layer:
The input goes through a series of
transformations using the hidden layer,
which finally results in output that is
conveyed using this layer.
Image Source:
networks
neural network
● The artificial neural network takes input

and computes the weighted sum of the
inputs and includes a bias. This
computation is represented in the form
of a transfer function.
Image Source:
networks
Advantages of artificial neural
network
● Parallel processing capability

● Storing data on the entire network
● Capability to work with incomplete
knowledge
● Having a memory distribution
● Having fault tolerance
Image Source:
networks
Disadvantages of artificial neural
network
● Assurance of proper network structure

● Unrecognized behavior of the network
● Hardware dependence
● Difficulty of showing the issue to the
network
● The duration of the network is unknown
Image Source:
networks
How artificial neural network work
● Artificial Neural Network can be best

represented as a weighted directed
graph, where the artificial neurons form
the nodes.
● The association between the neurons
outputs and neuron inputs can be
viewed as the directed edges with
weights.
Image Source:
networks
● The Artificial Neural Network receives

the input signal from the external source
in the form of a pattern and image in the
form of a vector.
● These inputs are then mathematically
assigned by the notations x(n) for every
n number of inputs.
Image Source:
networks
● Afterward, each of the input is multiplied

by its corresponding weights ( these
weights are the details utilized by the
artificial neural networks to solve a
specific problem ).
Image Source:
networks
Types of Artificial Neural Network
● Feedback ANN:
In this type of ANN, the output returns into
the network to accomplish the best-evolved
results internally.
● Feed-Forward ANN:
A feed-forward network is a basic neural
network comprising of an input layer, an
output layer, and at least one layer of a
neuron.
Image Source:
networks
Model Types
● Neural networks use information in the

form of data to generate knowledge in
the form of models.
● A model can be defined as a description
of a real-world system or process using
mathematical concepts.
● It is usually represented as a mapping
between input and output variables.
Image Source:
https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
networks
neural network models belong to the
following types
● Approximation (or function

regression)
An approximation can be regarded as the
problem of fitting a function from data.
● Classification.
Classification can be stated as the process
whereby a received pattern, characterized
by a distinct set of features, is assigned to
one of a prescribed number of classes. Image Source:
networks
neural network models belong to the
following types
● Approximation (or function

regression)
An approximation can be regarded as the
problem of fitting a function from data.
● Classification.
Classification can be stated as the process
whereby a received pattern, characterized
by a distinct set of features, is assigned to
one of a prescribed number of classes. Image Source:
networks
Approximation (or function
regression) Examples
● Model the strength of high performance

concretes.
● Predict the noise generated by airfoil
blades.
● Predict the residuary resistance of
sailing yachts.
● Predict the vascular adhesion of
nanoparticles.
Image Source:
networks
Classification (or pattern recognition)
Examples
● Predict the electricity generated by

combined cycle power plants.
● Forecast the power generated by a solar
plant.
● Model wine preferences from
physicochemical properties.
Image Source:
networks
● We can distinguish between two types of

classification models:
● Binary classification Examples
1. Diagnose breast cancer from
fine-needle aspirate images.
2. Detect malfunctions liquid ultrasonic
flowmeters.
3. Detect forged banknotes.
4. Reduce employee attrition.
5. Increase the conversion rate of
Image Source:
telemarketing campaigns in banks. https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
networks
● Multiple classification examples

● Classify iris flowers from sepal and petal
dimensions
● Recognize human activity from
smartphone signals
Image Source:
networks
Classification neural networks
● A classification model usually requires a

scaling layer, one or several perceptron
layers, and a probabilistic layer. It might
also contain a principal component
layer.
Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
networks
Data Set
● The data set contains information for

creating our model. It is a collection of
data structured as a table, in rows and
columns.
● We can identify the next concepts in a
dataset:
● Data source.
● Variables.
● Instances.
● Missing values.
Image Source:
● Data set tasks. https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
networks
Data Set
● The data set contains information for

creating our model. It is a collection of
data structured as a table, in rows and
columns.
● We can identify the next concepts in a
dataset:
● Data source.
● Variables.
● Instances.
● Missing values.
Image Source:
● Data set tasks. https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
Rough Set theory
Introduction
● a rough set, first described by Polish

computer scientist Zdzisław I. Pawlak, is
a formal approximation of a crisp set
(i.e., conventional set) in terms of a pair
of sets which give the lower and the
upper approximation of the original set.
Image Source:
Rough Set theory
Introduction
● In rough sets theory, the data is

collected in a table, called a decision
table.
● Rows of a decision table correspond to
objects, and columns correspond to
features.
Image Source:
Rough Set theory
Introduction
● RST can be defined using lower and

upper approximations
● Lower approximation and positive
region
● is the union of all equivalence classes in
which are contained by (i.e., are subsets
of) the target set.
Image Source:
Rough Set theory
Upper approximation and

negative region
● The upper approximation is the union of

all equivalence classes in which have
non-empty intersection with the target
set.
Image Source:
Rough Set theory
The boundary region
● by set difference, consists of those

objects that can neither be ruled in nor
ruled out as members of the target set
Image Source:
Rough Set theory
The rough set
● The composed of the lower and upper

approximation is called a rough set.
Image Source:
Rough Set theory
Objective analysis
● Rough set theory is one of many

methods that can be employed to
analyze uncertain (including vague)
systems, although less common than
more traditional methods of probability,
statistics, entropy and Dempster–Shafer
theory.
Image Source:
Rough Set theory
Reduct and core
● (attribute-value table) which are more

important to the knowledge represented
in the equivalence class structure than
other attributes.
● Often, we wonder whether there is a
subset of attributes which can, by itself,
fully characterize the knowledge in the
database; such an attribute set is called
a reduct.
Image Source:
Rough Set theory
Core
● The set of attributes which is common to

all reducts is called the core: the core is
the set of attributes which is possessed
by every reduct, and therefore consists
of attributes which cannot be removed
from the information system without
causing collapse of the
equivalence-class structure.
Image Source:
Rough Set theory
Decision rules
1. The decision rules not only capture

patterns hidden in the data as they can
also be used to classify new unseen
objects.
2. Rules represent dependencies in the
dataset, and represent extracted
knowledge which can be used when
classifying new objects not in the
original information system.
Image Source:
Rough Set theory
Decision rules
3. When the reducts were found, the job of

creating definite rules for the value of
the decision feature of the information
system was practically done.
4. To transform a reduct into a rule, one
only has to bind the condition feature
values of the object class from which
the reduct originated to the
corresponding features of the reduct.
Image Source:
Rough Set theory
Decision rules
5. Then, to complete the rule, a decision

part comprising the resulting part of the
rule is added.
6. This is done in the same way as for the
condition features.
7. To classify objects, which has never
been seen before, rules generated from
a training set will be used. These rules
represent the actual classifier. This
classifier is used to predict to which
classes new objects are attached. Image Source:
Rough Set theory
Decision rules
8. The nearest matching rule is determined

as the one whose condition part differs
from the feature vector of re-image by
the minimum number of features.
9. When there is more than one matching
rule, we use a voting mechanism to
choose the decision value. Every
matched rule contributes votes to its
decision value, which are equal to the
times number of objects matched by the
rule. Image Source:
Rough Set theory
Decision rules
10. The votes are added and the decision

with the largest number of votes is
chosen as the correct class.
11. Quality measures associated with
decision rules can be used to eliminate
some of the decision rules.
Image Source:
Rough Set theory
Rough Sets Data Analysis

Techniques
● Preprocessing stage
● Includes tasks such as data cleaning,
completeness, correctness, attribute
creation, attribute selection and
discretization.
● Processing includes the generation of
preliminary knowledge, such as
computation of object reducts from data,
derivation of rules from reducts, and
classification processes Image Source:
Rough Set theory
Preprocessing stage
● In order to successfully analyze data

with rough sets, a decision table must
be created.
● This is done with data preparation.
● The data preparation task includes data
conversion, data cleansing, data
completion checks, conditional attribute
creation, decision attribute generation,
discretization of attributes, and data
splitting into analysis and validation
Image Source:
subsets. https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
Rough Set theory
Data completion and discretization of

continuous-valued attributes
● Discretization which uses data

transformation procedure that involves
finding, cuts in the data sets which
divide the data into intervals.
● Values lying within an interval are then
mapped to the same value.
Image Source:
Rough Set theory

● Doing this process will lead to reduce

the size of the attributes value set and
ensures that the rules that are mined are
not too specific.
Image Source:
Rough Set theory


not too specific.
Image Source:
Rough Set theory


not too specific.
Image Source:
Rough Set theory
Processing stage
● processing stage includes generating

preliminary knowledge, such as
computation of object reducts from data,
derivation of rules from reducts, and
classification processes.
● These stages lead towards the final goal
of generating rules from information or
decision system
Image Source:
Rough Set theory
Rule generation and classification
● The generated reducts are used to

generate decision rules.
● The decision rule, at its left side, is a
combination of values of attributes such
that the set of (almost) all objects
matching this combination have the
decision value given at the rule’s rough
side.
Image Source:
Rough Set theory
Rule generation and classification
● The rule derived from reducts can be

used to classify the data.
● The set of rules is referred to as a
classifier and can be used to classify
new and unseen data.
Image Source:
Data Visualization
Introduction
● Data Visualization is used to

communicate information clearly and
efficiently to users by the usage of
information graphics such as tables and
charts.
● It helps users in analyzing a large
amount of data in a simpler way. It
makes complex data more accessible,
understandable, and usable.
Image Source:
https://previews.customer.envatousercontent.com/h264-video-previews/81d7b3f3-
Data Visualization
What makes Data Visualization

Effective?
● Effective data visualization are created

by communication, data science, and
design collide.
Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
Data Visualization
Importance of Data Visualization
● Data visualization can identify areas that

need improvement or modifications.
● Data visualization can clarify which
factor influence customer behavior.
● Data visualization helps you to
understand which products to place
where.
● Data visualization can predict sales
volumes.
Image Source:
Data Visualization
Importance of Data Visualization
● Data visualization can identify areas that

need improvement or modifications.
● Data visualization can clarify which
factor influence customer behavior.
● Data visualization helps you to
understand which products to place
where.
● Data visualization can predict sales
volumes.
Image Source:
Data Visualization
Why Use Data Visualization?
● To make easier in understand and

remember.
● To discover unknown facts, outliers, and
trends.
● To visualize relationships and patterns
quickly.
● To ask a better question and make
better decisions.
● To competitive analyze.
● To improve insights. Image Source:
Data Visualization
Data Visualization Tools
● IBM Cognos
● Tableau
● Infogram
● Chartblocks
● Datawrapper
● Plotly
● Visual.ly and etc.
Image Source:
Data Visualization
Data Visualization Steps/Process
1. Develop your research question

2. Get or create your data
3. Clean your data
4. Choose a chart type
5. Choose your tool
6. Prepare data
7. Create report graph
Image Source:
Data Visualization
1. Develop your research question
1. It is important to have a clear

understanding of the goal of your
research.
2. This will determine what sort of data is
needed, the type of analysis necessary,
and the types of visualizations that
would be most effective to communicate
your explorations or findings.
Image Source:
Data Visualization
● access to a large collection of numerical,

statistical and geospatial data. There is
also a great wealth of open data freely
available for download on the web.
Image Source:
Data Visualization
● advice and technical assistance with the

design, creation, and dissemination of
surveys using the Qualtrics web
survey platform to assist you in
collecting your own data.
Image Source:
Data Visualization
3. Clean your data
● Removing unnecessary variables

● Deleting duplicate rows/observations
● Addressing outliers or invalid data
● Dealing with missing values
● Standardizing or categorizing values
● Correcting typographical errors
Image Source:
Data Visualization
4. Choose a chart type
● Showing how variables compare to each

other?
● Showing relationships between
variables?
● Showing patterns in the data?
● Showing how the whole dataset can be
broken down into smaller parts?
Image Source:
Data Visualization
5. Choose your tool
● Tableau
● Excel
● Google Sheet
● Python
● R
● Gephi
Image Source:
Data Visualization
6. Prepare data
● Typical data preparation tasks include:

● Formatting columns appropriately
(numbers are treated as numbers, dates
as dates)
● Convert values into appropriate units
● Filter your data to focus on the specific
data that interests you.
Image Source:
Data Visualization
6. Prepare data
● Group data and create aggregate values

for groups (Counts, Min, Max, Mean,
Median, Mode)
● Extract values from complex columns
● Combine variables to create new
columns
Image Source:
Data Visualization
7. Create report graph
1. Import data into the software

2. Select the chart type you wish to create
3. Evaluate the effectiveness of the chart.
4. Refine by applying design principles.
The way in which you design your chart
can have a big impact on the
effectiveness of the chart. Consider
these design principles.
Image Source:
Odds Ratio
Introduction
● An odds ratio (OR) is a statistic that

quantifies the strength of the association
between two events, A and B.
● The odds ratio compares two
probabilities (or proportions) P1 and P2
Image Source:
http://hihg.med.miami.edu/code/http/modules/education/Design/images/Slide4050
Odds Ratio
Introduction
● The odds ratio is defined as the ratio of

the odds of A in the presence of B and
the odds of A in the absence of B,
● or equivalently (due to symmetry), the
ratio of the odds of B in the presence of
A and the odds of B in the absence of A.
Image
Source:http://hihg.med.miami.edu/code/http/modules/education/Design/images/Sli
Odds Ratio
Introduction
● Two events are independent if and only

if the OR equals 1, i.e., the odds of one
event are the same in either the
presence or absence of the other event.
Image
Odds Ratio
Introduction
● If the OR is greater than 1, then A and B

are associated (correlated) in the sense
that, compared to the absence of B, the
presence of B raises the odds of A, and
symmetrically the presence of A raises
the odds of B.
Image
Odds Ratio
Introduction
● Conversely, if the OR is less than 1, then

A and B are negatively correlated, and
the presence of one event reduces the
odds of the other event.
● Note that the odds ratio is symmetric in
the two events, and there is no causal
direction implied
Image
Odds Ratio
Example - to find out the odds of
customer default with low income
versus high income.
● The function is defined by the following

formula:
● Where Px is the probability of default
with low income and (1-Px) is the
probability of non-default with low
income.
Image
Source:https://miro.medium.com/max/162/1*aGAfFHFLs9XrUozRa3ejWg.png
Odds Ratio
versus high income.
● While Py is the probability of default with

high income and (1-Py) of non-default
with high income.
● Thus, the above odds ratio will give the
odds of a customer defaulting with low
income over a customer defaulting with
high income.
Image Source:
ttps://miro.medium.com/max/162/1*aGAfFHFLs9XrUozRa3ejWg.png
Odds Ratio
versus high income.
● While Py is the probability of default with

high income and (1-Py) of non-default
with high income.
● Thus, the above odds ratio will give the
odds of a customer defaulting with low
income over a customer defaulting with
high income.
Image Source:
ttps://miro.medium.com/max/162/1*aGAfFHFLs9XrUozRa3ejWg.png

Business Analytics Concepts by IBM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Analytics Concepts by IBM

Uploaded by

Copyright:

Available Formats

Module 5 - Business Data

Disclaimer: The content is curated for educational purposes only.

● Understand business analytics and develop business intelligence.

© Edunet Foundation. All rights reserved.

Disclaimer: The content is curated for educational purposes only.

● Introduction to business analytics and Concepts of business analytics

© Edunet Foundation. All rights reserved.

● Median in Raw and Grouped Data

© Edunet Foundation. All rights reserved.

● Business analytics (BA) is the iterative,

Image Source: https://www.businessanalytics.com/

● Business Analytics is "the study of data

● It adopts quantitative methods and

● Business Analytics is the procedure

● This method of going about a business

Image Source: https://www.martinsights.com/?p=1049

Image Source: https://www.analytixlabs.co.in/blog/what-is-business-analytics/

● To carry out data mining and exploring

Image Source: https://www.analytixlabs.co.in/blog/business-analytics-career/

● Test previous decisions are taken with

Business Analytics Trends For 2020

● Data Quality Management

Image Source: https://www.datapine.com/blog/business-intelligence-trends/

Image Source: https://codeit.us/blog/top-data-and-analytics-trends

What is Descriptive Analytics?

● Descriptive analytics is a statistical

What is Descriptive Analytics?

● For example, in an online learning

How does descriptive analytics work?

● Data aggregation and data mining are

Image Source: https://www.dataversity.net/fundamentals-descriptive-analytics/

How does descriptive analytics work?

Image Source: hhttps://www.sisense.com/glossary/descriptive-analytics/

Examples of descriptive analytics

● Tracking course enrollments, course

● Comparing pre-test and post-test

Advantages of descriptive analytics

● Quickly and easily report on the Return

● Identify specific learners who require

● It is a branch of mathematics that

Image Source: https://www.youtube.com/watch?v=7rKQBKQOIQw

Image Source: https://slideplayer.com/slide/6642532/

● It is used to describe the basic features

Image Source: https://data-flair.training/blogs/stat-descriptive-statistics/

● The characteristics of the data are

Image Source: : https://data-flair.training/blogs/stat-descriptive-statistics/

● Inferential statistics is a scientific

Qualitative or Categorical Data

● Qualitative data, also known as the

Qualitative or Categorical Data

● Categorical measures are defined in

Qualitative or Categorical Data

● Here, the birthdate and school postcode

Qualitative or Categorical Data

Qualitative or Categorical Data

● But sometimes, the data can be

Qualitative or Categorical Data

● In this method, the data are grouped into

Qualitative or Categorical Data

Qualitative or Categorical Data

● The ordinal data is commonly

Qualitative or Categorical Data

Qualitative or Categorical Data

Qualitative or Categorical Data