Download as pdf or txt
Download as pdf or txt
You are on page 1of 434

Module 5 - Business Data


Disclaimer: The content is curated for educational purposes only.

© Edunet Foundation. All rights reserved.
After going through this module, students will be able to

● Understand business analytics and develop business intelligence.

● Analyze data using statistical and data mining techniques for business
● Understand case studies for predictive models.
● Develop case studies for predictive analytical models.

© Edunet Foundation. All rights reserved.

Understand business
analytics and develop
business intelligence.

Disclaimer: The content is curated for educational purposes only.

© Edunet Foundation. All rights reserved.
In this section, we will discuss:

● Introduction to business analytics and Concepts of business analytics

● Trends in business analytics
● Descriptive analytics
● Introduction to statistics
● Types of data
● Measure of Central Tendency
● Arithmetic mean
● Geometric Mean
● Harmonic Mean

© Edunet Foundation. All rights reserved.

In this section, we will discuss:

● Median in Raw and Grouped Data

● Mode in Raw and Grouped Data
● Standard Deviation
● Variance
● Properties of Variance and standard deviation
● Usage of variance in business analytics
● OLAP Concept
● OLTP Concept

© Edunet Foundation. All rights reserved.

Introduction to business
analytics and Concepts of
business analytics
What is Business Analytics?

● Business analytics (BA) is the iterative,

methodical exploration of an
organization's data, with an emphasis on
statistical analysis.
● Business analytics is used by
companies that are committed to making
data-driven decisions.

Image Source:

© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
What is Business Analytics?(Contd)

● Business Analytics is "the study of data

through statistical and operations
analysis, the formation of predictive
models, application of optimization
techniques, and the communication of
these results to customers, business
partners, and college executives."

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
What is Business Analytics?(Contd)

● It adopts quantitative methods and

evidence is required for data to build
certain models for businesses and make
profitable decisions. Thus, Business
Analytics majorly depends on and uses
Big Data( large volume of data) .

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Understanding Business Analytics

● Business Analytics is the procedure

through which information is dissected
after studying past performances and
issues, to devise a successful plan for
the future.
● Big Data or large amounts of data is
used to derive solutions.

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Understanding Business

● This method of going about a business

or this outlook towards building and
sustaining a business is vital to the
economy and industries that thrive in the

Image Source:

© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Components of Business Analytics

● Define Objective
● Data Aggregation
● Data Cleaning
● Analytical Methodology
● Evaluation and Validation
● Reporting and Data Visualisation

Image Source:

© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Types of Business Analytics Methods

● Descriptive Analytics
● Diagnostic Analytics
● Predictive Analytics
● Prescriptive Analytics

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Uses and Benefits of Business

● To carry out data mining and exploring

new data to find new patterns and
● To carry out statistical and quantitative
analysis to provide explanations for
certain occurrences.

Image Source:

© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Uses and Benefits of Business

● Test previous decisions are taken with

the help of A/B testing and multivariate
● Deploy predictive modeling to predict
future outcomes.

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Business Analytics Tools

● Tableau/ QlikView/ Power BI
● Birt
● Python
● R
● MS Excel
● Sisense
● Clear Analytics
● Pentaho BI
● MicroStrategy Image Source:
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Applications of Business Analytics

● Marketing
● Finance
● Human Resources
● Manufacturing

Image Source:
© Edunet Foundation. All rights reserved.
Trends in Business

Business Analytics Trends For 2020

● Data Quality Management

● Data Discovery/Visualization
● Artificial Intelligence
● Predictive and Prescriptive Analytics
● Collaborative Business Intelligence
● Data-driven Culture

Image Source:

© Edunet Foundation. All rights reserved.
Trends in Business
Business Analytics Trends For

● Augmented Analytics
● Mobile BI
● Data Automation
● Embedded Analytics
● Natural language processing

Image Source:

© Edunet Foundation. All rights reserved.
Descriptive analytics

What is Descriptive Analytics?

● Descriptive analytics is a statistical

method that is used to search and
summarize historical data in order to
identify patterns or meaning.
● Descriptive analytics are based on
standard aggregate functions in

Image Source:
© Edunet Foundation. All rights reserved.
Descriptive analytics

What is Descriptive Analytics?


● For example, in an online learning

course with a discussion board,
descriptive analytics could determine
how many students participated in the
discussion, or how many times a
particular student posted in the
discussion forum.
Image Source:
© Edunet Foundation. All rights reserved.
Descriptive analytics

How does descriptive analytics work?

● Data aggregation and data mining are

two techniques used in descriptive
analytics to discover historical data.
● Data is first gathered and sorted by data
aggregation in order to make the
datasets more manageable by analysts.

Image Source:

© Edunet Foundation. All rights reserved.
Descriptive analytics

How does descriptive analytics work?

● Data mining describes the next step of
the analysis and involves a search of the
data to identify patterns and meaning.
● Identified patterns are analyzed to
discover the specific ways that learners
interacted with the learning content and
within the learning environment.

Image Source: h

© Edunet Foundation. All rights reserved.
Descriptive analytics

Examples of descriptive analytics

● Tracking course enrollments, course

compliance rates,
● Recording which learning resources are
accessed and how often
● Summarizing the number of times a
learner posts in a discussion board
● Tracking assignment and assessment
Image Source:
© Edunet Foundation. All rights reserved.
Descriptive analytics

Examples of descriptive

● Comparing pre-test and post-test

● Analyzing course completion rates by
learner or by course
● Collating course survey results
● Identifying length of time that learners
took to complete a course

Image Source:
© Edunet Foundation. All rights reserved.
Descriptive analytics

Advantages of descriptive analytics

● Quickly and easily report on the Return

on Investment (ROI) by showing how
performance achieved business or
target goals.
● Identify gaps and performance issues
early - before they become problems.

Image Source:
© Edunet Foundation. All rights reserved.
Descriptive analytics

Advantages of descriptive

● Identify specific learners who require

additional support, regardless of how
many students or employees there are
● Identify successful learners in order to
offer positive feedback or additional
● Analyze the value and impact of course
design and learning resources.
. Image Source:
© Edunet Foundation. All rights reserved.
Introduction to statistics

Introduction to Statistics

● It is a branch of mathematics that

deals with the organization,presentation,
collection,analyzation and interpretation
of numerical data.

Image Source:

© Edunet Foundation. All rights reserved.
Introduction to statistics

Types of Statistics

● Descriptive statistics
● Inferential statistics

Image Source:

© Edunet Foundation. All rights reserved.
Types of Statistics

Descriptive statistics

● It is used to describe the basic features

of data in a study.
● Descriptive statistics deals with the
processing of data without attempting to
draw any inferences from it.
● The data are presented in the form of
tables and graphs.

Image Source:

. © Edunet Foundation. All rights reserved.
Types of Statistics

Descriptive statistics

● The characteristics of the data are

described in simple terms.
● Events that are dealt with include
everyday happenings such as accidents,
prices of goods, business, incomes,
epidemics, sports data, population data.

Image Source: :

. © Edunet Foundation. All rights reserved.
Types of Statistics

Inferential statistics

● Inferential statistics is a scientific

discipline that uses mathematical tools
to make forecasts and projections by
analyzing the given data.
● This is of use to people employed in
such fields as engineering, economics,
biology, the social sciences, business,
agriculture and communications.
. Image Source:
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

● Qualitative data, also known as the

categorical data.
● It describes the data that fits into the
● Qualitative data are not numerical.
● The categorical information involves
categorical variables that describe the
features such as a person’s gender,
hometown etc. Image Source:
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


● Categorical measures are defined in

terms of natural language specifications,
but not in terms of numbers.
● Sometimes categorical data can hold
numerical values (quantitative value)
● But those values do not have
mathematical sense
Image Source:
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


● Here, the birthdate and school postcode

hold the quantitative value
● But it does not give numerical meaning.

Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


Nominal Data:
● Nominal data is one of the types of
qualitative information which helps to
label the variables without providing the
numerical value.
● Nominal data is also called the nominal
scale. It cannot be ordered and
Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


● But sometimes, the data can be

qualitative and quantitative
● Examples of nominal data are letters,
symbols, words, gender etc.
● The nominal data are examined using
the grouping method.

Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


● In this method, the data are grouped into

categories, and then the frequency or
the percentage of the data can be
● These data are visually represented
using the pie charts.

Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


Ordinal Data:
● Ordinal data/variable is a type of data
which follows a natural order.
● The significant feature of the nominal
data is that the difference between the
data values are not determined.
● This variable is mostly found in surveys,
finance, economics, questionnaires, and
so on. Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


● The ordinal data is commonly

represented using a bar chart.
● These data are investigated and
interpreted through many visualisation
● The information may be expressed
using tables in which each row in the
table shows the distinct category.
Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


Binary Data:
● Binary data has only 2 values/states.
● For Example yes or no, affected or
unaffected, true or false.
i) Symmetric : Both values are equally
important (Gender).
ii) Asymmetric : Both values are not equally
important (Result).
Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


● Better understanding- Qualitative data
gives a better understanding of the
perspectives and needs of participants.
● Provides Explaination- Qualitative data
along with quantitative data can explain
the result of the survey and can
measure the correction of the
quantitative data.
Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data


● Better Identification- of behavior

patterns - Qualitative data can provide
detailed information which can prove
itself useful in identification of behavioral

Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

● Lesser reachability- Being subjective in
nature, small population is generally
covered to represent the large
● Time Consuming- Qualitative data is
time consuming as large data is to be
● Possibility of Bias- Being subjective
analysis; evaluator bias is quite feasible. Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data


● Quantitative data is also known as

numerical data which represents the
numerical value (i.e., how much, how
often, how many).
● Numerical data gives information about
the quantities of a specific thing.
● Some of the examples of numerical data
are height, length, size, weight, and so
Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data


● The quantitative data can be classified

into two different types based on the
data sets.
● The two different classifications of
numerical data are discrete data and
continuous data.

Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data


Discrete Data:
● Discrete data can take only discrete
● Discrete information contains only a
finite number of possible values.

Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data


● Those values cannot be subdivided

● Here, things can be counted in the
whole numbers.
● Example: Number of students in the

Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data


Continuous Data:

● Continuous data is data that can be

● It has an infinite number of probable
values that can be selected within a
given specific range.
● Example: Temperature range
Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data



● Specific- Quantitative data is clear and

specific to the survey conducted.
● High Reliability- If collected properly,
quantitative data is normally accurate
and hence highly reliable.

Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data


● Easy communication- Quantitative

data is easy to communicate and
elaborate using charts, graphs etc.
● Existing support- Many large datasets
may be already present that can be
analyzed to check the relevance of the
Image Source :
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data



● Limited Options- Respondents are

required to choose from limited options.
● High Complexity- Qualitative data may
need complex procedures to get correct
● Require Expertise- Analysis of
qualitative data requires certain
expertise in statistical analysis. Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central


● A measure of central tendency is a

summary statistic that represents the
center point or typical value of a dataset.
● These measures indicate where most
values in a distribution fall and are also
referred to as the central location of a
Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● We can think of it as the tendency of

data to cluster around a middle value.
● In statistics the three most common
measures of central tendency are the
mean, median and mode.

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● Each of these measures calculates the

location of the central point using a
different method.
● Choosing the best measure of central
tendency depends on the type of data
we have.

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● The mean is the arithmetic average, and

it is probably the measure of central
tendency that you are most familiar.
● Calculating the mean is very simple.

Image Source:
© Edunet Foundation. All rights reserved.
Measure of Central

● We just add up all of the values and

divide by the number of observations in
your dataset.

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● The calculation of the mean

incorporates all values in the data.
● If you change any value, the mean
● However, the mean doesn’t always
locate the center of the data accurately.

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● In a symmetric distribution, the mean

locates the center accurately.

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● However, in a skewed distribution, the

mean can miss the mark.
● This problem occurs because outliers
have a substantial impact on the mean.
● Extreme values in an extended tail pull
the mean away from the center.
● As the distribution becomes more
skewed, the mean is drawn further away
from the center.
Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central


● The median is the middle value.

● It is the value that splits the dataset in
● To find the median, order your data from
smallest to largest, and then find the
data point that has an equal amount of
values above it and below it.

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● The method for locating the median

varies slightly depending on whether
your dataset has an even or odd number
of values.

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● In the dataset with the odd number of

observations, notice how the number 12
has six values above it and six below it.
● Therefore, 12 is the median of this

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● When there is an even number of

values, you count in to the two
innermost values and then take the
● The average of 27 and 29 is 28.
Consequently, 28 is the median of this

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central


● The mode is the value that occurs the

most frequently in your data set.
● On a bar chart, the mode is the highest
● If the data have multiple values that are
tied for occurring the most frequently,
you have a multimodal distribution.
● If no value repeats, the data do not have
a mode.
Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● In the dataset, the value 5 occurs most

frequently, which makes it the mode.
● These data might represent a 5-point
Likert scale.

Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● Typically, you use the mode with

categorical, ordinal, and discrete data.
● In fact, the mode is the only measure of
central tendency that you can use with
categorical data—such as the most
preferred flavor of ice cream.
● However, with categorical data, there
isn’t a central value because you can’t
order the groups.
Image Source :
© Edunet Foundation. All rights reserved.
Measure of Central

● With ordinal and discrete data, the mode

can be a value that is not in the center.
● Again, the mode represents the most
common value.

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean


● Arithmetic Mean in the most common

and easily understood measure of
central tendency.
● We can define mean as the value
obtained by dividing the sum of
measurements with the number of
measurements contained in the data
set and is denoted by the symbol
x¯ Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Arithmetic Mean for three types of


● Individual Data Series

● Discrete Data Series
● Continuous Data Series

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series

● When data is given on individual basis.

Following is an example of individual
5 10 20 30 40 50 60 70

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


● For individual series, the Arithmetic Mean can

be calculated using the following formula.


Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


● Alternatively, we can write same formula

as follows:

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


Problem Statement:
● Calculate Arithmetic Mean for the
following individual data:
14 36 45 70 105

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


● Based on the above mentioned
formula, Arithmetic Mean x¯ will be:

● The Arithmetic Mean of the given Image Source :

numbers is 54.
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series

● When data is given alongwith their

frequencies. Following is an example of
discrete series:
Items : 5 10 20 30 40 50 60 70
Frequency: 2 5 1 3 12 0 5 7

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


● For discrete series, the Arithmetic Mean

can be calculated using the following


Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


● Alternatively, we can write same formula

as follows:


Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Problem Statement:
● Calculate Arithmetic Mean for the
following discrete data:
Items: 14 36 45 70
Frequency: 2 5 1 3

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Based on the given data, we have:

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


● Based on the above mentioned

formula, Arithmetic Mean x¯ will be:

● The Arithmetic Mean of the given

numbers is 42.09. Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series

● When data is given based on ranges

along with their frequencies. Following
is an example of continuous series:
Items: 0-5 5-10 10-20 20-30 30-40
Frequency: 2 5 1 3 12

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


● In case of continuous series, a mid

point is computed as
(lower−limit+upper−limit)/2 and
Arithmetic Mean is computed using
following formula.


Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


Problem Statement:
Let's calculate Arithmetic Mean for the
following continuous data:
Items: 0-10 10-20 20-30 30-40
Frequency: 2 5 1 3

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


Based on the given data, we have:

Image Source :
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


● Based on the above mentioned

formula, Arithmetic Mean x¯ will be:

The Arithmetic Mean of the given numbers is

19.54. Image Source :
© Edunet Foundation. All rights reserved.
Geometric mean

Geometric mean

● Geometric mean of n numbers is

defined as the nth root of the product
of n numbers.


Image Source :

© Edunet Foundation. All rights reserved.
Geometric mean

Geometric mean

Image Source :

© Edunet Foundation. All rights reserved.
Geometric mean

Geometric mean

Problem Statement:
● Determine the geometric mean of
following set of numbers.
1 3 9 27 81

Image Source :

© Edunet Foundation. All rights reserved.
Geometric mean

Geometric mean

Here n = 5

Image Source :

© Edunet Foundation. All rights reserved.
Harmonic Mean

● What is mean
Harmonic Harmonic Mean?
is a type of average that
is calculated by dividing the number of
values in a data series by the sum of the
reciprocals (1/x_i) of each value in the
data series.
● A harmonic mean is one of the three
Pythagorean means (the other two are
arithmetic mean and geometric mean).
The harmonic mean always shows the
lowest value among the Pythagorean

Image Source:
© Edunet Foundation. All rights reserved.
Harmonic Mean
● The general formula for calculating a
harmonic mean is:
● Formula
Harmonicfor Harmonic
mean Mean
= n / (∑1/x_i)
● Where:
● n – the number of the values in a
● x_i – the point in a dataset
● The weighted harmonic mean can be
calculated using the following formula:
● Weighted Harmonic Mean = (∑w_i ) /
● Where:
● w_i – the weight of the data point
● x_i – the point in a dataset
Image Source:
© Edunet Foundation. All rights reserved.
Harmonic Mean

● You are a stock

Example analyst in an
of Harmonic investment
● Your manager asked you to determine
the P/E ratio of the index of the stocks of
Company A and Company B.
● Company A reports a market
capitalization of $1 billion and earnings
of $20 million, while Company B reports
a market capitalization of $20 billion and
earnings of $5 billion.
● The index consists of 40% of Company
A and 60% of Company B.
Image Source:
© Edunet Foundation. All rights reserved.
Harmonic Mean

Example of Harmonic Mean

● Firstly, we need to find the P/E ratios of

each company. Remember that the P/E
ratio is essentially the market
capitalization divided by the earnings.
● P/E (Company A) = ($1 billion) / ($20
million) = 50
● P/E (Company B) = ($20 billion) / ($5
billion) = 4

Image Source:
© Edunet Foundation. All rights reserved.
Harmonic Mean

● Example
We must of Harmonic
use Mean
the weighted harmonic
mean to calculate the P/E ratio of the
index. Using the formula for the
weighted harmonic mean, the P/E ratio
of the index can be found in the
following way:
● P/E (Index) = (0.4+0.6) / (0.4/50 + 0.6/4)
= 6.33
● Note that if we calculate the P/E ratio of
the index using the weighted arithmetic
mean, it would be significantly
● P/E (Index) = 0.4×50 + 0.6×4 = 22.4 Image Source:
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data

Median in Raw Data

● The median of raw data is the number

which divides the observations when
arranged in an order (ascending or
descending) in two equal parts.

Image Source:

© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Method of finding median

● Take the following steps to find the

median of raw data.
● Step I: Arrange the raw data in
ascending or descending order.
● Step II: Observe the number of variates
in the data. Let the number of variates in
the data be n. Then find the median as
● (i) If n is odd then [Math Processing
Error]th variate is the median
Image Source:
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Method of finding median

● (ii) If n is even then the mean of [Math

Processing Error]th and ([Math
Processing Error] + 1)th variates is the
median, i.e.,
● median = [Math Processing Error].

Image Source:

© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Solved Examples on Median of Raw
● Find the median of the ungrouped data.
● 15, 18, 10, 6, 14
● Solution:
● Arranging variates in ascending order,
we get
● 6, 10, 14, 15, 18.
● The number of variates = 5, which is
● Therefore, median = [Math Processing
Error]th variate
● = 3rd variate
Image Source:
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Finding Median for Grouped Data

● Median is the value which occupies the

middle position when all the
observations are arranged in an
ascending or descending order. It is a
positional average.
● (i) Construct the cumulative frequency
● (ii) Find (N/2)th term
● (iii) The class that contains the
cumulative frequency N/2 is called the
median class.
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Finding Median for Grouped Data

● (iv) Find the median by using the


● Where l = Lower limit of the median

● f = Frequency of the median class
● c = Width of the median class,
● N = The total frequency (∑f)
● m = cumulative frequency of the class
Image Source:
preceeding the median class
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Solved Examples on Median of
Grouped Data
● A researcher studying the behavior of
mice has recorded the time (in seconds)
taken by each mouse to locate its food
by considering 13 different mice as 31,
33, 63, 33, 28, 29, 33, 27, 27, 34, 35,
28, 32. Find the median time that mice
spent in searching its food.
● 31, 33, 63, 33, 28, 29, 33, 27, 27, 34,
35, 28, 32
● Ascending order of given data is
● 27, 27, 28, 28, 29, 31, 32, 33, 33, 33,
34, 35, 63
● Middle value is 7th observation © Edunet Foundation. All rights reserved.
Mode in Raw and Grouped
Finding the Mode in Raw Data

● To find the mode, or modal value, it is

best to put the numbers in order. Then
count how many of each number. A
number that appears most often is the
● 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14,
12, 56, 23, 29
● In order these numbers are:

© Edunet Foundation. All rights reserved.

Mode in Raw and Grouped
Finding the Mode in Raw Data

● 3, 5, 7, 12, 13, 14, 20, 23, 23, 23, 23,

29, 39, 40, 56
● This makes it easy to see which
numbers appear most often.
● This makes it easy to see which
numbers appear most often.
● In this case the mode is 23.

© Edunet Foundation. All rights reserved.

Mode in Raw and Grouped

Finding the Mode in Grouped Data

● In some cases (such as when all values

appear the same number of times) the
mode is not useful. But we can group
the values to see if one group has more
than the others.
● Example: {4, 7, 11, 16, 20, 22, 25, 26,
● Each value occurs once, so let us try to
group them.
© Edunet Foundation. All rights reserved.
Mode in Raw and Grouped

Finding the Mode in Grouped Data

● We can try groups of 10:

● 0-9: 2 values (4 and 7)
● 10-19: 2 values (11 and 16)
● 20-29: 4 values (20, 22, 25 and 26)
● 30-39: 1 value (33)
● In groups of 10, the "20s" appear most
often, so we could choose 25 (the
middle of the 20s group) as the mode.
© Edunet Foundation. All rights reserved.
Standard Deviation

Standard Deviation Formulas

● The Standard Deviation is a measure of

how spread out numbers are.
● You might like to read this simpler page
on Standard Deviation first.
● But here we explain the formulas.
● The symbol for Standard Deviation is σ
(the Greek letter sigma).

© Edunet Foundation. All rights reserved.

Standard Deviation

Standard Deviation Formulas

● This is the formula for Standard


Image Source:
© Edunet Foundation. All rights reserved.
Standard Deviation

Steps for Standard Deviation

● Say we have a bunch of numbers like 9,

2, 5, 4, 12, 7, 8, 11.
● To calculate the standard deviation of
those numbers:
● 1. Work out the Mean (the simple
average of the numbers)
● 2. Then for each number: subtract the
Mean and square the result
● 3. Then work out the mean of those
squared differences.
● 4. Take the square root of that and we
are done! © Edunet Foundation. All rights reserved.

● Variance is What is Variance?

the expected value of the squared
deviation of a random variable from its mean.
● In short, it is the measurement of the distance of a
set of random numbers from their collective
average value.
● Variance is used in statistics as a way of better
understanding a data set's distribution.

Image Source:
© Edunet Foundation. All rights reserved.

How does Variance work?

● Variance is calculated by finding the square of the
standard deviation of a variable, and the
covariance of the variable with itself.
● In the formula above, u represents the mean of
the data points, x is the value of an individual data
point, and N is the total number of data points.

Image Source:

© Edunet Foundation. All rights reserved.

How to Calculate Variance?

● Steps to Calculate Variance:
1. List elements of data set.The following are ages of
students pursuing a Master’s degree:
Data set 1: 28,25,26,27,31,32,24
2. Calculate the mean.
● (28 + 25 +26 +27 +31 +32 + 24) / 7 = 27.57

Image Source:
© Edunet Foundation. All rights reserved.

How to Calculate Variance?

● (Continued)
Find the deviation from the mean for each data

Image Source:
© Edunet Foundation. All rights reserved.

How to Calculate Variance?


● Square it

Image Source:
© Edunet Foundation. All rights reserved.

How to Calculate Variance? => (0.1849 + 6.6049 + 2.4649 + .3249 + 11.76 +

(Continued) 19.6249 + 12. 4609) / 7

⇒ 53.4303 /7 = 7.6329
● The average of all squared differences is ⇒ Variance=7.6329
the variance. To find it, add all squared
⇒ Standard Deviation=sqrt of Variance
variances and divide the sum by a
number of elements in data set (n).
● To find the standard deviation in ages of
students pursuing Master’s, we calculate
the square root of the variance
Image Source:
© Edunet Foundation. All rights reserved.
● Variance plays a major role in
Applications of Variance
interpreting data in statistics.
● The most common application of
variance is in polls.
● For opinion polls, the data gathering
agencies cannot invest in collecting data
from the entire population.
● They set criteria for sampling the
population based on ethnicity, income
group, regions, education level, salary
and religion, so that the population is Image Source:
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation

● Properties
Variance is a ofnumerical
Variancevalue that
describes the variability of observations
from its arithmetic mean.
● Variance is nothing but an average of
squared deviations.
● Variance is denoted by sigma-squared
● Variance is expressed in square units
which are usually larger than the values
in the given dataset.
Image Source:
● Variance measures how far individuals
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation
Properties of Variance
● In statistics, variance is defined as the
measure of variability that represents
how far members of a group are spread
● It finds out the average degree to which
each observation varies from the mean.
● When the variance of a data set is small,
it shows the closeness of the data points
to the mean whereas a greater value of
variance represents that the
Image Source:
observations are very dispersed ©around
Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation

Properties of Standard Deviation

● Standard deviation is a measure that

quantifies the amount of dispersion of
the observations in a dataset.
● The low standard deviation is an
indicator of the closeness of the scores
to the arithmetic mean and a high
standard deviation represents.
● The scores are dispersed over a higher
Image Source:
range of values.
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation
Properties of Standard Deviation
● Standard deviation is a measure of the
dispersion of observations within a data
set relative to their mean.
● The standard deviation is the root mean
square deviation.
● standard deviation is labelled as sigma
● standard deviation which is expressed in
the same units as the values in the set
of data.
Image Source:
● Standard Deviation measures how© Edunet
Foundation. All rights reserved.
Properties of Variance and
standard deviation

Example : To find Standard Deviation

and Variance
● Marks scored by a student in five
subjects are 60, 75, 46, 58 and 80
● You have to find out the standard
deviation and variance.
● First of all, you have to find out the

Image Source:
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation

Example : To find Standard Deviation

and Variance
● Now calculate the variance
● Where, X = Observations
● A = Arithmetic Mean
● Both variance and standard deviation
are always positive.
● If all the observations in a data set are
identical, then the standard deviation
and variance will be zero.
Image Source:
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation

Difference between Standard

Deviation and Variance

Image Source:
© Edunet Foundation. All rights reserved.
OLAP Concept

What is OLAP?
● Online Analytical Processing (OLAP) is
a category of software that allows users
to analyze information from multiple
database systems at the same time.
● It is a technology that enables analysts
to extract and view business data from
different points of view.
● Analysts frequently need to group,
aggregate and join data.
Image Source:
● These operations in relational databases
© Edunet Foundation. All rights reserved.
OLAP Concept

● OLAP Cube
OLAP databases are divided into one or
more cubes.
● The cubes are designed in such a way
that creating and viewing reports
become easy. The OLAP cube is a data
structure optimized for very quick data
● The OLAP Cube consists of numeric
facts called measures which are
categorized by dimensions. Image Source:
© Edunet Foundation. All rights reserved.
OLAP Concept

● A
How it Works?
Data warehouse would extract
information from multiple data sources
and formats like text files, excel sheet,
multimedia files, etc.

● The extracted data is cleaned and

transformed. Data is loaded into an
OLAP server (or OLAP cube) where
information is pre-calculated in advance Image Source:
© Edunet Foundation. All rights reserved.
OLAP Concept

Basic analytical operations of OLAP

Four types of analytical operations in OLAP

● Roll-up
● Drill-down
● Slice and dice
● Pivot (rotate)
Image Source:
© Edunet Foundation. All rights reserved.
OLAP Concept

Basic analytical operations of OLAP

• Roll-up is also known as "consolidation"

or "aggregation." The Roll-up operation
can be performed in 2 ways
• 1.Reducing dimensions
• 2.Climbing up concept hierarchy.
Concept hierarchy is a system of
grouping things based on their order or
level. Image Source:
© Edunet Foundation. All rights reserved.
OLAP Concept
• In this example, cities New jersey and
Basic analytical operations of OLAP
Lost Angles and rolled up into country
• The sales figure of New Jersey and Los
Angeles are 440 and 1560 respectively.
They become 2000 after roll-up
• In this aggregation process, data is
location hierarchy moves up from city to
the country.
• In the roll-up process at least one or
Image Source:
more dimensions need to be removed.
© Edunet Foundation. All rights reserved.
OLAP Concept

Basic analytical operations of OLAP

• In drill-down data is fragmented into

smaller parts. It is the opposite of the
rollup process. It can be done via
1. Moving down the concept hierarchy
2.Increasing a dimension

Image Source:
© Edunet Foundation. All rights reserved.
OLAP Concept
Drill Down
Basic analytical operations of OLAP

Consider the diagram :

1. Quater Q1 is drilled down to months

January, February, and March.
Corresponding sales are also registers.
2. In this example, dimension months are
Image Source:
© Edunet Foundation. All rights reserved.
OLAP Concept

Basic analytical operations of OLAP

• In drill-down data is fragmented into

smaller parts. It is the opposite of the
rollup process. It can be done via
1. Moving down the concept hierarchy
2.Increasing a dimension

Image Source:

© Edunet Foundation. All rights reserved.
OLAP Concept
Basic analytical operations of OLAP

Consider the diagram :

● Dimension Time is Sliced with Q1 as the

● A new cube is created altogether.

Image Source: h

© Edunet Foundation. All rights reserved.
OLAP Concept

Basic analytical operations of OLAP

• This operation is similar to a slice. The

difference in dice is you select 2 or more
dimensions that result in the creation of
a sub-cube.

Image Source:

© Edunet Foundation. All rights reserved.
OLAP Concept

Basic analytical operations of OLAP

● In Pivot, you rotate the data axes to

provide a substitute presentation of
● In the following example, the pivot is
based on item types.

Image Source: h

© Edunet Foundation. All rights reserved.
OLAP Concept

Types of OLAP systems

● Types of OLAP Systems

Image Source: h
© Edunet Foundation. All rights reserved.
OLAP Concept

● What
Relational is ROLAP?
Online analytical
● ROLAP is an extended RDBMS along
with multidimensional data mapping to
perform the standard relational
● ROLAP works with data that exist in a
relational database.
● Facts and dimension tables are stored
as relational tables. It also allows Image Source::
© Edunet Foundation. All rights reserved.
OLAP Concept

Advantages of ROLAP

● High data efficiency. It offers high data

efficiency because query performance
and access language are optimized
particularly for the multidimensional data
● Scalability. This type of OLAP system
offers scalability for managing large
volumes of data, and even when the
Image Source::
data is steadily increasing.
© Edunet Foundation. All rights reserved.
OLAP Concept

Disadvantages of ROLAP
● Demand for higher resources: ROLAP
needs high utilization of manpower,
software, and hardware resources.
● Aggregately data limitations. ROLAP
tools use SQL for all calculation of
aggregate data. However, there are no
set limits to the for handling
● Slow query performance. Query
Image Source::
performance in this model is slow© Edunet
Foundation. All rights reserved.
OLAP Concept

What is MOLAP?

● MOLAP uses array-based

multidimensional storage engines to
display multidimensional views of data.
Basically, they use an OLAP cube.

Image Source::
© Edunet Foundation. All rights reserved.
OLAP Concept

What is Hybrid OLAP?

● Hybrid OLAP is a mixture of both

● It offers fast computation of MOLAP and
higher scalability of ROLAP. HOLAP
uses two databases.
● Aggregated or computed data is stored
in a multidimensional OLAP cube
● Detailed information is stored in a
Image Source::
relational database.
© Edunet Foundation. All rights reserved.
OLAP Concept

● This kind of OLAP helps to economize

the of Hybrid
disk space, and itOLAP
also remains
compact which helps to avoid issues
related to access speed and
● Hybrid HOLAP's uses cube technology
which allows faster performance for all
types of data.
● ROLAP are instantly updated and
HOLAP users have access to this
real-time instantly updated data. MOLAP Image Source::
© Edunet Foundation. All rights reserved.
OLAP Concept

OLAP tools

● Business Analytic tools (OLAP) are IBM

Cognos, Micro Strategy, Palo OLAP
Server, Apache Kylin, Oracle OLAP,
icCube, Pentaho BI, JsHypercube, etc.
● We can apply security restrictions on
users and objects using OLAP tools.
● It creates a single platform for planning,
forecasting, reporting, and analysis. Image Source::
© Edunet Foundation. All rights reserved.
OLAP Concept

● OLAP is a platform for all type of

Advantages of OLAP
business includes planning, budgeting,
reporting, and analysis.
● Information and calculations are
consistent in an OLAP cube. This is a
crucial benefit.
● Quickly create and analyze "What if"
● Easily search OLAP database for broad
or specific terms.
Image Source::
● OLAP provides the building blocks for
© Edunet
Foundation. All rights reserved.
OLAP Concept

Advantages of OLAP
● Allows users to do slice and dice cube
data all by various dimensions,
measures, and filters.
● It is good for analyzing time series.
● Finding some clusters and outliers is
easy with OLAP.
● It is a powerful visualization online
analytical process system which
Image Source::
provides faster response times
© Edunet Foundation. All rights reserved.
OLAP Concept

Disadvantages of OLAP
● OLAP requires organizing data into a
star or snowflake schema. These
schemas are complicated to implement
and administer
● You cannot have large number of
dimensions in a single OLAP cube
● Transactional data cannot be accessed
with OLAP system.
● Any modification in an OLAP cube
Image Source::
needs a full update of the cube. This is Foundation.
© Edunet a
All rights reserved.
OLTP Concept

Overview of OLTP

● OLTP or Online Transaction Processing

is a type of data processing approach,
where the transactions play the major
role for data manipulation in the
● This type of data processing is known
for its high performance, faster
accessibility and reliable & consistent Image Source:
© Edunet Foundation. All rights reserved.
OLTP Concept

Understanding OLTP

● In the case of online airline booking, we

need to book an airline which is related
to insertion in the database.
● OLTP ensures the availability in the cart
and concurrency in case a large number
of users are accessing the same
website at the same time.

© Edunet Foundation. All rights reserved.

OLTP Concept

Characteristics OLTP

● 3NF databases

● Predefined operations

● Updating of databases is directly

accessible to end users.

● A small number of records

Image Source:
● Maintaining historical data
© Edunet Foundation. All rights reserved.
OLTP Concept

How does OLTP make working so


● Online transaction process concerns

about concurrency and atomicity.

● OLTP stores less historical data which

make it efficient.

● it maintains the consistency and

concurrency of the data in the
© Edunet Foundation. All rights reserved.
OLTP Concept

What can you do with OLTP?

● Its goal is to availability, speed,

concurrency, and recoverability.
● A large number of users can conduct
short transactions using OLTP systems.
● We can design such systems that help
in performing operations whose
database queries are usually simple,
require less than second response times
and return comparatively fewer records.
© Edunet Foundation. All rights reserved.
OLTP Concept

Working with OLTP

● It involves gathering information as

input, processing the data according to
needs and updating data to reflect the
processing information.
● For various decentralized database
systems, OLTP brokering programs
distribute transactions processes among
multiple computers on a network.
● OLTP is also carried into the
service-oriented architecture (SOA) and
Web services. © Edunet Foundation. All rights reserved.
OLTP Concept

OLTP Advantages

● Concurrency
● Acid Compliance
● Availability
● Integrity

© Edunet Foundation. All rights reserved.

OLTP Concept

OLTP Disadvantages

● For such concurrency, availability and

faster transactions OLTP often requires
support for transactions that include
many companies networks.
● Thus in today’s era, we require a more
decentralized system.

© Edunet Foundation. All rights reserved.

OLTP Concept

Why should we use OLTP?

● To use less paper and make a faster,

more accurate prediction of revenues
and expenses.
● The system that requires offline
maintenance makes a good requirement
for online transaction processing.
● Availability, concurrency, and atomicity
of data are much more important.

© Edunet Foundation. All rights reserved.

OLTP Concept

Why do we need OLTP?

● OLTP to perform the tasks

● Maintains normalized databases
● Decentralized system
● Business intelligence tasks

© Edunet Foundation. All rights reserved.

Analyze data using
statistical and data mining
techniques for business

Disclaimer: The content is curated for educational purposes only.

© Edunet Foundation. All rights reserved.
In this section, we will discuss:

● BI component framework
● business intelligence for management
● operational BI
● BI for process and performance improvement
● Role of Business Intelligence in Improving customer experience
● business intelligence role and responsibilities
● Popular BI tools in the market.

© Edunet Foundation. All rights reserved.

BI component framework


● Architecture and components of a BI


Image Source:

© Edunet Foundation. All rights reserved.

BI component framework

Architecture Components

Data Warehouse

● Data warehouse is the core of the BI

● A data warehouse is a database built for
the purpose of data analysis and
Image Source:

© Edunet Foundation. All rights reserved.

BI component framework

Architecture Components

Extract Transform Load

● It is very likely that more than one

system acts as the source of data
required for the BI system.
● Finally, loads it into the data warehouse;
this process is called Extract Transform
Load (ETL).
Image Source:

© Edunet Foundation. All rights reserved.

BI component framework

Architecture Components

Data model – BISM

● This layer, which we call the data model,

contains a file-based or memory-based
model of the data for producing very
quick responses to reports.

Image Source:

© Edunet Foundation. All rights reserved.

BI component framework

Architecture Components

Data visualization
● The frontend of a BI system is data
visualization. In other words, data
visualization is a part of the BI system
that users can see.
● There are different methods for
visualizing information, such as strategic
and tactical dashboards, Key
Performance Indicators (KPIs), and Image Source:
detailed or consolidated reports. -and-its-architecture.html

© Edunet Foundation. All rights reserved.

BI component framework

Architecture Components

Master Data Management

● Master Data Management (MDM) is the

process of maintaining the single
version of truth for master data entities
through multiple systems.

Image Source:

© Edunet Foundation. All rights reserved.

BI component framework

Architecture Components

Data Quality Services

● The quality of data is different in each

operational system, especially when we
deal with legacy systems or systems
that have a high dependence on user
Image Source:

© Edunet Foundation. All rights reserved.

Business intelligence for

BI Management

● BI Management ensures the

management and steering of business
intelligence and of the organizational
units involved as well as the integration
into an existing expert, technical and
organizational BI environment Image Source:

© Edunet Foundation. All rights reserved.

Business intelligence for
What does Business Intelligence
Management include?

● Four components: analysts, data

solutions, decision making, and

Image Source:

© Edunet Foundation. All rights reserved.

Business intelligence for

BI Governance

● BI governance defines the rules

according to which business intelligence
is steered, organized, implemented, and
developed further.
Image Source:

© Edunet Foundation. All rights reserved.

Business intelligence for

BI Awareness

● BI awareness describes the

company-wide understanding of BI.
Uniform and consistent BI
understanding forms the basis for
successful BI projects.
Image Source:

© Edunet Foundation. All rights reserved.

Business intelligence for

BI Strategy

● A BI strategy must be developed,

adapted, and updated continuously.
● i.e. the identification of the ambitions of
the BI sponsors and based on this, the
definition of the strategy-relevant initial
situation according to which concrete
Image Source:
goals can be derived.

© Edunet Foundation. All rights reserved.

Business intelligence for

BI Organization

● The BI Competence Centre is an

organizational unit that, ideally, will be a
service-providing division that is part of
a certain management field.
Image Source:

© Edunet Foundation. All rights reserved.

Business intelligence for

BI Requirements Engineering

● Identification of BI-specific requirements

and their distinction from other projects

Image Source:

© Edunet Foundation. All rights reserved.

Operational BI


● Operational business intelligence (OBI)

systems provide an intermediate step
toward satisfying the strategic needs
that data warehouses address as well
as the tactical decision-making that
enterprise application integration (EAI)

© Edunet Foundation. All rights reserved.

Operational BI

Business Operations

● Hourly/daily minibatches of transactions

are sent to the OBI system that first logs
the transactions in a transaction
database, and then processes changes
in a data-mining engine. From this data,
the OBI system runs its rules-based
detection system, and generates a Image Source:
suspected fraud report.

© Edunet Foundation. All rights reserved.

Operational BI

Business Operations(Contd..)

● Business intelligence (BI) that helps

drive and optimize business operations
on a daily basis and sometimes used for
intra-day decision-making, is called
operational business intelligence.
Image Source:

© Edunet Foundation. All rights reserved.

Operational BI

Business Operations(Contd..)

● Conceptually, OBI systems are thought

of as a data mart that is updated
frequently (daily, every few hours, or
even every few minutes or seconds)
with minibatches.
● OBI systems are similar to data marts
because they generally focus on a
specific task rather than on Image Source:
enterprise-wide functions. perational-bi/

© Edunet Foundation. All rights reserved.

Operational BI
Case Study : Real-Time Credit and
Debit Card Fraud Detection, an
HPE Shadowbase

● A complex suspicious or fraudulent

activity determination is made and
action taken while a transaction is in the
process of being gathered, routed,
authorized, and returned to the
origination point, or shortly thereafter,
Image Source:
typically far sooner than otherwise

© Edunet Foundation. All rights reserved.

BI for process and
performance improvement

What is Business Intelligence

● Business intelligence (BI) is software

and services that take raw data and turn
it into relevant and practical insights that
companies can use to strengthen their
positions and business decisions.
● BI tools analyze large sets of data based
on queries that are written to fetch
specific types of information.

© Edunet Foundation. All rights reserved.
BI for process and
performance improvement

What is Business Intelligence

● The results are then formatted and

displayed as summaries, graphs,
reports, charts, and maps for further
analysis for decision making.
● There are many company benefits when
it comes to gathering customer and
competitor data. Let's explore the
benefits of using BI for improving
internal workflows.
Image Source:
© Edunet Foundation. All rights reserved.
BI for process and
performance improvement
Benefits of using business intelligence
for internal process improvement

● Change in your industry and even global

changes will affect business processes
at some point, and this means your
business will need to evolve its
processes to remain competitive. Using
BI to gather workflow-based information
offers many benefits, including these:

Image Source:
© Edunet Foundation. All rights reserved.
BI for process and
performance improvement
Benefits of using business intelligence
for internal process improvement

● Reducing the time it takes to make

● Optimizing internal processes to help
your employees focus on higher-value
● Reducing the time it takes to get your
product or service to market.

Image Source:
© Edunet Foundation. All rights reserved.
BI for process and
performance improvement
Benefits of using business intelligence
for internal process improvement

● Increasing customer satisfaction through

improved efficiencies and better service
● Freeing up more time to focus on other
things, like quality and customer
retention initiatives
● Overall improved operational efficiency
and agility
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Visualizing with Big Data
● As the old saying goes, ‘a picture is
worth a thousand words,’ and the same
can be said for understanding data.
● Visualization tools help organize
extremely large, fast-moving data sets in
real-time to understand the current state
of the customer experience.
● “[Data imaging] tools are very helpful in
drawing attention to critical points in the
customer journey and experience, and
pulling out some actionable insights,”
Image Source:
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Enabling self-service

● Jerry Leisure, head of customer

experience mobile-gaming company,
Kabam, says the gaming industry has a
well-developed player community that
altruistically wants to help fellow
● Traditionally, this kind of crowd-sourced
self-support has lived in player forums
unaffiliated with the game maker itself.
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Enabling self-service

● Companies can also use BI to

collaborate with influential customers.
● “We [can] provide helpful information to
content creators on YouTube, for
example, who then share that
information with other players,” says

© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Leveraging artificial Intelligence

● AI is already helping to improve the

traveler experience at various
touchpoints, explains Erica Ellington,
director of projects and support for
Southwest Airlines.
● “Business intelligence enables us to
leverage both structured and
unstructured data to help us make more
informed decisions to improve the
customer experience.”
Image Source:
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Leveraging artificial Intelligence

● McCallister of CX University adds that in

order for AI and BI to effectively discover
customer insights together, more data
needs to become structured (i.e.,
well-organized, uniform information).
● To transform data from unstructured to
structured requires capturing, tagging
and classifying as much data as
© Edunet Foundation. All rights reserved.
Business Intelligence Role
and Responsibilities

The Role of Business Intelligence

● Business intelligence, or BI, is a type of

software that can harness the power of
data within an organization.
● It offers a better way to sort, compare,
and review data in order for companies
to make smart decisions.

© Edunet Foundation. All rights reserved.
Business Intelligence Role
and Responsibilities

The Role of Business Intelligence

● Companies adopting business

intelligence solutions can turn business
data into insights and take plausible
● These insights can help companies
make strategic business decisions that
increase productivity, improve revenues,
and enhance growth
© Edunet Foundation. All rights reserved.
Business Intelligence Role
and Responsibilities

The Role of Business Intelligence

● Companies adopting business

intelligence solutions can turn business
data into insights and take plausible
● These insights can help companies
make strategic business decisions that
increase productivity, improve revenues,
and enhance growth
© Edunet Foundation. All rights reserved.
Business Intelligence Role
and Responsibilities

BI lives Up to Its Reputation

● There are many reasons why

companies choose business intelligence
○ Better planning and analysis
○ Increased accuracy
○ Helped considerably with sales
○ Improved pricing and offers
Image Source:
© Edunet Foundation. All rights reserved.
BI lives Up to Its Reputation

Better planning and analysis

● Companies felt that BI systems helped

them the most with faster reporting,
planning, and analysis.
● 64% of responding companies ranked
their ability to report, plan and analyze
data as “good” after implementing a
business intelligence suite.

Image Source:
© Edunet Foundation. All rights reserved.
BI lives Up to Its Reputation

Increased Accuracy

● Among the companies surveyed, 56%

felt that business intelligence data
increased the accuracy of their business
analysis and planning.

© Edunet Foundation. All rights reserved.
BI lives Up to Its Reputation
Helped considerably with sales

● Among the many tasks that companies

felt that business intelligence data
helped with, 57% ranked sales
forecasting and planning as the area
receiving the most benefit from BI data.
● Other areas where they felt that BI date
provided assistance was in customer
behavior analysis (40%) and a unified
view of customers (32%).
Image Source:
© Edunet Foundation. All rights reserved.
BI lives Up to Its Reputation

Improved pricing and offers

● Pricing and offer optimization benefited

somewhat from the implementation of a
BI system.
● 27% of respondents felt that the
additional data derived from their BI
system helped them improve their
pricing structure to become more
competitive, as well as improve the
attractiveness of their offers.

Image Source
© Edunet Foundation. All rights reserved.
Popular BI tools in the
What are BI Tools

● BI tools are types of software used to

gather, process, analyze, and visualize
large volumes of past, current, and
future data in order to generate
actionable business insights, create
interactive reports, and simplify the
decision-making processes.

© Edunet Foundation. All rights reserved.
Popular BI tools in the
What are BI Tools

● These tools include key features such

as data visualization, visual analytics,
interactive dashboarding and KPI
● Additionally, they enable users to utilize
automated reporting and predictive
analytics features based on self-service.

© Edunet Foundation. All rights reserved.
Popular BI tools in the
The benefit of BI tools

● Professional software and tools offer

various prominent benefits, here we will
focus on the most invaluable ones:
○ They bring together all relevant
○ Their true self-service analytics
approaches unlock data access
○ Users can take advantage of
○ They eliminate manual tasks
○ They reduce business costs
Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the
The benefit of BI tools

● BI tools that are leaders in the business

intelligence community, often mentioned
in industry articles, and obtain a favorable
level of user reviews on Capterra, as
● The order of the tools is random and
doesn't represent a grading or ranking
system in any form.

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the
The benefit of BI tools

● SAS Business Intelligence
● Clear Analytics
● SAP Business Objects
● Microstrategy
● Good Data
● IBM Cognos Analytics
● Qlikview
● Yellowfin BI
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Datapine is a BI software that lets you

connect your data from various sources
and analyze with advanced analytics
features (including predictive).
● With your analysis, you can create a
powerful business dashboard (or
several), generate standard or
customized reports or incorporate
intelligent alerts to get notified of
anomalies and targets.
Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● This tool, rated with outstanding 4.6

stars on Capterra, is a powerful solution
for businesses of all sizes since
datapine can be implemented for
various industries, functions, and
platforms, no matter the size.

Image Source:

© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key Feature of DATAPINE

○ Intuitive drag-and-drop interface
○ Easy-to-use predictive analytics
○ Many interactive dashboard
○ Multiple reporting options
○ Smart insights and alarms based
on artificial intelligence

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● SAS Business Intelligence is a software

solution offering numerous products and
technologies for data scientists, text
analysists, data engineers, forecasting
analysts, econometricians, and
optimization modelers, among others.

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Founded in the 70s, SAS Business

Intelligence enjoys a long tradition in the
market, building and expanding its
products every year.
● With a Capterra rating of 4.5*, this
software enjoys a high level of users’
trust and satisfaction.

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key features of SAS Business

○ Data exploration supported by
machine learning
○ Text analytics capabilities
○ Reports and dashboards across
○ Integration with other applications

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Clear Analytics is a tool that

consolidates data from internal systems,
cloud, accounting, CRM, and allows you
to drag-and-drop that data into Excel.
● It works with Microsoft Power BI, using
Power Query and Power Pivot to clean
and model different datasets.
● Capterra gives a high user review of 4.5
stars making this tool also one of the
highest-rated on our list.
Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key Features of Clear Analytics:

○ Reports delivered to Power BI
○ Connected with Excel
○ Sharing on mobile devices
○ A full audit trail
○ Fetch data elements with a
semantic layer

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● SAP BusinessObjects is a business

intelligence suite designed for
comprehensive reporting, analysis, and
data visualization.
● They provide Office integrations with
Excel and PowerPoint where you can
create live presentations and hybrid
analytics that connects to their
on-premise and cloud SAP systems.

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● They’re focused on business categories

such as CRM and customer experience,
ERP and digital core, HR and people
engagement, digital supply chain, and
many more.
● To be accurate, more than 170M users
leverage SAP across the world, making
it one of the largest software suppliers in
the world.

© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key features of SAP Business Objects:

○ A BI enterprise reporting system
○ Self-service, role-based
○ Cross-enterprise sharing
○ Connection with SAP Warehouse
and HANA
○ Integration with Office

© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Domo is a BI solution comprised of

multiple systems that are featured in this
platform, starting with connecting the
data, and finishing with extending data
with pre-built and custom apps from the
Domo Appstore.
● You can use Domo also for your data
lakes, warehouses, and ETL tools,
alongside with R or Python scripts to
prepare data for predictive modeling.
Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key features of DOMO:

○ Numerous pre-built cloud
○ Magic ETL feature
○ Automatically suggested
○ Mr. Roboto as an AI engine
○ Domo Appstore

© Edunet Foundation. All rights reserved.
Popular BI tools in the

● MicroStrategy is an enterprise analytics

and mobility platform focused on hyper
intelligence, federated analytics, and
cloud solutions.
● Their mobile dossiers enable users to
build interactive books of analytics that
render on iOS or Android devices, with
the possibility to extend the
MicroStrategy content into their apps by
using Xcode or JavaScript.
Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Capterra users gave a solid 4* review,

hence, this is one of our examples of
business intelligence tools having strong
references on the BI market.

Image Source
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key features of Microstrategy

○ Hyperintelligence pulls your data
○ Federated analytics
○ Mobile deployment
○ Integration with voice technology
○ Cloud technology

Image Source

© Edunet Foundation. All rights reserved.
Popular BI tools in the

● GoodData is a business analytics

software that provides the tools for data
ingestion, storage, analytic queries,
visualizations, and application
● You can embed their analytics into your
website, desktop or mobile application
or create dashboards and reports for
your daily activities, without the need to
obtain a Ph.D., as stated on their
Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key features of Gooddata

○ Customers can publish their own
○ A modular data pipeline
○ A platform for developers
○ Additional support
○ 4 Data centers

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Part of the Microsoft family, IBM Cognos

Analytics is a cloud-based business
intelligence software that utilizes AI
recommendations when creating
dashboards and reports, geospatial
capabilities to overlay your data with the
physical world, and enables you to ask
questions in plain English to
communicate with the software.

Image Source
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● A robust solution from one of the

industry leaders in software
development, IBM Cognos Analytics
received a sturdy 4 stars review on

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key features of IBM Cognos Analytics:

○ Search mechanism
○ A single data module
○ Interactive data visualization
○ AI assistant
○ Extensive knowledge center:
Integration with other applications

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● QlikView is one of BI applications

offered by Qlik as part of its data
analytics platform focused on rapid
development and guided analytics
applications and dashboards.
● It’s built on an Associative Engine that
allows data discovery without the need
to use query-based tools, eliminating the
risk of data loss and inaccurate results.

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● A high rating of 4.5 stars on Capterra,

users are quite satisfied with this
product and its features, making it one
of the top BI tools on our list.

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key features of QLIKVIEW:

○ Associative exploration
○ Visually highlighted dashboards
○ Associative Engine
○ A dual-use strategy
○ Developer’s platform

Image Source:
© Edunet Foundation. All rights reserved.
Popular BI tools in the

● A suite of products consisted of

dashboards, signals, stories, data
discovery and data prep, this BI
analytics tool offers numerous features,
including a mobile app available both for
Android and iOS devices.
● Capterra users gave a strong rating of
4.5*, hence, it makes sense to take a
closer look at what they have on offer.

© Edunet Foundation. All rights reserved.
Popular BI tools in the

● Key features of YELLOWFIN BI:

○ Yellowfin signals via smartphone
○ Persuasive data stories
○ Smart tasks

© Edunet Foundation. All rights reserved.
Understand case studies for
predictive models

Disclaimer: The content is curated for educational purposes only.

© Edunet Foundation. All rights reserved.
In this section, we will discuss:

● Concept of data mining techniques

● Concepts of data mining model with its development and deployment in
business scenario
● Data mining models
● CRISP-DM model
● understanding of data and its preparation techniques for the better model
● introduction to sampling and data partitioning in data mining project

© Edunet Foundation. All rights reserved.

Concept of Data Mining

Data Mining concept

● Data mining processes structured

information through the application of
artificial intelligence, neural networks,
and advanced statistical tools in order to
detect patterns and summarize data into
a format that can be understood.
● It allows corporations to anticipate future
trends, uncover new opportunities, and
most importantly improve overall
Image Source:
© Edunet Foundation. All rights reserved.
Concept of Data Mining

Data Mining concept (Contd.)

● Data mining is the term used to describe
the process of extracting value from a
● Data mining involves the use of
sophisticated data analysis tools to
discover previously unknown, valid
patterns and relationships in large
data sets.
● Data Mining consists of more than
collecting and managing data, it also
includes analysis and prediction. Image Source:
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Data mining techniques
– Tracking patterns

● It is one of the most elementary

techniques in data mining is learning to
identify outlines in your data sets.
● It is typically a recognition of some
deviation in your data trendy at regular
intervals, or a variation of a certain
variable over time.
Image Source:
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Data mining techniques
– Classification

● It is a classic data mining technique

based on machine learning.
● It is used to classify each item in a set of
data into one of a predefined set of
classes or groups.
● Classification method makes use of
mathematical techniques such as
decision trees, linear programming,
neural network, and statistics. Image Source:
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Data mining techniques
– Association

● Association data mining notices

recurring themes in databases,
recognizes relations between them and
develops a pattern of these relations.
● It will then use these patterns as a
reference to predict future behaviour.

Image Source:

© Edunet Foundation. All rights reserved.
Concept of Data Mining
Data mining techniques
– Outlier Detection

● Outlier is defined as an observation that

deviates too much from other
observations. The identification of
outliers can lead to the discovery of
useful and meaningful knowledge.
● In many cases, basically identifying
the all-embracing pattern cannot give
you a clear understanding of your data
set. You also need to be able to classify
irregularities or outliers in your data. Image Source:
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Data mining techniques
– Clustering

● Clustering is a data mining technique

that makes a meaningful or useful
cluster of objects which have similar
characteristics using the automatic
● The clustering technique defines the
classes and puts objects in each
class, while in the classification
techniques, objects are assigned into
Image Source:
predefined classes. © Edunet Foundation. All rights reserved.
Concept of Data Mining
Data mining techniques
– Regression

● Regression, used mainly as a form

of planning and modelling, is used to
classify the probability of a certain
variable, given the presence of other
● Regression and Classification both are
used in prediction analysis, but
regression is used to predict a numeric
or continuous value while classification
Image Source:
assigns data into discrete categories.
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Data mining techniques
– Prediction

● It is one of a data mining technique

that learns the relationship between
independent variables and the
relationship between dependent and
independent variables.
● Prediction derives the relationship
between a thing you know and a thing
you need to predict for future reference.
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business
Phases and Tasks

● It is a step by step procedure for

implementation of data mining in a
business scenario.
● The phases and tasks include –
Business understanding, Data
understanding, Data preparation,
Modelling, Evaluation, Deployment.

© Edunet Foundation. All rights reserved.

Concepts of data mining
model with its development
and deployment in business
Business understanding

● Data mining goals are defined.

● The fundamental requirement is to
understand client and business
● Current data mining scenario, factors
in resources, constraints and
assumptions should be taken into the
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business

● In this stage, a sanity check is

conducted to understand whether it is
appropriate for data mining goals.
● The data is collected from various
sources within the organization.
● It is a highly complex process since data
and process from various sources
unlikely to match easily.
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business
Data preparation

● The data is production ready in this

● The data from diverse sources
should be nominated, cleaned,
transformed, formatted, anonymized,
and created.
● Data cleaning is a process to "clean" the
data by smoothing noisy data and
satisfying in missing values. Image Source:
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business

● In this stage, mathematical models

are used to determine the data
● Suitable modelling techniques need to
be chosen for the prepared data set.
● After that, create a scenario to validate
the model. Then run the model on the
prepared data set.
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business

● In this stage, patterns recognized are

examined against business objectives.
● A go or no-go decision should be taken
to move the model in the deployment

Image Source:
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business

● In this stage, ship your data mining

discoveries to every business operation.
● A thorough deployment plan, for
shipping, maintenance, and monitoring
of data mining discoveries is created.

Image Source:
© Edunet Foundation. All rights reserved.
Data mining models

Types of models

● Data mining models can be broadly

classified into two categories:

○ Predictive Model

○ Descriptive Model

Image Source:
© Edunet Foundation. All rights reserved.
Data mining models

Predictive model

● The predictive model makes a forecast

about unidentified data values by
using the identified values.
● The forecast is the process of
investigating the existing and previous
states of the attribute and forecast of its
forthcoming state.
● The techniques that fall under this
category are the classification,
Image Source:
regression and time-series analysis.
© Edunet Foundation. All rights reserved.
Data mining models

Descriptive model

● It identifies the projects or

relationships in data and discovers
the properties of the data studied.
● These descriptive data mining
techniques are used to obtain
information on the regularity of the data
by using raw data as input and to
discover important patterns.
● For example, Clustering,
Summarization, Association rule, Image Source:
© Edunet Foundation. All rights reserved.
CRISP-DM model

Basic concepts

● It stands for Cross-Industry Standard

Process for Data Mining, an
industry-proven way to guide your data
mining efforts.
● As a methodology, it includes
descriptions of the typical phases of a
project, the tasks involved with each
phase, and an explanation of the
relationships between these tasks.
● As a process model, CRISP-DM
Image Source:
provides an overview of the data ©mining
Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Why is it necessary to Partition ?

● For easy management

● To assist backup/recovery
● To enhance performance

Image Source:

© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
What is data preparation ?

● Data preparation is the process of

cleaning and transforming raw data prior
to processing and analysis.
● It is an important step prior to
processing and often involves
reformatting data, making corrections to
data and the combining of data sets to
enrich data.

Image Source:
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Why prepare data ?

● Data need to be formatted for a given

software tool.
● Data need to be made adequate for a
given method.
● Data in the real world is dirty as :
○ Incomplete
○ Noisy (contains error)
○ Inconsistent

Image Source:
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Major task in Data preparation

● Data discretization
● Data cleaning
● Data integration
● Data transformation
● Data reduction

Image Source:
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data Understanding)

● Collect Data
● Describe data
● Explore data
● Verify data quality

Image Source:
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Select data
○ Reconsider data selection criteria.
○ Decide which dataset will be used.
○ Collect appropriate additional data
(internal or external).
○ Consider use of sampling
○ Explain why certain data was
included or excluded.
Image Source:
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Clean data
○ Correct, remove or ignore noise.
○ Decide how to deal with special
values and their meaning
○ Aggregation level, missing values,
○ Outliers?

Image Source:
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Construct data
○ Derived attributes.
○ Background knowledge.
○ How can missing attributes be
constructed or imputed?

Image Source:
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Integrate data
○ Integrate sources and store result
(new tables and records).

Image Source:
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Format data
○ Rearranging attributes
○ Reordering records (Perhaps the
modelling tool requires that the
records be sorted according to the
value of the outcome attribute).
○ Reformatted within-value These
are purely syntactic changes made to
satisfy the requirements of the
specific modelling tool, remove illegal
Image Source:
characters, uppercase lowercase).
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
What is data sampling ?

● Data sampling is a statistical analysis

technique used to select, manipulate
and analyze a representative subset of
data points to identify patterns and
trends in the larger dataset being

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
What is data sampling ?

● It enables data scientists, predictive

modelers and other data analysts to
work with a small, manageable amount
of data about a statistical population to
build and run analytical models more
quickly, while still producing accurate

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Advantages of sampling

● Sampling can be particularly useful with

data sets that are too large to efficiently
analyze in full.
● Identifying and analyzing a
representative sample is more efficient
and cost-effective than surveying the
entirety of the data or population.

Image Source:

© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Steps involved in sampling

● Identify and define Target population

● Select sampling frame
● Choose sampling methods
● Determine Sample size
● Collect the required data

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Types of data sampling methods

● Simple random sampling

● Stratified sampling
● Cluster sampling
● Systematic sampling

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Types of data sampling methods

● Convenience sampling
● Snowball sampling
● Purposive or judgmental sampling
● Quota sampling

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Data Partitioning

● The simplest and most fundamental

version of cluster analysis is partitioning,
which organizes the objects of a set into
several exclusive groups or clusters.
● Given a data set, D, of n objects, and k,
the number of clusters to form, a
partitioning algorithm organizes the
objects into k partitions (k ≤ n), where
each partition represents a cluster.
Image Source:
© Edunet Foundation. All rights reserved.
Able to develop case
studies for predictive
analytical models

Disclaimer: The content is curated for educational purposes only.

© Edunet Foundation. All rights reserved.
In this section, we will discuss:

● Concepts of machine learning

● approach for data mining using decision tree inductive concept
● conceptual cluster
● attribute oriented induction
● iterative database scanning
● attribute focusing
● neural networks
● rough sets
● visualization
● concepts of odds
● concepts of odds ratio
© Edunet Foundation. All rights reserved.
Concepts of Machine

What is Machine Learning ?

● The term machine learning was first

introduced by Arthur Samuel in 1959.
● Machine Learning is said as a subset of
artificial intelligence that is mainly
concerned with the development of
algorithms which allow a computer to
learn from the data and past
experiences on their own.
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of Machine

What is Machine Learning ?

● We can define it in a summarized way


Machine Learning enables a machine to

automatically learn from data, improves
performance from experiences, and
predict things without being explicitly

Image Source:

© Edunet Foundation. All rights reserved.
Concepts of Machine

How does Machine Learning work ?

● A Machine Learning system learns from

historical data, builds the prediction
models, and whenever it receives new
data, predicts the output for it.
● The accuracy of predicted output
depends upon the amount of data, as
the huge amount of data helps to build a
better model which predicts the output
more accurately.
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of Machine

Features of Machine Learning

● Machine Learning uses data to detect

various patterns in a given dataset.
● It can learn from past data and improve
● It is a data-driven technology.
● Machine learning is much similar to data
mining as it also deals with the huge
amount of the data.

Image Source:
© Edunet Foundation. All rights reserved.
Concepts of Machine

Importance of Machine Learning

● Rapid increment in the production of

● Solving complex problems, which are
difficult for a human
● Decision making in various sector
including finance
● Finding hidden patterns and extracting
useful information from data.

Image Source:

© Edunet Foundation. All rights reserved.
Concepts of Machine
Classification of Machine Learning
(Supervised Learning)

● Supervised learning is a type of machine

learning method in which we provide
sample labeled data to the machine
learning system in order to train it, and
on that basis, it predicts the output.
● The goal of supervised learning is to
map input data with the output data.
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of Machine
Classification of Machine Learning
(Supervised Learning)

● The supervised learning is based on

supervision, and it is the same as when
a student learns things in the
supervision of the teacher. The example
of supervised learning is spam filtering.
● Supervised learning can be grouped
further in two categories of algorithms:
● Classification
● Regression Image Source:
© Edunet Foundation. All rights reserved.
Concepts of Machine
Classification of Machine Learning
(Unsupervised Learning)

● Unsupervised learning is a learning

method in which a machine learns
without any supervision.
● The goal of unsupervised learning is to
restructure the input data into new
features or a group of objects with
similar patterns.
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of Machine
Classification of Machine Learning
(Unsupervised Learning)

● In Unsupervised learning, we don't have

a predetermined result.
● he machine tries to find useful insights
from the huge amount of data. It can be
further classified into two categories of
● Clustering
● Association
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of Machine
Classification of Machine Learning
(Reinforcement Learning)

● Reinforcement learning is a
feedback-based learning method, in
which a learning agent gets a reward for
each right action and gets a penalty for
each wrong action.
● The agent learns automatically with
these feedbacks and improves its
Image Source:
© Edunet Foundation. All rights reserved.
Concepts of Machine
Classification of Machine Learning
(Reinforcement Learning)

● In reinforcement learning, the agent

interacts with the environment and
explores it.
● The goal of an agent is to get the most
reward points, and hence, it improves its
● The robotic dog, which automatically
learns the movement of his arms, is an
example of Reinforcement learning. Image Source:
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
What is Data Mining ?

● The process of extracting information to

identify patterns, trends, and useful data
that would allow the business to take the
data-driven decision from huge sets of
data is called Data Mining.
● Data Mining is also called Knowledge
Discovery of Data (KDD).
Image Source:
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
What is Decision Tree ?

● Decision Tree is a supervised learning

method used in data mining for
classification and regression methods.
● It is a tree that helps us in
decision-making purposes.
● The decision tree creates classification
or regression models as a tree structure.
● Decision trees can deal with both
categorical and numerical data. Image Source:
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
What is Decision Tree ?

● A decision tree is a structure that

includes a root node, branches, and leaf
nodes. Each internal node denotes a
test on an attribute, each branch
denotes the outcome of a test, and each
leaf node holds a class label. The
topmost node in the tree is the root
Image Source:
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
Key Factors

● Entropy : Entropy refers to a common

way to measure impurity. In the decision
tree, it measures the randomness or
impurity in data sets.

Image Source:

© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
Key Factors

● Information Gain : Information Gain

refers to the decline in entropy after the
dataset is split. It is also called Entropy
Reduction. Building a decision tree is all
about discovering attributes that return
the highest data gain.

Image Source:

© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
Why are Decision Trees useful

● It enables us to analyze the possible

consequences of a decision thoroughly.
● It provides us a framework to measure
the values of outcomes and the
probability of accomplishing them.
● It helps us to make the best decisions
based on existing data and best
Image Source:
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
Advantages of using Decision Trees

● A decision tree does not need scaling of

● Decision trees need less exertion for
data preparation during pre-processing.
● It is automatic and simple to explain to
the technical team as well as
● A decision tree does not require a
standardization of data. Image Source:
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning


● Conceptual clustering is a machine

learning paradigm for unsupervised
classification developed mainly during
the 1980s.

● It is distinguished from ordinary data

clustering by generating a concept
description for each generated class.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction (Continue)

● Most conceptual clustering methods are

capable of generating hierarchical
category structures.

● Conceptual clustering is closely related

to formal concept analysis, decision tree
learning, and mixture model learning.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction (Continue)

● The conceptual part of the process lies

in how the exemplars are agglomerated/
divided rather than in how the clusters
are described (i.e.. the cluster forming
mechanism need not maintain any
cluster descriptions).

● The second view is that of concept

formation, with exemplars as the
catalyst. © Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction (Continue)

● Under this view clusters are formed

according to their conceptual
descriptions, i.e., the system must
constantly maintain conceptual
descriptions of clusters and cluster
membership is constrained by the
concepts available to describe the
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction (Continue)

● Following the terminology of psychology,

the first view will here be called
conceptual sorting. The second view will
be called concept discovery. Each in its
own way can be said to involve
conceptual clustering.

© Edunet Foundation. All rights reserved.

Conceptual Cluster Learning

Conceptual Clustering vs. Data


● Conceptual clustering is obviously

closely related to data clustering.

● However, in conceptual clustering it is

not only the inherent structure of the
data that drives cluster formation, but
also the Description language which is
available to the learner.
© Edunet Foundation. All rights reserved. 8-php1EbhwKjvd1fTEhzg&usqp=CAU
Conceptual Cluster Learning

Conceptual Clustering vs. Data

Clustering (Continue)

● Thus, a statistically strong grouping in

the data may fail to be extracted by the
learner if the prevailing concept
description language is incapable of
describing that particular regularity.

© Edunet Foundation. All rights reserved.

Conceptual Cluster Learning

Conceptual Clustering as Concept


● One view of conceptual clustering

proposes to produce interesting
groupings and then provide them with a
conceptual interpretation.

● That is. to build extensionally defined

categories (by enumerating their
members) and then find a conceptual
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Conceptual Clustering as Concept

Sorting (Continue)

● A major reason independently rendered

clusters can have rather unappealing
conceptual interpretations is that they
practice no concept-related similarity

© Edunet Foundation. All rights reserved.

Conceptual Cluster Learning

Conceptual Clustering as Concept

Sorting (Continue)

There are two points to be made here:

1. The similarity metric used defines a

gradient over the feature space that
possesses one of the conceptual
irregularities that underlying the domain.

2. The similarity metric views all attributes

with a fixed relevance to the problem
without any way to determine attribute

relevancy from patterns in' the data.

© Edunet Foundation. All rights reserved. pg
Conceptual Cluster Learning

Conceptual Clustering as Concept


● Concept discovery systems focus on the

determination of concepts (according to
some concept representation system) to
describe each category that is formed.

● Indeed, categories are formed such that

their descriptions are as desired by the
applied biases (including
representational constraints) and a
concept based cluster quality measure. ure/fig10/AS:668839448682505@1536475159358/Overview-of-textual-concept-di
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Conceptual Clustering as Concept

Discovery (Continue)

● It is the category descriptions that are

constantly monitored, generalized,
specialized, and evaluated by the
concept-based quality measure.

● These systems incorporate mechanisms

to propose multi-relation (polythetic)
concepts as category descriptions.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Conceptual Clustering as Concept

Discovery (Continue)

● The availability of concepts is governed

by the biases of the system and the
background knowledge that is applied.

● For Example, The grape and apple differ

in color and type-of-fruit but are both
ripe; the orange and apple differ in color
and type and ripeness.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Conceptual Clustering as Concept

Discovery (Continue)

● Without background knowledge, the

concept-based approach reverts to the
attribute-based one.

● It is background knowledge that makes

the feature space and concept space
rough and irregular so that the fit of the
data to the irregularities can be used to
help confirm a candidate conceptual
interpretation. © Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Knowledge-Based Conceptual

● Discovering concepts by conceptual

clustering is not purely an inductive
inference process.

● A portion of the process involves

deductive inference to determine from
background knowledge latent attributes
for exemplars and appropriate concepts
to ready as candidate category
descriptions. © Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Knowledge-Based Conceptual
Clustering (Continue)

● A system equipped with sizable

background knowledge and a deductive
mechanism for accessing and applying it
can make a wide variety of appropriate
transformations of exemplars that will
greatly aid concept formation.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Knowledge-Based Conceptual
Clustering (Continue)
● For example, an inference rule could
suggest the construction of an attribute
whose values report the number of other
attributes (from a subset of other
attributes) having values that differ from
the most frequent attribute values.

● Such a derived attribute supports

polymorphic concepts like "2 of the 3
attributes A. B. and C have target values
of x. y, and z. respectively”.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Knowledge-Based Conceptual
Clustering (Continue)

● Since the system knows the definition of

the attribute (from background
knowledge) it is able to state
polymorphic concepts in easily
understood terms.

● The point is that additional knowledge

applied during clustering can have a
great effect on the types of categories
formed. © Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Type Levels

● Type-0: Statistic-based quality measure;

no conceptual interpretation.

● Type-1: Statistic-based quality measure;

conceptual interpretation after-the-fact.

● Type-2: Attribute-based quality

measure; no conceptual interpretation.

● Type-3: Attribute-based quality

measure; conceptual interpretation
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Type Levels (Continue)

● Type-4: Concept-based quality

measure; no background knowledge.

● Type-5: Concept-based quality

measure; background knowledge.

● Type-6: Concept-based quality

measure; background knowledge;
structured exemplars.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Application Areas

● Biology

● Medicine

● Psychology

● Climate

● Business

● Information Retrieval © Edunet Foundation. All rights reserved.

Attribute Oriented Induction


● Attribute Oriented Induction (AOI)

method was first proposed in 1989
integrates a machine learning paradigm
especially learning-from-examples
techniques with database operations,
extracts generalized rules from an
interesting set of data and discovers
high level data regularities.
© Edunet Foundation. All rights reserved. 9ba8c/3-Figure2-1.png
Attribute Oriented Induction

Introduction (Continue)

● AOI provides an efficient and effective

mechanism for discovering various kinds
of knowledge rules from datasets or

● AOI approach is developed for learning

different kinds of knowledge rules such
as characteristic rules, discrimination
rules, classification rules, data evolution
regularities, association rules and
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Characteristic Rule

● Characteristic rule is an assertion which

characterizes the concepts which
satisfied by all of the data stored in

● This rule provide generalized concepts

about a property which can help people
recognize the common features of the
data in a class.

● For example the symptom of the specific

© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Discriminant Rule

● Discriminant rule is an assertion which

discriminates the concepts of one
(target) class from another (contrasting).

● This rule give a discriminant criterion

which can be used to predict the class
membership of new data.

● For example to distinguish one disease

from the other
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Classification Rule

● Classification rule is a set of rules which

classifies the set of relevant data
according to one or more specific

● For example, classifying diseases into

classes and provide the symptoms of
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Association Rule

● Association rule is association

relationships among the set of relevant

● For example, discovering a set of

symptoms frequently occurring together

© Edunet Foundation. All rights reserved.

Attribute Oriented Induction

Data Evolution Regularities Rule

● Data evolution regularities rule is a

general evolution behavior of a set of
the relevant data (valid only in
time-related/temporal data).

● For example, describing the major

factors that influence the fluctuations of
stock values through time.
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Cluster Description Rule

● Cluster description rule is used to cluster

data according to data semantics.

● For example clustering the university

student based on different attribute(s).

© Edunet Foundation. All rights reserved.

Attribute Oriented Induction

Quantitative and Qualitative Rules in


● Quantitative rule is a rule which is

associated with quantitative information
such as statistical information which
asses the representativeness of the rule
in the database.

● There are three types quantitative rule

i.e. quantitative characteristic rule,
quantitative discriminative rule and
quantitative characteristic and
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Quantitative and Qualitative Rules in

AOI (Continue)

● Quantitative characteristic rule is

quantitative information of a
characteristic rule and each rule in final
generalization can be measured with
t-weight in formula 1.

© Edunet Foundation. All rights reserved.

Attribute Oriented Induction

Quantitative and Qualitative Rules in

AOI (Continue)

● t-weight = percentage of each rule in the

final generalized relation.

● Votes(qa) = number of tuples in each

rule in the final generalized relation
Where Votes(qa) is in Votes{q1,...,qN}.

● N = number of rules in the final

generalized relation.
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Quantitative and Qualitative Rules in

AOI (Continue)

● Quantitative discriminative rule is a

discrimination rule that use quantitative
information. Each rule in the target class
will be discriminated against a rule in the
constrating class and is measured with
d-weight in formula 2.

© Edunet Foundation. All rights reserved.

Attribute Oriented Induction

Quantitative and Qualitative Rules in

AOI (Continue)

● d-weight = percentage ratio per rule in

the target class to the total number of
tuples in the target class and the
contrasting class for the same rule.

● Votes(qa) = number of tuples in each

rule in the target class Cj.

● Cj is in {C1,...,CK}.

● K = total number of the target and

© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Quantitative and Qualitative Rules in

AOI (Continue)

● Quantitative characteristic and

discriminative rule use quantitative
information characteristic rule and
discriminative rule which have both
t-weight and d-weight for the same

● Each rule is measured with t-weight in

formula 1 for characteristic rule and
d-weight in formula 2 for discriminative
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Quantitative and Qualitative Rules in

AOI (Continue)

● Qualitative rule can be obtained by

using the same process of learning
applied in its quantitative counterpart
without the association of the
quantitative attribute in the generalized

© Edunet Foundation. All rights reserved.

Attribute Oriented Induction

Concept Hierarchies

● One advantage of AOI is that it has

concept hierarchy as the background
knowledge which can be provided by the
knowledge engineers or domain experts.

● Concept hierarchy stored a relation in

the database provides essential
background knowledge for data
generalization and multiple level data
mining. © Edunet Foundation. All rights reserved. 80123814791.jpg
Attribute Oriented Induction

Concept Hierarchies (Continue)

● Concept hierarchy represents a

taxonomy of concept of the attribute
domain values.

● Concept hierarchy can be specified

based on the relationship among
database attributes or by set groupings
and be stored in the form of relations in
the same database.
© Edunet Foundation. All rights reserved. 80123814791.jpg
Attribute Oriented Induction

Concept Hierarchies (Continue)

● Concept hierarchy can be adjusted

dynamically based on the distribution of
the set of data relevant to the data
mining tasks.

● The hierarchies for numerical attributes

can be constructed automatically based
on data distribution analysis.
A concept hierarchy tree for attribute workclass in adult dataset
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Prototype

● The AOI method was implemented in a

data mining system prototype called
DBMINER which previously called
DBLearn and been tested successfully
against large relational database.

● DBLearn is a prototype data mining

system which was developed in Simon
Fraser University.
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Prototype (Continue)

● DBMINER was developed by integrating

database, OLAP and data mining
technologies has following features:

1. Incorporating several data mining

techniques like attribute oriented
induction, statistical analysis,
progressive deepening for mining
multiple-level rules and meta-rule
guided knowledge mining data cube and
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Prototype (Continue)

2. Mining new kinds of rules from large

databases including multiple level
association rules, classification rules,
cluster description rules and prediction.

3. Automatic generation of numeric

hierarchies and refinement of concept

4. High level SQL-like and graphical data

mining interfaces. © Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Prototype (Continue)

5. Client server architecture and

performance improvements for larger

6. SQL-like data mining query language

DMQL and Graphical user interfaces
have been enhanced for interactive
knowledge mining.

7. Perform roll-up and drill-down at

multiple concept levels with multiple
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Algorithms

● AOI can be implemented with an

architecture design shown in figure,
where characteristic rule (LCHR) and
classification rule (LCLR) can be learned
directly from the transactional database
(OLTP) or Data warehouse (OLAP) with
the help of the concept hierarchy as the
knowledge generalization. Concept
hierarchy can be created from OLTP
AOI architecture
database as a direct resource. © Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Algorithms (Continue)

From a database we can identify two types

of learnings:

1. Positive learning as the target class

where the data are tuples in the
database which are consistent with the
learning concepts. Positive
learning/target class will be built when
learn characteristic rule
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Algorithms (Continue)

2. Negative learning as the contrasting

class in which the data do not belong to
the target class. negative
learning/contrasting class will be built
when learn discrimination or
classification rule.

© Edunet Foundation. All rights reserved.

Attribute Oriented Induction

AOI Characteristic Rule Algorithm

● This AOI characteristic rule algorithm is

the implementation of step one to seven
of the generalization strategy steps.

● The algorithm shows two sub processes

i.e. control number of distinct attributes
and control number of tuples.
AOI characteristic rule algorithm
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Advantages

● AOI provides additional flexibility over

many machine learning algorithms.

● AOI can learn knowledge rules in

different conjunctive and disjunctive
forms and provides more choices for the
experts and users.

© Edunet Foundation. All rights reserved.

Attribute Oriented Induction

AOI Advantages (Continue)

● AOI can use database facilities as the

traditional relational database such as
selection, join, projection whereas most
learning algorithms suffer from
inefficiency problems in a large
database environment.

● AOI can learn qualitative rules with

quantitative information while many
machine learning algorithm only can
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Advantages (Continue)

● AOI can handle noisy data and

exceptional cases elegantly by
incorporating statistical techniques in the
learning process whereas some learning
system can only work in a ‘noise free’

© Edunet Foundation. All rights reserved.

Attribute Oriented Induction

AOI Disadvantages

● AOI can only provides a snapshot of the

generalized knowledge and not a global
picture. Yet, the global picture can be
revealed by trying different thresholds

● Adjusting different thresholds will result

in different sets of generalized tuples.
However, using different thresholds
repeatedly is a time consuming and
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Disadvantages (Continue)

● There will be a problem in selecting the

best generalized rules between the large
and small threshold. Where in a large
threshold value will lead to a relatively
complex rule with many disjuncts and
the results may not be fully generalized.
On the other hand a small threshold
value will lead to a simple rule with few
disjuncts and the results may over
generalized the rule with a risk of losing
© Edunet Foundation. All rights reserved.
Iterative Database Scanning


● An iterative search starts just the same

as a non-iterative search, the query
sequence is compared to the database
and the score list, pairwise and multiple
alignment outputs are reported.

● The multiple alignment is then used to

create a query “profile” that contains
information about the types of amino
acid seen at each position in the
© Edunet Foundation. All rights reserved.
Iterative Database Scanning

Introduction (Continue)

● This profile is then searched against the

database, a score list, pairwise and
multiple alignments are output and the
process is then repeated.

● The iterations will stop either when the

number of iterations has been reached,
or if two successive iterations find
exactly the same sequences.

© Edunet Foundation. All rights reserved.

Iterative Database Scanning

Introduction (Continue)

● Iterative searching will normally be able

to find more remote similarities to the
query sequence than a single sequence

© Edunet Foundation. All rights reserved.

Iterative Database Scanning


● Iterative K-Means Algorithm to create

clusters of related data, through iterative
database scan and minimization of
group cluster system error, namely; root
mean square errors.

© Edunet Foundation. All rights reserved.

Iterative Database Scanning

Applications (Continue)

● Matching the protein sequences through

iterative scan of protein database
scanning and finding the best match as
per protein generics.
© Edunet Foundation. All rights reserved.
Iterative Database Scanning

Applications (Continue)

● Finding and predicting lungs cancer

through iterative scanning of database
of image samples of lungs scans.

© Edunet Foundation. All rights reserved.

Iterative Database Scanning


● Iterative information retrieval

● Mining more useful information through


● Matching patterns

● Iterative querying allows web based

database application to integrate results
© Edunet Foundation. All rights reserved.
Iterative Database Scanning


● Comparative Slow Process

● Resource Intensive

● Complex to design and implement

● Availability of more advanced


© Edunet Foundation. All rights reserved.

Attribute Focusing


● Attribute Focusing is a technique

designed for detecting interesting
attribute values, in the sense that the
values differ from an expected value.
Bhandari (1993), Bhandari and Biyani
(1994) proposed two methods for
detecting interesting attribute values.

© Edunet Foundation. All rights reserved.

Attribute Focusing


● The first method consists of finding

interesting values of a given attribute by
comparing the observed frequency of
that value with its expected frequency
assuming a uniform probability

● Since this is a one-dimensional method,

analyzing just one attribute at a time, it
involves no attribute interaction and so
© Edunet Foundation. All rights reserved.
Attribute Focusing


● Since the goal of data mining is to

discover knowledge that is not only
accurate but also comprehensible for
human decision makers, the field of
cognitive psychology is clearly relevant
for data mining.

● In the classical view, categories are

defined by a small set of attributes.
© Edunet Foundation. All rights reserved.
Attribute Focusing

Introduction (Continue)

● By contrast, in the natural view of

concepts, highly correlated
(non-independent) attributes are the
rule, not the exception.

● To summarize, in the natural view of

concepts, which is currently much more
accepted in psychology than the
classical view, attribute interaction is the
Large degree of attribute interaction makes a concept harder to learn
rule, and not the exception. © Edunet Foundation. All rights reserved.
Attribute Focusing

Introduction (Continue)

● It is also increasingly likely that data

pertaining to their professional activity is
available in a database.

● Clearly, a machine-assisted method

which allows them to learn more about
their domain from such data should be a
powerful knowledge discovery technique
since it could help a lot of people
improve at their jobs rapidly. © Edunet Foundation. All rights reserved.
Attribute Focusing

The Importance of Attribute Focusing

in Data Mining

● Evidence for this natural view of

concepts is provided, in the context of
data mining, by projects that did found a
significant degree of attribute interaction
in real-world data sets.

● An example is the large number of small

disjuncts found by Provost & Danyluk
(1993) in telecommunications data.
© Edunet Foundation. All rights reserved.
Attribute Focusing

The Importance of Attribute Focusing

in Data Mining (Continue)

● Another example is the several

instances of Simpson’s paradox
discovered in real-world data sets by
Fabris & Freitas (1999)

● Yet another example is the existence of

strong attribute interactions in a typical
financial data set, as discussed by Dhar
et al. (2000)
© Edunet Foundation. All rights reserved.
Attribute Focusing

The Influence of Attribute Interaction

on Concept Hardness

● There are, of course, many factors that

make a concept (class description)
difficult to be learned, including
unbalanced class distributions, noise,
missing relevant attributes, etc.

● However, in some cases even if all

relevant information for class separation
is included in the data - i.e. all relevant
attributes are present, there is little
© Edunet Foundation. All rights reserved.
Attribute Focusing

Interestingness Function

● An interestingness function I2 is used to

detect an interesting pair of attribute
values, where each of the values belong
to a different attribute of a given pair of

● The function I2 measures how much the

observed joint frequency of a pair of
attribute values deviates from the
expected frequency assuming that the
two attributes are statistically
© Edunet Foundation. All rights reserved.
Attribute Focusing

Interestingness Function (Continue)

● Hence, the essence of Attribute

Focusing (using the interestingness
function I2) is precisely to detect
attribute values whose interactions
produce unexpected observed joint
© Edunet Foundation. All rights reserved.
Attribute Focusing

Interestingness Function (Continue)

● Goil and Choudhary (1997) have

extended Attribute Focusing for
multidimensional databases (data
cubes). A contribution of this work was
to introduce a parallel algorithm to
compute the above-discussed
interestingness function I2.

● This research addressed the problem of

making Attribute Focusing more
computationally efficient, which is
© Edunet Foundation. All rights reserved.
Attribute Focusing

Interestingness Function (Continue)

● However, it did not adapt Attribute

Focusing to one of the major
characteristics of data cubes, namely
the fact that dimensions contain
hierarchical attributes.

● This characteristic of data cubes

introduces new opportunities and
requirements for adapting the
computation of the interestingness
function I2.
© Edunet Foundation. All rights reserved.
Attribute Focusing


● Attribute Focusing has been

successfully deployed to discover
hitherto unknown knowledge in a
real-life, commercial setting.

● It actually helps people do their jobs

better. That kind of practical success has
not been demonstrated even for
advanced knowledge discovery
techniques. © Edunet Foundation. All rights reserved.
Attribute Focusing

Advantages (Continue)

● There are three possible areas where

Attribute Focusing may enjoy an
advantage over other methods: superior
mathematical algorithms, ability to
process more data, the use of the

● Interactive systems will provide,

perhaps, the best opportunity for
discovery in tile near term. In such
systems, a knowledge analyst is
© Edunet Foundation. All rights reserved.
Attribute Focusing


● Attribute Focusing approach uses an

explicit model. Uses filtering functions
and model of interpretation.

● Attribute Focusing represent a means of

deriving immediate and significant
practical advantages by combining the
results of existing research on
knowledge discovery with models based
on human factors and cognitive science.
© Edunet Foundation. All rights reserved.
Attribute Focusing

Future Work

● Formation-theoretic, entropy-based
measures and statistical measures of
association/correlation may be used to
evolve new instances of interestingness

● Similarly, new instances of filtering

functions may be evolved by considering
human factors issues.
© Edunet Foundation. All rights reserved.
Introduction to neural


● The neural network is a technology

based on the structure of the neurons
inside a human brain.

© Edunet Foundation. All rights reserved.
Introduction to neural


● Neural network algorithm will try to

create a function to map your input to
your desired output.

Image Source:*Ne7jPeR6Vrl1f9d7pLLG8Q.jpeg
© Edunet Foundation. All rights reserved.
Introduction to neural


● Artificial Neural Networks, cell nucleus

represents Nodes, synapse represents
Weights, and Axon represents Output.

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
Biological Neural Network
Vs artificial neural network

Biological Neural Artificial Neural Network


Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
The architecture of an artificial
neural network

● Input Layer:
As the name suggests, it accepts inputs in
several different formats provided by the

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
The architecture of an artificial
neural network

● Hidden Layer:
The hidden layer presents in-between input
and output layers. It performs all the
calculations to find hidden features and

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
The architecture of an artificial
neural network

● Output Layer:
The input goes through a series of
transformations using the hidden layer,
which finally results in output that is
conveyed using this layer.

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
The architecture of an artificial
neural network

● The artificial neural network takes input

and computes the weighted sum of the
inputs and includes a bias. This
computation is represented in the form
of a transfer function.

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
Advantages of artificial neural

● Parallel processing capability

● Storing data on the entire network
● Capability to work with incomplete
● Having a memory distribution
● Having fault tolerance
Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
Disadvantages of artificial neural

● Assurance of proper network structure

● Unrecognized behavior of the network
● Hardware dependence
● Difficulty of showing the issue to the
● The duration of the network is unknown

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural

How artificial neural network work

● Artificial Neural Network can be best

represented as a weighted directed
graph, where the artificial neurons form
the nodes.
● The association between the neurons
outputs and neuron inputs can be
viewed as the directed edges with
Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural

How artificial neural network work

● The Artificial Neural Network receives

the input signal from the external source
in the form of a pattern and image in the
form of a vector.
● These inputs are then mathematically
assigned by the notations x(n) for every
n number of inputs.

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural

How artificial neural network work

● Afterward, each of the input is multiplied

by its corresponding weights ( these
weights are the details utilized by the
artificial neural networks to solve a
specific problem ).

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural

Types of Artificial Neural Network

● Feedback ANN:
In this type of ANN, the output returns into
the network to accomplish the best-evolved
results internally.
● Feed-Forward ANN:
A feed-forward network is a basic neural
network comprising of an input layer, an
output layer, and at least one layer of a
Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural

Model Types

● Neural networks use information in the

form of data to generate knowledge in
the form of models.
● A model can be defined as a description
of a real-world system or process using
mathematical concepts.
● It is usually represented as a mapping
between input and output variables.

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
neural network models belong to the
following types

● Approximation (or function

An approximation can be regarded as the
problem of fitting a function from data.

● Classification.
Classification can be stated as the process
whereby a received pattern, characterized
by a distinct set of features, is assigned to
one of a prescribed number of classes. Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
neural network models belong to the
following types

● Approximation (or function

An approximation can be regarded as the
problem of fitting a function from data.

● Classification.
Classification can be stated as the process
whereby a received pattern, characterized
by a distinct set of features, is assigned to
one of a prescribed number of classes. Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
Approximation (or function
regression) Examples

● Model the strength of high performance

● Predict the noise generated by airfoil
● Predict the residuary resistance of
sailing yachts.
● Predict the vascular adhesion of
Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
Classification (or pattern recognition)

● Predict the electricity generated by

combined cycle power plants.
● Forecast the power generated by a solar
● Model wine preferences from
physicochemical properties.

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural
Approximation (or function
regression) Examples

● We can distinguish between two types of

classification models:
● Binary classification Examples
1. Diagnose breast cancer from
fine-needle aspirate images.
2. Detect malfunctions liquid ultrasonic
3. Detect forged banknotes.
4. Reduce employee attrition.
5. Increase the conversion rate of
Image Source:
telemarketing campaigns in banks.
© Edunet Foundation. All rights reserved.
Introduction to neural
Approximation (or function
regression) Examples

● Multiple classification examples

● Classify iris flowers from sepal and petal
● Recognize human activity from
smartphone signals

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural

Classification neural networks

● A classification model usually requires a

scaling layer, one or several perceptron
layers, and a probabilistic layer. It might
also contain a principal component

Image Source:
© Edunet Foundation. All rights reserved.
Introduction to neural

Data Set

● The data set contains information for

creating our model. It is a collection of
data structured as a table, in rows and
● We can identify the next concepts in a
● Data source.
● Variables.
● Instances.
● Missing values.
Image Source:
● Data set tasks.
© Edunet Foundation. All rights reserved.
Introduction to neural

Data Set

● The data set contains information for

creating our model. It is a collection of
data structured as a table, in rows and
● We can identify the next concepts in a
● Data source.
● Variables.
● Instances.
● Missing values.
Image Source:
● Data set tasks.
© Edunet Foundation. All rights reserved.
Rough Set theory


● a rough set, first described by Polish

computer scientist Zdzisław I. Pawlak, is
a formal approximation of a crisp set
(i.e., conventional set) in terms of a pair
of sets which give the lower and the
upper approximation of the original set.

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory


● In rough sets theory, the data is

collected in a table, called a decision
● Rows of a decision table correspond to
objects, and columns correspond to

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory


● RST can be defined using lower and

upper approximations
● Lower approximation and positive
● is the union of all equivalence classes in
which are contained by (i.e., are subsets
of) the target set.

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Upper approximation and

negative region

● The upper approximation is the union of

all equivalence classes in which have
non-empty intersection with the target

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

The boundary region

● by set difference, consists of those

objects that can neither be ruled in nor
ruled out as members of the target set

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

The rough set

● The composed of the lower and upper

approximation is called a rough set.

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Objective analysis

● Rough set theory is one of many

methods that can be employed to
analyze uncertain (including vague)
systems, although less common than
more traditional methods of probability,
statistics, entropy and Dempster–Shafer

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Reduct and core

● (attribute-value table) which are more

important to the knowledge represented
in the equivalence class structure than
other attributes.
● Often, we wonder whether there is a
subset of attributes which can, by itself,
fully characterize the knowledge in the
database; such an attribute set is called
a reduct.
Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory


● The set of attributes which is common to

all reducts is called the core: the core is
the set of attributes which is possessed
by every reduct, and therefore consists
of attributes which cannot be removed
from the information system without
causing collapse of the
equivalence-class structure.
Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

1. The decision rules not only capture

patterns hidden in the data as they can
also be used to classify new unseen
2. Rules represent dependencies in the
dataset, and represent extracted
knowledge which can be used when
classifying new objects not in the
original information system.
Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

3. When the reducts were found, the job of

creating definite rules for the value of
the decision feature of the information
system was practically done.
4. To transform a reduct into a rule, one
only has to bind the condition feature
values of the object class from which
the reduct originated to the
corresponding features of the reduct.
Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

5. Then, to complete the rule, a decision

part comprising the resulting part of the
rule is added.
6. This is done in the same way as for the
condition features.
7. To classify objects, which has never
been seen before, rules generated from
a training set will be used. These rules
represent the actual classifier. This
classifier is used to predict to which
classes new objects are attached. Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

8. The nearest matching rule is determined

as the one whose condition part differs
from the feature vector of re-image by
the minimum number of features.
9. When there is more than one matching
rule, we use a voting mechanism to
choose the decision value. Every
matched rule contributes votes to its
decision value, which are equal to the
times number of objects matched by the
rule. Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

10. The votes are added and the decision

with the largest number of votes is
chosen as the correct class.
11. Quality measures associated with
decision rules can be used to eliminate
some of the decision rules.

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Rough Sets Data Analysis


● Preprocessing stage
● Includes tasks such as data cleaning,
completeness, correctness, attribute
creation, attribute selection and
● Processing includes the generation of
preliminary knowledge, such as
computation of object reducts from data,
derivation of rules from reducts, and
classification processes Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Preprocessing stage

● In order to successfully analyze data

with rough sets, a decision table must
be created.
● This is done with data preparation.
● The data preparation task includes data
conversion, data cleansing, data
completion checks, conditional attribute
creation, decision attribute generation,
discretization of attributes, and data
splitting into analysis and validation
Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Data completion and discretization of

continuous-valued attributes

● Discretization which uses data

transformation procedure that involves
finding, cuts in the data sets which
divide the data into intervals.
● Values lying within an interval are then
mapped to the same value.

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Data completion and discretization of

continuous-valued attributes

● Doing this process will lead to reduce

the size of the attributes value set and
ensures that the rules that are mined are
not too specific.

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Data completion and discretization of

continuous-valued attributes

● Doing this process will lead to reduce

the size of the attributes value set and
ensures that the rules that are mined are
not too specific.

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Data completion and discretization of

continuous-valued attributes

● Doing this process will lead to reduce

the size of the attributes value set and
ensures that the rules that are mined are
not too specific.

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Processing stage

● processing stage includes generating

preliminary knowledge, such as
computation of object reducts from data,
derivation of rules from reducts, and
classification processes.
● These stages lead towards the final goal
of generating rules from information or
decision system

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Rule generation and classification

● The generated reducts are used to

generate decision rules.
● The decision rule, at its left side, is a
combination of values of attributes such
that the set of (almost) all objects
matching this combination have the
decision value given at the rule’s rough

Image Source:
© Edunet Foundation. All rights reserved.
Rough Set theory

Rule generation and classification

● The rule derived from reducts can be

used to classify the data.
● The set of rules is referred to as a
classifier and can be used to classify
new and unseen data.

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization


● Data Visualization is used to

communicate information clearly and
efficiently to users by the usage of
information graphics such as tables and
● It helps users in analyzing a large
amount of data in a simpler way. It
makes complex data more accessible,
understandable, and usable.
Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

What makes Data Visualization


● Effective data visualization are created

by communication, data science, and
design collide.

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

Importance of Data Visualization

● Data visualization can identify areas that

need improvement or modifications.
● Data visualization can clarify which
factor influence customer behavior.
● Data visualization helps you to
understand which products to place
● Data visualization can predict sales
Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

Importance of Data Visualization

● Data visualization can identify areas that

need improvement or modifications.
● Data visualization can clarify which
factor influence customer behavior.
● Data visualization helps you to
understand which products to place
● Data visualization can predict sales
Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

Why Use Data Visualization?

● To make easier in understand and

● To discover unknown facts, outliers, and
● To visualize relationships and patterns
● To ask a better question and make
better decisions.
● To competitive analyze.
● To improve insights. Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

Data Visualization Tools

● IBM Cognos
● Tableau
● Infogram
● Chartblocks
● Datawrapper
● Plotly
● and etc.

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

Data Visualization Steps/Process

1. Develop your research question

2. Get or create your data
3. Clean your data
4. Choose a chart type
5. Choose your tool
6. Prepare data
7. Create report graph

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

1. Develop your research question

1. It is important to have a clear

understanding of the goal of your
2. This will determine what sort of data is
needed, the type of analysis necessary,
and the types of visualizations that
would be most effective to communicate
your explorations or findings.
Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

2. Get or create your data

● access to a large collection of numerical,

statistical and geospatial data. There is
also a great wealth of open data freely
available for download on the web.

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

2. Get or create your data

● advice and technical assistance with the

design, creation, and dissemination of
surveys using the Qualtrics web
survey platform to assist you in
collecting your own data.

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

3. Clean your data

● Removing unnecessary variables

● Deleting duplicate rows/observations
● Addressing outliers or invalid data
● Dealing with missing values
● Standardizing or categorizing values
● Correcting typographical errors

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

4. Choose a chart type

● Showing how variables compare to each

● Showing relationships between
● Showing patterns in the data?
● Showing how the whole dataset can be
broken down into smaller parts?

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

5. Choose your tool

● Tableau
● Excel
● Google Sheet
● Python
● R
● Gephi

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

6. Prepare data

● Typical data preparation tasks include:

● Formatting columns appropriately
(numbers are treated as numbers, dates
as dates)
● Convert values into appropriate units
● Filter your data to focus on the specific
data that interests you.

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

6. Prepare data

● Group data and create aggregate values

for groups (Counts, Min, Max, Mean,
Median, Mode)
● Extract values from complex columns
● Combine variables to create new

Image Source:
© Edunet Foundation. All rights reserved.
Data Visualization

7. Create report graph

1. Import data into the software

2. Select the chart type you wish to create
3. Evaluate the effectiveness of the chart.
4. Refine by applying design principles.
The way in which you design your chart
can have a big impact on the
effectiveness of the chart. Consider
these design principles.
Image Source:
© Edunet Foundation. All rights reserved.
Odds Ratio


● An odds ratio (OR) is a statistic that

quantifies the strength of the association
between two events, A and B.
● The odds ratio compares two
probabilities (or proportions) P1 and P2

Image Source:
© Edunet Foundation. All rights reserved.
Odds Ratio


● The odds ratio is defined as the ratio of

the odds of A in the presence of B and
the odds of A in the absence of B,
● or equivalently (due to symmetry), the
ratio of the odds of B in the presence of
A and the odds of B in the absence of A.

© Edunet Foundation. All rights reserved.
Odds Ratio


● Two events are independent if and only

if the OR equals 1, i.e., the odds of one
event are the same in either the
presence or absence of the other event.

© Edunet Foundation. All rights reserved.
Odds Ratio


● If the OR is greater than 1, then A and B

are associated (correlated) in the sense
that, compared to the absence of B, the
presence of B raises the odds of A, and
symmetrically the presence of A raises
the odds of B.

© Edunet Foundation. All rights reserved.
Odds Ratio


● Conversely, if the OR is less than 1, then

A and B are negatively correlated, and
the presence of one event reduces the
odds of the other event.
● Note that the odds ratio is symmetric in
the two events, and there is no causal
direction implied

© Edunet Foundation. All rights reserved.
Odds Ratio
Example - to find out the odds of
customer default with low income
versus high income.

● The function is defined by the following

● Where Px is the probability of default
with low income and (1-Px) is the
probability of non-default with low

© Edunet Foundation. All rights reserved.
Odds Ratio
Example - to find out the odds of
customer default with low income
versus high income.

● While Py is the probability of default with

high income and (1-Py) of non-default
with high income.
● Thus, the above odds ratio will give the
odds of a customer defaulting with low
income over a customer defaulting with
high income.

Image Source:
© Edunet Foundation. All rights reserved.
Odds Ratio
Example - to find out the odds of
customer default with low income
versus high income.

● While Py is the probability of default with

high income and (1-Py) of non-default
with high income.
● Thus, the above odds ratio will give the
odds of a customer defaulting with low
income over a customer defaulting with
high income.

Image Source:
© Edunet Foundation. All rights reserved.

You might also like