Download as pdf or txt
Download as pdf or txt
You are on page 1of 434

Module 5 - Business Data

Analytics

Disclaimer: The content is curated for educational purposes only.


© Edunet Foundation. All rights reserved.
After going through this module, students will be able to

● Understand business analytics and develop business intelligence.


● Analyze data using statistical and data mining techniques for business
intelligence.
● Understand case studies for predictive models.
● Develop case studies for predictive analytical models.

© Edunet Foundation. All rights reserved.


Understand business
analytics and develop
business intelligence.

Disclaimer: The content is curated for educational purposes only.


© Edunet Foundation. All rights reserved.
In this section, we will discuss:

● Introduction to business analytics and Concepts of business analytics


● Trends in business analytics
● Descriptive analytics
● Introduction to statistics
● Types of data
● Measure of Central Tendency
● Arithmetic mean
● Geometric Mean
● Harmonic Mean

© Edunet Foundation. All rights reserved.


In this section, we will discuss:

● Median in Raw and Grouped Data


● Mode in Raw and Grouped Data
● Standard Deviation
● Variance
● Properties of Variance and standard deviation
● Usage of variance in business analytics
● OLAP Concept
● OLTP Concept

© Edunet Foundation. All rights reserved.


Introduction to business
analytics and Concepts of
business analytics
What is Business Analytics?

● Business analytics (BA) is the iterative,


methodical exploration of an
organization's data, with an emphasis on
statistical analysis.
● Business analytics is used by
companies that are committed to making
data-driven decisions.

Image Source: https://www.businessanalytics.com/


© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
What is Business Analytics?(Contd)

● Business Analytics is "the study of data


through statistical and operations
analysis, the formation of predictive
models, application of optimization
techniques, and the communication of
these results to customers, business
partners, and college executives."

Image Source:
https://www.proschoolonline.com/certification-business-analytics-course/what-is-b
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
What is Business Analytics?(Contd)

● It adopts quantitative methods and


evidence is required for data to build
certain models for businesses and make
profitable decisions. Thus, Business
Analytics majorly depends on and uses
Big Data( large volume of data) .

Image Source:
https://www.altudo.co/resources/blogs/business-analytics-vs-marketing-analytics-
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Understanding Business Analytics

● Business Analytics is the procedure


through which information is dissected
after studying past performances and
issues, to devise a successful plan for
the future.
● Big Data or large amounts of data is
used to derive solutions.

Image Source:
https://www.indiaeducation.net/management/streams/business-analytics.html
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Understanding Business
Analytics(Contd)

● This method of going about a business


or this outlook towards building and
sustaining a business is vital to the
economy and industries that thrive in the
economy.

Image Source: https://www.martinsights.com/?p=1049


© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Components of Business Analytics

● Define Objective
● Data Aggregation
● Data Cleaning
● Analytical Methodology
● Evaluation and Validation
● Reporting and Data Visualisation

Image Source: https://www.analytixlabs.co.in/blog/what-is-business-analytics/


© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Types of Business Analytics Methods

● Descriptive Analytics
● Diagnostic Analytics
● Predictive Analytics
● Prescriptive Analytics

Image Source:
https://www.proschoolonline.com/certification-business-analytics-course/what-is-b
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Uses and Benefits of Business
Analytics

● To carry out data mining and exploring


new data to find new patterns and
relationships.
● To carry out statistical and quantitative
analysis to provide explanations for
certain occurrences.

Image Source: https://www.analytixlabs.co.in/blog/business-analytics-career/


© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Uses and Benefits of Business
Analytics

● Test previous decisions are taken with


the help of A/B testing and multivariate
testing.
● Deploy predictive modeling to predict
future outcomes.

Image Source:
https://www.datapine.com/blog/benefits-of-business-intelligence-and-business-an
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Business Analytics Tools

● SQL
● Tableau/ QlikView/ Power BI
● Birt
● Python
● R
● MS Excel
● Sisense
● Clear Analytics
● Pentaho BI
● MicroStrategy Image Source: https://sigma4sap.com/?page_id=466
© Edunet Foundation. All rights reserved.
Introduction to business
analytics and Concepts of
business analytics
Applications of Business Analytics

● Marketing
● Finance
● Human Resources
● Manufacturing

Image Source:
https://www.proschoolonline.com/certification-business-analytics-course/what-is-b
© Edunet Foundation. All rights reserved.
Trends in Business
Analytics

Business Analytics Trends For 2020

● Data Quality Management


● Data Discovery/Visualization
● Artificial Intelligence
● Predictive and Prescriptive Analytics
Tools
● Collaborative Business Intelligence
● Data-driven Culture

Image Source: https://www.datapine.com/blog/business-intelligence-trends/


© Edunet Foundation. All rights reserved.
Trends in Business
Analytics
Business Analytics Trends For
2020(Contd)

● Augmented Analytics
● Mobile BI
● Data Automation
● Embedded Analytics
● Natural language processing

Image Source: https://codeit.us/blog/top-data-and-analytics-trends


© Edunet Foundation. All rights reserved.
Descriptive analytics

What is Descriptive Analytics?

● Descriptive analytics is a statistical


method that is used to search and
summarize historical data in order to
identify patterns or meaning.
● Descriptive analytics are based on
standard aggregate functions in
databases

Image Source:
https://www.dezyre.com/article/types-of-analytics-descriptive-predictive-prescriptiv
© Edunet Foundation. All rights reserved.
Descriptive analytics

What is Descriptive Analytics?

(Contd)

● For example, in an online learning


course with a discussion board,
descriptive analytics could determine
how many students participated in the
discussion, or how many times a
particular student posted in the
discussion forum.
Image Source: https://www.valamis.com/hub/descriptive-analytics
© Edunet Foundation. All rights reserved.
Descriptive analytics

How does descriptive analytics work?

● Data aggregation and data mining are


two techniques used in descriptive
analytics to discover historical data.
● Data is first gathered and sorted by data
aggregation in order to make the
datasets more manageable by analysts.

Image Source: https://www.dataversity.net/fundamentals-descriptive-analytics/


© Edunet Foundation. All rights reserved.
Descriptive analytics

How does descriptive analytics work?

(Contd)
● Data mining describes the next step of
the analysis and involves a search of the
data to identify patterns and meaning.
● Identified patterns are analyzed to
discover the specific ways that learners
interacted with the learning content and
within the learning environment.

Image Source: hhttps://www.sisense.com/glossary/descriptive-analytics/


© Edunet Foundation. All rights reserved.
Descriptive analytics

Examples of descriptive analytics

● Tracking course enrollments, course


compliance rates,
● Recording which learning resources are
accessed and how often
● Summarizing the number of times a
learner posts in a discussion board
● Tracking assignment and assessment
grades
Image Source:
https://www.vertical-leap.uk/blog/data-science-for-marketers-part-2-descriptive-v-
© Edunet Foundation. All rights reserved.
Descriptive analytics

Examples of descriptive
analytics(Contd)

● Comparing pre-test and post-test


assessments
● Analyzing course completion rates by
learner or by course
● Collating course survey results
● Identifying length of time that learners
took to complete a course

Image Source:
https://www.vectorstock.com/royalty-free-vector/data-analytics-icons-flat-pack-vec
© Edunet Foundation. All rights reserved.
Descriptive analytics

Advantages of descriptive analytics

● Quickly and easily report on the Return


on Investment (ROI) by showing how
performance achieved business or
target goals.
● Identify gaps and performance issues
early - before they become problems.
.

Image Source:
https://forums.bsdinsight.com/threads/descriptive-predictive-and-prescriptive-anal
© Edunet Foundation. All rights reserved.
Descriptive analytics

Advantages of descriptive
analytics(Contd)

● Identify specific learners who require


additional support, regardless of how
many students or employees there are
● Identify successful learners in order to
offer positive feedback or additional
resources.
● Analyze the value and impact of course
design and learning resources.
. Image Source:
https://econsultancy.com/analytics-approaches-every-marketer-should-know-1-de
© Edunet Foundation. All rights reserved.
Introduction to statistics

Introduction to Statistics

● It is a branch of mathematics that


deals with the organization,presentation,
collection,analyzation and interpretation
of numerical data.

Image Source: https://www.youtube.com/watch?v=7rKQBKQOIQw


© Edunet Foundation. All rights reserved.
Introduction to statistics

Types of Statistics

● Descriptive statistics
● Inferential statistics

Image Source: https://slideplayer.com/slide/6642532/


© Edunet Foundation. All rights reserved.
Types of Statistics

Descriptive statistics

● It is used to describe the basic features


of data in a study.
● Descriptive statistics deals with the
processing of data without attempting to
draw any inferences from it.
● The data are presented in the form of
tables and graphs.

Image Source: https://data-flair.training/blogs/stat-descriptive-statistics/


. © Edunet Foundation. All rights reserved.
Types of Statistics

Descriptive statistics

● The characteristics of the data are


described in simple terms.
● Events that are dealt with include
everyday happenings such as accidents,
prices of goods, business, incomes,
epidemics, sports data, population data.

Image Source: : https://data-flair.training/blogs/stat-descriptive-statistics/


. © Edunet Foundation. All rights reserved.
Types of Statistics

Inferential statistics

● Inferential statistics is a scientific


discipline that uses mathematical tools
to make forecasts and projections by
analyzing the given data.
● This is of use to people employed in
such fields as engineering, economics,
biology, the social sciences, business,
agriculture and communications.
. Image Source:
https://mahritaharahap.wordpress.com/teaching-areas/inferential-statistics/
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

● Qualitative data, also known as the


categorical data.
● It describes the data that fits into the
categories.
● Qualitative data are not numerical.
● The categorical information involves
categorical variables that describe the
features such as a person’s gender,
hometown etc. Image Source:
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

● Categorical measures are defined in


terms of natural language specifications,
but not in terms of numbers.
● Sometimes categorical data can hold
numerical values (quantitative value)
● But those values do not have
mathematical sense
Image Source:
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

● Here, the birthdate and school postcode


hold the quantitative value
● But it does not give numerical meaning.

Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

Nominal Data:
● Nominal data is one of the types of
qualitative information which helps to
label the variables without providing the
numerical value.
● Nominal data is also called the nominal
scale. It cannot be ordered and
measured.
Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

● But sometimes, the data can be


qualitative and quantitative
● Examples of nominal data are letters,
symbols, words, gender etc.
● The nominal data are examined using
the grouping method.

Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

● In this method, the data are grouped into


categories, and then the frequency or
the percentage of the data can be
calculated.
● These data are visually represented
using the pie charts.

Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

Ordinal Data:
● Ordinal data/variable is a type of data
which follows a natural order.
● The significant feature of the nominal
data is that the difference between the
data values are not determined.
● This variable is mostly found in surveys,
finance, economics, questionnaires, and
so on. Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

● The ordinal data is commonly


represented using a bar chart.
● These data are investigated and
interpreted through many visualisation
tools.
● The information may be expressed
using tables in which each row in the
table shows the distinct category.
Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

Binary Data:
● Binary data has only 2 values/states.
● For Example yes or no, affected or
unaffected, true or false.
i) Symmetric : Both values are equally
important (Gender).
ii) Asymmetric : Both values are not equally
important (Result).
Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

Advantages:
● Better understanding- Qualitative data
gives a better understanding of the
perspectives and needs of participants.
● Provides Explaination- Qualitative data
along with quantitative data can explain
the result of the survey and can
measure the correction of the
quantitative data.
Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..

● Better Identification- of behavior


patterns - Qualitative data can provide
detailed information which can prove
itself useful in identification of behavioral
patterns.

Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Qualitative or Categorical Data

Continued……..
Disadvantages:
● Lesser reachability- Being subjective in
nature, small population is generally
covered to represent the large
population.
● Time Consuming- Qualitative data is
time consuming as large data is to be
understood.
● Possibility of Bias- Being subjective
analysis; evaluator bias is quite feasible. Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data

Continued……..

● Quantitative data is also known as


numerical data which represents the
numerical value (i.e., how much, how
often, how many).
● Numerical data gives information about
the quantities of a specific thing.
● Some of the examples of numerical data
are height, length, size, weight, and so
on.
Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data

Continued……..

● The quantitative data can be classified


into two different types based on the
data sets.
● The two different classifications of
numerical data are discrete data and
continuous data.

Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data

Continued……..

Discrete Data:
● Discrete data can take only discrete
values.
● Discrete information contains only a
finite number of possible values.

Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data

Continued……..

● Those values cannot be subdivided


meaningfully.
● Here, things can be counted in the
whole numbers.
● Example: Number of students in the
class

Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data

Continued……..

Continuous Data:

● Continuous data is data that can be


calculated.
● It has an infinite number of probable
values that can be selected within a
given specific range.
● Example: Temperature range
Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data

Continued……..

Advantages:

● Specific- Quantitative data is clear and


specific to the survey conducted.
● High Reliability- If collected properly,
quantitative data is normally accurate
and hence highly reliable.

Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data

Continued……..

● Easy communication- Quantitative


data is easy to communicate and
elaborate using charts, graphs etc.
● Existing support- Many large datasets
may be already present that can be
analyzed to check the relevance of the
survey.
Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Types of data

Quantitative or Numerical Data

Continued……..

Disadvantages:

● Limited Options- Respondents are


required to choose from limited options.
● High Complexity- Qualitative data may
need complex procedures to get correct
sample.
● Require Expertise- Analysis of
qualitative data requires certain
expertise in statistical analysis. Image Source :
https://www.geeksforgeeks.org/understanding-data-attribute-types-qualitative-and
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency

Definition

● A measure of central tendency is a


summary statistic that represents the
center point or typical value of a dataset.
● These measures indicate where most
values in a distribution fall and are also
referred to as the central location of a
distribution.
Image Source :
http://www.brainkart.com/article/Various-measures-of-central-tendency_35079/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Definition
Continued…...

● We can think of it as the tendency of


data to cluster around a middle value.
● In statistics the three most common
measures of central tendency are the
mean, median and mode.

Image Source :
http://www.brainkart.com/article/Various-measures-of-central-tendency_35079/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Definition
Continued…...

● Each of these measures calculates the


location of the central point using a
different method.
● Choosing the best measure of central
tendency depends on the type of data
we have.

Image Source :
http://www.brainkart.com/article/Various-measures-of-central-tendency_35079/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Mean
Continued….

● The mean is the arithmetic average, and


it is probably the measure of central
tendency that you are most familiar.
● Calculating the mean is very simple.

Image Source:
statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Mean
Continued….

● We just add up all of the values and


divide by the number of observations in
your dataset.
x1+x2+x3+.....+xn
_______________
n

Image Source :
statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Mean
Continued….

● The calculation of the mean


incorporates all values in the data.
● If you change any value, the mean
changes.
● However, the mean doesn’t always
locate the center of the data accurately.

Image Source :
statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Mean
Continued….

● In a symmetric distribution, the mean


locates the center accurately.

Image Source :
statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Mean
Continued….

● However, in a skewed distribution, the


mean can miss the mark.
● This problem occurs because outliers
have a substantial impact on the mean.
● Extreme values in an extended tail pull
the mean away from the center.
● As the distribution becomes more
skewed, the mean is drawn further away
from the center.
Image Source :
statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency

Median

● The median is the middle value.


● It is the value that splits the dataset in
half.
● To find the median, order your data from
smallest to largest, and then find the
data point that has an equal amount of
values above it and below it.

Image Source :
http://www.brainkart.com/article/Various-measures-of-central-tendency_35079/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Median
Continued….

● The method for locating the median


varies slightly depending on whether
your dataset has an even or odd number
of values.

Image Source :
http://www.brainkart.com/article/Various-measures-of-central-tendency_35079/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Median
Continued….

● In the dataset with the odd number of


observations, notice how the number 12
has six values above it and six below it.
● Therefore, 12 is the median of this
dataset.

Image Source :
statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Median
Continued….

● When there is an even number of


values, you count in to the two
innermost values and then take the
average.
● The average of 27 and 29 is 28.
Consequently, 28 is the median of this
dataset.

Image Source :
statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency

Mode

● The mode is the value that occurs the


most frequently in your data set.
● On a bar chart, the mode is the highest
bar.
● If the data have multiple values that are
tied for occurring the most frequently,
you have a multimodal distribution.
● If no value repeats, the data do not have
a mode.
Image Source :
http://www.brainkart.com/article/Various-measures-of-central-tendency_35079/
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Mode
Continued….

● In the dataset, the value 5 occurs most


frequently, which makes it the mode.
● These data might represent a 5-point
Likert scale.

Image Source :
https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Mode
Continued….

● Typically, you use the mode with


categorical, ordinal, and discrete data.
● In fact, the mode is the only measure of
central tendency that you can use with
categorical data—such as the most
preferred flavor of ice cream.
● However, with categorical data, there
isn’t a central value because you can’t
order the groups.
Image Source :
https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode
© Edunet Foundation. All rights reserved.
Measure of Central
Tendency
Mode
Continued….

● With ordinal and discrete data, the mode


can be a value that is not in the center.
● Again, the mode represents the most
common value.

Image Source :
https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode
© Edunet Foundation. All rights reserved.
Arithmetic mean

Definition

● Arithmetic Mean in the most common


and easily understood measure of
central tendency.
● We can define mean as the value
obtained by dividing the sum of
measurements with the number of
measurements contained in the data
set and is denoted by the symbol
x
¯
x¯ Image Source :
https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode
© Edunet Foundation. All rights reserved.
Arithmetic mean

Arithmetic Mean for three types of


series

● Individual Data Series


● Discrete Data Series
● Continuous Data Series

Image Source :
https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series

● When data is given on individual basis.


Following is an example of individual
series:
Items:
5 10 20 30 40 50 60 70

Image Source :
https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


Continued….

● For individual series, the Arithmetic Mean can


be calculated using the following formula.

Formula:

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


Continued….

● Alternatively, we can write same formula


as follows:

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


Continued….

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


Continued….

Example:
Problem Statement:
● Calculate Arithmetic Mean for the
following individual data:
Items:
14 36 45 70 105

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Individual Data Series


Continued….

Solution:
● Based on the above mentioned
formula, Arithmetic Mean x¯ will be:

● The Arithmetic Mean of the given Image Source :


numbers is 54. https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series

● When data is given alongwith their


frequencies. Following is an example of
discrete series:
Items : 5 10 20 30 40 50 60 70
Frequency: 2 5 1 3 12 0 5 7

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Continued….

● For discrete series, the Arithmetic Mean


can be calculated using the following
formula.

Formula

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Continued….

● Alternatively, we can write same formula


as follows:

Formula:

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Continued….

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Continued….

Example:
Problem Statement:
● Calculate Arithmetic Mean for the
following discrete data:
Items: 14 36 45 70
Frequency: 2 5 1 3

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Continued….

Solution:
Based on the given data, we have:

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Discrete Data Series


Continued….

● Based on the above mentioned


formula, Arithmetic Mean x¯ will be:

● The Arithmetic Mean of the given


numbers is 42.09. Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series

● When data is given based on ranges


along with their frequencies. Following
is an example of continuous series:
Items: 0-5 5-10 10-20 20-30 30-40
Frequency: 2 5 1 3 12

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


Continued….

● In case of continuous series, a mid


point is computed as
(lower−limit+upper−limit)/2 and
Arithmetic Mean is computed using
following formula.

Formula:

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


Continued….

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


Continued….

Example:
Problem Statement:
Let's calculate Arithmetic Mean for the
following continuous data:
Items: 0-10 10-20 20-30 30-40
Frequency: 2 5 1 3

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


Continued….

Solution:
Based on the given data, we have:

Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Arithmetic mean

Continuous Data Series


Continued….

● Based on the above mentioned


formula, Arithmetic Mean x¯ will be:

The Arithmetic Mean of the given numbers is


19.54. Image Source :
https://www.tutorialspoint.com/statistics/discrete_series_arithmetic_mean.htm
© Edunet Foundation. All rights reserved.
Geometric mean

Geometric mean

● Geometric mean of n numbers is


defined as the nth root of the product
of n numbers.

Formula:

Image Source : https://www.tutorialspoint.com/statistics/geometric_mean.htm


© Edunet Foundation. All rights reserved.
Geometric mean

Geometric mean
Continued….

Image Source : https://www.tutorialspoint.com/statistics/geometric_mean.htm


© Edunet Foundation. All rights reserved.
Geometric mean

Geometric mean
Continued….

Example:
Problem Statement:
● Determine the geometric mean of
following set of numbers.
1 3 9 27 81

Image Source : https://www.tutorialspoint.com/statistics/geometric_mean.htm


© Edunet Foundation. All rights reserved.
Geometric mean

Geometric mean
Continued….

Solution:
Here n = 5

Image Source : https://www.tutorialspoint.com/statistics/geometric_mean.htm


© Edunet Foundation. All rights reserved.
Harmonic Mean

● What is mean
Harmonic Harmonic Mean?
is a type of average that
is calculated by dividing the number of
values in a data series by the sum of the
reciprocals (1/x_i) of each value in the
data series.
● A harmonic mean is one of the three
Pythagorean means (the other two are
arithmetic mean and geometric mean).
The harmonic mean always shows the
lowest value among the Pythagorean
means.

Image Source:
https://www.google.com/url?sa=i&source=imgres&cd=&cad=rja&uact=8&ved=2ah
© Edunet Foundation. All rights reserved.
Harmonic Mean
● The general formula for calculating a
harmonic mean is:
● Formula
Harmonicfor Harmonic
mean Mean
= n / (∑1/x_i)
● Where:
● n – the number of the values in a
dataset
● x_i – the point in a dataset
● The weighted harmonic mean can be
calculated using the following formula:
● Weighted Harmonic Mean = (∑w_i ) /
(∑w_i/x_i)
● Where:
● w_i – the weight of the data point
● x_i – the point in a dataset
Image Source:
https://www.google.com/url?sa=i&source=imgres&cd=&cad=rja&uact=8&ved=2ah
© Edunet Foundation. All rights reserved.
Harmonic Mean

● You are a stock


Example analyst in an
of Harmonic investment
Mean
bank.
● Your manager asked you to determine
the P/E ratio of the index of the stocks of
Company A and Company B.
● Company A reports a market
capitalization of $1 billion and earnings
of $20 million, while Company B reports
a market capitalization of $20 billion and
earnings of $5 billion.
● The index consists of 40% of Company
A and 60% of Company B.
Image Source:
https://www.google.com/url?sa=i&source=imgres&cd=&cad=rja&uact=8&ved=2ah
© Edunet Foundation. All rights reserved.
Harmonic Mean

Example of Harmonic Mean

● Firstly, we need to find the P/E ratios of


each company. Remember that the P/E
ratio is essentially the market
capitalization divided by the earnings.
● P/E (Company A) = ($1 billion) / ($20
million) = 50
● P/E (Company B) = ($20 billion) / ($5
billion) = 4

Image Source:
https://www.google.com/url?sa=i&source=imgres&cd=&cad=rja&uact=8&ved=2ah
© Edunet Foundation. All rights reserved.
Harmonic Mean

● Example
We must of Harmonic
use Mean
the weighted harmonic
mean to calculate the P/E ratio of the
index. Using the formula for the
weighted harmonic mean, the P/E ratio
of the index can be found in the
following way:
● P/E (Index) = (0.4+0.6) / (0.4/50 + 0.6/4)
= 6.33
● Note that if we calculate the P/E ratio of
the index using the weighted arithmetic
mean, it would be significantly
overstated:
● P/E (Index) = 0.4×50 + 0.6×4 = 22.4 Image Source:
https://www.google.com/url?sa=i&source=imgres&cd=&cad=rja&uact=8&ved=2ah
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data

Median in Raw Data

● The median of raw data is the number


which divides the observations when
arranged in an order (ascending or
descending) in two equal parts.

Image Source: https://www.math-only-math.com/images/median-of-raw-data.png


© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Method of finding median

● Take the following steps to find the


median of raw data.
● Step I: Arrange the raw data in
ascending or descending order.
● Step II: Observe the number of variates
in the data. Let the number of variates in
the data be n. Then find the median as
following.
● (i) If n is odd then [Math Processing
Error]th variate is the median
Image Source: https://www.math-only-math.com/images/median-of-raw-data.png
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Method of finding median

● (ii) If n is even then the mean of [Math


Processing Error]th and ([Math
Processing Error] + 1)th variates is the
median, i.e.,
● median = [Math Processing Error].

Image Source: https://www.math-only-math.com/images/median-of-raw-data.png


© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Solved Examples on Median of Raw
Data
● Find the median of the ungrouped data.
● 15, 18, 10, 6, 14
● Solution:
● Arranging variates in ascending order,
we get
● 6, 10, 14, 15, 18.
● The number of variates = 5, which is
odd.
● Therefore, median = [Math Processing
Error]th variate
● = 3rd variate
Image Source: https://www.math-only-math.com/images/median-of-raw-data.png
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Finding Median for Grouped Data

● Median is the value which occupies the


middle position when all the
observations are arranged in an
ascending or descending order. It is a
positional average.
● (i) Construct the cumulative frequency
distribution.
● (ii) Find (N/2)th term
● (iii) The class that contains the
cumulative frequency N/2 is called the
median class.
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Finding Median for Grouped Data

● (iv) Find the median by using the


formula:

● Where l = Lower limit of the median


class,
● f = Frequency of the median class
● c = Width of the median class,
● N = The total frequency (∑f)
● m = cumulative frequency of the class
Image Source:
preceeding the median class https://www.onlinemath4all.com/images/formulaformedianiofgroupeddata.png
© Edunet Foundation. All rights reserved.
Median in Raw and
Grouped Data
Solved Examples on Median of
Grouped Data
● A researcher studying the behavior of
mice has recorded the time (in seconds)
taken by each mouse to locate its food
by considering 13 different mice as 31,
33, 63, 33, 28, 29, 33, 27, 27, 34, 35,
28, 32. Find the median time that mice
spent in searching its food.
● 31, 33, 63, 33, 28, 29, 33, 27, 27, 34,
35, 28, 32
● Ascending order of given data is
● 27, 27, 28, 28, 29, 31, 32, 33, 33, 33,
34, 35, 63
● Middle value is 7th observation © Edunet Foundation. All rights reserved.
Mode in Raw and Grouped
Data
Finding the Mode in Raw Data

● To find the mode, or modal value, it is


best to put the numbers in order. Then
count how many of each number. A
number that appears most often is the
mode.
● 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14,
12, 56, 23, 29
● In order these numbers are:

© Edunet Foundation. All rights reserved.


Mode in Raw and Grouped
Data
Finding the Mode in Raw Data

● 3, 5, 7, 12, 13, 14, 20, 23, 23, 23, 23,


29, 39, 40, 56
● This makes it easy to see which
numbers appear most often.
● This makes it easy to see which
numbers appear most often.
● In this case the mode is 23.

© Edunet Foundation. All rights reserved.


Mode in Raw and Grouped
Data

Finding the Mode in Grouped Data

● In some cases (such as when all values


appear the same number of times) the
mode is not useful. But we can group
the values to see if one group has more
than the others.
● Example: {4, 7, 11, 16, 20, 22, 25, 26,
33}
● Each value occurs once, so let us try to
group them.
© Edunet Foundation. All rights reserved.
Mode in Raw and Grouped
Data

Finding the Mode in Grouped Data

● We can try groups of 10:


● 0-9: 2 values (4 and 7)
● 10-19: 2 values (11 and 16)
● 20-29: 4 values (20, 22, 25 and 26)
● 30-39: 1 value (33)
● In groups of 10, the "20s" appear most
often, so we could choose 25 (the
middle of the 20s group) as the mode.
© Edunet Foundation. All rights reserved.
Standard Deviation

Standard Deviation Formulas

● The Standard Deviation is a measure of


how spread out numbers are.
● You might like to read this simpler page
on Standard Deviation first.
● But here we explain the formulas.
● The symbol for Standard Deviation is σ
(the Greek letter sigma).

© Edunet Foundation. All rights reserved.


Standard Deviation

Standard Deviation Formulas

● This is the formula for Standard


Deviation:

Image Source:
https://www.mathsisfun.com/data/images/standard-deviation-formula.gif
© Edunet Foundation. All rights reserved.
Standard Deviation

Steps for Standard Deviation

● Say we have a bunch of numbers like 9,


2, 5, 4, 12, 7, 8, 11.
● To calculate the standard deviation of
those numbers:
● 1. Work out the Mean (the simple
average of the numbers)
● 2. Then for each number: subtract the
Mean and square the result
● 3. Then work out the mean of those
squared differences.
● 4. Take the square root of that and we
are done! © Edunet Foundation. All rights reserved.
Variance

● Variance is What is Variance?


the expected value of the squared
deviation of a random variable from its mean.
● In short, it is the measurement of the distance of a
set of random numbers from their collective
average value.
● Variance is used in statistics as a way of better
understanding a data set's distribution.

Image Source:
https://365datascience.com/wp-content/uploads/2018/09/image7.jpg
© Edunet Foundation. All rights reserved.
Variance

How does Variance work?


● Variance is calculated by finding the square of the
standard deviation of a variable, and the
covariance of the variable with itself.
● In the formula above, u represents the mean of
the data points, x is the value of an individual data
point, and N is the total number of data points.

Image Source: https://images.deepai.org/glossary-terms/variance-6302132.jpg


© Edunet Foundation. All rights reserved.
Variance

How to Calculate Variance?


● Steps to Calculate Variance:
1. List elements of data set.The following are ages of
students pursuing a Master’s degree:
Data set 1: 28,25,26,27,31,32,24
2. Calculate the mean.
● (28 + 25 +26 +27 +31 +32 + 24) / 7 = 27.57

Image Source:
https://www.onlinemathlearning.com/image-files/population-mean.png
© Edunet Foundation. All rights reserved.
Variance

How to Calculate Variance?


● (Continued)
Find the deviation from the mean for each data
point.

Image Source:
https://s3.amazonaws.com/acadgildsite/wordpress_images/Data+Science/varianc
© Edunet Foundation. All rights reserved.
Variance

How to Calculate Variance?


(Continued)

● Square it

Image Source:
https://s3.amazonaws.com/acadgildsite/wordpress_images/Data+Science/varianc
© Edunet Foundation. All rights reserved.
Variance

How to Calculate Variance? => (0.1849 + 6.6049 + 2.4649 + .3249 + 11.76 +


(Continued) 19.6249 + 12. 4609) / 7

⇒ 53.4303 /7 = 7.6329
● The average of all squared differences is ⇒ Variance=7.6329
the variance. To find it, add all squared
⇒ Standard Deviation=sqrt of Variance
variances and divide the sum by a
number of elements in data set (n).
● To find the standard deviation in ages of
students pursuing Master’s, we calculate
the square root of the variance
Image Source:
https://s3.amazonaws.com/acadgildsite/wordpress_images/Data+Science/varianc
© Edunet Foundation. All rights reserved.
Variance
● Variance plays a major role in
Applications of Variance
interpreting data in statistics.
● The most common application of
variance is in polls.
● For opinion polls, the data gathering
agencies cannot invest in collecting data
from the entire population.
● They set criteria for sampling the
population based on ethnicity, income
group, regions, education level, salary
and religion, so that the population is Image Source:
https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uplo
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation

● Properties
Variance is a ofnumerical
Variancevalue that
describes the variability of observations
from its arithmetic mean.
● Variance is nothing but an average of
squared deviations.
● Variance is denoted by sigma-squared
(σ2)
● Variance is expressed in square units
which are usually larger than the values
in the given dataset.
Image Source:
● Variance measures how far individuals https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation
Properties of Variance
● In statistics, variance is defined as the
(Continued)
measure of variability that represents
how far members of a group are spread
out.
● It finds out the average degree to which
each observation varies from the mean.
● When the variance of a data set is small,
it shows the closeness of the data points
to the mean whereas a greater value of
variance represents that the
Image Source:
observations are very dispersed ©around https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation

Properties of Standard Deviation

● Standard deviation is a measure that


quantifies the amount of dispersion of
the observations in a dataset.
● The low standard deviation is an
indicator of the closeness of the scores
to the arithmetic mean and a high
standard deviation represents.
● The scores are dispersed over a higher
Image Source:
range of values. https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation
Properties of Standard Deviation
● Standard deviation is a measure of the
(Continued)
dispersion of observations within a data
set relative to their mean.
● The standard deviation is the root mean
square deviation.
● standard deviation is labelled as sigma
(σ).
● standard deviation which is expressed in
the same units as the values in the set
of data.
Image Source:
● Standard Deviation measures how© Edunet
much https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
Foundation. All rights reserved.
Properties of Variance and
standard deviation

Example : To find Standard Deviation


and Variance
● Marks scored by a student in five
subjects are 60, 75, 46, 58 and 80
respectively.
● You have to find out the standard
deviation and variance.
● First of all, you have to find out the
mean,

Image Source:
https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation

Example : To find Standard Deviation


and Variance
● Now calculate the variance
● Where, X = Observations
● A = Arithmetic Mean
● Both variance and standard deviation
are always positive.
● If all the observations in a data set are
identical, then the standard deviation
and variance will be zero.
Image Source:
https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
© Edunet Foundation. All rights reserved.
Properties of Variance and
standard deviation

Difference between Standard


Deviation and Variance

Image Source:
https://keydifferences.com/difference-between-variance-and-standard-deviation.ht
© Edunet Foundation. All rights reserved.
OLAP Concept

What is OLAP?
● Online Analytical Processing (OLAP) is
a category of software that allows users
to analyze information from multiple
database systems at the same time.
● It is a technology that enables analysts
to extract and view business data from
different points of view.
● Analysts frequently need to group,
aggregate and join data.
Image Source:
● These operations in relational databases https://myventurepad.com/wp-content/uploads/2017/05/what-is-ola-analysis.png
© Edunet Foundation. All rights reserved.
OLAP Concept

● OLAP Cube
OLAP databases are divided into one or
more cubes.
● The cubes are designed in such a way
that creating and viewing reports
become easy. The OLAP cube is a data
structure optimized for very quick data
analysis.
● The OLAP Cube consists of numeric
facts called measures which are
categorized by dimensions. Image Source:
https://myventurepad.com/wp-content/uploads/2017/05/what-is-ola-analysis.png
© Edunet Foundation. All rights reserved.
OLAP Concept

OLAP Cube
(Continued)
● A
How it Works?
Data warehouse would extract
information from multiple data sources
and formats like text files, excel sheet,
multimedia files, etc.

● The extracted data is cleaned and


transformed. Data is loaded into an
OLAP server (or OLAP cube) where
information is pre-calculated in advance Image Source:
https://myventurepad.com/wp-content/uploads/2017/05/what-is-ola-analysis.png
© Edunet Foundation. All rights reserved.
OLAP Concept

Basic analytical operations of OLAP

Four types of analytical operations in OLAP


are:
● Roll-up
● Drill-down
● Slice and dice
● Pivot (rotate)
Image Source:
https://cdn.educba.com/academy/wp-content/uploads/2019/11/Operations-in-OLA
© Edunet Foundation. All rights reserved.
OLAP Concept
Rollup

Basic analytical operations of OLAP

• Roll-up is also known as "consolidation"


or "aggregation." The Roll-up operation
can be performed in 2 ways
• 1.Reducing dimensions
• 2.Climbing up concept hierarchy.
Concept hierarchy is a system of
grouping things based on their order or
level. Image Source:
https://cdn.educba.com/academy/wp-content/uploads/2019/11/Operations-in-OLA
© Edunet Foundation. All rights reserved.
OLAP Concept
Rollup
(Continued)
• In this example, cities New jersey and
Basic analytical operations of OLAP
Lost Angles and rolled up into country
USA
• The sales figure of New Jersey and Los
Angeles are 440 and 1560 respectively.
They become 2000 after roll-up
• In this aggregation process, data is
location hierarchy moves up from city to
the country.
• In the roll-up process at least one or
Image Source:
more dimensions need to be removed. https://cdn.educba.com/academy/wp-content/uploads/2019/11/Operations-in-OLA
© Edunet Foundation. All rights reserved.
OLAP Concept
Drilldown

Basic analytical operations of OLAP

• In drill-down data is fragmented into


smaller parts. It is the opposite of the
rollup process. It can be done via
1. Moving down the concept hierarchy
2.Increasing a dimension

Image Source:
https://cdn.educba.com/academy/wp-content/uploads/2019/11/Operations-in-OLA
© Edunet Foundation. All rights reserved.
OLAP Concept
Drill Down
(Continued)
Basic analytical operations of OLAP

Consider the diagram :

1. Quater Q1 is drilled down to months


January, February, and March.
Corresponding sales are also registers.
2. In this example, dimension months are
added.
Image Source:
https://cdn.educba.com/academy/wp-content/uploads/2019/11/Operations-in-OLA
© Edunet Foundation. All rights reserved.
OLAP Concept
Slice

Basic analytical operations of OLAP

• In drill-down data is fragmented into


smaller parts. It is the opposite of the
rollup process. It can be done via
1. Moving down the concept hierarchy
2.Increasing a dimension

Image Source: https://www.guru99.com/online-analytical-processing.html


© Edunet Foundation. All rights reserved.
OLAP Concept
Slice
(Continued)
Basic analytical operations of OLAP

Consider the diagram :

● Dimension Time is Sliced with Q1 as the


filter.
● A new cube is created altogether.

Image Source: hhttps://www.guru99.com/online-analytical-processing.html


© Edunet Foundation. All rights reserved.
OLAP Concept
Dice

Basic analytical operations of OLAP

• This operation is similar to a slice. The


difference in dice is you select 2 or more
dimensions that result in the creation of
a sub-cube.

Image Source: https://www.guru99.com/online-analytical-processing.html


© Edunet Foundation. All rights reserved.
OLAP Concept

Pivot
Basic analytical operations of OLAP

● In Pivot, you rotate the data axes to


provide a substitute presentation of
data.
● In the following example, the pivot is
based on item types.

Image Source: hhttps://www.guru99.com/online-analytical-processing.html


© Edunet Foundation. All rights reserved.
OLAP Concept

Types of OLAP systems

● Types of OLAP Systems


● ROLAP
● MOLAP
● HOLAP
● WOLAP
● DOLAP
● SOLAP
Image Source: hhttps://www.guru99.com/online-analytical-processing.html
© Edunet Foundation. All rights reserved.
OLAP Concept

● What
Relational is ROLAP?
Online analytical
processing(ROLAP).
● ROLAP is an extended RDBMS along
with multidimensional data mapping to
perform the standard relational
operation.
● ROLAP works with data that exist in a
relational database.
● Facts and dimension tables are stored
as relational tables. It also allows Image Source::
https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
© Edunet Foundation. All rights reserved.
OLAP Concept

Advantages of ROLAP

● High data efficiency. It offers high data


efficiency because query performance
and access language are optimized
particularly for the multidimensional data
analysis.
● Scalability. This type of OLAP system
offers scalability for managing large
volumes of data, and even when the
Image Source::
data is steadily increasing. https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
© Edunet Foundation. All rights reserved.
OLAP Concept

Disadvantages of ROLAP
● Demand for higher resources: ROLAP
needs high utilization of manpower,
software, and hardware resources.
● Aggregately data limitations. ROLAP
tools use SQL for all calculation of
aggregate data. However, there are no
set limits to the for handling
computations.
● Slow query performance. Query
Image Source::
performance in this model is slow© Edunet
when https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
Foundation. All rights reserved.
OLAP Concept

What is MOLAP?

● MOLAP uses array-based


multidimensional storage engines to
display multidimensional views of data.
Basically, they use an OLAP cube.

Image Source::
https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
© Edunet Foundation. All rights reserved.
OLAP Concept

What is Hybrid OLAP?

● Hybrid OLAP is a mixture of both


ROLAP and MOLAP.
● It offers fast computation of MOLAP and
higher scalability of ROLAP. HOLAP
uses two databases.
● Aggregated or computed data is stored
in a multidimensional OLAP cube
● Detailed information is stored in a
Image Source::
relational database. https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
© Edunet Foundation. All rights reserved.
OLAP Concept

● This kind of OLAP helps to economize


Benefits
the of Hybrid
disk space, and itOLAP
also remains
compact which helps to avoid issues
related to access speed and
convenience.
● Hybrid HOLAP's uses cube technology
which allows faster performance for all
types of data.
● ROLAP are instantly updated and
HOLAP users have access to this
real-time instantly updated data. MOLAP Image Source::
https://static.javatpoint.com/tutorial/datawarehouse/images/data-warehouse-types
© Edunet Foundation. All rights reserved.
OLAP Concept

OLAP tools

● Business Analytic tools (OLAP) are IBM


Cognos, Micro Strategy, Palo OLAP
Server, Apache Kylin, Oracle OLAP,
icCube, Pentaho BI, JsHypercube, etc.
● We can apply security restrictions on
users and objects using OLAP tools.
● It creates a single platform for planning,
forecasting, reporting, and analysis. Image Source:: https://www.educba.com/olap-tools/?source=leftnav
© Edunet Foundation. All rights reserved.
OLAP Concept

● OLAP is a platform for all type of


Advantages of OLAP
business includes planning, budgeting,
reporting, and analysis.
● Information and calculations are
consistent in an OLAP cube. This is a
crucial benefit.
● Quickly create and analyze "What if"
scenarios
● Easily search OLAP database for broad
or specific terms.
Image Source::
● OLAP provides the building blocks for
© Edunet https://cdn.educba.com/academy/wp-content/uploads/2019/04/OLAP-TOOLS.jpg
Foundation. All rights reserved.
OLAP Concept

Advantages of OLAP
(Continued)
● Allows users to do slice and dice cube
data all by various dimensions,
measures, and filters.
● It is good for analyzing time series.
● Finding some clusters and outliers is
easy with OLAP.
● It is a powerful visualization online
analytical process system which
Image Source::
provides faster response times https://cdn.educba.com/academy/wp-content/uploads/2019/04/OLAP-TOOLS.jpg
© Edunet Foundation. All rights reserved.
OLAP Concept

Disadvantages of OLAP
● OLAP requires organizing data into a
star or snowflake schema. These
schemas are complicated to implement
and administer
● You cannot have large number of
dimensions in a single OLAP cube
● Transactional data cannot be accessed
with OLAP system.
● Any modification in an OLAP cube
Image Source::
needs a full update of the cube. This is Foundation.
© Edunet a https://cdn.educba.com/academy/wp-content/uploads/2019/04/OLAP-TOOLS.jpg
All rights reserved.
OLTP Concept

Overview of OLTP

● OLTP or Online Transaction Processing


is a type of data processing approach,
where the transactions play the major
role for data manipulation in the
database.
● This type of data processing is known
for its high performance, faster
accessibility and reliable & consistent Image Source:
https://www.opentextbooks.org.hk/system/files/resource/25/25212/25291/media/image58.JPG
data.
© Edunet Foundation. All rights reserved.
OLTP Concept

Understanding OLTP

● In the case of online airline booking, we


need to book an airline which is related
to insertion in the database.
● OLTP ensures the availability in the cart
and concurrency in case a large number
of users are accessing the same
website at the same time.

© Edunet Foundation. All rights reserved.


OLTP Concept

Characteristics OLTP

● 3NF databases

● Predefined operations

● Updating of databases is directly


accessible to end users.

● A small number of records


Image Source: https://www.educba.com/what-is-oltp/
● Maintaining historical data
© Edunet Foundation. All rights reserved.
OLTP Concept

How does OLTP make working so


easy

● Online transaction process concerns


about concurrency and atomicity.

● OLTP stores less historical data which


make it efficient.

● it maintains the consistency and


concurrency of the data in the
databases.
© Edunet Foundation. All rights reserved.
OLTP Concept

What can you do with OLTP?

● Its goal is to availability, speed,


concurrency, and recoverability.
● A large number of users can conduct
short transactions using OLTP systems.
● We can design such systems that help
in performing operations whose
database queries are usually simple,
require less than second response times
and return comparatively fewer records.
© Edunet Foundation. All rights reserved.
OLTP Concept

Working with OLTP

● It involves gathering information as


input, processing the data according to
needs and updating data to reflect the
processing information.
● For various decentralized database
systems, OLTP brokering programs
distribute transactions processes among
multiple computers on a network.
● OLTP is also carried into the
service-oriented architecture (SOA) and
Web services. © Edunet Foundation. All rights reserved.
OLTP Concept

OLTP Advantages

● Concurrency
● Acid Compliance
● Availability
● Integrity

© Edunet Foundation. All rights reserved.


OLTP Concept

OLTP Disadvantages

● For such concurrency, availability and


faster transactions OLTP often requires
support for transactions that include
many companies networks.
● Thus in today’s era, we require a more
decentralized system.

© Edunet Foundation. All rights reserved.


OLTP Concept

Why should we use OLTP?

● To use less paper and make a faster,


more accurate prediction of revenues
and expenses.
● The system that requires offline
maintenance makes a good requirement
for online transaction processing.
● Availability, concurrency, and atomicity
of data are much more important.

© Edunet Foundation. All rights reserved.


OLTP Concept

Why do we need OLTP?

● OLTP to perform the tasks


● Maintains normalized databases
● Decentralized system
● Business intelligence tasks

© Edunet Foundation. All rights reserved.


Analyze data using
statistical and data mining
techniques for business
intelligence.

Disclaimer: The content is curated for educational purposes only.


© Edunet Foundation. All rights reserved.
In this section, we will discuss:

● BI component framework
● business intelligence for management
● operational BI
● BI for process and performance improvement
● Role of Business Intelligence in Improving customer experience
● business intelligence role and responsibilities
● Popular BI tools in the market.

© Edunet Foundation. All rights reserved.


BI component framework

Architecture

● Architecture and components of a BI


system

Image Source:
http://www.myreadingroom.co.in/notes-and-studymaterial/65-dbms/560-business-intelligence
-and-its-architecture.html

© Edunet Foundation. All rights reserved.


BI component framework

Architecture Components

Data Warehouse

● Data warehouse is the core of the BI


system.
● A data warehouse is a database built for
the purpose of data analysis and
reporting
Image Source:
http://www.myreadingroom.co.in/notes-and-studymaterial/65-dbms/560-business-intelligence
-and-its-architecture.html

© Edunet Foundation. All rights reserved.


BI component framework

Architecture Components

Extract Transform Load

● It is very likely that more than one


system acts as the source of data
required for the BI system.
● Finally, loads it into the data warehouse;
this process is called Extract Transform
Load (ETL).
Image Source:
http://www.myreadingroom.co.in/notes-and-studymaterial/65-dbms/560-business-intelligence
-and-its-architecture.html

© Edunet Foundation. All rights reserved.


BI component framework

Architecture Components

Data model – BISM

● This layer, which we call the data model,


contains a file-based or memory-based
model of the data for producing very
quick responses to reports.

Image Source:
http://www.myreadingroom.co.in/notes-and-studymaterial/65-dbms/560-business-intelligence
-and-its-architecture.html

© Edunet Foundation. All rights reserved.


BI component framework

Architecture Components

Data visualization
● The frontend of a BI system is data
visualization. In other words, data
visualization is a part of the BI system
that users can see.
● There are different methods for
visualizing information, such as strategic
and tactical dashboards, Key
Performance Indicators (KPIs), and Image Source:
http://www.myreadingroom.co.in/notes-and-studymaterial/65-dbms/560-business-intelligence
detailed or consolidated reports. -and-its-architecture.html

© Edunet Foundation. All rights reserved.


BI component framework

Architecture Components

Master Data Management

● Master Data Management (MDM) is the


process of maintaining the single
version of truth for master data entities
through multiple systems.

Image Source:
http://www.myreadingroom.co.in/notes-and-studymaterial/65-dbms/560-business-intelligence
-and-its-architecture.html

© Edunet Foundation. All rights reserved.


BI component framework

Architecture Components

Data Quality Services

● The quality of data is different in each


operational system, especially when we
deal with legacy systems or systems
that have a high dependence on user
inputs.
Image Source:
http://www.myreadingroom.co.in/notes-and-studymaterial/65-dbms/560-business-intelligence
-and-its-architecture.html

© Edunet Foundation. All rights reserved.


Business intelligence for
management

BI Management

● BI Management ensures the


management and steering of business
intelligence and of the organizational
units involved as well as the integration
into an existing expert, technical and
organizational BI environment Image Source:
https://www.cubeserv.com/en/services/business-intelligence-ma
nagement/

© Edunet Foundation. All rights reserved.


Business intelligence for
management
What does Business Intelligence
Management include?

● Four components: analysts, data


solutions, decision making, and
oversight.

Image Source: https://www.betterbuys.com/bi/business-intelligence-management-optimizing-bi/

© Edunet Foundation. All rights reserved.


Business intelligence for
management

BI Governance

● BI governance defines the rules


according to which business intelligence
is steered, organized, implemented, and
developed further.
Image Source:
https://www.cubeserv.com/en/services/business-intelligence-ma
nagement/

© Edunet Foundation. All rights reserved.


Business intelligence for
management

BI Awareness

● BI awareness describes the


company-wide understanding of BI.
Uniform and consistent BI
understanding forms the basis for
successful BI projects.
Image Source:
https://www.cubeserv.com/en/services/business-intelligence-ma
nagement/

© Edunet Foundation. All rights reserved.


Business intelligence for
management

BI Strategy

● A BI strategy must be developed,


adapted, and updated continuously.
● i.e. the identification of the ambitions of
the BI sponsors and based on this, the
definition of the strategy-relevant initial
situation according to which concrete
Image Source:
goals can be derived. https://www.cubeserv.com/en/services/business-intelligence-ma
nagement/

© Edunet Foundation. All rights reserved.


Business intelligence for
management

BI Organization

● The BI Competence Centre is an


organizational unit that, ideally, will be a
service-providing division that is part of
a certain management field.
Image Source:
https://www.cubeserv.com/en/services/business-intelligence-ma
nagement/

© Edunet Foundation. All rights reserved.


Business intelligence for
management

BI Requirements Engineering

● Identification of BI-specific requirements


and their distinction from other projects

Image Source:
https://www.cubeserv.com/en/services/business-intelligence-ma
nagement/

© Edunet Foundation. All rights reserved.


Operational BI

Introduction

● Operational business intelligence (OBI)


systems provide an intermediate step
toward satisfying the strategic needs
that data warehouses address as well
as the tactical decision-making that
enterprise application integration (EAI)
addresses

© Edunet Foundation. All rights reserved.


Operational BI

Business Operations

● Hourly/daily minibatches of transactions


are sent to the OBI system that first logs
the transactions in a transaction
database, and then processes changes
in a data-mining engine. From this data,
the OBI system runs its rules-based
detection system, and generates a Image Source:
suspected fraud report. https://www.shadowbasesoftware.com/solutions/application-integration/rtbi/o
perational-bi/

© Edunet Foundation. All rights reserved.


Operational BI

Business Operations(Contd..)

● Business intelligence (BI) that helps


drive and optimize business operations
on a daily basis and sometimes used for
intra-day decision-making, is called
operational business intelligence.
Image Source:
https://www.shadowbasesoftware.com/solutions/application-integration/rtbi/o
perational-bi/

© Edunet Foundation. All rights reserved.


Operational BI

Business Operations(Contd..)

● Conceptually, OBI systems are thought


of as a data mart that is updated
frequently (daily, every few hours, or
even every few minutes or seconds)
with minibatches.
● OBI systems are similar to data marts
because they generally focus on a
specific task rather than on Image Source:
https://www.shadowbasesoftware.com/solutions/application-integration/rtbi/o
enterprise-wide functions. perational-bi/

© Edunet Foundation. All rights reserved.


Operational BI
Case Study : Real-Time Credit and
Debit Card Fraud Detection, an
HPE Shadowbase

● A complex suspicious or fraudulent


activity determination is made and
action taken while a transaction is in the
process of being gathered, routed,
authorized, and returned to the
origination point, or shortly thereafter,
Image Source:
typically far sooner than otherwise
/
https://www.shadowbasesoftware.com/solutions/application-integration/rtbi/operational-bi
achievable.

© Edunet Foundation. All rights reserved.


BI for process and
performance improvement

What is Business Intelligence

● Business intelligence (BI) is software


and services that take raw data and turn
it into relevant and practical insights that
companies can use to strengthen their
positions and business decisions.
● BI tools analyze large sets of data based
on queries that are written to fetch
specific types of information.

Image
Source:https://www.digitalvidya.com/wp-content/uploads/2018/05/What-is-Busine
© Edunet Foundation. All rights reserved.
BI for process and
performance improvement

What is Business Intelligence

● The results are then formatted and


displayed as summaries, graphs,
reports, charts, and maps for further
analysis for decision making.
● There are many company benefits when
it comes to gathering customer and
competitor data. Let's explore the
benefits of using BI for improving
internal workflows.
Image Source:
https://www.digitalvidya.com/wp-content/uploads/2018/05/What-is-Business-Intelli
© Edunet Foundation. All rights reserved.
BI for process and
performance improvement
Benefits of using business intelligence
for internal process improvement

● Change in your industry and even global


changes will affect business processes
at some point, and this means your
business will need to evolve its
processes to remain competitive. Using
BI to gather workflow-based information
offers many benefits, including these:

Image Source:
https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-of-bi-tool
© Edunet Foundation. All rights reserved.
BI for process and
performance improvement
Benefits of using business intelligence
for internal process improvement

● Reducing the time it takes to make


decisions
● Optimizing internal processes to help
your employees focus on higher-value
work
● Reducing the time it takes to get your
product or service to market.

Image Source:
https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-of-bi-tool
© Edunet Foundation. All rights reserved.
BI for process and
performance improvement
Benefits of using business intelligence
for internal process improvement

● Increasing customer satisfaction through


improved efficiencies and better service
● Freeing up more time to focus on other
things, like quality and customer
retention initiatives
● Overall improved operational efficiency
and agility
Image
Source:https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-o
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Visualizing with Big Data
● As the old saying goes, ‘a picture is
worth a thousand words,’ and the same
can be said for understanding data.
● Visualization tools help organize
extremely large, fast-moving data sets in
real-time to understand the current state
of the customer experience.
● “[Data imaging] tools are very helpful in
drawing attention to critical points in the
customer journey and experience, and
pulling out some actionable insights,”
Image Source:
advises https://www.scnsoft.com/blog-pictures/business-intelligence/big-data-visualization
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Enabling self-service

● Jerry Leisure, head of customer


experience mobile-gaming company,
Kabam, says the gaming industry has a
well-developed player community that
altruistically wants to help fellow
gamers.
● Traditionally, this kind of crowd-sourced
self-support has lived in player forums
unaffiliated with the game maker itself.
Image
Source:https://www.bigmountainanalytics.com/wp-content/uploads/2019/04/worki
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Enabling self-service

● Companies can also use BI to


collaborate with influential customers.
● “We [can] provide helpful information to
content creators on YouTube, for
example, who then share that
information with other players,” says
Leisure.

Image
Source:https://www.bigmountainanalytics.com/wp-content/uploads/2019/04/worki
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Leveraging artificial Intelligence

● AI is already helping to improve the


traveler experience at various
touchpoints, explains Erica Ellington,
director of projects and support for
Southwest Airlines.
● “Business intelligence enables us to
leverage both structured and
unstructured data to help us make more
informed decisions to improve the
customer experience.”
Image Source:
https://emerj.com/wp-content/uploads/2017/05/Artificial-Intelligence-in-Business-I
© Edunet Foundation. All rights reserved.
Role of Business
Intelligence in Improving
customer experience
Leveraging artificial Intelligence

● McCallister of CX University adds that in


order for AI and BI to effectively discover
customer insights together, more data
needs to become structured (i.e.,
well-organized, uniform information).
● To transform data from unstructured to
structured requires capturing, tagging
and classifying as much data as
possible.
Image
Source:https://emerj.com/wp-content/uploads/2017/05/Artificial-Intelligence-in-Bus
© Edunet Foundation. All rights reserved.
Business Intelligence Role
and Responsibilities

The Role of Business Intelligence

● Business intelligence, or BI, is a type of


software that can harness the power of
data within an organization.
● It offers a better way to sort, compare,
and review data in order for companies
to make smart decisions.

Image
Source:https://expert360.com/sites/default/files/media_embed/2019-02/blog.1.png
© Edunet Foundation. All rights reserved.
Business Intelligence Role
and Responsibilities

The Role of Business Intelligence

● Companies adopting business


intelligence solutions can turn business
data into insights and take plausible
action.
● These insights can help companies
make strategic business decisions that
increase productivity, improve revenues,
and enhance growth
Image
Source:https://expert360.com/sites/default/files/media_embed/2019-02/blog.1.png
© Edunet Foundation. All rights reserved.
Business Intelligence Role
and Responsibilities

The Role of Business Intelligence

● Companies adopting business


intelligence solutions can turn business
data into insights and take plausible
action.
● These insights can help companies
make strategic business decisions that
increase productivity, improve revenues,
and enhance growth
Image
Source:https://expert360.com/sites/default/files/media_embed/2019-02/blog.1.png
© Edunet Foundation. All rights reserved.
Business Intelligence Role
and Responsibilities

BI lives Up to Its Reputation

● There are many reasons why


companies choose business intelligence
solutions.
○ Better planning and analysis
○ Increased accuracy
○ Helped considerably with sales
forecasting
○ Improved pricing and offers
Image Source:
https://specials-images.forbesimg.com/imageserve/5ed9c9ad6d83330007c8079d/
© Edunet Foundation. All rights reserved.
BI lives Up to Its Reputation

Better planning and analysis

● Companies felt that BI systems helped


them the most with faster reporting,
planning, and analysis.
● 64% of responding companies ranked
their ability to report, plan and analyze
data as “good” after implementing a
business intelligence suite.

Image Source:
https://thebusinessanalystjobdescription.com/wp-content/uploads/2015/01/Busine
© Edunet Foundation. All rights reserved.
BI lives Up to Its Reputation

Increased Accuracy

● Among the companies surveyed, 56%


felt that business intelligence data
increased the accuracy of their business
analysis and planning.

Image
Source:https://cdn.datafloq.com/cache/blog_pictures/878x531/6-data-analytics-bu
© Edunet Foundation. All rights reserved.
BI lives Up to Its Reputation
Helped considerably with sales
forecasting

● Among the many tasks that companies


felt that business intelligence data
helped with, 57% ranked sales
forecasting and planning as the area
receiving the most benefit from BI data.
● Other areas where they felt that BI date
provided assistance was in customer
behavior analysis (40%) and a unified
view of customers (32%).
Image Source:
https://s32519.pcdn.co/wp-content/uploads/2019/12/measuring-forecast-accuracy
© Edunet Foundation. All rights reserved.
BI lives Up to Its Reputation

Improved pricing and offers

● Pricing and offer optimization benefited


somewhat from the implementation of a
BI system.
● 27% of respondents felt that the
additional data derived from their BI
system helped them improve their
pricing structure to become more
competitive, as well as improve the
attractiveness of their offers.

Image Source
https://images.techhive.com/images/article/2016/02/bi-business-intelligence-ts-10
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
What are BI Tools

● BI tools are types of software used to


gather, process, analyze, and visualize
large volumes of past, current, and
future data in order to generate
actionable business insights, create
interactive reports, and simplify the
decision-making processes.

Image
Source:https://www.infomazeelite.com/wp-content/uploads/2020/01/Business-intel
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
What are BI Tools

● These tools include key features such


as data visualization, visual analytics,
interactive dashboarding and KPI
scorecards.
● Additionally, they enable users to utilize
automated reporting and predictive
analytics features based on self-service.

Image
Source:https://www.infomazeelite.com/wp-content/uploads/2020/01/Business-intel
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
The benefit of BI tools

● Professional software and tools offer


various prominent benefits, here we will
focus on the most invaluable ones:
○ They bring together all relevant
data
○ Their true self-service analytics
approaches unlock data access
○ Users can take advantage of
predictions
○ They eliminate manual tasks
○ They reduce business costs
Image Source:
https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-of-bi-tool
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
The benefit of BI tools

● BI tools that are leaders in the business


intelligence community, often mentioned
in industry articles, and obtain a favorable
level of user reviews on Capterra, as
mentioned.
● The order of the tools is random and
doesn't represent a grading or ranking
system in any form.

Image Source:
https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-of-bi-tool
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
The benefit of BI tools

● DATAPINE
● SAS Business Intelligence
● Clear Analytics
● SAP Business Objects
● DOMO
● Microstrategy
● Good Data
● IBM Cognos Analytics
● Qlikview
● Yellowfin BI
Image
Source:https://deifpxeochufn.cloudfront.net/wp-content/uploads/2018/02/benifits-o
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
DATAPINE

● Datapine is a BI software that lets you


connect your data from various sources
and analyze with advanced analytics
features (including predictive).
● With your analysis, you can create a
powerful business dashboard (or
several), generate standard or
customized reports or incorporate
intelligent alerts to get notified of
anomalies and targets.
Image Source: https://www.datapine.com/images/datapine-bi-tool.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
DATAPINE

● This tool, rated with outstanding 4.6


stars on Capterra, is a powerful solution
for businesses of all sizes since
datapine can be implemented for
various industries, functions, and
platforms, no matter the size.

Image Source: https://www.datapine.com/images/datapine-bi-tool.png


© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
DATAPINE

● Key Feature of DATAPINE


○ Intuitive drag-and-drop interface
○ Easy-to-use predictive analytics
○ Many interactive dashboard
features
○ Multiple reporting options
○ Smart insights and alarms based
on artificial intelligence

Image Source:https://www.datapine.com/images/datapine-bi-tool.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
SAS BUSINESS INTELLIGENCE

● SAS Business Intelligence is a software


solution offering numerous products and
technologies for data scientists, text
analysists, data engineers, forecasting
analysts, econometricians, and
optimization modelers, among others.

Image Source:https://www.datapine.com/images/sas-business-intelligence.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
SAS BUSINESS INTELLIGENCE

● Founded in the 70s, SAS Business


Intelligence enjoys a long tradition in the
market, building and expanding its
products every year.
● With a Capterra rating of 4.5*, this
software enjoys a high level of users’
trust and satisfaction.

Image Source:https://www.datapine.com/images/sas-business-intelligence.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
SAS BUSINESS INTELLIGENCE

● Key features of SAS Business


Intelligence
○ Data exploration supported by
machine learning
○ Text analytics capabilities
○ Reports and dashboards across
devices
○ Integration with other applications

Image Source:https://www.datapine.com/images/sas-business-intelligence.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
CLEAR ANALYTICS

● Clear Analytics is a tool that


consolidates data from internal systems,
cloud, accounting, CRM, and allows you
to drag-and-drop that data into Excel.
● It works with Microsoft Power BI, using
Power Query and Power Pivot to clean
and model different datasets.
● Capterra gives a high user review of 4.5
stars making this tool also one of the
highest-rated on our list.
Image Source:https://www.datapine.com/images/clear-analytics.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
CLEAR ANALYTICS

● Key Features of Clear Analytics:


○ Reports delivered to Power BI
○ Connected with Excel
○ Sharing on mobile devices
○ A full audit trail
○ Fetch data elements with a
semantic layer

Image Source:https://www.datapine.com/images/clear-analytics.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
SAP BUSINESS OBJECTS

● SAP BusinessObjects is a business


intelligence suite designed for
comprehensive reporting, analysis, and
data visualization.
● They provide Office integrations with
Excel and PowerPoint where you can
create live presentations and hybrid
analytics that connects to their
on-premise and cloud SAP systems.

Image Source:
https://www.apprisia.com/blog/wp-content/uploads/2013/04/SAPBIBO.gif
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
SAP BUSINESS OBJECTS

● They’re focused on business categories


such as CRM and customer experience,
ERP and digital core, HR and people
engagement, digital supply chain, and
many more.
● To be accurate, more than 170M users
leverage SAP across the world, making
it one of the largest software suppliers in
the world.

Image
Source:https://www.apprisia.com/blog/wp-content/uploads/2013/04/SAPBIBO.gif
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
SAP BUSINESS OBJECTS

● Key features of SAP Business Objects:


○ A BI enterprise reporting system
○ Self-service, role-based
dashboards
○ Cross-enterprise sharing
○ Connection with SAP Warehouse
and HANA
○ Integration with Office

Image
Source:https://www.apprisia.com/blog/wp-content/uploads/2013/04/SAPBIBO.gif
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
DOMO

● Domo is a BI solution comprised of


multiple systems that are featured in this
platform, starting with connecting the
data, and finishing with extending data
with pre-built and custom apps from the
Domo Appstore.
● You can use Domo also for your data
lakes, warehouses, and ETL tools,
alongside with R or Python scripts to
prepare data for predictive modeling.
Image Source:
https://salestechstar.com/wp-content/uploads/2020/01/Domo-Ranked-Top-Platfor
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
DOMO

● Key features of DOMO:


○ Numerous pre-built cloud
connectors
○ Magic ETL feature
○ Automatically suggested
visualizations
○ Mr. Roboto as an AI engine
○ Domo Appstore

Image
Sourcehttps://salestechstar.com/wp-content/uploads/2020/01/Domo-Ranked-Top-
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
MICROSTRATEGY

● MicroStrategy is an enterprise analytics


and mobility platform focused on hyper
intelligence, federated analytics, and
cloud solutions.
● Their mobile dossiers enable users to
build interactive books of analytics that
render on iOS or Android devices, with
the possibility to extend the
MicroStrategy content into their apps by
using Xcode or JavaScript.
Image Source:https://i.ytimg.com/vi/LGrKsT76Es4/maxresdefault.jpg
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
MICROSTRATEGY

● Capterra users gave a solid 4* review,


hence, this is one of our examples of
business intelligence tools having strong
references on the BI market.

Image Sourcehttps://i.ytimg.com/vi/LGrKsT76Es4/maxresdefault.jpg:
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
MICROSTRATEGY

● Key features of Microstrategy


○ Hyperintelligence pulls your data
○ Federated analytics
○ Mobile deployment
○ Integration with voice technology
○ Cloud technology

Image Source https://i.ytimg.com/vi/LGrKsT76Es4/maxresdefault.jpg:


© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
GOODDATA

● GoodData is a business analytics


software that provides the tools for data
ingestion, storage, analytic queries,
visualizations, and application
integration.
● You can embed their analytics into your
website, desktop or mobile application
or create dashboards and reports for
your daily activities, without the need to
obtain a Ph.D., as stated on their
website.
Image Source: https://images.g2crowd.com/uploads/attachment/file/108698/1.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
GOODDATA

● Key features of Gooddata


○ Customers can publish their own
reports
○ A modular data pipeline
○ A platform for developers
○ Additional support
○ 4 Data centers

Image Source:https://images.g2crowd.com/uploads/attachment/file/108698/1.png
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
IBM COGNOS ANALYTICS

● Part of the Microsoft family, IBM Cognos


Analytics is a cloud-based business
intelligence software that utilizes AI
recommendations when creating
dashboards and reports, geospatial
capabilities to overlay your data with the
physical world, and enables you to ask
questions in plain English to
communicate with the software.

Image Source
https://software-advice.imgix.net/managed/products/screenshots/screenshot_144
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
IBM COGNOS ANALYTICS

● A robust solution from one of the


industry leaders in software
development, IBM Cognos Analytics
received a sturdy 4 stars review on
Capterra.

Image Source:
https://software-advice.imgix.net/managed/products/screenshots/screenshot_144
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
IBM COGNOS ANALYTICS

● Key features of IBM Cognos Analytics:


○ Search mechanism
○ A single data module
○ Interactive data visualization
○ AI assistant
○ Extensive knowledge center:
Integration with other applications

Image Source:
https://software-advice.imgix.net/managed/products/screenshots/screenshot_144
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
QLIKVIEW

● QlikView is one of BI applications


offered by Qlik as part of its data
analytics platform focused on rapid
development and guided analytics
applications and dashboards.
● It’s built on an Associative Engine that
allows data discovery without the need
to use query-based tools, eliminating the
risk of data loss and inaccurate results.

Image Source:https://cdn.buttercms.com/q4iN12vOSZKTsjOWdxbB
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
QLIKVIEW

● A high rating of 4.5 stars on Capterra,


users are quite satisfied with this
product and its features, making it one
of the top BI tools on our list.

Image Source:https://cdn.buttercms.com/q4iN12vOSZKTsjOWdxbB
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
QLIKVIEW

● Key features of QLIKVIEW:


○ Associative exploration
○ Visually highlighted dashboards
○ Associative Engine
○ A dual-use strategy
○ Developer’s platform

Image Source:https://cdn.buttercms.com/q4iN12vOSZKTsjOWdxbB
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
YELLOWFIN BI

● A suite of products consisted of


dashboards, signals, stories, data
discovery and data prep, this BI
analytics tool offers numerous features,
including a mobile app available both for
Android and iOS devices.
● Capterra users gave a strong rating of
4.5*, hence, it makes sense to take a
closer look at what they have on offer.

Image
Source:https://www.channelfutures.com/files/2018/07/Cloud-Analytics-2018.jpg
© Edunet Foundation. All rights reserved.
Popular BI tools in the
market
YELLOWFIN BI

● Key features of YELLOWFIN BI:


○ Yellowfin signals via smartphone
○ Persuasive data stories
○ Smart tasks

Image
Source:https://www.channelfutures.com/files/2018/07/Cloud-Analytics-2018.jpg
© Edunet Foundation. All rights reserved.
Understand case studies for
predictive models

Disclaimer: The content is curated for educational purposes only.


© Edunet Foundation. All rights reserved.
In this section, we will discuss:

● Concept of data mining techniques


● Concepts of data mining model with its development and deployment in
business scenario
● Data mining models
● CRISP-DM model
● understanding of data and its preparation techniques for the better model
building
● introduction to sampling and data partitioning in data mining project

© Edunet Foundation. All rights reserved.


Concept of Data Mining
Techniques

Data Mining concept

● Data mining processes structured


information through the application of
artificial intelligence, neural networks,
and advanced statistical tools in order to
detect patterns and summarize data into
a format that can be understood.
● It allows corporations to anticipate future
trends, uncover new opportunities, and
most importantly improve overall
performance.
Image Source: https://slideplayer.com/slide/10943778/
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Techniques

Data Mining concept (Contd.)


● Data mining is the term used to describe
the process of extracting value from a
database..
● Data mining involves the use of
sophisticated data analysis tools to
discover previously unknown, valid
patterns and relationships in large
data sets.
● Data Mining consists of more than
collecting and managing data, it also
includes analysis and prediction. Image Source:
https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#DMCO
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Techniques
Data mining techniques
– Tracking patterns

● It is one of the most elementary


techniques in data mining is learning to
identify outlines in your data sets.
● It is typically a recognition of some
deviation in your data trendy at regular
intervals, or a variation of a certain
variable over time.
Image Source: https://www.guru99.com/data-mining-tutorial.html
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Techniques
Data mining techniques
– Classification

● It is a classic data mining technique


based on machine learning.
● It is used to classify each item in a set of
data into one of a predefined set of
classes or groups.
● Classification method makes use of
mathematical techniques such as
decision trees, linear programming,
neural network, and statistics. Image Source: https://www.guru99.com/data-mining-tutorial.html
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Techniques
Data mining techniques
– Association

● Association data mining notices


recurring themes in databases,
recognizes relations between them and
develops a pattern of these relations.
● It will then use these patterns as a
reference to predict future behaviour.

Image Source: https://www.guru99.com/data-mining-tutorial.html


© Edunet Foundation. All rights reserved.
Concept of Data Mining
Techniques
Data mining techniques
– Outlier Detection

● Outlier is defined as an observation that


deviates too much from other
observations. The identification of
outliers can lead to the discovery of
useful and meaningful knowledge.
● In many cases, basically identifying
the all-embracing pattern cannot give
you a clear understanding of your data
set. You also need to be able to classify
irregularities or outliers in your data. Image Source: https://www.guru99.com/data-mining-tutorial.html
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Techniques
Data mining techniques
– Clustering

● Clustering is a data mining technique


that makes a meaningful or useful
cluster of objects which have similar
characteristics using the automatic
technique.
● The clustering technique defines the
classes and puts objects in each
class, while in the classification
techniques, objects are assigned into
Image Source: https://www.guru99.com/data-mining-tutorial.html
predefined classes. © Edunet Foundation. All rights reserved.
Concept of Data Mining
Techniques
Data mining techniques
– Regression

● Regression, used mainly as a form


of planning and modelling, is used to
classify the probability of a certain
variable, given the presence of other
variables.
● Regression and Classification both are
used in prediction analysis, but
regression is used to predict a numeric
or continuous value while classification
Image Source: https://www.guru99.com/data-mining-tutorial.html
assigns data into discrete categories.
© Edunet Foundation. All rights reserved.
Concept of Data Mining
Techniques
Data mining techniques
– Prediction

● It is one of a data mining technique


that learns the relationship between
independent variables and the
relationship between dependent and
independent variables.
● Prediction derives the relationship
between a thing you know and a thing
you need to predict for future reference.
Image Source: https://www.guru99.com/data-mining-tutorial.html
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business
Phases and Tasks
scenario

● It is a step by step procedure for


implementation of data mining in a
business scenario.
● The phases and tasks include –
Business understanding, Data
understanding, Data preparation,
Modelling, Evaluation, Deployment.

© Edunet Foundation. All rights reserved.


Concepts of data mining
model with its development
and deployment in business
Business understanding
scenario

● Data mining goals are defined.


● The fundamental requirement is to
understand client and business
objectives.
● Current data mining scenario, factors
in resources, constraints and
assumptions should be taken into the
assessment.
Image Source:
https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business
Datascenario
understanding

● In this stage, a sanity check is


conducted to understand whether it is
appropriate for data mining goals.
● The data is collected from various
sources within the organization.
● It is a highly complex process since data
and process from various sources
unlikely to match easily.
Image Source:
https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business
Data preparation
scenario

● The data is production ready in this


stage.
● The data from diverse sources
should be nominated, cleaned,
transformed, formatted, anonymized,
and created.
● Data cleaning is a process to "clean" the
data by smoothing noisy data and
satisfying in missing values. Image Source:
https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business
Modelling
scenario

● In this stage, mathematical models


are used to determine the data
patterns.
● Suitable modelling techniques need to
be chosen for the prepared data set.
● After that, create a scenario to validate
the model. Then run the model on the
prepared data set.
Image Source:
https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business
Evaluation
scenario

● In this stage, patterns recognized are


examined against business objectives.
● A go or no-go decision should be taken
to move the model in the deployment
phase.

Image Source:
https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
© Edunet Foundation. All rights reserved.
Concepts of data mining
model with its development
and deployment in business
Deployment
scenario

● In this stage, ship your data mining


discoveries to every business operation.
● A thorough deployment plan, for
shipping, maintenance, and monitoring
of data mining discoveries is created.

Image Source:
https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
© Edunet Foundation. All rights reserved.
Data mining models

Types of models

● Data mining models can be broadly


classified into two categories:

○ Predictive Model

○ Descriptive Model

Image Source:
https://www.researchgate.net/figure/Basic-data-mining-models_fig3_308698620
© Edunet Foundation. All rights reserved.
Data mining models

Predictive model

● The predictive model makes a forecast


about unidentified data values by
using the identified values.
● The forecast is the process of
investigating the existing and previous
states of the attribute and forecast of its
forthcoming state.
● The techniques that fall under this
category are the classification,
Image Source:
regression and time-series analysis. https://www.researchgate.net/figure/Basic-data-mining-models_fig3_308698620
© Edunet Foundation. All rights reserved.
Data mining models

Descriptive model

● It identifies the projects or


relationships in data and discovers
the properties of the data studied.
● These descriptive data mining
techniques are used to obtain
information on the regularity of the data
by using raw data as input and to
discover important patterns.
● For example, Clustering,
Summarization, Association rule, Image Source:
https://www.researchgate.net/figure/Basic-data-mining-models_fig3_308698620
© Edunet Foundation. All rights reserved.
CRISP-DM model

Basic concepts

● It stands for Cross-Industry Standard


Process for Data Mining, an
industry-proven way to guide your data
mining efforts.
● As a methodology, it includes
descriptions of the typical phases of a
project, the tasks involved with each
phase, and an explanation of the
relationships between these tasks.
● As a process model, CRISP-DM
Image Source:
provides an overview of the data ©mining https://www.ibm.com/support/knowledgecenter/SS3RA7_sub/modeler_crispdm_d
Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Why is it necessary to Partition ?

● For easy management


● To assist backup/recovery
● To enhance performance

Image Source: https://dev.to/alibayatgh/what-is-data-partitioning-171o


© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
What is data preparation ?

● Data preparation is the process of


cleaning and transforming raw data prior
to processing and analysis.
● It is an important step prior to
processing and often involves
reformatting data, making corrections to
data and the combining of data sets to
enrich data.

Image Source:
https://bigdataanalyticsnews.com/data-preparation-why-is-it-important/
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Why prepare data ?

● Data need to be formatted for a given


software tool.
● Data need to be made adequate for a
given method.
● Data in the real world is dirty as :
○ Incomplete
○ Noisy (contains error)
○ Inconsistent

Image Source:
https://bigdataanalyticsnews.com/data-preparation-why-is-it-important/
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Major task in Data preparation

● Data discretization
● Data cleaning
● Data integration
● Data transformation
● Data reduction

Image Source:
https://bigdataanalyticsnews.com/data-preparation-why-is-it-important/
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data Understanding)

● Collect Data
● Describe data
● Explore data
● Verify data quality

Image Source:
https://paginas.fe.up.pt/~ec/files_1112/week_03_Data_Preparation.pdf
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Select data
○ Reconsider data selection criteria.
○ Decide which dataset will be used.
○ Collect appropriate additional data
(internal or external).
○ Consider use of sampling
techniques.
○ Explain why certain data was
included or excluded.
Image Source:
https://paginas.fe.up.pt/~ec/files_1112/week_03_Data_Preparation.pdf
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Clean data
○ Correct, remove or ignore noise.
○ Decide how to deal with special
values and their meaning
○ Aggregation level, missing values,
etc.
○ Outliers?

Image Source:
https://paginas.fe.up.pt/~ec/files_1112/week_03_Data_Preparation.pdf
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Construct data
○ Derived attributes.
○ Background knowledge.
○ How can missing attributes be
constructed or imputed?

Image Source:
https://paginas.fe.up.pt/~ec/files_1112/week_03_Data_Preparation.pdf
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Integrate data
○ Integrate sources and store result
(new tables and records).

Image Source:
https://paginas.fe.up.pt/~ec/files_1112/week_03_Data_Preparation.pdf
© Edunet Foundation. All rights reserved.
Understanding of data and
its preparation techniques
for the better model building
Crisp-DM (Data preparation)

● Format data
○ Rearranging attributes
○ Reordering records (Perhaps the
modelling tool requires that the
records be sorted according to the
value of the outcome attribute).
○ Reformatted within-value These
are purely syntactic changes made to
satisfy the requirements of the
specific modelling tool, remove illegal
Image Source:
characters, uppercase lowercase). https://paginas.fe.up.pt/~ec/files_1112/week_03_Data_Preparation.pdf
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
What is data sampling ?

● Data sampling is a statistical analysis


technique used to select, manipulate
and analyze a representative subset of
data points to identify patterns and
trends in the larger dataset being
examined.

Image Source:
https://www.analyticsvidhya.com/blog/2019/09/data-scientists-guide-8-types-of-sa
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
What is data sampling ?

● It enables data scientists, predictive


modelers and other data analysts to
work with a small, manageable amount
of data about a statistical population to
build and run analytical models more
quickly, while still producing accurate
findings.

Image Source:
https://www.analyticsvidhya.com/blog/2019/09/data-scientists-guide-8-types-of-sa
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Advantages of sampling

● Sampling can be particularly useful with


data sets that are too large to efficiently
analyze in full.
● Identifying and analyzing a
representative sample is more efficient
and cost-effective than surveying the
entirety of the data or population.

Image Source: https://www.dreamstime.com/illustration/advantages.html


© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Steps involved in sampling

● Identify and define Target population


● Select sampling frame
● Choose sampling methods
● Determine Sample size
● Collect the required data

Image Source:
https://www.analyticsvidhya.com/blog/2019/09/data-scientists-guide-8-types-of-sa
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Types of data sampling methods

● Simple random sampling


● Stratified sampling
● Cluster sampling
● Systematic sampling

Image Source:
https://www.analyticsvidhya.com/blog/2019/09/data-scientists-guide-8-types-of-sa
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Types of data sampling methods
(Nonprobability)

● Convenience sampling
● Snowball sampling
● Purposive or judgmental sampling
● Quota sampling

Image Source:
https://www.analyticsvidhya.com/blog/2019/09/data-scientists-guide-8-types-of-sa
© Edunet Foundation. All rights reserved.
Introduction to sampling and
data partitioning in data
mining projects
Data Partitioning

● The simplest and most fundamental


version of cluster analysis is partitioning,
which organizes the objects of a set into
several exclusive groups or clusters.
● Given a data set, D, of n objects, and k,
the number of clusters to form, a
partitioning algorithm organizes the
objects into k partitions (k ≤ n), where
each partition represents a cluster.
Image Source: https://dev.to/alibayatgh/what-is-data-partitioning-171o
© Edunet Foundation. All rights reserved.
Able to develop case
studies for predictive
analytical models

Disclaimer: The content is curated for educational purposes only.


© Edunet Foundation. All rights reserved.
In this section, we will discuss:

● Concepts of machine learning


● approach for data mining using decision tree inductive concept
● conceptual cluster
● attribute oriented induction
● iterative database scanning
● attribute focusing
● neural networks
● rough sets
● visualization
● concepts of odds
● concepts of odds ratio
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning

What is Machine Learning ?

● The term machine learning was first


introduced by Arthur Samuel in 1959.
● Machine Learning is said as a subset of
artificial intelligence that is mainly
concerned with the development of
algorithms which allow a computer to
learn from the data and past
experiences on their own.
Image Source: https://expertsystem.com/machine-learning-definition/
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning

What is Machine Learning ?

● We can define it in a summarized way


as:

Machine Learning enables a machine to


automatically learn from data, improves
performance from experiences, and
predict things without being explicitly
programmed.

Image Source: https://www.javatpoint.com/machine-learning


© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning

How does Machine Learning work ?

● A Machine Learning system learns from


historical data, builds the prediction
models, and whenever it receives new
data, predicts the output for it.
● The accuracy of predicted output
depends upon the amount of data, as
the huge amount of data helps to build a
better model which predicts the output
more accurately.
Image Source: https://www.javatpoint.com/machine-learning
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning

Features of Machine Learning

● Machine Learning uses data to detect


various patterns in a given dataset.
● It can learn from past data and improve
automatically.
● It is a data-driven technology.
● Machine learning is much similar to data
mining as it also deals with the huge
amount of the data.

Image Source:
https://sarvosys.com/electronic-logging-devices/attachment/cwtype-features/
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning

Importance of Machine Learning

● Rapid increment in the production of


data
● Solving complex problems, which are
difficult for a human
● Decision making in various sector
including finance
● Finding hidden patterns and extracting
useful information from data.

Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/


© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning
Classification of Machine Learning
(Supervised Learning)

● Supervised learning is a type of machine


learning method in which we provide
sample labeled data to the machine
learning system in order to train it, and
on that basis, it predicts the output.
● The goal of supervised learning is to
map input data with the output data.
Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning
Classification of Machine Learning
(Supervised Learning)

● The supervised learning is based on


supervision, and it is the same as when
a student learns things in the
supervision of the teacher. The example
of supervised learning is spam filtering.
● Supervised learning can be grouped
further in two categories of algorithms:
● Classification
● Regression Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning
Classification of Machine Learning
(Unsupervised Learning)

● Unsupervised learning is a learning


method in which a machine learns
without any supervision.
● The goal of unsupervised learning is to
restructure the input data into new
features or a group of objects with
similar patterns.
Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning
Classification of Machine Learning
(Unsupervised Learning)

● In Unsupervised learning, we don't have


a predetermined result.
● he machine tries to find useful insights
from the huge amount of data. It can be
further classified into two categories of
algorithms:
● Clustering
● Association
Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning
Classification of Machine Learning
(Reinforcement Learning)

● Reinforcement learning is a
feedback-based learning method, in
which a learning agent gets a reward for
each right action and gets a penalty for
each wrong action.
● The agent learns automatically with
these feedbacks and improves its
performance.
Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/
© Edunet Foundation. All rights reserved.
Concepts of Machine
Learning
Classification of Machine Learning
(Reinforcement Learning)

● In reinforcement learning, the agent


interacts with the environment and
explores it.
● The goal of an agent is to get the most
reward points, and hence, it improves its
performance.
● The robotic dog, which automatically
learns the movement of his arms, is an
example of Reinforcement learning. Image Source: https://www.ahomtech.com/blog/importance-of-machine-learning/
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
concept
What is Data Mining ?

● The process of extracting information to


identify patterns, trends, and useful data
that would allow the business to take the
data-driven decision from huge sets of
data is called Data Mining.
● Data Mining is also called Knowledge
Discovery of Data (KDD).
Image Source: https://www1.cmc.edu/pages/faculty/BHunter/datamining.html
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
concept
What is Decision Tree ?

● Decision Tree is a supervised learning


method used in data mining for
classification and regression methods.
● It is a tree that helps us in
decision-making purposes.
● The decision tree creates classification
or regression models as a tree structure.
● Decision trees can deal with both
categorical and numerical data. Image Source: https://www.educba.com/decision-tree-in-data-mining/
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
concept
What is Decision Tree ?

● A decision tree is a structure that


includes a root node, branches, and leaf
nodes. Each internal node denotes a
test on an attribute, each branch
denotes the outcome of a test, and each
leaf node holds a class label. The
topmost node in the tree is the root
node.
Image Source:
http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4_dtrees1.html
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
concept
Key Factors

● Entropy : Entropy refers to a common


way to measure impurity. In the decision
tree, it measures the randomness or
impurity in data sets.

Image Source: https://www.javatpoint.com/decision-tree-induction


© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
concept
Key Factors

● Information Gain : Information Gain


refers to the decline in entropy after the
dataset is split. It is also called Entropy
Reduction. Building a decision tree is all
about discovering attributes that return
the highest data gain.

Image Source: https://www.javatpoint.com/decision-tree-induction


© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
concept
Why are Decision Trees useful

● It enables us to analyze the possible


consequences of a decision thoroughly.
● It provides us a framework to measure
the values of outcomes and the
probability of accomplishing them.
● It helps us to make the best decisions
based on existing data and best
speculations.
Image Source: https://www.empathyrooms.com/why-not-to-use-the-word-why/
© Edunet Foundation. All rights reserved.
Approach for data mining
using decision tree inductive
concept
Advantages of using Decision Trees

● A decision tree does not need scaling of


information.
● Decision trees need less exertion for
data preparation during pre-processing.
● It is automatic and simple to explain to
the technical team as well as
stakeholders.
● A decision tree does not require a
standardization of data. Image Source: https://www.dreamstime.com/illustration/advantages.html
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction

● Conceptual clustering is a machine


learning paradigm for unsupervised
classification developed mainly during
the 1980s.

● It is distinguished from ordinary data


clustering by generating a concept
description for each generated class.
https://image.slideserve.com/1450644/types-of-clusters-conceptual-clusters-l.jpg
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction (Continue)

● Most conceptual clustering methods are


capable of generating hierarchical
category structures.

● Conceptual clustering is closely related


to formal concept analysis, decision tree
learning, and mixture model learning.
https://www.researchgate.net/publication/267363337/figure/fig3/AS:66904603492
3528@1536524413930/A-conceptual-cluster-Some-intersected-points-in-both-the
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction (Continue)

● The conceptual part of the process lies


in how the exemplars are agglomerated/
divided rather than in how the clusters
are described (i.e.. the cluster forming
mechanism need not maintain any
cluster descriptions).

● The second view is that of concept


formation, with exemplars as the
catalyst. © Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction (Continue)

● Under this view clusters are formed


according to their conceptual
descriptions, i.e., the system must
constantly maintain conceptual
descriptions of clusters and cluster
membership is constrained by the
concepts available to describe the
results.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Introduction (Continue)

● Following the terminology of psychology,


the first view will here be called
conceptual sorting. The second view will
be called concept discovery. Each in its
own way can be said to involve
conceptual clustering.

© Edunet Foundation. All rights reserved.


Conceptual Cluster Learning

Conceptual Clustering vs. Data


Clustering

● Conceptual clustering is obviously


closely related to data clustering.

● However, in conceptual clustering it is


not only the inherent structure of the
data that drives cluster formation, but
also the Description language which is
available to the learner.
https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRvzdyEAHlrBDDCI
© Edunet Foundation. All rights reserved. 8-php1EbhwKjvd1fTEhzg&usqp=CAU
Conceptual Cluster Learning

Conceptual Clustering vs. Data


Clustering (Continue)

● Thus, a statistically strong grouping in


the data may fail to be extracted by the
learner if the prevailing concept
description language is incapable of
describing that particular regularity.

© Edunet Foundation. All rights reserved.


Conceptual Cluster Learning

Conceptual Clustering as Concept


Sorting

● One view of conceptual clustering


proposes to produce interesting
groupings and then provide them with a
conceptual interpretation.

● That is. to build extensionally defined


categories (by enumerating their
members) and then find a conceptual
interpretation. https://i.ytimg.com/vi/XpUk3vhC9AA/maxresdefault.jpg
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Conceptual Clustering as Concept


Sorting (Continue)

● A major reason independently rendered


clusters can have rather unappealing
conceptual interpretations is that they
practice no concept-related similarity
measurement.

© Edunet Foundation. All rights reserved.


Conceptual Cluster Learning

Conceptual Clustering as Concept


Sorting (Continue)

There are two points to be made here:

1. The similarity metric used defines a


gradient over the feature space that
possesses one of the conceptual
irregularities that underlying the domain.

2. The similarity metric views all attributes


with a fixed relevance to the problem
without any way to determine attribute https://blog.maketaketeach.com/wp-content/uploads/2016/03/Sort-WOW-border.j

relevancy from patterns in' the data.


© Edunet Foundation. All rights reserved. pg
Conceptual Cluster Learning

Conceptual Clustering as Concept


Discovery

● Concept discovery systems focus on the


determination of concepts (according to
some concept representation system) to
describe each category that is formed.

● Indeed, categories are formed such that


their descriptions are as desired by the
applied biases (including
representational constraints) and a
https://www.researchgate.net/profile/Chung-Hsien_Wu/publication/224202721/fig
concept based cluster quality measure. ure/fig10/AS:668839448682505@1536475159358/Overview-of-textual-concept-di
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Conceptual Clustering as Concept


Discovery (Continue)

● It is the category descriptions that are


constantly monitored, generalized,
specialized, and evaluated by the
concept-based quality measure.

● These systems incorporate mechanisms


to propose multi-relation (polythetic)
concepts as category descriptions.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Conceptual Clustering as Concept


Discovery (Continue)

● The availability of concepts is governed


by the biases of the system and the
background knowledge that is applied.

● For Example, The grape and apple differ


in color and type-of-fruit but are both
ripe; the orange and apple differ in color
and type and ripeness.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Conceptual Clustering as Concept


Discovery (Continue)

● Without background knowledge, the


concept-based approach reverts to the
attribute-based one.

● It is background knowledge that makes


the feature space and concept space
rough and irregular so that the fit of the
data to the irregularities can be used to
help confirm a candidate conceptual
interpretation. © Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Knowledge-Based Conceptual
Clustering

● Discovering concepts by conceptual


clustering is not purely an inductive
inference process.

● A portion of the process involves


deductive inference to determine from
background knowledge latent attributes
for exemplars and appropriate concepts
to ready as candidate category
http://www.inf.ufrgs.br/~engel/data/media/file/Aprendizagem/Cobweb.pdf
descriptions. © Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Knowledge-Based Conceptual
Clustering (Continue)

● A system equipped with sizable


background knowledge and a deductive
mechanism for accessing and applying it
can make a wide variety of appropriate
transformations of exemplars that will
greatly aid concept formation.

http://www.inf.ufrgs.br/~engel/data/media/file/Aprendizagem/Cobweb.pdf
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Knowledge-Based Conceptual
Clustering (Continue)
● For example, an inference rule could
suggest the construction of an attribute
whose values report the number of other
attributes (from a subset of other
attributes) having values that differ from
the most frequent attribute values.

● Such a derived attribute supports


polymorphic concepts like "2 of the 3
attributes A. B. and C have target values
of x. y, and z. respectively”.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Knowledge-Based Conceptual
Clustering (Continue)

● Since the system knows the definition of


the attribute (from background
knowledge) it is able to state
polymorphic concepts in easily
understood terms.

● The point is that additional knowledge


applied during clustering can have a
great effect on the types of categories
formed. © Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Type Levels

● Type-0: Statistic-based quality measure;


no conceptual interpretation.

● Type-1: Statistic-based quality measure;


conceptual interpretation after-the-fact.

● Type-2: Attribute-based quality


measure; no conceptual interpretation.

● Type-3: Attribute-based quality


measure; conceptual interpretation https://www.analyticsvidhya.com/wp-content/uploads/2016/11/clustering-6.png
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Type Levels (Continue)

● Type-4: Concept-based quality


measure; no background knowledge.

● Type-5: Concept-based quality


measure; background knowledge.

● Type-6: Concept-based quality


measure; background knowledge;
structured exemplars.
© Edunet Foundation. All rights reserved.
Conceptual Cluster Learning

Application Areas

● Biology

● Medicine

● Psychology

● Climate

● Business

● Information Retrieval © Edunet Foundation. All rights reserved.


Attribute Oriented Induction

Introduction

● Attribute Oriented Induction (AOI)


method was first proposed in 1989
integrates a machine learning paradigm
especially learning-from-examples
techniques with database operations,
extracts generalized rules from an
interesting set of data and discovers
high level data regularities.
https://d3i71xaburhd42.cloudfront.net/7fdc538e50ab4d4a0e4c531ae180ea84c63
© Edunet Foundation. All rights reserved. 9ba8c/3-Figure2-1.png
Attribute Oriented Induction

Introduction (Continue)

● AOI provides an efficient and effective


mechanism for discovering various kinds
of knowledge rules from datasets or
databases.

● AOI approach is developed for learning


different kinds of knowledge rules such
as characteristic rules, discrimination
rules, classification rules, data evolution
regularities, association rules and
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Characteristic Rule

● Characteristic rule is an assertion which


characterizes the concepts which
satisfied by all of the data stored in
database.

● This rule provide generalized concepts


about a property which can help people
recognize the common features of the
data in a class.

● For example the symptom of the specific


© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Discriminant Rule

● Discriminant rule is an assertion which


discriminates the concepts of one
(target) class from another (contrasting).

● This rule give a discriminant criterion


which can be used to predict the class
membership of new data.

● For example to distinguish one disease


from the other
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Classification Rule

● Classification rule is a set of rules which


classifies the set of relevant data
according to one or more specific
attributes.

● For example, classifying diseases into


classes and provide the symptoms of
each
https://ars.els-cdn.com/content/image/1-s2.0-S0957417404000053-gr1.gif
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Association Rule

● Association rule is association


relationships among the set of relevant
data.

● For example, discovering a set of


symptoms frequently occurring together

© Edunet Foundation. All rights reserved.


Attribute Oriented Induction

Data Evolution Regularities Rule

● Data evolution regularities rule is a


general evolution behavior of a set of
the relevant data (valid only in
time-related/temporal data).

● For example, describing the major


factors that influence the fluctuations of
stock values through time.
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Cluster Description Rule

● Cluster description rule is used to cluster


data according to data semantics.

● For example clustering the university


student based on different attribute(s).

© Edunet Foundation. All rights reserved.


Attribute Oriented Induction

Quantitative and Qualitative Rules in


AOI

● Quantitative rule is a rule which is


associated with quantitative information
such as statistical information which
asses the representativeness of the rule
in the database.

● There are three types quantitative rule


i.e. quantitative characteristic rule,
quantitative discriminative rule and
quantitative characteristic and
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Quantitative and Qualitative Rules in


AOI (Continue)

● Quantitative characteristic rule is


quantitative information of a
characteristic rule and each rule in final
generalization can be measured with
t-weight in formula 1.

© Edunet Foundation. All rights reserved.


Attribute Oriented Induction

Quantitative and Qualitative Rules in


AOI (Continue)

● t-weight = percentage of each rule in the


final generalized relation.

● Votes(qa) = number of tuples in each


rule in the final generalized relation
Where Votes(qa) is in Votes{q1,...,qN}.

● N = number of rules in the final


generalized relation.
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Quantitative and Qualitative Rules in


AOI (Continue)

● Quantitative discriminative rule is a


discrimination rule that use quantitative
information. Each rule in the target class
will be discriminated against a rule in the
constrating class and is measured with
d-weight in formula 2.

© Edunet Foundation. All rights reserved.


Attribute Oriented Induction

Quantitative and Qualitative Rules in


AOI (Continue)

● d-weight = percentage ratio per rule in


the target class to the total number of
tuples in the target class and the
contrasting class for the same rule.

● Votes(qa) = number of tuples in each


rule in the target class Cj.

● Cj is in {C1,...,CK}.

● K = total number of the target and


© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Quantitative and Qualitative Rules in


AOI (Continue)

● Quantitative characteristic and


discriminative rule use quantitative
information characteristic rule and
discriminative rule which have both
t-weight and d-weight for the same
rules.

● Each rule is measured with t-weight in


formula 1 for characteristic rule and
d-weight in formula 2 for discriminative
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

Quantitative and Qualitative Rules in


AOI (Continue)

● Qualitative rule can be obtained by


using the same process of learning
applied in its quantitative counterpart
without the association of the
quantitative attribute in the generalized
relations.

© Edunet Foundation. All rights reserved.


Attribute Oriented Induction

Concept Hierarchies

● One advantage of AOI is that it has


concept hierarchy as the background
knowledge which can be provided by the
knowledge engineers or domain experts.

● Concept hierarchy stored a relation in


the database provides essential
background knowledge for data
generalization and multiple level data
https://ars.els-cdn.com/content/image/3-s2.0-B9780123814791000046-f04-09-97
mining. © Edunet Foundation. All rights reserved. 80123814791.jpg
Attribute Oriented Induction

Concept Hierarchies (Continue)

● Concept hierarchy represents a


taxonomy of concept of the attribute
domain values.

● Concept hierarchy can be specified


based on the relationship among
database attributes or by set groupings
and be stored in the form of relations in
the same database.
https://ars.els-cdn.com/content/image/3-s2.0-B9780123814791000034-f03-13-97
© Edunet Foundation. All rights reserved. 80123814791.jpg
Attribute Oriented Induction

Concept Hierarchies (Continue)

● Concept hierarchy can be adjusted


dynamically based on the distribution of
the set of data relevant to the data
mining tasks.

● The hierarchies for numerical attributes


can be constructed automatically based
on data distribution analysis.
A concept hierarchy tree for attribute workclass in adult dataset
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Prototype

● The AOI method was implemented in a


data mining system prototype called
DBMINER which previously called
DBLearn and been tested successfully
against large relational database.

● DBLearn is a prototype data mining


system which was developed in Simon
Fraser University.
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Prototype (Continue)

● DBMINER was developed by integrating


database, OLAP and data mining
technologies has following features:

1. Incorporating several data mining


techniques like attribute oriented
induction, statistical analysis,
progressive deepening for mining
multiple-level rules and meta-rule
guided knowledge mining data cube and
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Prototype (Continue)

2. Mining new kinds of rules from large


databases including multiple level
association rules, classification rules,
cluster description rules and prediction.

3. Automatic generation of numeric


hierarchies and refinement of concept
hierarchies.

4. High level SQL-like and graphical data


mining interfaces. © Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Prototype (Continue)

5. Client server architecture and


performance improvements for larger
application.

6. SQL-like data mining query language


DMQL and Graphical user interfaces
have been enhanced for interactive
knowledge mining.

7. Perform roll-up and drill-down at


multiple concept levels with multiple
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Algorithms

● AOI can be implemented with an


architecture design shown in figure,
where characteristic rule (LCHR) and
classification rule (LCLR) can be learned
directly from the transactional database
(OLTP) or Data warehouse (OLAP) with
the help of the concept hierarchy as the
knowledge generalization. Concept
hierarchy can be created from OLTP
AOI architecture
database as a direct resource. © Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Algorithms (Continue)

From a database we can identify two types


of learnings:

1. Positive learning as the target class


where the data are tuples in the
database which are consistent with the
learning concepts. Positive
learning/target class will be built when
learn characteristic rule
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Algorithms (Continue)

2. Negative learning as the contrasting


class in which the data do not belong to
the target class. negative
learning/contrasting class will be built
when learn discrimination or
classification rule.

© Edunet Foundation. All rights reserved.


Attribute Oriented Induction

AOI Characteristic Rule Algorithm

● This AOI characteristic rule algorithm is


the implementation of step one to seven
of the generalization strategy steps.

● The algorithm shows two sub processes


i.e. control number of distinct attributes
and control number of tuples.
AOI characteristic rule algorithm
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Advantages

● AOI provides additional flexibility over


many machine learning algorithms.

● AOI can learn knowledge rules in


different conjunctive and disjunctive
forms and provides more choices for the
experts and users.

© Edunet Foundation. All rights reserved.


Attribute Oriented Induction

AOI Advantages (Continue)

● AOI can use database facilities as the


traditional relational database such as
selection, join, projection whereas most
learning algorithms suffer from
inefficiency problems in a large
database environment.

● AOI can learn qualitative rules with


quantitative information while many
machine learning algorithm only can
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Advantages (Continue)

● AOI can handle noisy data and


exceptional cases elegantly by
incorporating statistical techniques in the
learning process whereas some learning
system can only work in a ‘noise free’
environment.

© Edunet Foundation. All rights reserved.


Attribute Oriented Induction

AOI Disadvantages

● AOI can only provides a snapshot of the


generalized knowledge and not a global
picture. Yet, the global picture can be
revealed by trying different thresholds
repeatedly.

● Adjusting different thresholds will result


in different sets of generalized tuples.
However, using different thresholds
repeatedly is a time consuming and
© Edunet Foundation. All rights reserved.
Attribute Oriented Induction

AOI Disadvantages (Continue)

● There will be a problem in selecting the


best generalized rules between the large
and small threshold. Where in a large
threshold value will lead to a relatively
complex rule with many disjuncts and
the results may not be fully generalized.
On the other hand a small threshold
value will lead to a simple rule with few
disjuncts and the results may over
generalized the rule with a risk of losing
© Edunet Foundation. All rights reserved.
Iterative Database Scanning

Introduction

● An iterative search starts just the same


as a non-iterative search, the query
sequence is compared to the database
and the score list, pairwise and multiple
alignment outputs are reported.

● The multiple alignment is then used to


create a query “profile” that contains
information about the types of amino
acid seen at each position in the https://www.researchgate.net/profile/Teresa_Attwood3/publication/11160012/figur
e/fig2/AS:277320108658689@1443129674824/Overview-of-the-iterative-process-
© Edunet Foundation. All rights reserved.
Iterative Database Scanning

Introduction (Continue)

● This profile is then searched against the


database, a score list, pairwise and
multiple alignments are output and the
process is then repeated.

● The iterations will stop either when the


number of iterations has been reached,
or if two successive iterations find
exactly the same sequences.

© Edunet Foundation. All rights reserved.


Iterative Database Scanning

Introduction (Continue)

● Iterative searching will normally be able


to find more remote similarities to the
query sequence than a single sequence
search.

© Edunet Foundation. All rights reserved.


Iterative Database Scanning

Applications

● Iterative K-Means Algorithm to create


clusters of related data, through iterative
database scan and minimization of
group cluster system error, namely; root
mean square errors.

© Edunet Foundation. All rights reserved.


Iterative Database Scanning

Applications (Continue)

● Matching the protein sequences through


iterative scan of protein database
scanning and finding the best match as
per protein generics.

https://www.researchgate.net/profile/Teresa_Attwood3/publication/11160012/figur
e/fig2/AS:277320108658689@1443129674824/Overview-of-the-iterative-process-
© Edunet Foundation. All rights reserved.
Iterative Database Scanning

Applications (Continue)

● Finding and predicting lungs cancer


through iterative scanning of database
of image samples of lungs scans.

© Edunet Foundation. All rights reserved.


Iterative Database Scanning

Advantages

● Iterative information retrieval

● Mining more useful information through


iterations

● Matching patterns

● Iterative querying allows web based


database application to integrate results
© Edunet Foundation. All rights reserved.
Iterative Database Scanning

Disadvantages

● Comparative Slow Process

● Resource Intensive

● Complex to design and implement

● Availability of more advanced


methodologies

© Edunet Foundation. All rights reserved.


Attribute Focusing

Introduction

● Attribute Focusing is a technique


designed for detecting interesting
attribute values, in the sense that the
values differ from an expected value.
Bhandari (1993), Bhandari and Biyani
(1994) proposed two methods for
detecting interesting attribute values.

© Edunet Foundation. All rights reserved.


Attribute Focusing

Introduction

● The first method consists of finding


interesting values of a given attribute by
comparing the observed frequency of
that value with its expected frequency
assuming a uniform probability
distribution.

● Since this is a one-dimensional method,


analyzing just one attribute at a time, it
involves no attribute interaction and so
© Edunet Foundation. All rights reserved.
Attribute Focusing

Introduction

● Since the goal of data mining is to


discover knowledge that is not only
accurate but also comprehensible for
human decision makers, the field of
cognitive psychology is clearly relevant
for data mining.

● In the classical view, categories are


defined by a small set of attributes.
© Edunet Foundation. All rights reserved.
Attribute Focusing

Introduction (Continue)

● By contrast, in the natural view of


concepts, highly correlated
(non-independent) attributes are the
rule, not the exception.

● To summarize, in the natural view of


concepts, which is currently much more
accepted in psychology than the
classical view, attribute interaction is the
Large degree of attribute interaction makes a concept harder to learn
rule, and not the exception. © Edunet Foundation. All rights reserved.
Attribute Focusing

Introduction (Continue)

● It is also increasingly likely that data


pertaining to their professional activity is
available in a database.

● Clearly, a machine-assisted method


which allows them to learn more about
their domain from such data should be a
powerful knowledge discovery technique
since it could help a lot of people
improve at their jobs rapidly. © Edunet Foundation. All rights reserved.
Attribute Focusing

The Importance of Attribute Focusing


in Data Mining

● Evidence for this natural view of


concepts is provided, in the context of
data mining, by projects that did found a
significant degree of attribute interaction
in real-world data sets.

● An example is the large number of small


disjuncts found by Provost & Danyluk
(1993) in telecommunications data.
© Edunet Foundation. All rights reserved.
Attribute Focusing

The Importance of Attribute Focusing


in Data Mining (Continue)

● Another example is the several


instances of Simpson’s paradox
discovered in real-world data sets by
Fabris & Freitas (1999)

● Yet another example is the existence of


strong attribute interactions in a typical
financial data set, as discussed by Dhar
et al. (2000)
© Edunet Foundation. All rights reserved.
Attribute Focusing

The Influence of Attribute Interaction


on Concept Hardness

● There are, of course, many factors that


make a concept (class description)
difficult to be learned, including
unbalanced class distributions, noise,
missing relevant attributes, etc.

● However, in some cases even if all


relevant information for class separation
is included in the data - i.e. all relevant
attributes are present, there is little
© Edunet Foundation. All rights reserved.
Attribute Focusing

Interestingness Function

● An interestingness function I2 is used to


detect an interesting pair of attribute
values, where each of the values belong
to a different attribute of a given pair of
attributes.

● The function I2 measures how much the


observed joint frequency of a pair of
attribute values deviates from the
expected frequency assuming that the
two attributes are statistically
© Edunet Foundation. All rights reserved.
Attribute Focusing

Interestingness Function (Continue)

● Hence, the essence of Attribute


Focusing (using the interestingness
function I2) is precisely to detect
attribute values whose interactions
produce unexpected observed joint
frequency.

https://www.researchgate.net/profile/Edgar_Reehuis/publication/259214544/figure
/fig1/AS:650799420043265@1532174081474/Novelty-vs-Interestingness-Interest
© Edunet Foundation. All rights reserved.
Attribute Focusing

Interestingness Function (Continue)

● Goil and Choudhary (1997) have


extended Attribute Focusing for
multidimensional databases (data
cubes). A contribution of this work was
to introduce a parallel algorithm to
compute the above-discussed
interestingness function I2.

● This research addressed the problem of


making Attribute Focusing more
computationally efficient, which is
© Edunet Foundation. All rights reserved.
Attribute Focusing

Interestingness Function (Continue)

● However, it did not adapt Attribute


Focusing to one of the major
characteristics of data cubes, namely
the fact that dimensions contain
hierarchical attributes.

● This characteristic of data cubes


introduces new opportunities and
requirements for adapting the
computation of the interestingness
function I2.
© Edunet Foundation. All rights reserved.
Attribute Focusing

Advantages

● Attribute Focusing has been


successfully deployed to discover
hitherto unknown knowledge in a
real-life, commercial setting.

● It actually helps people do their jobs


better. That kind of practical success has
not been demonstrated even for
advanced knowledge discovery
techniques. © Edunet Foundation. All rights reserved.
Attribute Focusing

Advantages (Continue)

● There are three possible areas where


Attribute Focusing may enjoy an
advantage over other methods: superior
mathematical algorithms, ability to
process more data, the use of the
analyst.

● Interactive systems will provide,


perhaps, the best opportunity for
discovery in tile near term. In such
systems, a knowledge analyst is
© Edunet Foundation. All rights reserved.
Attribute Focusing

Characteristics

● Attribute Focusing approach uses an


explicit model. Uses filtering functions
and model of interpretation.

● Attribute Focusing represent a means of


deriving immediate and significant
practical advantages by combining the
results of existing research on
knowledge discovery with models based
on human factors and cognitive science.
© Edunet Foundation. All rights reserved.
Attribute Focusing

Future Work

● Formation-theoretic, entropy-based
measures and statistical measures of
association/correlation may be used to
evolve new instances of interestingness
functions.

● Similarly, new instances of filtering


functions may be evolved by considering
human factors issues. https://i.stack.imgur.com/v8RVc.png
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

Definition

● The neural network is a technology


based on the structure of the neurons
inside a human brain.

Image
Source:https://miro.medium.com/max/700/1*BQ0pIVk56WHyqigI9adDLw.gif
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

Definition

● Neural network algorithm will try to


create a function to map your input to
your desired output.

Image Source:
https://miro.medium.com/max/1400/1*Ne7jPeR6Vrl1f9d7pLLG8Q.jpeg
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

Definition

● Artificial Neural Networks, cell nucleus


represents Nodes, synapse represents
Weights, and Axon represents Output.

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
Biological Neural Network
Vs artificial neural network

Biological Neural Artificial Neural Network


Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
The architecture of an artificial
neural network

● Input Layer:
As the name suggests, it accepts inputs in
several different formats provided by the
programmer.

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
The architecture of an artificial
neural network

● Hidden Layer:
The hidden layer presents in-between input
and output layers. It performs all the
calculations to find hidden features and
patterns.

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
The architecture of an artificial
neural network

● Output Layer:
The input goes through a series of
transformations using the hidden layer,
which finally results in output that is
conveyed using this layer.

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
The architecture of an artificial
neural network

● The artificial neural network takes input


and computes the weighted sum of the
inputs and includes a bias. This
computation is represented in the form
of a transfer function.

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
Advantages of artificial neural
network

● Parallel processing capability


● Storing data on the entire network
● Capability to work with incomplete
knowledge
● Having a memory distribution
● Having fault tolerance
Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
Disadvantages of artificial neural
network

● Assurance of proper network structure


● Unrecognized behavior of the network
● Hardware dependence
● Difficulty of showing the issue to the
network
● The duration of the network is unknown

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

How artificial neural network work

● Artificial Neural Network can be best


represented as a weighted directed
graph, where the artificial neurons form
the nodes.
● The association between the neurons
outputs and neuron inputs can be
viewed as the directed edges with
weights.
Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

How artificial neural network work

● The Artificial Neural Network receives


the input signal from the external source
in the form of a pattern and image in the
form of a vector.
● These inputs are then mathematically
assigned by the notations x(n) for every
n number of inputs.

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

How artificial neural network work

● Afterward, each of the input is multiplied


by its corresponding weights ( these
weights are the details utilized by the
artificial neural networks to solve a
specific problem ).

Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

Types of Artificial Neural Network

● Feedback ANN:
In this type of ANN, the output returns into
the network to accomplish the best-evolved
results internally.
● Feed-Forward ANN:
A feed-forward network is a basic neural
network comprising of an input layer, an
output layer, and at least one layer of a
neuron.
Image Source:
https://static.javatpoint.com/tutorial/artificial-neural-network/images/artificial-neural
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

Model Types

● Neural networks use information in the


form of data to generate knowledge in
the form of models.
● A model can be defined as a description
of a real-world system or process using
mathematical concepts.
● It is usually represented as a mapping
between input and output variables.

Image Source:
https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
neural network models belong to the
following types

● Approximation (or function


regression)
An approximation can be regarded as the
problem of fitting a function from data.

● Classification.
Classification can be stated as the process
whereby a received pattern, characterized
by a distinct set of features, is assigned to
one of a prescribed number of classes. Image Source:
https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
neural network models belong to the
following types

● Approximation (or function


regression)
An approximation can be regarded as the
problem of fitting a function from data.

● Classification.
Classification can be stated as the process
whereby a received pattern, characterized
by a distinct set of features, is assigned to
one of a prescribed number of classes. Image Source:
https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
Approximation (or function
regression) Examples

● Model the strength of high performance


concretes.
● Predict the noise generated by airfoil
blades.
● Predict the residuary resistance of
sailing yachts.
● Predict the vascular adhesion of
nanoparticles.
Image Source:
https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
Classification (or pattern recognition)
Examples

● Predict the electricity generated by


combined cycle power plants.
● Forecast the power generated by a solar
plant.
● Model wine preferences from
physicochemical properties.

Image Source:
https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
Approximation (or function
regression) Examples

● We can distinguish between two types of


classification models:
● Binary classification Examples
1. Diagnose breast cancer from
fine-needle aspirate images.
2. Detect malfunctions liquid ultrasonic
flowmeters.
3. Detect forged banknotes.
4. Reduce employee attrition.
5. Increase the conversion rate of
Image Source:
telemarketing campaigns in banks. https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
© Edunet Foundation. All rights reserved.
Introduction to neural
networks
Approximation (or function
regression) Examples

● Multiple classification examples


● Classify iris flowers from sepal and petal
dimensions
● Recognize human activity from
smartphone signals

Image Source:
https://www.neuraldesigner.com/images/activity-diagram-neural-network.svg
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

Classification neural networks

● A classification model usually requires a


scaling layer, one or several perceptron
layers, and a probabilistic layer. It might
also contain a principal component
layer.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

Data Set

● The data set contains information for


creating our model. It is a collection of
data structured as a table, in rows and
columns.
● We can identify the next concepts in a
dataset:
● Data source.
● Variables.
● Instances.
● Missing values.
Image Source:
● Data set tasks. https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Introduction to neural
networks

Data Set

● The data set contains information for


creating our model. It is a collection of
data structured as a table, in rows and
columns.
● We can identify the next concepts in a
dataset:
● Data source.
● Variables.
● Instances.
● Missing values.
Image Source:
● Data set tasks. https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Introduction

● a rough set, first described by Polish


computer scientist Zdzisław I. Pawlak, is
a formal approximation of a crisp set
(i.e., conventional set) in terms of a pair
of sets which give the lower and the
upper approximation of the original set.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Introduction

● In rough sets theory, the data is


collected in a table, called a decision
table.
● Rows of a decision table correspond to
objects, and columns correspond to
features.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Introduction

● RST can be defined using lower and


upper approximations
● Lower approximation and positive
region
● is the union of all equivalence classes in
which are contained by (i.e., are subsets
of) the target set.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Upper approximation and


negative region

● The upper approximation is the union of


all equivalence classes in which have
non-empty intersection with the target
set.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

The boundary region

● by set difference, consists of those


objects that can neither be ruled in nor
ruled out as members of the target set

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

The rough set

● The composed of the lower and upper


approximation is called a rough set.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Objective analysis

● Rough set theory is one of many


methods that can be employed to
analyze uncertain (including vague)
systems, although less common than
more traditional methods of probability,
statistics, entropy and Dempster–Shafer
theory.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Reduct and core

● (attribute-value table) which are more


important to the knowledge represented
in the equivalence class structure than
other attributes.
● Often, we wonder whether there is a
subset of attributes which can, by itself,
fully characterize the knowledge in the
database; such an attribute set is called
a reduct.
Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Core

● The set of attributes which is common to


all reducts is called the core: the core is
the set of attributes which is possessed
by every reduct, and therefore consists
of attributes which cannot be removed
from the information system without
causing collapse of the
equivalence-class structure.
Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

1. The decision rules not only capture


patterns hidden in the data as they can
also be used to classify new unseen
objects.
2. Rules represent dependencies in the
dataset, and represent extracted
knowledge which can be used when
classifying new objects not in the
original information system.
Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

3. When the reducts were found, the job of


creating definite rules for the value of
the decision feature of the information
system was practically done.
4. To transform a reduct into a rule, one
only has to bind the condition feature
values of the object class from which
the reduct originated to the
corresponding features of the reduct.
Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

5. Then, to complete the rule, a decision


part comprising the resulting part of the
rule is added.
6. This is done in the same way as for the
condition features.
7. To classify objects, which has never
been seen before, rules generated from
a training set will be used. These rules
represent the actual classifier. This
classifier is used to predict to which
classes new objects are attached. Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

8. The nearest matching rule is determined


as the one whose condition part differs
from the feature vector of re-image by
the minimum number of features.
9. When there is more than one matching
rule, we use a voting mechanism to
choose the decision value. Every
matched rule contributes votes to its
decision value, which are equal to the
times number of objects matched by the
rule. Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Decision rules

10. The votes are added and the decision


with the largest number of votes is
chosen as the correct class.
11. Quality measures associated with
decision rules can be used to eliminate
some of the decision rules.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Rough Sets Data Analysis


Techniques

● Preprocessing stage
● Includes tasks such as data cleaning,
completeness, correctness, attribute
creation, attribute selection and
discretization.
● Processing includes the generation of
preliminary knowledge, such as
computation of object reducts from data,
derivation of rules from reducts, and
classification processes Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Preprocessing stage

● In order to successfully analyze data


with rough sets, a decision table must
be created.
● This is done with data preparation.
● The data preparation task includes data
conversion, data cleansing, data
completion checks, conditional attribute
creation, decision attribute generation,
discretization of attributes, and data
splitting into analysis and validation
Image Source:
subsets. https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Data completion and discretization of


continuous-valued attributes

● Discretization which uses data


transformation procedure that involves
finding, cuts in the data sets which
divide the data into intervals.
● Values lying within an interval are then
mapped to the same value.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Data completion and discretization of


continuous-valued attributes

● Doing this process will lead to reduce


the size of the attributes value set and
ensures that the rules that are mined are
not too specific.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Data completion and discretization of


continuous-valued attributes

● Doing this process will lead to reduce


the size of the attributes value set and
ensures that the rules that are mined are
not too specific.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Data completion and discretization of


continuous-valued attributes

● Doing this process will lead to reduce


the size of the attributes value set and
ensures that the rules that are mined are
not too specific.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Processing stage

● processing stage includes generating


preliminary knowledge, such as
computation of object reducts from data,
derivation of rules from reducts, and
classification processes.
● These stages lead towards the final goal
of generating rules from information or
decision system

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Rule generation and classification

● The generated reducts are used to


generate decision rules.
● The decision rule, at its left side, is a
combination of values of attributes such
that the set of (almost) all objects
matching this combination have the
decision value given at the rule’s rough
side.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Rough Set theory

Rule generation and classification

● The rule derived from reducts can be


used to classify the data.
● The set of rules is referred to as a
classifier and can be used to classify
new and unseen data.

Image Source:
https://www.neuraldesigner.com/images/breast-cancer-neural-network.png
© Edunet Foundation. All rights reserved.
Data Visualization

Introduction

● Data Visualization is used to


communicate information clearly and
efficiently to users by the usage of
information graphics such as tables and
charts.
● It helps users in analyzing a large
amount of data in a simpler way. It
makes complex data more accessible,
understandable, and usable.
Image Source:
https://previews.customer.envatousercontent.com/h264-video-previews/81d7b3f3-
© Edunet Foundation. All rights reserved.
Data Visualization

What makes Data Visualization


Effective?

● Effective data visualization are created


by communication, data science, and
design collide.

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

Importance of Data Visualization

● Data visualization can identify areas that


need improvement or modifications.
● Data visualization can clarify which
factor influence customer behavior.
● Data visualization helps you to
understand which products to place
where.
● Data visualization can predict sales
volumes.
Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

Importance of Data Visualization

● Data visualization can identify areas that


need improvement or modifications.
● Data visualization can clarify which
factor influence customer behavior.
● Data visualization helps you to
understand which products to place
where.
● Data visualization can predict sales
volumes.
Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

Why Use Data Visualization?

● To make easier in understand and


remember.
● To discover unknown facts, outliers, and
trends.
● To visualize relationships and patterns
quickly.
● To ask a better question and make
better decisions.
● To competitive analyze.
● To improve insights. Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

Data Visualization Tools

● IBM Cognos
● Tableau
● Infogram
● Chartblocks
● Datawrapper
● Plotly
● Visual.ly and etc.

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

Data Visualization Steps/Process

1. Develop your research question


2. Get or create your data
3. Clean your data
4. Choose a chart type
5. Choose your tool
6. Prepare data
7. Create report graph

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

1. Develop your research question

1. It is important to have a clear


understanding of the goal of your
research.
2. This will determine what sort of data is
needed, the type of analysis necessary,
and the types of visualizations that
would be most effective to communicate
your explorations or findings.
Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

2. Get or create your data

● access to a large collection of numerical,


statistical and geospatial data. There is
also a great wealth of open data freely
available for download on the web.

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

2. Get or create your data

● advice and technical assistance with the


design, creation, and dissemination of
surveys using the Qualtrics web
survey platform to assist you in
collecting your own data.

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

3. Clean your data

● Removing unnecessary variables


● Deleting duplicate rows/observations
● Addressing outliers or invalid data
● Dealing with missing values
● Standardizing or categorizing values
● Correcting typographical errors

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

4. Choose a chart type

● Showing how variables compare to each


other?
● Showing relationships between
variables?
● Showing patterns in the data?
● Showing how the whole dataset can be
broken down into smaller parts?

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

5. Choose your tool

● Tableau
● Excel
● Google Sheet
● Python
● R
● Gephi

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

6. Prepare data

● Typical data preparation tasks include:


● Formatting columns appropriately
(numbers are treated as numbers, dates
as dates)
● Convert values into appropriate units
● Filter your data to focus on the specific
data that interests you.

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

6. Prepare data

● Group data and create aggregate values


for groups (Counts, Min, Max, Mean,
Median, Mode)
● Extract values from complex columns
● Combine variables to create new
columns

Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Data Visualization

7. Create report graph

1. Import data into the software


2. Select the chart type you wish to create
3. Evaluate the effectiveness of the chart.
4. Refine by applying design principles.
The way in which you design your chart
can have a big impact on the
effectiveness of the chart. Consider
these design principles.
Image Source:
https://static.javatpoint.com/tutorial/tableau/images/data-visualization.png
© Edunet Foundation. All rights reserved.
Odds Ratio

Introduction

● An odds ratio (OR) is a statistic that


quantifies the strength of the association
between two events, A and B.
● The odds ratio compares two
probabilities (or proportions) P1 and P2

Image Source:
http://hihg.med.miami.edu/code/http/modules/education/Design/images/Slide4050
© Edunet Foundation. All rights reserved.
Odds Ratio

Introduction

● The odds ratio is defined as the ratio of


the odds of A in the presence of B and
the odds of A in the absence of B,
● or equivalently (due to symmetry), the
ratio of the odds of B in the presence of
A and the odds of B in the absence of A.

Image
Source:http://hihg.med.miami.edu/code/http/modules/education/Design/images/Sli
© Edunet Foundation. All rights reserved.
Odds Ratio

Introduction

● Two events are independent if and only


if the OR equals 1, i.e., the odds of one
event are the same in either the
presence or absence of the other event.

Image
Source:http://hihg.med.miami.edu/code/http/modules/education/Design/images/Sli
© Edunet Foundation. All rights reserved.
Odds Ratio

Introduction

● If the OR is greater than 1, then A and B


are associated (correlated) in the sense
that, compared to the absence of B, the
presence of B raises the odds of A, and
symmetrically the presence of A raises
the odds of B.

Image
Source:http://hihg.med.miami.edu/code/http/modules/education/Design/images/Sli
© Edunet Foundation. All rights reserved.
Odds Ratio

Introduction

● Conversely, if the OR is less than 1, then


A and B are negatively correlated, and
the presence of one event reduces the
odds of the other event.
● Note that the odds ratio is symmetric in
the two events, and there is no causal
direction implied

Image
Source:http://hihg.med.miami.edu/code/http/modules/education/Design/images/Sli
© Edunet Foundation. All rights reserved.
Odds Ratio
Example - to find out the odds of
customer default with low income
versus high income.

● The function is defined by the following


formula:
● Where Px is the probability of default
with low income and (1-Px) is the
probability of non-default with low
income.

Image
Source:https://miro.medium.com/max/162/1*aGAfFHFLs9XrUozRa3ejWg.png
© Edunet Foundation. All rights reserved.
Odds Ratio
Example - to find out the odds of
customer default with low income
versus high income.

● While Py is the probability of default with


high income and (1-Py) of non-default
with high income.
● Thus, the above odds ratio will give the
odds of a customer defaulting with low
income over a customer defaulting with
high income.

Image Source:
ttps://miro.medium.com/max/162/1*aGAfFHFLs9XrUozRa3ejWg.png
© Edunet Foundation. All rights reserved.
Odds Ratio
Example - to find out the odds of
customer default with low income
versus high income.

● While Py is the probability of default with


high income and (1-Py) of non-default
with high income.
● Thus, the above odds ratio will give the
odds of a customer defaulting with low
income over a customer defaulting with
high income.

Image Source:
ttps://miro.medium.com/max/162/1*aGAfFHFLs9XrUozRa3ejWg.png
© Edunet Foundation. All rights reserved.

You might also like