Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Mathematics for Data Mining

CSE402A
Data Mining
B. Tech. CSE, 2017

Course Leader:
Mohan Kumar K N
mohan.cs.et@msruas.ac.in

1
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
MEAN, MEDIAN, MODE IN DATA MINING

What is mean?
Mean is the average of numbers.
Example:
3, 5, 6, 9, 8
Mean = all values/Total number of values
Mean = 3+5+6+9+8/5
Mean = 6.2

How to calculate the mean for data with frequencies?

Mean = 605 / 17
Mean = 35.58

2
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
MEAN, MEDIAN, MODE IN DATA MINING
What is Median?
Median is the middle value among all values.

How to calculate median for odd number of values?


Example:
9, 8, 5, 6, 3
Arrange values in order
3, 5, 6, 8, 9
Median = 6 What is Mode?
Mode is the most occurring value.
How to calculate median for even number of values?
Example: How to calculate mode?
9, 8, 5, 6, 3, 4 Example:
Arrange values in order
3, 4, 5, 6, 8, 9 3, 6, 6, 8, 9
Add 2 middle values and calculate their mean. Mode = 6 (because 6 is occurring 2 time and all other
Median = 5+6/2 values are occurs only one times)
Median = 5.5

3
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING

How to calculate the estimated mean and estimated median of grouped data?

Estimated Mean = 665 / 21


= 31.66
Class intervals:
Group 21 to 25, 26 to 30, 31 to 35 and 35 to 40 are
class intervals.

Mean is 31.6 so 31.6 rounds to 32.

Estimated mean = 32
Median group = 31 to 35
Estimated Median = ?

4
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING

How to calculate the estimated mean and estimated median of grouped data?

Estimated Median = L + (TV / 2) – SBM ⁄ FMG * GW

5
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING

How to calculate the estimated mean and estimated median of grouped data?

Estimated Median = L + (TV / 2) – SBM ⁄ FMG * GW

Result: Our median group is 31 to 35 and yes estimated median 33.4 is in median group. 6
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING

How to calculate the estimated mode of the above grouped data?

Mode: Mode is the most occuring Value in the data.

7
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING

How to calculate the estimated mode of the above grouped data?

Mode: Mode is the most occuring Value in the data.

8
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
WHAT ARE QUARTILES AND BOX PLOT IN DATA MINING

What is quartile?
Quartile means four equal groups.

How to find quartiles of odd length data set?


Example: Data = 8, 5, 2, 4, 8, 9, 5
Step 1:
First of all arrange the values in order
After ordering the values: Data = 2, 4, 5, 5, 8, 8, 9

Step 2:
For dividing this data into four equal parts,
we needed three quartiles.
Q1: Lower quartile
Q2: Median of the data set
Q3: Upper quartile
Step 3:
Find the median of the data set and label it as Q2.

9
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
WHAT ARE QUARTILES AND BOX PLOT IN DATA MINING

How to find quartiles of even length data set?


Example: Data = 8, 5, 2, 4, 8, 9, 5,7
Step 1:
First of all arrange the values in order

After ordering the values: Data = 2, 4, 5, 5, 7, 8, 8, 9


Step 2:
For dividing this data into four equal parts,
we needed three quartiles.
Q1: Lower quartile
Q2: Median of the data set
Q3: Upper quartile

Step 3:
Find the median of the data set and label it as Q2.

10
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
BOX PLOT IN DATA MINING
What is box plot?
•Box plot is a plotting of data in such a way that it is like a box shape and it represents the five number summary. Five
summary is minimum value, Quartile 1, Median, Quartile 3 and maximum value.
•End of the box is represented by inter-quartile range (IQR).
•IQR = Q3 – Q1
•The median is marked by a line within the box
•A rectangle is drawn to represent the second and third quartiles, usually with a vertical line inside to indicate the
median value. The lower and upper quartiles are shown as horizontal lines either side of the rectangle. Maximum and
minimum values are on the whiskers. Whiskers are the liens drawn on maximum and minimum value.
Draw the box plot for the odd length data set?

11
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
BOX PLOT IN DATA MINING

Draw the box plot for the even length data set?
Data = 8, 5, 2, 4, 8, 9, 5,7
First of all arrange the values in sequence.
2, 4, ♦ 5, 5, ♦ 7, 8 ♦ 8, 9
Minimum: 2
Q1: 4 + 5 / 2 = 4.5 – Lower quartile
Q2: 5+ 7 / 2 = 6 – Middle quartile
Q3: 8 + 8 / 2 = 8 – Upper quartile
Maximum: 9

12
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
HOW TO CALCULATE VARIANCE AND STANDARD DEVIATION OF DATA IN DATA MINING
What is data variance and standard deviation?
Different values in the data set can be spread here and there from the mean. Variance tells us that how far away are the
values from the mean.
Standard deviation is the square root of variance.
Low standard deviation tells us that less numbers
are far away from mean.
High standard deviation tells us that more numbers
are far away from mean.

How to calculate variance and standard deviation of data?


Marks= [8,10,15,20]

Mean = 13.25
Variance = 28.81
Standard deviation = 5.37

13
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
DATA SKEWNESS IN DATA MINING

What is data skewness?


When most of the values are skewed to the left or right side from the median, then the data is called skewed.
Data can be in any of the following shapes;
Symmetric: Mean, median and mode are at the same point.
Positively skewed: When most of the values are to the left from the median.
Negatively skewed: When most of the values are to the right from the median.

14
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences

You might also like