Professional Documents
Culture Documents
Mathematics For Data Mining
Mathematics For Data Mining
CSE402A
Data Mining
B. Tech. CSE, 2017
Course Leader:
Mohan Kumar K N
mohan.cs.et@msruas.ac.in
1
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
MEAN, MEDIAN, MODE IN DATA MINING
What is mean?
Mean is the average of numbers.
Example:
3, 5, 6, 9, 8
Mean = all values/Total number of values
Mean = 3+5+6+9+8/5
Mean = 6.2
Mean = 605 / 17
Mean = 35.58
2
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
MEAN, MEDIAN, MODE IN DATA MINING
What is Median?
Median is the middle value among all values.
3
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING
How to calculate the estimated mean and estimated median of grouped data?
Estimated mean = 32
Median group = 31 to 35
Estimated Median = ?
4
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING
How to calculate the estimated mean and estimated median of grouped data?
5
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING
How to calculate the estimated mean and estimated median of grouped data?
Result: Our median group is 31 to 35 and yes estimated median 33.4 is in median group. 6
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING
7
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
FINDING THE ESTIMATED MEAN, MEDIAN AND MODE FOR GROUPED DATA IN DATA MINING
8
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
WHAT ARE QUARTILES AND BOX PLOT IN DATA MINING
What is quartile?
Quartile means four equal groups.
Step 2:
For dividing this data into four equal parts,
we needed three quartiles.
Q1: Lower quartile
Q2: Median of the data set
Q3: Upper quartile
Step 3:
Find the median of the data set and label it as Q2.
9
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
WHAT ARE QUARTILES AND BOX PLOT IN DATA MINING
Step 3:
Find the median of the data set and label it as Q2.
10
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
BOX PLOT IN DATA MINING
What is box plot?
•Box plot is a plotting of data in such a way that it is like a box shape and it represents the five number summary. Five
summary is minimum value, Quartile 1, Median, Quartile 3 and maximum value.
•End of the box is represented by inter-quartile range (IQR).
•IQR = Q3 – Q1
•The median is marked by a line within the box
•A rectangle is drawn to represent the second and third quartiles, usually with a vertical line inside to indicate the
median value. The lower and upper quartiles are shown as horizontal lines either side of the rectangle. Maximum and
minimum values are on the whiskers. Whiskers are the liens drawn on maximum and minimum value.
Draw the box plot for the odd length data set?
11
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
BOX PLOT IN DATA MINING
Draw the box plot for the even length data set?
Data = 8, 5, 2, 4, 8, 9, 5,7
First of all arrange the values in sequence.
2, 4, ♦ 5, 5, ♦ 7, 8 ♦ 8, 9
Minimum: 2
Q1: 4 + 5 / 2 = 4.5 – Lower quartile
Q2: 5+ 7 / 2 = 6 – Middle quartile
Q3: 8 + 8 / 2 = 8 – Upper quartile
Maximum: 9
12
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
HOW TO CALCULATE VARIANCE AND STANDARD DEVIATION OF DATA IN DATA MINING
What is data variance and standard deviation?
Different values in the data set can be spread here and there from the mean. Variance tells us that how far away are the
values from the mean.
Standard deviation is the square root of variance.
Low standard deviation tells us that less numbers
are far away from mean.
High standard deviation tells us that more numbers
are far away from mean.
Mean = 13.25
Variance = 28.81
Standard deviation = 5.37
13
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
DATA SKEWNESS IN DATA MINING
14
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences