WEEK 3 - Central-Tendency-Variation-And-Shape

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

STATISTIKA LINGKUNGAN- TL2102

CENTRAL TENDENCY,
VARIATIONS, AND SHAPE
MINDRIANY SYAFILA
ENVIRONMENTAL ENGINEERING STUDY PROGRAM
FACULTY OF CIVIL AND ENVIRONMENTAL ENGINEERING
INSTITUT TEKNOLOGI BANDUNG
MEASURES OF CENTER

2
A single value CENTRAL TENDENCY
• that indicates the center of distribution
• as the representative for the set of data MODE
MEAN ◉ is the number which appears most
◉ add up all the numbers and often
MEDIAN
divide by the number of ◉ can be determined for data measured
numbers ◉ place all the numbers in on any scale of measurement: nominal,
order and select the ordinal, interval, or ratio
◉ used to summarize interval
middle number
or ratio data in situations ◉ Data set can have one, more than one,
when the distribution is ◉ used to summarize ordinal or no mode
symmetrical and unimodal or highly skewed interval
or ratio scores Bimodal two data values occur with the
same greatest frequency
Multimodalmore than two data values
occur with the same greatest frequency
No Mode no data value is repeated
3
ARITHMATIC MEAN (MEAN)
NOTATIONS TO FIND THE MEAN
Σ denotes the sum of a set of values. Compute the sum of all
x is the variable usually used to the data values, Σx
represent the individual data values.
n represents the number of values in
a sample. Divide the total by the
N represents the number of values in a number of the data values
population.

4
• the score at the 50th percentile, (in the middle); MEDIAN
• used to summarize ordinal or highly skewed interval or ratio scores interval or
ratio scores;
TO FIND THE • when data are normally distributed, the median is the same score as the mode.
MEDIAN
Order the data from  can be used in case
smallest to largest is necessary to arrange
of frequency Merits of Median the data in some order,
distribution with
ascending or descending
For an odd number of data values open-end classes.
o is a positional average its value
in the distribution:  is not affected by extreme
is not determined by all the
Median = Middle data value observations.
observations in the series.
 can be determined
o is not capable for further
For an even number of data graphically where as
algebraic calculations.
values in the distribution: the value of mean Demerits o The sampling stability of
Median = (Sum of middle cannot be of the median is less as
two values)/2  is easy to calculate Median compared to mean.
and understand.

5
MODE
Merits of Mode:
Iis easy to calculate and simple to understand.
is not affected by the extreme values.
UNIMODAL NO MODE can be determined graphically.
can be determined in case of open-end class
interval.

Demerits of
BIMODAL MULTIMODAL Mode
ois not suitable for further mathematical
treatments.
ocannot always be determined.
ois not based on each and every item of the
series.
ois not rigidly defined.
6
RANGE MIDRANGE
is the difference between the value midway between the
the smallest and largest maximum and minimum values in
numbers the original data set

maximum value + minimum value


Midrange =
2
EXAMPLE UNGROUPED DATA:
SET OF DATA
5, 6, 2, 4, 7, 8, 3, 5, 6, 6

5+6+2+4+7+8+3+5+6+6
=
MEAN From the list above 6 appears more
than any other number, so
10
MODE mode = 6
mean = 52/10 = 5.2

the difference between the


place all the numbers in order
2, 3, 4, 5, 5, 6, 6, 6, 7, 8
MEDIAN RANGE smallest and largest numbers,
As there are two middle numbers , 5 and 6, range = 8 – 2 = 6
median = (5+6)/2 = 5.5

8
CENTER VALUE OF GROUPED DATA
• L is the lower class boundary of the MEDIAN
group containing the median
• n is the total number of values
• B is the cumulative frequency of the
groups before the median group • L is the lower class boundary of
• G is the frequency of the median the modal group
group • fm-1 is the frequency of the group
• w is the group width before the modal group
• fm is the frequency of the modal
group
MEAN MODE • fm+1 is the frequency of the group
after the modal group
x = class midpoint • w is the group width

f = frequency
Σf = n

9
EXAMPLE: GROUPED DATA
Class Frequency Cumulative
boundaries Class (f) freq. Midpoint (x) (f).(x)
474.5 475 - 504 57 57 489.5 27901.5
median group 503.5 >504 - 533 64 121 518.5 33184 mode
532.5 >533 - 562 55 176 547.5 30112.5
561.5 >562 - 591 51 227 576.5 29401.5 L = 503.5
590.5 >591 - 620 43 270 605.5 26036.5
L = 532.5 >620 - 649 fm-1 = 57
619.5 18 288 634.5 11421
n = 310 648.5 >649 - 678 12 300 663.5 7962 fm = 64
B = 121 677.5 >678 - 707 6 306 692.5 4155 fm+1 = 55
706.5 >707 - 736 4 310 721.5 2886
G = 55 Σ 310 173060 w = 29
w = 29

Estimated MEDIAN MEAN Estimated MODE

= 173060/310 = 558.26
= 550.43
= 516.19

10
CENTRAL TENDENCY AND
THE SHAPE OF
DISTRIBUTION
11
Measurement Measures can
Best measure
scale be used

Nominal Mode Mode


Symmetric Mode
if the left half of its histogram is Ordinal
Median
Median
roughly a mirror of its right half.
Symmetrical
Mode
Skewed Interval Median
data: Mean
Skewed data:
Mean
if it is not symmetric and if it Median
extends more to one side than Symmetrical
the other. Mode
data: Mean
Ratio Median
Skewed data:
Mean
Median
12
SYMMETRIC - SKEWNESS

Mean Median Mode Mode = Median = Mean Mode Median Mean


13
Skewed to the Left Symmetric Skewed to the Right
(negatively) (positively)
MEASURES OF VARIATIONS

14
VARIABILITY
to obtain a usually Serves as a descriptive Population could be
measure of accompanies a and inferential small or large
how spread measure of statistics
out the central
scores are tendency as
Descriptive: Inferential: scores are
in a basic provides a
all of the
measures scores are widely sread,
distribution descriptive the degree measure of it is easy for
clustered
statistics for a to which the
how
close will one or two
set of scores accurately extreme
scores are necessarily
any individual scores to give
spread out provide a
score or a distorted
or clustered good
sample picture of the
representat
together in a represents
ion of the general
distribution the entire population
entire set
population
MEASURING VARIABILITY
is determined by measuring distance
THE RANGE
• is the total distance covered STANDARD DEVIATION
by the distribution, from the
measures the standard (average) distance between
highest score to the lowest
score
a score and the mean
• tells the number of
to obtain
measurement categories. Compute the Compute take the
• can be defined as the deviation Square the mean square the
(distance from each of the root of standard
difference between the the mean) for deviation squared the deviation
largest score and the each score variance
deviations
smallest score

16
THE SAMPLE VARIANCE AND SAMPLE
STANDARD DEVIATION (1)
VALUE DESCRIPTION

x Data value or outcome


Mean:

the average of the data values

the difference between what happened and what expected to happen

the sum of squares

17
THE SAMPLE VARIANCE AND SAMPLE
STANDARD DEVIATION (2)
VALUE DESCRIPTION

algebraic simplification of the sum of squares


Sum squares:

Sample variance: The defining formula for the variance is the


upper one. The computation formula for the
variance is the lower one.

18
THE SAMPLE VARIANCE AND SAMPLE
STANDARD DEVIATION (3)
VALUE DESCRIPTION

Sample standard deviation:

The defining formula for the standard deviation


is the upper one.
The computation formula for the standard
deviation is the lower one.

19
POPULATION PARAMETERS
STANDARD DEVIATION
FROM A FREQUENCY TABLE

s
the x values  class midpoints

20
MEASURES OF LOCATIONS

21
The quartiles divide the data set into four equal parts.
QUARTILES

Data is
Data is placed Measure divided Evaluate
in numerical of to four quartiles
order lowest relative equal
to highest position parts

01 02 03 04

Median of lower half  Q1 = P25 l1 = lower limit of ith quartile class


Median  Q2 = P50 l2= upper limit of ith quartile class
Median of above half Q3 = P75 c = cumulative frequency of the class
preceding the ith quartile class
f = frequency of ith quartile class
22
QUARTILES: EXAMPLE – UNGROUPED DATA
SET DATA 1: SET DATA 2:
590, 654, 493, 649, 594, 579, 567 590, 654, 493, 649, 594, 579, 567, 478
Data is
Arrange
Measuredata in ascending
divided form, Evaluate
and
Arrange data in ascending form, and n = 7 odd number n =of8 even number to four quartiles
493, 567, 579, 590, 594, 649, 654 relative
478, 493, 567, equal
579, 590, 594, 649, 654
position parts
q1 = (1/4) x n = (1/4) x 7 = 1.75  (2)  Q1 = 567
q1 = (1/4) x n = (1/4) x 8 = 2
q2 = (2/4) x n = (1/4) x 7 = 3.5  (4)  Q2 = 590 mean of (2) and (3)  Q104
q1 =02 = 485.5
q3 = (3/4) x n = (1/4) x 7 = 5.25  (6)  Q3 = 649 q2 = (2/4) x n = (1/4) x 8 = 4
q2 = mean of (4) and (5)  Q2 = 584.5
q3 = (3/4) x n = (1/4) x 8 = 6
q1 = mean of (6) and (7)  Q3 = 621.5
23
QUARTILES: EXAMPLE – GROUPED DATA
Frequency Cumulative Class q1 = (1/4) x n = (1/4) x 309 = 77.25  78
Class (f) freq. boundaries
475 - 504
57 57 474.5
Q1 >504 - 533 Measure Evaluate
64 121 503.5
>533 - 562 of quartiles
Q2 55 176 532.5
>562 - 591
51 227 561.5
relative
Q3 >591 - 620
43 270 590.5
position
>620 - 649 Q1 = 503.5 + [(77.25-57)/64] 29 = 512.68
18 288 619.5
>649 - 678
12 300 648.5
04
>678 - 707
6 306 677.5 q2 = (2/4) x n = (1/2) x 309 = 154.5  155
>707 - 736
3 309 706.5 Q2 = 532.5 + [(154.5-121)/55] 29 = 550.16

q3 = (3/4) x n = (3/4) x 309 = 231.75  232 Q3 = 590.5 + [(231.75-227)/43] 29 = 593.7

24
The deciles divide the data set into ten equal parts. DECILES
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

D1 D2 D3 D4 D5 D6 D7 D8 D9

l1 = lower limit of ith decile class


l2= upper limit of ith decile class
c = cumulative frequency of the class preceding the ith decile class
f = frequency of ith decile class

25
DECILES: EXAMPLE – UNGROUPED DATA
SET DATA 1: SET DATA 2:
590, 654, 493, 649, 594, 579, 567 590, 654, 493, 649, 594, 579, 567, 478
Arrange data in ascending form, and
Arrange data in ascending form, and
n = 8 even number
n = 7 odd number
478, 493, 567, 579, 590, 594, 649, 654
493, 567, 579, 590, 594, 649, 654
d1 = (1/10) x n = (1/10) x 8 = 0.8  (1)
d1 = (1/10) x n = (1/10) x 7 = 0.7  (1)
 D1 = 493  D1 = 478
d5 = (5/10) x n = (5/10) x 7 = 3.5  (4) d5 = (5/10) x n = (1/2) x 8 = 4
 D5 = 590 d5 = mean of (4) and (5)  D5 = 584.5
d8 = (8/10) x n = (8/10) x 7 = 5.6  (6)
 D8 = 649 d8 = (8/10) x n = (4/5) x 8 = 6.25  (7)
 D8 =649
26
DECILES: EXAMPLE – GROUPED DATA
Frequency Cumulative Class d1 = (1/10) x n = (1/10) x 309 = 30.9  31
Class (f) freq. boundaries
D1 475 - 504
57 57 474.5
>504 - 533 Measure Evaluate
64 121 503.5
>533 - 562 of quartiles
D5 55 176 532.5
>562 - 591
51 227 561.5
relative
D8 >591 - 620
43 270 590.5
position
>620 - 649 D1 = 474.5 + [(30.9-0)/57] 29 = 490.22
18 288 619.5
>649 - 678
12 300 648.5
04
>678 - 707
6 306 677.5 d5 = (5/10) x n = (1/2) x 309 = 154.5  155
>707 - 736
3 309 706.5 D5 = 532.5 + [(154.5-121)/55] 29 = 550.16

d8 = (8/10) x n = (4/5) x 309 = 247.2  248 D8 = 590.5 + [(247.2-227)/43] 29 = 604.12

27
PERCENTILES
Percentiles divide the data set into one hundred equal parts
number of scores less than x
Percentile of score x = • 100
total number of scores

Data is
Data is placed Measure divided to Evaluate
l1 = lower limit of ith percentile class of
in numerical 100 equal quartiles
l2= upper limit of ith percentile class order lowest relative parts  99
c = cumulative frequency of the class to highest position designations
preceding the ith percentile class
f = frequency of ith percentile class 01 02 03 04

28
PERCENTILES: EXAMPLE – UNGROUPED DATA
SET DATA 1: SET DATA 2:
590, 654, 493, 649, 594, 579, 567 590, 654, 493, 649, 594, 579, 567, 478
Arrange data in ascending form, and
Arrange data in ascending form, and
n = 8 even number
n = 7 odd number
478, 493, 567, 579, 590, 594, 649, 654
493, 567, 579, 590, 594, 649, 654
p8 = (8/100) x n = (8/100) x 8 = 0.64  (1)
p8 = (8/100) x n = (8/100) x 7 = 0.56  (1)
 P8 = 493  P8 = 478
p50 = (50/100) x n = (5/10) x 7 = 3.5  (4) p50 = (50/100) x n = (5/10) x 8 = 4  (4)
 P50 = 590 p50 = mean of (4) and (5)  P50 = 584.5
p85 = (85/100) x n = (85/100) x 7 = 5.95  (6)
 P85 = 649 p85 = (85/100) x n = (85/100) x 8 = 6.8  (7)
 P85 = 649
29
PERCENTILES: EXAMPLE – GROUPED DATA
Frequency Cumulative Class find the percentiles arrangement of 570 in the data
Class (f) freq. boundaries
475 - 504
57 57 474.5
>504 - 533 Measure Evaluate
64 121 503.5
>533 - 562
55 176 532.5
of quartiles
>562 - 591
51 227 561.5
relative
>591 - 620
43 270 590.5
position
>620 - 649
18 288 619.5
>649 - 678
12 300 648.5
P = 570 04
>678 - 707
a = 561.5
6 306 677.5 fp = 227 570 = 561.5 + [(X-176)/227] 29
>707 - 736 X = 242.53
3 309 706.5 n1 = 176
C = 29 p = (242.53/309) 100% = 78.4%

30
QUARTILES, DECILES AND PERCENTILES
MERITS DISMERITS
These positional values can be directly
determined in case of open end class intervals. These values are not easily understood by a
These positional values can be calculated common man.
easily in absence of some data. These values are not based on all the
observations of a series.
These are helpful in the calculation of
measures of skewness. These values cannot be computed if items are
These are not affected very much by the not given in ascending or descending order.
extreme items.
These values have less sampling stability.
These can be located graphically.

31
OTHER MEASURES

Interquartile Range (or IQR): Q3 - Q1

Semi-interquartile Range: Q3 - Q1
2

Midquartile: Q1 + Q3
2 10 - 90 Percentile Range: P90 - P10

32
the number of standard deviations that a given value z-SCORE
x is above or below the mean
Sample

Population

33
EXPLORATORY DATA
ANALYSIS

34
EXPLORING

Measures of center: mean, median, and mode

Measures of variation: Standard deviation and range

Measures of spread and relative location:


minimum values, maximum value, and quartiles Unusual values: outliers

Distribution: histograms, stem-leaf plots, and boxplots

35
BOXPLOTS
(Box-and-Whisker Diagram)
OUTLIERS Reveals the: 5 - number summary
 a value located very far away • center of the data • Minimum
from almost all of the other values • spread of the data • first quartile Q1
 an extreme value • distribution of the • Median (Q2)
 can have a dramatic effect on the data • third quartile Q3
mean, standard deviation, and on the • presence of • Maximum
scale of the histogram so that the outliers
true nature of the distribution is Excellent for comparing
totally obscured two or more data sets

36
BOXPLOTS (Box-and-Whisker Diagram)

37
http://www.cimt.org.uk/cmmss/S1/Text.pdf, accessed July 12th, 2020
https://ocw.metu.edu.tr/pluginfile.php/2410/mod_resource/content/0/lectures/3-central%20tendency-NC.pdf,
accessed July 12th, 2020
https://college.cengage.com/mathematics/brase/understandable_statistics/9780618949922_ch03.pdf, accessed
July 12th, 2020
https://compass.centralmethodist.edu/ICS/icsfs/2__Stat_Chaptrs_3_and_4.pdf?, accessed July 12th, 2020
Mario F. Triola, Elementary Statistics, Addison Wesley, Longman, 8th edition, 2001
Mario F. Triola, Elementary Statistics, Addison Wesley, Longman, 10th edition, 2007
Mario F. Triola, Elementary Statistics, Addison Wesley, Longman, 11th edition, 2010
Mario F. Triola, Elementary Statistics, Addison Wesley, Longman, 12th edition, 2014
https://statisticsbyjim.com/basics/histograms/, accessed July 12th, 2020
https://www.mathsisfun.com/data/frequency-grouped-mean-median-mode.html, accessed July 12th, 2020

THANK YOU
38
http://www.tihe.org/courses/it133/IT%20133%20Lectures/IT133%20-%20Lecture%2004.pdf, accessed
July 13th, 2020.
http://epgp.inflibnet.ac.in/epgpdata/uploads/epgp_content/S000023MA/P001409/M022424/ET/1504675
221Module6Q1.pdf, accessed July 13th, 2020.
https://www.coralgablescavaliers.org/ourpages/auto/2016/11/4/57591307/Percentiles%20Quartiles%20W
orksheets.pdf, accessed July 13th, 2020.
https://itfeature.com/tag/measure-of-position, accessed July 13th, 2020.
https://uomustansiriyah.edu.iq/media/lectures/5/5_2018_12_10!09_06_45_PM.pdf, accessed July 13th,
2020.
https://www.shsu.edu/~jga001/chapter%203.pdf, accessed July 13th, 2020.

THANK YOU
39

You might also like