Professional Documents
Culture Documents
02 - Descriptive Statistics
02 - Descriptive Statistics
Also
called sample average or arithmetic mean.
Sensitive to extreme values, where one data point
could make a great change in sample mean.
Add up data, then divide by sample size (n).
The sample size n is the number of observations.
The formula is :
Characteristics of the mean
Uniqueness.
Simplicity: easy to calculate.
It is not affected by extreme values like the mean.
Mode
It is the observation(s) that occur
most frequently.
Less useful in describing statistics.
The observation that occurs most
frequently.
Can be used for continuous or
ordinal data, sometime used as
average for nominal data (modal
category).
It Can be only one mode
(Unimodal distribution) or two
(Bimodal distribution) or even
more, e.g:
Characteristics of mode
13
Variance (for a sample)
Variance = ∑ (Mean − x) 2
n–1
Steps:
Compute each deviation
Square each deviation
Sum all the squares
Divide by the data size (sample size) minus one: n-1
Step 1 Step 3 Step 4
x (x x) (x x)2
Step 2 x
x 25
5
6 1 1 n 5
3 -2 4
8 3 9
5 0 0
Step 5 s2
( x x ) 2
18
4.5
3 -2 4 n 1 4
25 0 18 s s 2 4.5 2.12
15
Standard deviation:
16
Variance
Measures the amount of spread or variability of
observations from their mean.
The sample variance (s2) is the average of the square of
the deviations about the sample mean. (population
variance = σ2).
Not used in descriptive statistics because difficulty in
interpreting a ‘square’ unit of data
Formula:
Formula
IQR = Q1 – Q3
The Coefficient of Variation
It
is a measure of relative variation rather than an absolute
variation.
It expresses the standard deviation as a percentage of the
mean.
The formula is x 100
24 04/22/2020
Examples of Frequency Table_1
(SPSS output)
Gender distribution in a sample of 111 patients
Cumulative
Frequency Percent Valid Percent Percent
Valid male 40 36.0 36.0 36.0
female 71 64.0 64.0 100.0
Total 111 100.0 100.0
stoneLocation
Cumulative
Frequency Percent Valid Percent Percent
Valid proximal 46 41.4 41.4 41.4
distal 62 55.9 55.9 97.3
both 3 2.7 2.7 100.0
Total 111 100.0 100.0
25 04/22/2020
Examples of Frequency Table_2
(SPSS output)
Continuous data (age) is
grouped and converted
into a ordinal data (age
group)
age group
Valid Cumulativ
Frequency Percent Percent e Percent
Valid 20below 4 3.6 3.6 3.6
21 - 30 6 5.4 5.4 9.0
31 - 40 18 16.2 16.2 25.2
41 - 50 30 27.0 27.0 52.3
51 - 60 24 21.6 21.6 73.9
61 - 70 17 15.3 15.3 89.2
71above 12 10.8 10.8 100.0
Total 111 26 100.0 100.0 04/22/2020
Bar graph or chart
Graphical presentation of frequency distribution of
categorical data (nominal or ordinal). Height
Figure 1: Gender distribution among 111 renal stone patients represent
80 frequency or
frequency
percent
70
Y axis:
Frequency or
60
relative freq
Bars of
Bars separated
equal
50 by equal gaps
width
40
30
male female
100%
Percent
90%
80%
Terengganu
70%
Daruliman
60%
50%
40% Kedah Darulaman
30%
20% Kelantan
10% Darulnaim
0%
1st Qtr 2nd Qtr 3rd Qtr 0 20 40 60 80 100
2.7%
proximal
Size of slice
represent 41.4%
frequency or
percent distal
55.9%
29 04/22/2020
Excellence graphs (Schmid, 1983)
Accuracy
data properly entered
not misleading, distortion or susceptible to misinterpretation
Clarity
the ideas and concepts conveyed are clearly understood
Simplicity
Straight forward, avoid gridlines or odd lettering
Appearance
Should be appealing to viewer
Well-designed structure
Pattern highlighted, letterings are horizontal
30 04/22/2020
Organizing & Displaying
Data of Numerical Variables
31 04/22/2020
Graphs for quantitative data
32 04/22/2020
Histogram
Age Distribution among 111 cases
20
10
0 N = 111.00
15.0 25.0 35.0 45.0 55.0 65.0 75.0
20.0 30.0 40.0 50.0 60.0 70.0 80.0
AGE
Interval class, no gap
33 in between 04/22/2020
Normal Distribution
35 04/22/2020
Measures of skewness or symmetry
We
can use Pearson’s skewness coefficient.
The formula:
36 04/22/2020
68-95-99.7
Rule for the Normal Distribution
Distribution of Blood
Pressure in Men;
Mean (SD) = 125 (14)
mm Hg
37 04/22/2020
Histogram and Distribution
38 04/22/2020
Polygon
• A frequency polygon is a graph that displays the data using
lines to connect points plotted for the frequencies.
• The frequencies represent the heights of the vertical bars in
the histogram. (superimposed).
The line
segments
pass
through
the mid
points at
the top of
the
rectangle
s.
The polygon
is tied down
at both 39 04/22/2020
ends.
‘Stem and leaf’ plot
Another tool for visually displaying continuous data
Very similar to a histogram
Allows for easier identification of individual values in the
sample
“leaves
”
40 04/22/2020
Box plot
A graphical display that use descriptive statistics based on
percentile.
Also called ‘5 number summary plot’
Provide information about central tendency and the
variability of the middle 50% of the distribution.
The ‘box’ represent the IQR: 25th to 75th percentile.
Outlier observation is 1.5 times the IQR away from the edges of the
box. (> 3.0 times is extreme outliers).
Smallest and largest values that make up the lines are the nearest
values outside the outliers.
Outlie 100 35
rs
Largest
80 value
which is
The 75th not outlier.
60
percent The 50th
ile The percent
age
The 25th 40
box ile
percent (media
ile n)
20
Smallest
The 103
value
whisker
34 which is
s 0 not outlier.
male female
42 sex 04/22/2020
Sources of the outliers
43 04/22/2020
Thank You
44 04/22/2020
Terimakasih