LO3 and LO4

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 110

LO3.

Compile, interpret and utilize health data

BY: Misganaw T.(BSC)

04/10/2024 By: Misganaw T.(BSc Nursing) 1


Methods of Data Organization
• Ordered array: A simple arrangement of individual
observations in order of magnitude.
– Is the first step in organizing quantitative data
• Frequency distribution:
– A table which involves a listing of all observed values of the variable
being studied and how many times each value is observed.
– The actual summarization and organization of data starts from
frequency distribution.
a) Qualitative variable:
– Count the number of cases in each category.
b) Quantitative variable:
– Select a set of continuous, non-overlapping intervals such that each
value in the set of observations can be placed in one, and only one of
the intervals
04/10/2024 By: Misganaw T.(BSc Nursing) 2
Methods…..
(A). Qualitative variable: Count the number of
cases in each category.
Example: The ICU type of 25 patients entering
intensive care unit at a given hospital:
1. Medical
2. Surgical
3. Cardiac
4. Other
04/10/2024 By: Misganaw T.(BSc Nursing) 3
Methods…..
ICU Type Frequency Relative Frequency

Medical 12 0.48
Surgical 6 0.24
Cardiac 5 0.20
Other 2 0.08

Total 25 1.00

04/10/2024 By: Misganaw T.(BSc Nursing) 4


Methods…..

(B) Quantitative variable:


 Select a set of continuous, non-overlapping
intervals such that each value can be placed in
one, and only one, of the intervals.
 The first consideration is how many intervals to
include

04/10/2024 By: Misganaw T.(BSc Nursing) 5


Example:
Ungrouped Grouped

04/10/2024 By: Misganaw T.(BSc Nursing) 6


Methods…..
 To determine the number of class intervals and the
corresponding width, we use:

 Sturge’s rule:
K=1+3.322(logn)
W=L-S
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
04/10/2024 By: Misganaw T.(BSc Nursing) 7
Methods…..
Example
The birth weights (in Kilogram) of 30 children were
recorded as follow:
2.0, 2.1, 2.3, 3.0, 2.7, 2.8, 3.5, 3.1, 3.7, 4.0, 2.3, 3.5, 4.2,
3.7, 3.2, 2.7, 2.5, 2.7, 3.8, 3.1, 3.0, 2.6, 2.8, 2.9, 3.5, 4.1,
3.9, 2.8, 2.2, 3.1.
K = 1+3.322(log30) = 5.91
W = 4.2-2.0 = 0.37 ≈0.4
5.91

04/10/2024 By: Misganaw T.(BSc Nursing) 8


Methods…..
Frequency distribution
RF CF CFuvetive freq.
Birth weight Tally mark No. of children CF
Grouped data Tally Frequancy

2.0 - 2.3 ||||| 5 16.7 5

2.4 - 2.7 ||||| 5 16.7 10

2.8 - 3.1 ||||| |||| 9 30.0 19

3.2 - 3.5 |||| 4 13.3 23

3.6 - 3.9 |||| 4 13.3 27

4.0 - 4.3 ||| 3 10.0 30

Total 30 100.0

04/10/2024 By: Misganaw T.(BSc Nursing) 9


Methods…..
 Cumulative frequencies: When frequencies of
two or more classes are added.

 Cumulative relative frequency: The


percentage of the total number of observations
that have a value either in that interval or below
it.

 Mid-point: The value of the interval which lies


midway between the lower and the upper limits
of a class.
04/10/2024 By: Misganaw T.(BSc Nursing) 10
Methods…..
 True limits(class boundaries): Are those
limits that make an interval of a continuous
variable continuous in both directions

 Used for smoothening of the class intervals

 Subtract 0.5 units of measurement from the


lower and add it to the upper limit

04/10/2024 By: Misganaw T.(BSc Nursing) 11


Methods…..
Time True limit Mid-point Frequency
(Hours)

10-14 9.5 – 14.5 12 5


15-19 14.5 – 19.5 17 11
20-24 19.5 – 24.5 22 12
25-29 24.5 – 29.5 27 7
30-34 29.5 – 34.5 32 3
35-39 34.5 - 39.5 37 2

Total 40
04/10/2024 By: Misganaw T.(BSc Nursing) 12
Methods of data presentation
Commonly, here are two ways of presenting
statistical data:
1. Statistical tables
2. Graphs/Diagrams

04/10/2024 By: Misganaw T.(BSc Nursing) 13


Presentation of Results
 For data to be more easily appreciated and to draw
quick comparisons, it is often useful to arrange the
data in the form of a table, or in one of a number of
different graphical forms.

 When analysing voluminous data collected from


say, a health centres' records, it is quite useful to put
them into compact tables.

 Quite often, the presentation of data in a meaningful


way is done by preparing a frequency distribution. If
this is not done the raw data will not present any
meaning and any pattern in them (if any) may not be
detected.
04/10/2024 By: Misganaw T.(BSc Nursing) 14
Choice of Tables vs Graphs
Large
 Quantity of Data

Small

Precise
 Objectives
Clear and easy

 Graphs - where impact is more important than the fine detail of the values given.

 Graphs allow broad patterns within the data to be grasped much more quickly.

04/10/2024 By: Misganaw T.(BSc Nursing) 15


Statistical tables
• A statistical table is an orderly and systematic
presentation of numerical data in rows and columns.

• Rows and columns are horizontal and vertical


arrangements respectively.

• Tables are often the best way to show small datasets. It


can be very useful to see a table of the original ‘raw’
data from a study. No information at all is lost in
summarising.

04/10/2024 By: Misganaw T.(BSc Nursing) 16


parts of a table
a) Titles : It explains - What the data are about
- from where the data are collected
- time period of the data
- how the data are classified

b) Captions: The titles of the columns are given in captions.


In case there is a sub-division of any column there
would be sub-caption headings also.

c) Stubs: The titles of the rows are called stubs


d) Body: Contains the numerical data
e) Head note: A statement below the title which clarifies the content of
the table
f) Foot note: A statement below the table which clarifies some specific
items given in the table eg. it explains omissions, etc.
g) Source: Source of the data should be stated.

04/10/2024 By: Misganaw T.(BSc Nursing) 17


Types of tables
1. Simple or one way table: involves only a
single variable
2. Two-way table: involves two characteristics
and is formed when either caption or the stub
is divided into two parts
3. Higher order table: three or more
characteristics in a single table

04/10/2024 By: Misganaw T.(BSc Nursing) 18


Graphical presentation of data
The main purpose of statistical methods is to reduce the size of
statistical data and to render them easily intelligible. To attain this
objective the methods of classification, tabulation, averages and
percentages are generally used.

§ But the method of diagrammatic representation is probably


simpler and more easily understandable.

§ A graph is a method of showing quantitative data. When correctly


drawn it allows the reader to obtain rapidly an over-all grasp of
the material presented.

04/10/2024 By: Misganaw T.(BSc Nursing) 19


Diagrammatic……
Specific types of graphs include:
• Bar graph
Nominal, ordinal data
• Pie chart

• Histogram
• Stem-and-leaf plot
• Box plot Quantitative
data
• Scatter plot
• Line graph
• Others
04/10/2024 By: Misganaw T.(BSc Nursing) 20
1. Bar charts/ graphs
 Used to display a frequency distribution for
nominal and ordinal data
 Categories are listed on the horizontal axis (X-
axis)
 Frequencies or relative frequencies are
represented on the Y-axis (ordinate)
 The height of each bar is proportional to the
frequency or relative frequency of observations
in that category
04/10/2024 By: Misganaw T.(BSc Nursing) 21
Bar charts/ graphs cont…

There are different types of bar diagrams,


the most important ones are:
i. Simple bar chart: It is a one-dimensional diagram in
which the bar represents the whole of the
magnitude. The height or length of each bar indicates
the size (frequency) of the figure represented.

04/10/2024 By: Misganaw T.(BSc Nursing) 22


Example 1:
The most common causes of morbidity as reported by health
centers in Amhara region, 1997

70000
Number of sick persons who visited the health centers

61974

60000

50000 48750

42433 42389

40000

32185

30000 28625
25799

21455

20000
15095

11452

10000

0
Intestinal Malaria Skin Upper Pneumonia Gastritis Diarrhoea STI Eye diseases Tuberculosis
parasites diseases respiratory
tract
inf ections

Type of disease
04/10/2024 By: Misganaw T.(BSc Nursing) 23
Example 2:
Bar chart for the type of ICU for 25 patients

04/10/2024 By: Misganaw T.(BSc Nursing) 24


Bar charts/ graphs cont…
ii. Sub-divided bar chart
If there are different quantities forming the sub-
divisions of the totals, simple bars may be sub-
divided in the ratio of the various sub-divisions to
exhibit the relationship of the parts to the whole.
The order in which the components are shown in a
“bar” is followed in all bars used in the diagram.
This can be actual or percentage component

04/10/2024 By: Misganaw T.(BSc Nursing) 25


Example: Plasmodium species distribution for confirmed
malaria cases, Zeway, 2003

100
Mixed
P. vivax
80 P. falciparum

60
Percent

40

20

0
August October December

2003

04/10/2024 By: Misganaw T.(BSc Nursing) 26


Bar charts/ graphs cont…
iii. Multiple bar graph
 Bar charts can be used to represent the
relationships among more than two variables.
 The following figure shows the relationship
between children’s reports of breathlessness
and cigarette smoking by themselves and their
parents.

04/10/2024 By: Misganaw T.(BSc Nursing) 27


Multiple….
Prevalence of self reported breathlessness among school
childeren, 1998

35
Breathlessness, per cent

30
25
20
15
10
5
0
Neither One Both
Parents smooking

Child never smoked smoked occassionaly child smoked one/week or more

We can see from the graph quickly that the prevalence of the symptoms
increases both with the child’s smoking and with that of their parents.
04/10/2024 By: Misganaw T.(BSc Nursing) 28
2. Pie chart
 Shows the relative frequency for each
category by dividing a circle into sectors, the
angles of which are proportional to the
relative frequency.
 Used for qualitative and quantitative discrete
 Used for a single categorical variables and is
best reserved for popular presentation of data
with a few categories (groups).
 Use percentage distributions
 The “pie” diagram has a pictorial appeal,
04/10/2024 By: Misganaw T.(BSc Nursing) 29
Steps to construct a pie-chart
 Construct a frequency table

 Change the frequency into percentage (P)

 Change the percentages into degrees, where:


degree = Percentage X 360o

 Draw a circle and divide it accordingly


04/10/2024 By: Misganaw T.(BSc Nursing) 30
Example: Distribution of deaths for females, in England and
Wales, 1989.

Cause of death No. of death

Circulatory system 100 000


Neoplasm 70 000
Respiratory system 30 000
Injury and poisoning 6 000
Digestive system 10 000
Others 20 000
Total 236 000

04/10/2024 By: Misganaw T.(BSc Nursing) 31


Pie….
Distribution fo cause of death for fema le s, in Engla nd a nd W ales, 1989

Others
8%
Digestive System
4%
Injury and Poisoning
3%

Circulatory system
Respiratory system
42%
13%

Neoplasmas
30%

04/10/2024 By: Misganaw T.(BSc Nursing) 32


3. Histogram
 Histograms are frequency distributions with continuous
class intervals that have been turned into graphs.
 It is a bar graph with no space in between the bars used to
display continuous distribution
 A histogram consists of only one variable.
 To construct a histogram, we draw the interval boundaries
on a horizontal line and the frequencies on a vertical line.
 Non-overlapping intervals that cover all of the data values
must be used.
04/10/2024 By: Misganaw T.(BSc Nursing) 33
Histogram….
 Bars are drawn over the intervals in such a
way that the areas of the bars are all
proportional in the same way to their interval
frequencies.

04/10/2024 By: Misganaw T.(BSc Nursing) 34


Example: Distribution of the age of women at the time of marriage

Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49


group
Number 11 36 28 13 7 3 2
Age of women at the time of marriage

40

35

30
No of women

25

20

15

10

0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5

04/10/2024 AgeT.(BSc
By: Misganaw groupNursing) 35
Histogram for the ages of 2087 mothers with <5
children, Adam Tulu, 2003

700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6
0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH

04/10/2024 By: Misganaw T.(BSc Nursing) 36


6. Percentiles (Quartiles)
 Suppose that 50% of a cohort survived at least 4 years.
 This also means that 50% survived at most 4 years.
 We say 4 years is the median.
 The median is also called the 50th percentile
 We write: P50 = 4 years.
Definition : The Pth percentile of a sample of n
observations is that value of the variable with rank (P
/100)(1 + n). If this rank is not an integer, it is rounded to
the nearest half rank.
04/10/2024 By: Misganaw T.(BSc Nursing) 37
Similarly we could speak of other percentiles

 P0: The minimum


 P25: 25% of the sample values are less than or equal to
this value. 1st Quartile P25 means 25th percentile
 P50: 50% of the sample are less than or equal to this
value. 2nd Quartile
 P75: 75% of the sample values are less than or equal to
this value. 3rd Quartile
 P100: The maximum

04/10/2024 By: Misganaw T.(BSc Nursing) 38


7. Box and Whisker Plot
 It is a five figure summary of a distribution.
 It is another way to display information when
the objective is to illustrate certain locations
(skewness) in the distribution .

 Can be used to display a set of discrete or


continuous observations using a single vertical
axis – only certain summaries of the data are
shown
04/10/2024 By: Misganaw T.(BSc Nursing) 39
Box…..
• Graphic presentation using 5 measures
– The median (P50)
– The first quartile (P25)
– The third quartile (P75)
– The smallest
– The largest values
• A plot that shows the centre, spread
and skewness of data set

04/10/2024 By: Misganaw T.(BSc Nursing) 40


Box…..
 Percentile = p(n+1)th , p=the required percentile
 Arrange the numbers in ascending order
A. 1st quartile = 0.25 (n+1)th
B. 2nd quartile = 0.5 (n+1)th
C. 3rd quartile = 0.75 (n+1)th
D. 20th percentile = 0.2 (n+1)th
F. 15th percentile = 0.15 (n+1)th

04/10/2024 By: Misganaw T.(BSc Nursing) 41


Box…..
 The pth percentile is a value that is p% of the
observations and  the remaining (1-p)%.
 The pth percentile is:
– The observation corresponding to p(n+1)th if
p(n+1) is an integer
– The average of (k)th and (k+1)th observations if
p(n+1) is not an integer, where k is the largest
integer less than p(n+1).
• If p(n+1) = 3.6, the average of 3th and 4th observations

04/10/2024 By: Misganaw T.(BSc Nursing) 42


Box…..
 Given a sample of size n = 60, find the 10th
percentile of the data set.
p(n+1) = 0.10(60+1) = 6.1
= Average of 6th and 7th
10% of the observations are less than or equal to
this value and 90% of them are greater than or
equal to the value

04/10/2024 By: Misganaw T.(BSc Nursing) 43


9. Scatter plot
 Most studies in medicine involve measuring
more than one characteristic, and graphs
displaying the relationship between two
characteristics are common in literature.
 When both the variables are qualitative then
we can use a multiple bar graph.
 When one of the characteristics is qualitative
and the other is quantitative, the data can be
displayed in box and whisker plots

04/10/2024 By: Misganaw T.(BSc Nursing) 44


Scatter….
 For two quantitative variables we use bivariate plots (also
called scatter plots or scatter diagrams).
 In the study on percentage saturation of bile, information
was collected on the age of each patient to see whether a
relationship existed between the two measures
 A scatter diagram is constructed by drawing X-and Y-
axes
 Each point represented by a point or dot() represents a
pair of values measured for a single study subject
=POSTIVE RELATION

04/10/2024 By: Misganaw T.(BSc Nursing) 45


Scatter….
Age and percentage saturation of bile for women patients in
hospital Z, 1998
160

140

120
Saturation of bile

100

80

60

40

20

0
0 10 20 30 40 50 60 70 80
Age
04/10/2024 By: Misganaw T.(BSc Nursing) 46
10. Line graph
 Useful for assessing the trend of particular situation overtime.
 Helps for monitoring the trend of epidemics.
 The time, in weeks, months or years, is marked along the horizontal
axis
 Values of the quantity being studied is marked on the vertical axis.
 Values for each category are connected by continuous line.
 Sometimes two or more graphs are drawn on the same graph taking
the same scale so that the plotted graphs are comparable.
 The distance of each plotted point above the base-line indicates its
numerical value. The points are joined by a line.
04/10/2024 By: Misganaw T.(BSc Nursing) 47
No. of microscopically confirmed malaria cases by species and
month at Zeway malaria control unit, 2003

2100
No. of confirmed malaria cases

1800 Positive
1500 P. falciparum
P. vivax
1200

900

600

300

0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Months
04/10/2024 By: Misganaw T.(BSc Nursing) 48
Line…..
Line graph can be also used to depict the
relationship between two continuous variables like that
of scatter diagram
.

 The following graph shows level of zidovudine


(AZT) in the blood of AIDS patients at several
times after administration of the drug, for with
normal fat absorption and with fat mal
absorption.
04/10/2024 By: Misganaw T.(BSc Nursing) 49
Line…..
Response to administration of zidovudine in two groups of AIDS
patients in hospital X, 1999

8
7
6
Blood zidovudine
concentration

5
4
3
2
1
0
10
20
70
80
100
120
170
190
250
300
360

Time since administration (Min.)

Fat malabsorption Normal fat absorption


04/10/2024 By: Misganaw T.(BSc Nursing) 50
Numerical Summary Measures
• Single numbers which quantify the
characteristics of a distribution of values
 Measures of central tendency (location)
 Measures of dispersion

04/10/2024 By: Misganaw T.(BSc Nursing) 51


Numerical Summary Measures……
 A frequency distribution: a general picture of
the distribution of a variable

 But, can’t indicate the average value and the


spread of the values

04/10/2024 By: Misganaw T.(BSc Nursing) 52


Measures of Central Tendency (MCT)

 On the scale of values of a variable there is a


certain stage at which the largest number of items
tend to cluster.
 Since this stage is usually in the centre of
distribution, the tendency of the statistical data to
get concentrated at a certain value is called
“central tendency”
 The various methods of determining the point about
which the observations tend to concentrate are
called MCT
04/10/2024 By: Misganaw T.(BSc Nursing) 53
Measures......
 The objective of calculating MCT is to determine a
single figure which may be used to represent the
whole data set.

 In that sense it is an even more compact description


of the statistical data than the frequency
distribution.

 Since a MCT represents the entire data, it facilitates


comparison within one group or between groups of
data
04/10/2024 By: Misganaw T.(BSc Nursing) 54
Measures......
Position
20

15

10

0
0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99

04/10/2024 By: Misganaw T.(BSc Nursing) 55


Characteristics of a good MCT

A MCT is good or satisfactory if it possesses the


following characteristics:
1. It should be based on all the observations
2. It should not be affected by the extreme values
3. It should be as close to the maximum number of values
as possible
4. It should have a definite value(not function for mode)
5. It should not be subjected to complicated and tedious
calculations
6. It should be capable of further algebraic treatment
7. It should be stable with regard to sampling
04/10/2024 By: Misganaw T.(BSc Nursing) 56
Measures......
 The most common measures of central
tendency include:
 Arithmetic Mean
 Median
 Mode
 Others(Geometric mean,Harmonic
mean,weighted mean)

04/10/2024 By: Misganaw T.(BSc Nursing) 57


1. Arithmetic Mean
Ungrouped Data
 The arithmetic mean is the "average" of the
data set and by far the most widely used
measure of central location
 Is the sum of all the observations divided by
the total number of observations.

04/10/2024 By: Misganaw T.(BSc Nursing) 58


The Summation Notation

04/10/2024 By: Misganaw T.(BSc Nursing) 59


Arithmetic…..

04/10/2024 By: Misganaw T.(BSc Nursing) 60


Arithmetic…..
The heart rates for n=10 patients were as follows (beats per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150
What is the arithmetic mean for the heart rate of these patients?

04/10/2024 By: Misganaw T.(BSc Nursing) 61


Arithmetic…..
b)G ro
u pe dd a
ta
Inc alc
u latingthem e
anfromgroup
eddata
,weassumetha
tallvaluesfallingin
toa
particularc la
ssinte
rva
larelo
cate
datth
em id
-poin
tofth
einterv
al.Itisc alc
ula
teda
s
follow:
k


mf ii
x=i=1k

f
i=
1
i

w
he
re,
k=then umberofclassinterv a
ls
th
m i=t
h emi
d -
pointof t
h ei classinterv
al
fi=th
efrequenc
yo ftheithc lassinterval
04/10/2024 By: Misganaw T.(BSc Nursing) 62
Arithmetic…..
Example
Compute the mean age of 169 subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years

Class interval Mid-point (mi) Frequency (fi) mifi


10-19 14.5 4 58.0
20-29 24.5 66 1617.0
30-39 34.5 47 1621.5
40-49 44.5 36 1602.0
50-59 54.5 12 654.0
60-69 64.5 4 258.0

Total
04/10/2024 __ 169
By: Misganaw T.(BSc Nursing) 5810.5 63
The mean can be thought of as a “balancing point”,
“center of gravity”

04/10/2024 By: Misganaw T.(BSc Nursing) 64


Arithmetic…..
When the data are skewed, the mean is “dragged” in the direction of the
skewness

 It is possible in extreme cases for all but one of the sample points to be on
one side of the arithmetic mean & in this case, the mean is a poor measure
of central location or does not reflect the center of the sample.

04/10/2024 By: Misganaw T.(BSc Nursing) 65


Properties of the Arithmetic Mean
 The arithmetic mean is easily understood and
easy to compute
 For a given set of data there is one and only one
arithmetic mean (uniqueness).
 Easy to calculate and understand (simple).
 Influenced by each and every value in a data set
 Greatly affected by the extreme values.
 In case of grouped data if any class interval is
open, arithmetic mean can not be calculated.
04/10/2024 By: Misganaw T.(BSc Nursing) 66
2. Median
Ungrouped data
 The median is the value which divides the data set
into two equal parts.
 If the number of values is odd, the median will be the
middle value when all values are arranged in order of
magnitude.
 When the number of observations is even, there is no
single middle value but two middle observations.
 In this case the median is the mean of these two
middle observations, when all observations have been
arranged in the order of their magnitude.
04/10/2024 By: Misganaw T.(BSc Nursing) 67
Median…..

04/10/2024 By: Misganaw T.(BSc Nursing) 68


Median…..

04/10/2024 By: Misganaw T.(BSc Nursing) 69


Median…..

04/10/2024 By: Misganaw T.(BSc Nursing) 70


The median is a better description (than the mean) of the majority when the distribution is skewed

• Example
– Data: 14, 89, 93, 95, 96
– Skewness is reflected in the outlying low value of 14
– The sample mean is 77.4
– The median is 93

04/10/2024 By: Misganaw T.(BSc Nursing) 71


Properties of the median
 There is only one median for a given set of data
(uniqueness)
 The median is easy to calculate
 Median is a positional average and hence it is
insensitive to very large or very small values
 Median can be calculated even in the case of open end
intervals
 It is determined mainly by the middle points and less
sensitive to the remaining data points (weakness).
 It is not a good representative of data if the number of
items is small
04/10/2024 By: Misganaw T.(BSc Nursing) 72
Quartiles
 Just as the median is the value above and
below which lie half the set of data, one can
define measures (above or below) which lie
other fractional parts of the data.
 The median divides the data into two equal
parts
 If the data are divided into four equal parts,
we speak of quartiles

04/10/2024 By: Misganaw T.(BSc Nursing) 73


Quartiles....
• The first quartile (Q1): 25% of all the ranked
observations are less than Q1.
• The second quartile (Q2): 50% of all the
ranked observations are less than Q2. The
second quartile is the median.
• The third quartile (Q3): 75% of all the ranked
observations are less than Q3.

04/10/2024 By: Misganaw T.(BSc Nursing) 74


Percentiles
 Simply divide the data into 100 pieces.
 Percentiles are less sensitive to outliers and
not greatly affected by the sample size (n).

04/10/2024 By: Misganaw T.(BSc Nursing) 75


3. Mode
 The mode is the most frequently occurring
value among all the observations in a set of
data.
 It is not influenced by extreme values.
 It is possible to have more than one mode or
no mode.
 It is not a good summary of the majority of
the data.

04/10/2024 By: Misganaw T.(BSc Nursing) 76


Mode……
Mode

20
18
16
14
12
10
8
6
4
2
0
04/10/2024 By: Misganaw T.(BSc Nursing) 77
Ungrouped data
 It is a value which occurs most frequently in a
set of values.
 If all the values are different there is no mode,
on the other hand, a set of values may have
more than one mode.

04/10/2024 By: Misganaw T.(BSc Nursing) 78


Mode……
Example
1, 2, 3, 4, 4, 4, 4, 5, 5, 6
Mode is 4 “Unimodal”
Example
1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
There are two modes – 2 & 5
This distribution is said to be “bi-modal”
Example
2.62, 2.75, 2.76, 2.86, 3.05, 3.12
No mode, since all the values are different
04/10/2024 By: Misganaw T.(BSc Nursing) 79
Properties of mode
 It is not affected by extreme values
 It can be calculated for distributions with
open end classes
 Often its value is not unique
 The main drawback of mode is that often it
does not exist

04/10/2024 By: Misganaw T.(BSc Nursing) 80


Symmetric and unimodal distribution

Symmetric and unimodal distribution — Mean,


median, and mode should all be
approximately the same
Mean, Median & Mode

04/10/2024 By: Misganaw T.(BSc Nursing) 81


Which......
Bimodal — Mean and median should be about
the same, but may take a value that is unlikely
to occur; two modes might be best

04/10/2024 By: Misganaw T.(BSc Nursing) 82


Which.....
Skewed to the right (positively skewed) —Mean
is sensitive to extreme values, so median
might be more appropriate

Mode

Median

Mean

04/10/2024 By: Misganaw T.(BSc Nursing) 83


Which......
Skewed to the left (negatively skewed) — The
same to previous
Mode

Median

Mean

04/10/2024 By: Misganaw T.(BSc Nursing) 84


Measures of Dispersion
Consider the following two sets of data:
A: 177 193 195 209 226 Mean = 200
B: 192 197 200 202 209 Mean = 200

Two or more sets may have the same mean and/or median but they may be
quite different.

04/10/2024 By: Misganaw T.(BSc Nursing) 85


These two distributions have the same mean,
median, and mode

04/10/2024 By: Misganaw T.(BSc Nursing) 86


Dispersion….
 MCT are not enough to give a clear
understanding about the distribution of the
data.
 We need to know something about the
variability or spread of the values — whether
they tend to be clustered close together, or
spread out over a broad range

04/10/2024 By: Misganaw T.(BSc Nursing) 87


Measures of Dispersion
 Measures that quantify the variation or
dispersion of a set of data from its central
location
 Dispersion refers to the variety exhibited by
the values of the data.
 The amount may be small when the values
are close together.
 If all the values are the same, no dispersion

04/10/2024 By: Misganaw T.(BSc Nursing) 88


Dispersion……..

Other synonymous term:


 “Measure of Variation”
“Measure of Spread”
“Measures of Scatter”

04/10/2024 By: Misganaw T.(BSc Nursing) 89


Dispersion…..
Measures of dispersion include:
 Range
 Inter-quartile range
 Variance
Standard deviation
Coefficient of variation
 Standard error

04/10/2024 By: Misganaw T.(BSc Nursing) 90


1. Range (R)
 The difference between the largest and
smallest observations in a sample.
 Range = Maximum value – Minimum value
Example
Data values: 5, 9, 12, 16, 23, 34, 37, 42
Range = 42-5 = 37

04/10/2024 By: Misganaw T.(BSc Nursing) 91


Properties of range
 It is the simplest crude measure and can be
easily understood
 It takes into account only two values which
causes it to be a poor measure of dispersion
 Very sensitive to extreme observations
 The larger the sample size, the larger the
range

04/10/2024 By: Misganaw T.(BSc Nursing) 92


2. Interquartile range (IQR)
 Indicates the spread of the middle 50% of the
observations, and used with median
IQR = Q3 - Q1
Example: Suppose the first and third quartile for
weights of girls 12 months of age are 8.8 Kg
and 10.2 Kg, respectively.
IQR = 10.2 Kg – 8.8 Kg
i.e., 50% of the infant girls weigh between 8.8
and 10.2 Kg.

04/10/2024 By: Misganaw T.(BSc Nursing) 93


Properties of IQR:

 It is a simple and versatile measure


 It encloses the central 50% of the observations
 It is not based on all observations but only on
two specific values
 It is important in selecting cut-off points in the
formulation of clinical standards
 Since it excludes the lowest and highest 25%
values, it is not affected by extreme values
 Less sensitive to the size of the sample
04/10/2024 By: Misganaw T.(BSc Nursing) 94
3. Variance ( , s ) 2 2

 The main objection of mean deviation, that


the negative signs are ignored, is removed by
taking the square of the deviations from the
mean.
 The variance is the average of the squares of
the deviations taken from the mean

04/10/2024 By: Misganaw T.(BSc Nursing) 95


Variance.....
 It is squared because the sum of the
deviations of the individual observations of a
sample about the sample mean is always 0
0= ( )
å xi- x
 The variance can be thought of as an average
of squared deviations

04/10/2024 By: Misganaw T.(BSc Nursing) 96


Variance.....
 Variance is used to measure the dispersion of
values relative to the mean.
 When values are close to their mean (narrow
range) the dispersion is less than when there
is scattering over a wide range.
 Population variance = σ2
 Sample variance = S2

04/10/2024 By: Misganaw T.(BSc Nursing) 97


Ungrouped data
Let X1, X2, ..., XN be the measurement on N population units, then:

 i
(X   ) 2

2  i 1
where
N
N

X i
= i=1
is the population mean.
N

04/10/2024 By: Misganaw T.(BSc Nursing) 98


A sample variance is calculated for a sample of individual values (X 1, X2, … Xn) and uses the
sample
mean (e.g. ) rather than the population mean µ.

04/10/2024 By: Misganaw T.(BSc Nursing) 99


Degrees of freedom
 In computing the variance there are (n-1)
degrees of freedom because only (n-1) of the
deviations are independent from each other
 The last one can always be calculated from the
others automatically.
 This is because all of the values of (Xi-Mean)
must add to zero.

04/10/2024 By: Misganaw T.(BSc Nursing) 100


Properties of Variance
The main disadvantage of variance is that its unit
is the square of the unit of the original
measurement values
 The variance gives more weight to the extreme
values as compared to those which are near to
mean value, because the difference is squared
in variance.
• The drawbacks of variance are overcome by
the standard deviation.
04/10/2024 By: Misganaw T.(BSc Nursing) 101
4. Standard deviation (, s)
It is the square root of the variance.
This produces a measure having the same
scale as that of the individual values.

   and S = S 2 2

04/10/2024 By: Misganaw T.(BSc Nursing) 102


Standard deviation.....
 Following are the survival times of n=11
patients after heart transplant surgery.
 The survival time for the “ith” patient is
represented as Xi for i= 1, …, 11.
 Calculate the sample variance and SD

04/10/2024 By: Misganaw T.(BSc Nursing) 103


Standard deviation......

04/10/2024 By: Misganaw T.(BSc Nursing) 104


Example. Compute the variance and SD of the age of 169 subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years
S2 = 20199.22/169-1 = 120.23
SD = √S2 = √120.23 = 10.96

Class
interval (mi) (fi) (mi-Mean) (mi- (mi-
Mean)2 Mean)2 fi

10-19 14.5 4 -19.98 399.20 1596.80


20-29 24.5 66 -9-98 99.60 6573.60
30-39 34.5 47 0.02 0.0004 0.0188
40-49 44.5 36 10.02 100.40 3614.40
50-59 54.5 12 20.02 400.80 4809.60
60-69 64.5 4 30.02 901.20 3604.80

Total 169 1901.20 20199.22


04/10/2024 By: Misganaw T.(BSc Nursing) 105
Properties of SD
 The SD has the advantage of being expressed
in the same units of measurement as the mean
 SD is considered to be the best measure of
dispersion and is used widely because of the
properties of the theoretical normal curve.
 However, if the units of measurements of
variables of two data sets is not the same, then
their variability can’t be compared by
comparing the values of SD.
04/10/2024 By: Misganaw T.(BSc Nursing) 106
SD Vs Standard Error (SE)
 SD describes the variability among individual values
in a given data set

 SE is used to describe the variability among separate


sample means obtained from one sample to another

 We interpret SE of the mean to mean that another


similarly conducted study may give a mean that may
lie between  SE.

04/10/2024 By: Misganaw T.(BSc Nursing) 107


Standard Error
 SD is about the variability of individuals

 SE is used to describe the variability in the means


of repeated samples taken from the same
population.

 For example, imagine 5,000 samples, each of the


same size n=11. This would produce 5,000 sample
means. This new collection has its own pattern of
variability. We describe this new pattern of
variability using the SE, not the SD.
04/10/2024 By: Misganaw T.(BSc Nursing) 108
5. Coefficient of variation (CV)
 When two data sets have different units of
measurements, or their means differ
sufficiently in size, the CV should be used as a
measure of dispersion.
 It is the best measure to compare the
variability of two series of sets of observations.
 Data with less coefficient of variation is
considered more consistent.

04/10/2024 By: Misganaw T.(BSc Nursing) 109


Coefficient.....
CV is the ratio of the SD to the mean
multiplied by 100.
S
CV   100
x
SD Mean CV (%)
SBP 15mm 130mm 11.5
Cholesterol 40mg/dl 200md/dl 20.0

“Cholesterol is more variable than systolic blood pressure”


04/10/2024 By: Misganaw T.(BSc Nursing) 110

You might also like