Professional Documents
Culture Documents
02 Measures of Central Tendency, Visual Presentation, Vital Statistics
02 Measures of Central Tendency, Visual Presentation, Vital Statistics
Data Presentation
Vital Statistics
Emilio Q. Villanueva III, MD, MSPH (Biostat)(c), DPSP
Assistant Professor 3, School of Medical Technology, PWU-Manila
Measures of Central Tendency,
Dispersion, and Location
Summary Measures
• Methods of compressing a mass of data for better comprehension
and description of what it tends to portray
• Categorized into:
• Measures of central tendency
• Measures of dispersion
• Measures of location
Measures of Central Tendency
• Refers to the “center” of a distribution of observations
• Refers to “typical” values which may be utilized to represent a series
of observations
• Most common measures of central tendency include:
• Mean
• Median
• Mode
(Arithmetic) Mean
• Also known as the “average”
σ𝑛𝑖=1 𝑥𝑖 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
𝑥ҧ = =
𝑛 𝑛
(Arithmetic) Mean
• Characteristics:
• Involves all observations in its composition
• Any change in the observation, even in just one value, will change the mean
• Sensitive to extreme observations
• Its unit is the same as that of the original set of observations from which it
was derived
(Arithmetic) Mean
• Characteristics:
• It can be calculated for any quantitative variable
• The sum of the deviations of the observations from the mean is equal to zero
– point of balance or center of gravity of the distribution
• Serves as the basis for the computation of higher statistical methods
• Easy to calculate manually if the observations are few
• SPSS can be used to compute the mean of a large set of observations
(Arithmetic) Mean
• Weight of a set of MSMT Students
Student Weight (Kg)
1 40 σ𝑛𝑖=1 𝑥𝑖 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
𝑥ҧ = =
𝑛 𝑛
2 50
3 48 40 + 50 + 48 + 48 + 67 + 70 + 100 + 58 + 55 + 54 + 60
𝑥ҧ =
4 48 11
5 67 𝑥ҧ = 59.09 𝐾𝑔
6 70
7 100
8 58 INTERPRETATION: The average weight of the
9 55 MSMT students is 59.09 Kg.
10 54
11 60
Median
• The middlemost value in a set of observations put in an array
𝑛+1
• Position of the median in an array can be computed as
2
Median
• Characteristics:
• Like the mean, it always exists and is unique
• Not influenced by outliers
• Does not make use of all the observations in its computation
• It can be calculated for quantitative variable and variable in the ordinal scale
• Easily determined if number of observations are few
• SPSS can be used to compute the mean of a large set of observations
Student Weight (Kg) i
1 40 1
Median 3 48 2
4 48 3
4
• Weight of a set of MSMT Students 2 50
10 54 5
Student Weight (Kg)
9 55 6
1 40
8 58 7
2 50
11 60 8
3 48
5 67 9
4 48
6 70 10
5 67
7 100 11
6 70
7 100 𝑛 + 1 11 + 1
Position of the Median = = = 6𝑡ℎ
8 58 2 2
9 55
10 54 INTERPRETATION: 50% of the MSMT students weigh 55 Kg
11 60 or less, while the other half weighs more than 55 Kg.
Mode
• The most frequently occurring value in a set of observations
• It is possible to have:
• No mode
• One mode – Unimodal
• Two modes – Bimodal
• More than 2 modes – Multimodal
• No calculation needed
• May be determined for any type of variable/ level of measurement
Mode
• Easily determined if number of observations are few
• SPSS can be used to compute the mean of a large set of observations
Student Weight (Kg) i
1 40 1
Mode 3 48 2
4 48 3
4
• Weight of a set of MSMT Students 2 50
10 54 5
Student Weight (Kg)
9 55 6
1 40
8 58 7
2 50
11 60 8
3 48
5 67 9
4 48
6 70 10
5 67
7 100 11
6 70
7 100 𝑀𝑜𝑑𝑒 = 48 𝐾𝑔
8 58
9 55
10 54 INTERPRETATION: The most usual weight of the
11 60 MSMT students is 48 Kg.
Location of the Measures of Central Tendency
• Symmetrical
𝑥ҧ = 𝑥 = 𝑥ො
Location of the Measures of Central Tendency
• Skewed to the right
𝑥ҧ > 𝑥 > 𝑥ො
Location of the Measures of Central Tendency
• Skewed to the left
𝑥ҧ < 𝑥 < 𝑥ො
Guidelines in Choosing the Measure of
Central Tendency to Use
• Nature of Distribution
• Normal distribution – any of the three
• If skewed distribution or with outliers – mean not desirable; may use median
instead
• Summary measure desired
• Depends on the objectives of the study
Measures of Dispersion
• Gives information as to the tendency of values to clump together
• Tools describing the variability of the observations
• Homogenous – with little difference between adjacent observations
• Heterogenous – observations are scattered around the mean
Measures of Dispersion
• May be used for quantitative variables only
• Most common measures of dispersion include
• Range
• Variance
• Standard deviation
• Coefficient of variation
Range
• The simplest measure of variability
• It does not tell anything about the observations between these two
extreme observations
• May be used only for quantitative variables
• SPSS can be used to find the lowest and highest observation
Student Weight (Kg) i
1 40 1
Range 3 48 2
4 48 3
4
• Weight of a set of MSMT Students 2 50
10 54 5
Student Weight (Kg)
9 55 6
1 40
8 58 7
2 50
11 60 8
3 48
5 67 9
4 48
6 70 10
5 67
7 100 11
6 70
7 100 𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
8 58
𝑅𝑎𝑛𝑔𝑒 = 100 − 40
9 55
10 54 𝑅𝑎𝑛𝑔𝑒 = 60 𝐾𝑔
11 60
Variance
• A measure of variability that takes the mean as the reference point
𝑛 2
2
σ𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠
𝑠 = =
𝑛−1 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 − 1
Variance
• Characteristics
• Involves all observations in its computation
• Its unit is the squared unit of the original set of observations
• Hard to interpret and is abstract in many instances
• Easily determined if number of observations are few
• SPSS can be used to compute the mean of a large set of observations
Variance
Student Weight Deviation from Mean Squared Deviation from the mean mean = 59.09
(Kg) (𝑥𝑖 − 𝑥)ҧ 𝑥𝑖 − 𝑥ҧ 2
1 40 -19.09 364.46
2 50 -9.09 82.64 σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2
3 48 𝑠2 =
-11.09 123.01 𝑛−1
4 48 -11.09 123.01 2592.91
5 67 7.91 62.55 𝑠2 =
11 − 1
6 70 10.91 119.01
𝑠 2 = 259.29 Kg2
7 100 40.91 1673.55
8 58 -1.09 1.19
9 55 -4.09 16.74
10 54 -5.09 25.92
11 60 0.91 0.83
Standard Deviation
• Square root of the variance
• Unit is the same as that of the original set of observations
• Can be better interpreted than the variance
σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2
𝑠= = 𝑠𝑞𝑢𝑎𝑟𝑒 𝑟𝑜𝑜𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑛−1
𝑠 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑐𝑣 = =
𝑥ҧ 𝑚𝑒𝑎𝑛
Coefficient of Variation
Student Weight Deviation from Mean Squared Deviation from the mean
(Kg) (𝑥𝑖 − 𝑥)ҧ 𝑥𝑖 − 𝑥ҧ 2
1 40 -19.09 364.46 𝑠
2 50 𝑐𝑣 = 𝑥100
-9.09 82.64 𝑥ҧ
3 48 -11.09 123.01 16.10
4 48 𝑐𝑣 = 𝑥100
-11.09 123.01 59.05
5 67 7.91 62.55
𝑐𝑣 = 27%
6 70 10.91 119.01
7 100 40.91 1673.55
8 58 -1.09 1.19
9 55 -4.09 16.74
10 54 -5.09 25.92
11 60 0.91 0.83
General Interpretation of the Measures of
Dispersion
• If the value of the measure of dispersion is high or large, the
distribution of the observation are said to be heterogenous
• If the value of the measure of dispersion is low or small, the
distribution of the observations are said to be homogenous
Measures of Location
• Measures that aid in determining the relative position of a particular
value in an array of observations
• Provide more details about a part of the entire distribution of
observations in a given data
• May be used for both quantitative and qualitative variables
• Common measures of location include:
• Quartiles
• Deciles
• Percentiles
Quartiles
• Points in the distribution that divides the observations into four equal
parts
Decile
• Points in the distribution that divides the observations into ten equal
parts
Percentile
• Points in the distribution that divides the observations into one-
hundred equal parts
Relationship of Different Measures of
Location
Determining the Value of a Measure of
Location Using Cumulative Percentage
• Cumulative percentage
• The sum of the percentages on the same row and previous rows
• The last value will always be equal to 100%
• Purposes:
• Display the data clearly and effectively
• Allow the viewer to think about what the data convey
• Encourage the reader to make comparisons
Presentation of Data
• Display the data
• Allow the viewer to think about what the data convey
• Avoid distortion of the data
• Encourage the reader to make comparison
• Serve a reasonably clear purpose:
• description, exploration or tabulation
• Be closely related to the statistical and verbal descriptions of the data
set
Textual
• Uses statements with few numbers in order to describe the data
purposely to get attention to some significant data
• Advantages
• Appropriate when there are few figures to be presented
• Gives emphasis to significant figures
• Disadvantages
• Data is incomprehensible when the large quantitative data are included in the
paragraph
• Paragraph involving many figures can be tiresome to most readers when the
same words are repeated many times
Tabular Presentation
• Data presented in a table is referred to as “tabular presentation”
• Table is an orderly arrangement of statistical data into rows and
columns with some predetermined aim or purpose
• The purpose of the table is to simplify presentation of related data
and make comparisons easy
Table
• Advantages
• Concise and easy to understand
• Facilitates analysis of categories of the given variable
• Presents data in more detail
• Aids in graph or chart construction
• Disadvantages
• Too many rows or columns could make it difficult for the reader to understand
the data
• Requires more time to construct
Prime Consideration in Table Construction
• Simplicity
• clean, professional and uniform look devoid of frills and unnecessary
markings
• Clarity
• jive with textual discussion and does not appear out of place; may be
achieved by having clear, concise headings, uncluttered footnotes, minimum
number of variables included, well-spaced columns or rows, etc.
• Directness
• Include only those that are necessary
Pointers to Achieve the Norms in Table
Construction
• Position of the table
• Position the table immediately after the text where it is first cited
• Uniformity of style
• Standardize a particular style for a single report to avoid unnecessary
confusion on the part of the reader
• Number of variables presented should be kept to a minimum (at most
3 variables in a table)
• Every table should be self-explanatory
Construction of Table
• Guidelines for aligning
• Align text in a table to the left
• Text that serves as a column label may be centered
• Numeric values should be aligned to the right
• If the numeric values contain decimals, they should be decimal-aligned
Construction of Table
• Check the table to be sure that
• All sources are specified
• Headings are specific for every column and row
• Row and column totals are checked for accuracy
• Cells are not left blank. Enter “0” or “-”
• Categories are mutually exclusive and exhaustive
Essential Components of Table
Number Relative position of the table. It is placed on the same line as the opening of the title,
separated from the title by a period
Title Brief statement about the table presented. The title would be as complete as possible
and should clearly relate the content of the table. What, Who, Where, When
Box Head The box head contains the captions or column heading
Stub The row captions are known as the stub. Items in the stub should be grouped to facilitate
interpretation of data
Body The main part of the table which contains the information
Footnotes Any statement inserted at the bottom of the table
Source The exact reference to the source should be given if the data presented are not original
Types of Table
• Master Table
• Dummy Table
• Tables by number of variables presented
• One-way table
• Two-way table
• Multi-way table
Master Table
• A single table which shows the distribution of observations across
many variables of interest in a given study; each observation is cross-
classified across the variables which may be quantitative or
qualitative in nature
• Purpose:
• Way of storing information with an aim of presenting detailed statistical data
• Facilitate generation and tabulation of smaller table
Dummy Table
• Dummy tables are skeleton tables which do not contain figures but
give a preview of what outputs may be expected from the study (i.e.,
it shows how the data will be organized and displayed)
• Uses
• help researcher clarify instrument
• help protocol reviewer
• help computer programmer
Frequency Distribution
• Classifies data into groups
• Fertility indicators
• Crude birth rate
• General fertility rate
• Morbidity indicators
• Incidence
• Prevalence
Common Health Indicators
• Mortality indicators
• Crude death rate
• Specific mortality rate
• Cause-of-death rate
• Infant mortality rate
• Neonatal mortality rate
• Post-neonatal mortality rate
• Maternal mortality ratio
• Proportional mortality ratio
• Case fatality rate
Morbidity Measures
• Measures the occurence of illness or conditions in a community
• Two types:
• Prevalence
• Incidence
Prevalence Proportion (Ratio)
• Measures the proportion of existing cases of a disease in the
population
• Reflects both incidence and the probability of surviving with disease
• More useful in describing chronic conditions such as congenital
defects, non-lethal degenerative diseases with no clear onset, mental
disease
or
• Assumptions:
• The population is stable, i.e., no drastic changes in the size and age structure of the
population; and,
• The rate is more or less constant
Fertility Indicators
• Crude Birth Rate
# 𝑜𝑓 𝑙𝑖𝑣𝑒𝑏𝑖𝑟𝑡ℎ𝑠 𝑖𝑛 𝑎 𝑦𝑒𝑎𝑟
𝐶𝐵𝑅 = 𝑥 1000
𝑀𝑖𝑑𝑦𝑒𝑎𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
• Incidence Proportion
# 𝑜𝑓 𝑛𝑒𝑤 𝑐𝑎𝑠𝑒𝑠 𝑡ℎ𝑎𝑡 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑒𝑑 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
𝐼𝑃 = 𝑥 100
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑟𝑖𝑠𝑘 𝑜𝑓 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑖𝑛𝑔 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
• Incidence Density
# 𝑜𝑓 𝑛𝑒𝑤 𝑐𝑎𝑠𝑒𝑠 𝑡ℎ𝑎𝑡 𝑑𝑒𝑣𝑒𝑙𝑜𝑝𝑒𝑑 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
𝐼𝐷 = 𝑥 100
# 𝑜𝑓 𝑝𝑒𝑟𝑠𝑜𝑛 𝑡𝑖𝑚𝑒 𝑎𝑡 𝑟𝑖𝑠𝑘
Mortality Indicators
• Crude Death Rate
# 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑖𝑛 𝑎 𝑦𝑒𝑎𝑟
𝐶𝐷𝑅 = 𝑥 1000
𝑀𝑖𝑑𝑦𝑒𝑎𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
• Cause-of-Death Rate
# 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑓𝑟𝑜𝑚 𝑎 𝑐𝑒𝑟𝑡𝑎𝑖𝑛 𝑐𝑎𝑢𝑠𝑒 𝑖𝑛 𝑎 𝑦𝑒𝑎𝑟
𝐶𝑜𝐷𝑅 = 𝑥𝐹
𝑀𝑖𝑑𝑦𝑒𝑎𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
Mortality Indicators
• Infant Mortality Rate
# 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑢𝑛𝑑𝑒𝑟 1 𝑦𝑜 𝑖𝑛 𝑎 𝑦𝑒𝑎𝑟
𝐼𝑀𝑅 = 𝑥 1000
# 𝑜𝑓 𝑙𝑖𝑣𝑒𝑏𝑖𝑟𝑡ℎ𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑦𝑒𝑎𝑟