Professional Documents
Culture Documents
Bistatstics MLT 1-5
Bistatstics MLT 1-5
By Israel M.
10/02/2022 1
objectives
• Define nominal, ordinal, discrete and continuous data
and describe the differences between these types of
data.
10/02/2022 2
Lecture Topics
• Definition of terminologies/Introduction
• Measures of Dispersion
10/02/2022 3
Introduction
• Statistics: A field of study concerned with:
10/02/2022 5
Descriptive statistics:
Exploratory data analysis
10/02/2022 6
Inferential statistics:
• Confirmatory data analysis
10/02/2022 7
Using Statistics (Two Categories)
Descriptive Statistics Inferential Statistics
Organize Predict and forecast values of
Summarize population parameters
Display
Example: tables, Test hypotheses about values
graphs, numerical of population parameters
summary measures
Make decisions
10/02/2022 8
Uses of biostatistics
Assessment of health status
Resource allocation
Magnitude of association
– Strong vs weak association between exposure and outcome
10/02/2022 9
• Assessing risk factors
– Cause & effect relationship
• Drawing inferences
– Information from sample to population
10/02/2022 10
Data
Encompasses observations on one or more variables.
Set of data is a collection of observed values
representing one or more characteristics of some
objects or units.
Are numbers which can be measurements or can be
obtained by counting
Age and height of students in the class
10/02/2022 11
Types of Data
1. Primary data: collected from the items or individual
respondents directly by the researcher for the
purpose of a study.
More reliable and relatively accurate.
More expensive and time consuming
10/02/2022 12
Sources of data
• Routinely kept records
• Population Surveys
• Experiments
• Reports
• Literature
• Etc
10/02/2022 13
Variable
10/02/2022 14
• Variables can be broadly classified into:
10/02/2022 15
Categorical variable:
Cannot be measured in quantitative form as we
measure height or weight but only sorted by name or
categories
10/02/2022 16
Quantitative variable:
10/02/2022 17
Quantitative variable is divided into two:
1. Discrete: Can only have a limited number of
discrete values (usually whole numbers).
10/02/2022 18
2. Continuous variable:
It can have an infinite number of possible values in any given
interval.
10/02/2022 19
Measurement Scales
At what level does the measurement take place?
10/02/2022 23
level of severity of These numbers serve
only to indicate a
injury: pecking order of levels
of the variable—the
1. Fatal injury
differences between
2. Severe these numerical values
are meaningless
3. Moderate
4. Minor
10/02/2022 24
Interval scale
• Measured on a continuum and differences between
any two numbers on a scale are of known size.
10/02/2022 25
Interval scale
Has a zero point, its location may be arbitrary.
Hence ratios of interval scale values have no
meaning.
10/02/2022 26
Ratio scale
10/02/2022 28
Sample
Is a subset of the population
We use samples in making inferences about
populations
10/02/2022 29
Parameter and Statistic
Parameter: A descriptive measure computed from
the data of a population.
Used to describe the attributes of populations
– E.g., the mean (µ) age of the target population
10/02/2022 30
Descriptive Statistics
Techniques used to organize and summarize a set of
data in a concise way.
– Organization of data
– Summarization of data
– Presentation of data
10/02/2022 31
Before summarization and organization, we need to
know the types of variables and measurement scales
of our data.
10/02/2022 32
Descriptive statistics include:
Tables
Graphs
- Measures of variability
10/02/2022 33
Frequency Distributions (Tables)
• The actual summarization and organization of data
starts from frequency distribution.
10/02/2022 35
Example
Sex Frequency Relative
frequency
Male 400 66.7
Female 200 33.3
Total 600 100.0%
10/02/2022 36
Frequency distribution for Quantitative data
10/02/2022 39
To determine the number of class intervals and the
corresponding width, we may use:
Sturge’s Rule K 1 3.322(logn)
LS
W
K
Where
K = number of class intervals
n = number of observations
W = width of the class interval
L = the largest value
S = the smallest value
10/02/2022 40
Example: weights in pounds of 57 children at
a day-care center:
68 63 42 27 30 36 28 32 79 27 22 23 24 25 44 65 43
25 74 51 36 42 28 31 28 25 45 12 57 51 12 32 49 38
42 27 31 50 38 21 16 24 69 47 23 22 43 27 49 28 23
19 46 30 43 49 12
K = 1 + 3.332 (log 57) = 6.85 7
Maximum value = 79 Minimum value = 12
Range = 79-12 = 67
Width: 9.57 10
10/02/2022 41
Determining the frequencies or the number of values
or measurements for each interval
10/02/2022 42
10/02/2022 43
• Cumulative frequencies: when frequencies of two or
more classes are added.
10/02/2022 44
Weight f Relative Cumulative Cumulative
interval Frequenc Frequency relative
y (%) (cf) frequency (%)
10/02/2022 45
59.6% of the children in the data set have a
weight of 39.5 lb or less
10/02/2022 46
True limits: Are those limits that make an interval of a
continuous variable continuous in both directions
• A true boundary is the average of the upper limit of one
interval and the lower limit of the next-higher interval.
10/02/2022 47
Weight True limit Mid-point f
interval
10-19 9.5-19.5 14.5 5
20-29 19.5-29.5 24.5 19
30-39 29.5-39.5 34.5 10
40-49 39.5-49.5 44.5 13
50-59 49.5-59.5 54.5 4
60-69 59.5-69.5 64.5 4
70-79 69.5-79.5 74.5 2
Total 57
10/02/2022 48
Simple or one-way table
10/02/2022 49
Two-way table
10/02/2022 50
Higher Order Table
Desired to represent three or more characteristics in a
single table.
Variable Frequency Percent
Sex
Male
Female
Occupation
Student
Farmer
Merchant
Marital Status
Single
Married
10/02/2022 51
Guidelines for constructing tables
Should be as simple as possible
Should be self-explanatory
Clear title telling what, when and where, how
classified and placed above the table
Each row and column should be labeled
Numerical entities of zero should be explicitly written
State clearly the unit of measurement used,
Explain codes and abbreviations in the foot-note,
Show totals,
If data is not original, indicate the source in foot-note
10/02/2022 52
Diagrammatic Representation
They have greater attraction than mere figures
10/02/2022 54
Specific types of graphs include:
• Bar graph
categorical data
• Pie chart
• Histogram
• Frequency polygon
• Stem-and-leaf plot
• Box plot
• Scatter plot
• Line graph
10/02/2022 55
Bar charts (or graphs)
Suitable when there are several groups
Categories are listed on the horizontal axis (X-axis),
arranged: alphabetically, size of their proportions, or
on some other rational basis
Frequencies or relative frequencies are represented on
the Y-axis (ordinate)
The height of the bar represents the frequency or
relative frequency of occurrence of cases that belong to
particular category
The width of the bar has no meaning
10/02/2022 56
Simple bar chart
700 623
600
N o . o f p atien ts
500
400
300 256
200 161
97
100
0
Other GP OPD Casualty Other
hospital
Source of re feral
10/02/2022 57
Component (or Sub-divided) Bar chart:
Bars are sub-divided into component parts of the figure.
These sorts of diagrams are constructed when each total is
built up from two or more component figures.
60
Percent
40
20
0
August October December
2003
10/02/2022 59
Multiple Bar Charts
Component figures are shown as separate bars
adjoining each other.
10/02/2022 60
Prevalence of self reported breathlessness among school
childeren, 1998
35
30
Breathlessness, per cent
25
20
15
10
5
0
Neither One Both
Parents smooking
10/02/2022 61
Method of constructing bar chart
The bars should be of equal width and should
be separated from one another so as not to
imply continuity.
10/02/2022 62
Pie chart
Can be used for categorical and quantitative discrete
data
10/02/2022 63
Steps to construct a pie-chart
• Construct a frequency table
10/02/2022 64
Distribution fo cause of death for females, in England and Wales, 1989
Others
8%
Digestive System
4%
Injury and Poisoning
3%
Circulatory system
Respiratory system
42%
13%
Neoplasmas
30%
10/02/2022 65
Histograms
Quantitative data
The fraction of data in each class interval is represented
by a rectangle
10/02/2022 67
Frequency polygon
To draw a frequency polygon we connect the mid-
point of the tops of the cells of the histogram by a
straight line
10/02/2022 68
Steps to construct frequency polygon
10/02/2022 69
Example: Distribution of weights of 57 children
10/02/2022 70
Numerical summary statistics
To summarize data by means of just a few numerical
measures, particularly before inferences or
generalizations are drawn from the data.
10/02/2022 71
Measures of Central Tendency (MCT)
10/02/2022 72
• The objective of calculating MCT is to determine a
single figure which may be used to represent the
whole data set.
10/02/2022 73
Characteristics of a good MCT
A MCT is good or satisfactory if it possesses the following
characteristics.
It should be based on all the observations
It should not be affected by the extreme values
It should be as close to the maximum number of
values as possible
It should have a definite value
It should not be subjected to complicated and tedious
calculations
It should be capable of further algebraic treatment
It should be stable with regard to sampling
10/02/2022 74
The most common measures of central
tendency include:
– Arithmetic Mean
– Median
– Mode
10/02/2022 75
Arithmetic Mean
Center (average) of data set
10/02/2022 76
Ungrouped Data
Is the sum of the individual values in a data set
divided by the number of values in the data set
• The sample mean x is the sample analog to the mean
of a finite population ().
10/02/2022 77
Example
The heart rates for n=10 patients were as follows
(beats per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150
What is the arithmetic mean for the heart rate of these
patients?
10/02/2022 78
Grouped data
• Occasionally, data, especially secondhand data, are
presented in the grouped form of a frequency table.
10/02/2022 80
When the data are skewed, the mean is
“dragged” in the direction of the skewness &
in this case, the mean is a poor measure of
central location or does not reflect the center
of the sample.
10/02/2022 81
Properties of the Arithmetic Mean
For a given set of data there is one and only one
arithmetic mean (uniqueness).
10/02/2022 82
Median
• The middle observation, which divides the set into equal
halves.
• One half of the sample has values lying below the median
and one half of the sample has values lying above the median
• Appropriate for discrete and continuous data as well, but can
also be used for ordinal data
10/02/2022 83
If the number of observations n is odd, there will be a
unique median
If n is even, there is strictly no middle observation,
but the median is defined by convention as the
average of the two middle observations
10/02/2022 84
10/02/2022 85
10/02/2022 86
Properties of the median
There is only one median for a given set of data
(uniqueness)
The median is easy to calculate
Median is a positional average and hence it is
insensitive to very large or very small values
It is determined mainly by the middle points and less
sensitive to the remaining data points (weakness).
10/02/2022 87
Mode
The mode is the most frequently occurring value
among all the observations in a set of data.
10/02/2022 88
Example
• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
• Mode is 4 “Unimodal”
Example
• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
• There are two modes – 2 & 5
• This distribution is said to be “bi-modal”
Example
• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
• No mode, since all the values are different
10/02/2022 89
Properties of mode
It is not affected by extreme values
Often its value is not unique
The main drawback of mode is that often it does not
exist
10/02/2022 90
Skewness
If extremely low or extremely high observations are
present in a distribution, then the mean tends to shift
towards those scores.
Skewed to the right (positively skewed):
Where the upper, or left, tail of the distribution is longer
(“fatter”) than the lower, or right, tail
mode < median < mean
Skewed to the left (negatively skewed):
The lower tail of the distribution is longer than the upper
tail
mean < median < mode
10/02/2022 91
Symmetrical distribution
10/02/2022 92
Symmetric (B) and skewed distributions: right skewed (A) and left
skewed (C).
(Source: Centers for Disease Control and Prevention (1992).
Principles of Epidemiology, 2nd Edition, Figure 3.5, p. 151.)
10/02/2022 93
Measures of Dispersion
Measurements tend to be different from one another.
10/02/2022 94
We need to know something about the
variability or spread of the values — whether
they tend to be clustered close together, or
spread out over a broad range
10/02/2022 95
• Two samples of cholesterol measurements on a
given person with different techniques
– Method 1: 177, 193, 195, 209, 226
– Method 2: 192, 197, 200, 202, 209
10/02/2022 96
Measures of dispersion include:
Range
Inter-quartile range
Variance and Standard deviation
Coefficient of variation
10/02/2022 97
Range
The difference between the highest and lowest
observations in a data.
Example
―Data values 45, 70, 95, 100, 125
―Range = 125-45 = 80
Data set with higher range exhibit more
variability
10/02/2022 98
Properties of range
The value of the range is determined by only two of
the original observations.
10/02/2022 99
Percentiles
• Numerical values that divide an ordered data set into
100 pieces.
• Percentile = p(n + 1), p is the required percentile
10/02/2022 100
The pth percentile is:
10/02/2022 101
Given a sample of size n = 60, find the 30th percentile
of the data set.
p(n+1) = 0.30(60+1) = 18.3
= Average of 18th and 19th
– 30% of the observations are below this value and
70% of them are the value
10/02/2022 102
Quartile
25% 25%
25% 25%
10/02/2022 103
a) The first quartile (Q1): 25% of all the
ranked observations are less than Q1.
10/02/2022 105
Example:
• Given the data set: 8, 9, 9, 10, 13, 15, 16,19, 20.
n= 9
Q1= 0.25 (9+1) = 2.5 position
Take the average of the second and the third values
Q1= 9
10/02/2022 106
Interquartile range (IQR)
Indicates the spread of the middle 50% of the
observations, and used with median
IQR = Q3 - Q1
10/02/2022 107
Interquartile range (IQR)
10/02/2022 108
Properties of IQR:
10/02/2022 110
Population variance N
2 ( Xi )2
i 1
N
Sample variance n
S 2 ( Xi X )2
i 1
n1
xi 2 ( xi )2 / n
n1
Where
n is the sample size and X is the sample mean
N = the total number of elements in the
population and population mean
10/02/2022 111
Standard deviation (, s)
• It is the square root of the variance.
• Describes the variability among individual values in a
given data set
• This produces a measure having the same scale as
that of the individual values.
2 2
and S = S
• The SD has the advantage of being expressed
in the same units of measurement as the mean
10/02/2022 112
The standard deviation is the square root of the
average of the square of the deviations from the
sample mean.
10/02/2022 113
• Example: Blood Cholesterol Measurements for a
Sample of 10 Persons
10/02/2022 114
Coefficient of variation (CV)
10/02/2022 115
CV is the ratio of the SD to the mean
multiplied by 100.
S
CV 100
x
10/02/2022 116
Example: suppose two samples of human males
yield the following results:
10/02/2022 118
Data collection techniques
Observation
Interviewing
10/02/2022 119
Observation
Involves systematically selecting, watching and
recording behavior and characteristics of living
things, objects or phenomena
10/02/2022 121
Interviewing
10/02/2022 122
A written questionnaire can be administered in
different ways, such as by:
10/02/2022 123
Focus group discussions (FGDs)
FGDs allow a group of 8-12 informants to
freely discuss a certain subject with the
guidance of a facilitator.
10/02/2022 124
Questionnaire Designing
A questionnaire is an instrument used to obtain
information about respondents in a sample survey.
10/02/2022 125
Examples
10/02/2022 126
In questionnaire design remember to:
Use familiar and appropriate language
Avoid abbreviations, double négatives, etc
Avoid two elements to be collected through one
question
Avoid embarrassing and painful questions
Avoid language that suggests a response
Start with simpler questions
For open ended questions, provide sufficient space
for the response
Arrange questions in logical sequence
10/02/2022 127
Chapter II Probability and
Probability Distribution
10/02/2022 128
Objectives
• At the end of the sequence of lectures, you will be able
to:
– Describe the union, intersect, complement and mutually
exclusive events, using a Venn diagram.
– Define relative frequency probability for practical use.
– Describe and implement the formula for finding the
probability of one outcome given another.
– Describe independence of factors, why it is important in a
study, and how it is used.
– Describe dependence of factors, why it is important in a
study, and its impact.
10/02/2022 129
• Doubt is not a pleasant condition, but certainty
is absurd.
Voltaire 1894—1778
10/02/2022 130
What is Probability?
Determines the likelihood of occurrence of events
that are subject to chance.
Assumes a “stochastic” or “random” process: i.e..
the outcome is not predetermined - there is an
element of chance
E.g.,
Probability that the head comes up on a coin toss
Probability that a sick patient who receives a new
medical treatment will survive for five or more
years
10/02/2022 131
Some faces showed
up more frequently
than others
10/02/2022 132
Why probability
Many events in life are uncertain.
10/02/2022 133
Event
• An event is simply a set of descriptions; it is a
proposition
10/02/2022 134
Combination of events
• We have introduced events, so now let us start
expanding on the grammar of events by combining
events to create new events
1. Intersection
• Given two events, say A and B, create a new event
that occurs if both A and B occur.
• The notation we use to denote the new event is A
upside down cup B; A∩B.
10/02/2022 135
Intersection
• Let A represent the event that a randomly
selected newborn is LBW, and B the event that
he or she is from a multiple birth
10/02/2022 136
Intersection
10/02/2022 137
2. Union
The union of A and B, denoted AUB (P ( A or B )) , is
the event that occurs if either A or B, or both, occur. It
is not the exclusionary “or”. It is either A, or B, or
both.
10/02/2022 138
Union
10/02/2022 139
3. Complement
The complement of an event A, denoted by Ā or Ac,
is the event that A does not occur
10/02/2022 140
If the event A is, I live to be 25, then the event AC is, I
do not live to be 25; so dead by 25.
10/02/2022 141
Null event
• It is amazing how much we can do just with
those three operations.
10/02/2022 142
Null event
10/02/2022 144
Mutually Exclusive Events
Example
– A = “live to be 25”
– B = “die before 10th birthday”
10/02/2022 145
What is probability of an event?
• Probability of an event is the relative frequency of
the set of outcomes over an indefinitely large
(infinite) number of trials.
10/02/2022 146
What is probability of an event?
Theoretical probability models are constructed from
which probabilities of many different events can be
computed.
10/02/2022 147
Classical Probability
• If an experiment is repeated n times under essentially
identical conditions and the event A occurs m times,
then as n gets large the ratio approaches the
probability of A. 𝒎
𝑷 ( 𝑨 )=
𝒏
• Probability is symmetric around a half. At the edges
we are certain—at zero we are certain it will not
happen, at one we are certain it will happen. We have
maximal uncertainty at the center, when p=1/2.
10/02/2022 148
Classical probability cont…
10/02/2022 149
Classical probability cont…
For any event A
Complement
10/02/2022 150
Additive Rule
If A and B are mutually exclusive events, that means
they cannot happen at the same time (A∩B=Ø)
10/02/2022 151
Additive rule cont…
When A and B are not mutually exclusive,
10/02/2022 154
Conditional Probability
The chance a particular event happens depends on the
outcome of some other event
10/02/2022 155
Conditional Probability cont…
E.g., Suppose in country X the chance that a person lives to age
25 is 0.95, whereas the chance that he lives to age 65 is .65.
• Suppose that the event B is that a person will live to be 65, and
the event A, is that a person is alive at age 25. Then the event B
given A, is that a 25-year-old person will be alive at 65.
10/02/2022 156
Solution
B = “A person will be alive at 65”
• n
10/02/2022 158
Multiplicative law and Independence
Two events A and B are independent if occurrence or
nonoccurrence of one does not in any way affect the
occurrence or nonoccurrence of the other.
Knowing that A happens does not influence our
probability of B happening seem
10/02/2022 160
Exercise
• Suppose that we are conducting hypertensive screening
program in the home. Suppose that hypertensive status
of the mother doesn't depend at all on the hypertensive
status of the father. Let event A: mother’s DBP>95 and
event B: father’s DBP>95. pr(A)=0.1, pr(B)=0.2.
– Are the two events mutually exclusive
– What is the probability that both the mother and the
father are hypertensive
– What is the probability that either the mother or the
father, or both are hypertensive
10/02/2022 161
Properties of Probability
1. The numerical value of a probability always lies
between 0 and 1, inclusive.
0 P(E) 1
A value 0 means the event can not occur
A value 1 means the event definitely will occur
A value of 0.5 means that the probability that the
event will occur is the same as the probability
that it will not occur.
10/02/2022 162
Properties of Probability
2. The sum of the probabilities of all mutually exclusive
outcomes is equal to 1.
P(E1) + P(E2 ) + .... + P(En ) = 1.
10/02/2022 163
Clarification aid:
• IF A and B are mutually exclusive then (Additive Law)
P(AUB)=P(A)+P(B)
10/02/2022 164
Probability Distributions
10/02/2022 165
Probability models
• Now we are going to start applying what we learned
about probability, and we start by applying
probability to numbers and models
10/02/2022 166
Probability Distributions
Describe the probability of events
A device used to describe the behaviour that a
random variable may have by applying the theory of
probability.
Parameters are characteristics of probability
distributions.
The statistic that we use to estimate parameters are
also random variables.
We are interested in the distributions of these
statistics and will use them to make inferences about
population parameters.
10/02/2022 167
Random Variable
Any quantity or characteristic that is able to
assume a number of different values such that
any particular outcome is determined by
chance
Why random?
10/02/2022 168
Random Variable
10/02/2022 171
Discrete Probability Distributions
• For a discrete random variable, the probability
distribution specifies each of the possible outcomes
of the random variable along with the probability that
each will occur (probability mass function)
P( X x)
10/02/2022 172
The following data shows the number of diagnostic
services a patient receives
10/02/2022 173
a. What is the probability that a patient receives
exactly 3 diagnostic services?
10/02/2022 174
Answers
a. P(X=3) = 0.031
b. P (X≤1) = P(X = 0) + P(X = 1)
= 0.671 + 0.229
= 0.900
10/02/2022 175
The Expected Value of a Discrete RV
• Let X be a discrete random variable which takes the
values X1, . . . ,Xn.
10/02/2022 176
The Expected Value of a Discrete RV cont…
10/02/2022 177
Binomial Distribution
• Consider dichotomous (binary) random variable
10/02/2022 178
Binomial Distribution cont…
Now a binomial random variable counts the number
of successes in n independent trials each associated
with a Bernoulli(p) random variable
Example:
We are interested in determining whether a newborn
infant will survive until his/her 70th birthday
Let Y represent the survival status of the child at age
70 years
Y = 1 if the child survives and Y = 0 if he/she does
not
10/02/2022 179
Binomial Distribution cont…
• Are the outcomes mutually exclusive and
exhaustive?
10/02/2022 180
Binomial assumptions
The experiment consist of n identical trials.
10/02/2022 182
Notations
• n the number of fixed trials
10/02/2022 183
Factorial
For any positive integer n, we define n
factorial as: n(n-1)(n-2)...(1).
We denote n factorial as n!.
10/02/2022 184
Combinations
• The possible selections of r items from a group of n
items regardless of the order of selection. The
number of combinations is denoted and is read as n
choose r.
• An alternative notation is nCr.
• We define the number of combinations of r out of n
elements as
n n!
C
r n r r! (n r)!
Forexample :
n 6! 6! 6 * 5 * 4 * 3 * 2 * 1 6 * 5 * 4 120
C 20
r 6 3 3!(6 3)! 3!3! (3 * 2 * 1)(3 * 2 * 1) 3 * 2 * 1 6
10/02/2022 185
Mean and Variance of Binomial distribution
• The mean of binomial distribution is n and the
variance is n(1- )
• Example: Assume that, when a child is born, the
probability it is a girl is ½ and that the sex of the
child does not depend on the sex of an older sibling.
10/02/2022 186
a) Probability distribution
X 0 1 2 3 4
b) mean= nP = 4 x 1/2 = 2
Variance and SD are 1
10/02/2022 187
Continuous Probability Distributions
• A continuous random variable X can take on any
value in a specified interval or range
10/02/2022 188
Continuous Probability Distributions cont…
• Instead of assigning probabilities to specific
outcomes of the random variable X, probabilities
are assigned to ranges of values
• The probability associated with any one particular
value is equal to 0
• Therefore, P(X=x) = 0
• Also, P(X ≥ x) = P(X > x)
• We calculate:
Pr [ a < X < b], the probability of an
interval of values of X.
10/02/2022 189
Normal Distribution
• The most important probability distribution in statistics
10/02/2022 190
• The concept of “probability of X=x” in the discrete
probability distribution is replaced by the
“probability density function f(x)
10/02/2022 191
10/02/2022 192
• The notation N(, 2) denotes a normal distribution
with mean and variance 2.
1. The mean µ tells you about location
– Increase µ - Location shifts right
– Decrease µ – Location shifts left
– Shape is unchanged
2. The variance σ2 tells you about narrowness or
flatness of the bell
– Increase σ2 - Bell flattens. Extreme values are more likely
– Decrease σ2 - Bell narrows. Extreme values are less likely
– Location is unchanged
10/02/2022 193
Properties of the Normal Distribution
A probability distribution of a continuous variable. It
extends from minus infinity (-∞) to plus infinity
(+∞).
Symmetrical about its mean, .
The mean, the median and mode are almost equal. It
is unimodal.
The total area under the curve about the x-axis is 1
square unit.
The curve never touches the x-axis.
As the value of increases, the curve becomes more
and more flat and vice versa.
10/02/2022 194
Properties of the Normal Distribution
cont…
The distribution is completely determined by
the parameters and .
10/02/2022 195
Normal probabilities Empirical Rules
• The probability that a normal random variable will be
within 1 standard deviation from its mean (on
either side) is 0.6826, or approximately 0.68.
• The probability that a normal random variable will be
within 2 standard deviations from its mean is
0.9544, or approximately 0.95.
• The probability that a normal random variable will be
within 3 standard deviation from its mean is
0.9974.
10/02/2022 196
10/02/2022 197
The Standard Normal Distribution
10/02/2022 198
The Standard Normal Distribution
• The standard normal random variable, Z, is the
normal random variable with mean = 0 and
standard deviation = 1: Z~N(0,12).
Standard Normal Distribution
0 .4
0 .3
=1
f( z )
0 .2
{
0 .1
0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5
=0
Z
10/02/2022 199
10/02/2022 200
Finding Probabilities of the SND: P(0 < Z <
1.56)
Outcomes of the random variable Z are denoted by z;
The whole number and tenths decimal place of z are
listed in the column to the left of the table, and
The hundredths decimal place is shown in the row
across the top
• For a particular value of z, the entry in the body of
the table specifies the area beneath the curve to the
right of z, or P(Z> z)
10/02/2022 201
Some sample values of and their
corresponding areas are as follows
0.00 0.5000
1.65 0.049
1.96 0.025
2.58 0.005
3.00 0.001
10/02/2022 202
Since the SND is symmetric about 0, the area
under the curve to the right of z is equal to the
area to the left of - z.
-z Area in the right tail
0.00 0.5000
-1.65 0.049
-1.96 0.025
-2.58 0.005
-3.00 0.001
10/02/2022 203
Finding Probabilities of the Standard Normal
Distribution: P(0 < Z < 1.56)
Standard Normal Probabilities
Standard Normal Distribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.4 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.3 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
f(z)
0.2 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1
1.56 1.0
1.1
0.3413
0.3643
0.3438
0.3665
0.3461
0.3686
0.3485
0.3708
0.3508
0.3729
0.3531
0.3749
0.3554
0.3770
0.3577
0.3790
0.3599
0.3810
0.3621
0.3830
{
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
0.0 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
-5 -4 -3 -2 -1 0 1 2 3 4 5 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
10/02/2022 204
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
10/02/2022 205
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
z ... .06 .07 .08
To find P(Z<-2.47): .
.
. .
. . .
P(0 < Z < 2.47) = .4932 .
2.3 ... 0.4909 0.4911 0.4913
P(Z < -2.47) = 2.4 ...
2.5 ...
0.4931
0.4948
0.4932
0.4949
0.4934
0.4951
.
.5 - P(0 < Z < 2.47) .
.
= .5 - .4932 = 0.0068
Standard Normal Distribution
Area to the left of -2.47 0.4
P(Z < -2.47) = .5 - 0.4932
= 0.0068 0.3
Table area for 2.47
P(0 < Z < 2.47) =
f(z)
0.2
0.4932
0.1
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
10/02/2022 206
find P(1 Z 2):
To find P(1 Z 2):
1. Find table area for 2.00
F(2) = P(Z 2.00) = .5 + .4772 =.9772
2. Find table area for 1.00
F(1) = P(Z 1.00) = .5 + .3413 = .8413
3. P(1 Z 2.00) = P(Z 2.00) - P(Z 1.00)
= .9772 - .8413 = 0.1359
10/02/2022 207
Exercise
1. Compute P(-1 ≤ Z ≤ 1.5)
10/02/2022 208
Z - Transformation
If a random variable X~N(,) then we can
transform it to a SND with the help of Z-
transformation
Z= x-
Z represents the Z-score for a given x value
10/02/2022 209
• This process is known as standardization and
gives the position on a normal curve with μ=0
and σ=1, i.e., the SND, Z.
10/02/2022 210
Example
• The diastolic blood pressures of males 35–44 years of
age are normally distributed with µ = 80 mm Hg and
σ2 = 144 mm Hg2
σ = 12 mm Hg
10/02/2022 211
a. What is the probability that a randomly selected male
has a BP above 95 mm Hg?
10/02/2022 212
b. What is the probability that a randomly
selected male has a DBP above 110 mm Hg?
Z = 110 – 80 = 2.50
12
10/02/2022 213
Exercise
10/02/2022 215
Introduction
Chi square distribution is one of the probability
distributions
2 distribution is not symmetrical, it is always
skewed to the right
The distribution only takes positive values between 0
and infinity
The skeweness diminishes as n gets larger
It depends on degree of freedom, df=(R-1)(C-1),
where R and C are the number of rows and columns
respectively. (The only parameter of the distribution)
10/02/2022 216
Test of significance using the 2
Used for categorical data analysis
It compares the actual observed frequency in each
group with the expected frequency
Allows us to test for association between categorical
(nominal) variables
The null hypothesis for this test is that there is no
association between the variables.
• E.g., the proportion of disease is the same regardless
of exposure
HA is there is an association between the variables
10/02/2022 217
Assumptions of 2 teat
Each of the observations should be independent of
the other observations
10/02/2022 219
Calculation of expected frequency
The expected frequency in each cell is the
product of the row and column totals divided
by the sum of all the observed frequencies (i.e.
sample size)
10/02/2022 220
Counts in the Chi-Square Test of a 2x2 table
are represented as “a”, “b”, “c” and “d”.
10/02/2022 221
Example 1
MI status over 3-
OC use group years
Yes No Total
10/02/2022 223
Example 2: Observed Numbers
Response by Treatment
10/02/2022 224
Expected Numbers
10/02/2022 225
10/02/2022 226
10/02/2022 227
10/02/2022 228
A study was conducted to investigate the possible
cause of gastroenteritis outbreak following a lunch
served in a high school cafeteria. Among the 225
students who ate the sandwiches, 109 became ill.
While, among the 38 students who did not eat the
sandwiches, 4 became ill.
10/02/2022 230
Introduction
10/02/2022 232
Terminologies cont…
• Sampling population: the subset of the target
population from which a sample will be drawn.
10/02/2022 233
Terminologies cont…
Sampling frame: the list of all the units in the
reference population, from which a sample is to be
picked.
10/02/2022 234
Why sampling?
Often, it is too expensive or impossible to
collect information on an entire population.
10/02/2022 235
Advantages & disadvantages of sampling
Advantages
Saves resources
Improves quality of data
Disadvantages
10/02/2022 237
While selecting a SAMPLE, there are basic
questions:
10/02/2022 238
Methods of Sample Selection
10/02/2022 239
Probability sampling methods
Every sampling unit has a known and non-zero
probability of selection into the sample.
Might be costly
10/02/2022 240
How random samples can be selected?
5. Multi-stage sampling
10/02/2022 241
Simple Random Sampling
Least biased of all sampling techniques, there is no
subjectivity - each member of the total population has an
equal chance of being selected
10/02/2022 243
Example (random numbers)
10/02/2022 244
Systematic sampling
Selection of individuals from the sampling frame
systematically rather than randomly
10/02/2022 245
Systematic sampling cont…
• Other items in the sample are obtained by adding the
sampling interval N/n successively to the random
number.
10/02/2022 247
Example
Therefore, K = 4.
You will need to select one unit out of every four units
to end up with a total of 100 units in your sample.
10/02/2022 248
Example cont…
10/02/2022 249
Stratified random sampling
It is done when the population is known to have
heterogeneity with regard to some factors, and those
factors are used for stratification
10/02/2022 250
Stratified random sampling cont…
A separate sample is taken independently from
each stratum.
Equal allocation:
– Allocate equal sample size to each stratum
10/02/2022 252
Stratified random sampling cont…
Proportionate allocation:
nj = n/N Nj
10/02/2022 253
Cluster Sampling
Method of sampling in which the element selected is a
group (as distinguished from an individual), called a
cluster.
Steps
10/02/2022 255
Cluster Sampling cont…
Example
10/02/2022 256
Cluster Sampling cont…
Advantages
A list of all the individual study units in the reference
population is not required. It is sufficient to have a list
of clusters
Cost reduction
Disadvantages
Sampling error is usually higher than for a simple
random sample of the same size.
It is usually better to survey a large number of small
clusters instead of a small number of large clusters.
10/02/2022 257
Multi-stage sampling
Similar to the cluster sampling, except that it involves
picking a sample from within each chosen cluster,
rather than including all units in the cluster.
10/02/2022 258
Multi-stage sampling cont…
This type of sampling requires at least two stages.
10/02/2022 259
Multi-stage sampling cont…
You do not need to have a list of all of the units in the
population. All you need is a list of clusters and list of
the units in the selected clusters.
10/02/2022 260
Non-probability sampling
Every item has an unknown chance of being selected
10/02/2022 261
Non-probability sampling
Inappropriate if the aim is to measure variables and
generalize findings obtained from a sample to the
population
10/02/2022 262
The most common types of NPS
10/02/2022 263
Convenience or haphazard sampling
Sometimes referred to as accidental sampling.
Study units that happen to be available at the time of
data collection are selected.
It can deliver accurate results when the population is
homogeneous.
For example, a scientist could use this method to
determine whether a lake is polluted or not.
Assuming that the lake water is well-mixed, any
sample would yield similar information.
10/02/2022 264
Volunteer sampling
As the term implies, this type of sampling occurs
when people volunteer to be involved in the study.
10/02/2022 265
Quota sampling
Sampling is done until a specific number of units
(quotas) for various sub-populations have been
selected.
10/02/2022 266
Advantages of NPS
Easy
Less expensive
Does not require sampling frame
Disadvantages of NPS
Not representative of the population
Bias
10/02/2022 267
Sample Size Determination
In studies concerned with estimating some
characteristic of a population, sample size
calculations are important to ensure that estimates are
obtained with required precision or confidence.
10/02/2022 268
Sample size determination depends on the:
10/02/2022 269
Determination of Sample Size for
Estimating Means
To estimate population mean,
n = Zα/2 2
d2
10/02/2022 270
Example
• Find the minimum sample size needed to estimate
the drop in heart rate (µ) for a new study using a
higher dose of propranolol than the standard one.
We require that the two-sided 95% CI for µ be no
wider than 5 beats per minute and the sample
standard deviation for change in heart rate equals 10
beats per minute.
2 2 2
n = (1.96) 10 /(2.5) = 62 patients
10/02/2022 272
Example
A survey is being planned to determine what
proportion of family in a certain area are medically
indigent. It is believed that the proportion can not be
greater than 0.35. A 95% confidence interval is
desired with d = 0.05. What size sample of families
should be selected?
n = (1.96)2(0.35)(0.65)
(0.05)2
10/02/2022 273
Note
10/02/2022 274
Sampling Distribution
10/02/2022 275
Introduction
The target of a scientist’s investigation is a population
with certain characteristic of interest: E.g., systolic
blood pressure
10/02/2022 276
Introduction
It would be too time consuming or too costly to
obtain the totality of population information in order
to learn about the parameter(s) of interest.
10/02/2022 278
Sampling Distributions
10/02/2022 279
Main types of sampling distributions
• Distribution of the sample mean
10/02/2022 280
Construction of sampling distributions
10/02/2022 281
Sampling distribution of sample mean
Suppose we have a population of size N=4,
constituting the ages of four outpatients.
x, Age (years): 18, 20, 22, 24
μ
x i
N
18 20 22 24
21
4
σ
i
(x μ) 2
2.236
N
10/02/2022 282
Now consider all possible samples, with
replacement, of size 2
1st 2nd Observation 1st 2nd Observation
Obs 18 20 22 24
18 18,18 18,20 18,22 18,24
Obs 18 20 22 24
20 20,18 20,20 20,22 20,24 18 18 19 20 21
22 22,18 22,20 22,22 22,24
20 19 20 21 22
24 24,18 24,20 24,22 24,24
22 20 21 22 23
16 possible samples
24 21 22 23 24
16 sample means
10/02/2022 283
Sample means Freq P( )
18 1 0.0625
19 2 0.1250
20 3 0.1875
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
10/02/2022 284
Sampling distribution of all sample means
Sample Means
Distribution
P(x)
.3
.2
.1
0
18 19 20 21 22 23 24
_
x
10/02/2022 285
Mean and SD of sample means
μx
x
i18 19 21 24
21
N 16
σx
i x
(x μ ) 2
N
(18 - 21)2 (19 - 21)2 (24 - 21)2
1.58
16
10/02/2022 286
Mean and SD of sample means
• We note that the mean of the sampling
distribution has the same value as the mean of
the original population.
10/02/2022 287
Standard error
• The standard deviation of any sample statistic is
called its standard error
10/02/2022 288
Properties of sampling distribution of mean
σ
σx
n
10/02/2022 289
z-score/value
Helps in computing the probability of
obtaining a sample with a mean of some
specified magnitude. (x μ)
z
z-score/value σ
n
Where: x = sample mean
= population mean
σ = population standard deviation
n = sample size
10/02/2022 290
Distribution of the sample proportion
Sample proportion =
Population proportion = p or π
10/02/2022 291
Properties
• The mean of the distribution, μp, will be equal
to the true population proportion, P, and the
variance of the distribution, will be equal to
p(q)/n.
p(1 p)
μp p σp
n
• The sampling distribution of will be
approximately normal when the sample size n
is large
10/02/2022 292
z-Value for Proportions
Standardize p to a z value with the formula:
pp pp
z
σp p(1 p)
n
10/02/2022 293
Summary for SE
Standard error
Statistic
• Sample mean, x • SEx = s / sqrt( n )
• Sample proportion, • SEp = sqrt [ p(1 - p) / n ]
10/02/2022 294
Exercises
1. Suppose a population has mean μ = 50 and
standard deviation σ = 16. Suppose a random
sample size of 64 is selected.
Find the probability that the sample mean is >53
Steps
Write the given information
Sketch a normal curve
Convert the mean to z-score
Find the corresponding area under the SND curve
10/02/2022 295
The area of the SND above a value of z = 1.5
gives an area of 0.0668. The probability P (z >
1.5) = 0.0668
10/02/2022 296
2. According to a recent estimate, 19.4% of the under-
five children in a population are stunted
What is the probability that in a random sample of
size 150 from this population fewer than 15% will be
stunted?
• Find z-score
10/02/2022 297
Chapter IV Estimation
10/02/2022 298
Introduction
• The values of population parameters are usually not known
10/02/2022 300
Estimation
The statistic itself is called an estimator
Point estimate
A point estimate of a population parameter is a single
value of a statistic.
For example, the sample mean is a point estimate of
x
the population mean μ. Similarly, the sample
proportion is a point estimate of the population
proportion P.
10/02/2022 302
The value of your sample statistic (e.g., your
sample mean or sample correlation) is used to
estimate the population parameter (e.g., the
population mean or the population correlation).
10/02/2022 303
Properties of good estimate
Unbiasedness: if one could take repeated samples of
size n from the population the average of these
estimates would equal the value of the population
parameter. The sample mean and median are unbiased
estimators of the population mean .
10/02/2022 304
Interval estimate
10/02/2022 306
10/02/2022 307
A point estimate does not give any indication
on how far away the parameter lies.
10/02/2022 308
CI has 3 components:
10/02/2022 309
Confidence coefficient is the measure of how
confident we want to be, critical value
10/02/2022 310
CIs also give information about the precision of an
estimate.
10/02/2022 311
Confidence interval (CI)Tolerance
error
of
1-α
α/2 α/2
10/02/2022 314
Example: 95% CI
10/02/2022 315
CI for a Single Population Mean
(normally distributed)
Is the population is normally distributed?
If population standard deviation () is not
known, use sample standard deviation (S).
A 100(1-)% C.I. for is:
10/02/2022 316
Finding the Critical Value
10/02/2022 317
Example
Suppose that the mean of percentage of bile for 31
male patients is 84.64, and the standard deviation 24.
Find the 95% CI.
10/02/2022 318
CI for a single population proportion
• The distribution of the sample proportion is
approximately normal if sample size is large
P (1 P )
SE
n
10/02/2022 319
10/02/2022 320
Lower limit = Point Estimate - (Critical Value) x
(Standard Error of Estimate)
Hence,
10/02/2022 321
Example
A random sample of 100 people shows that 4 are
smokers. Form a 95% CI for the true proportion of
smokers.
Point estimate = P 40 / 100 0.4
SP =
0.4(0.6) / 100 0.049
10/02/2022 323
What is Hypothesis Testing?
• A statistical hypothesis is an assumption
about a population parameter. This assumption
may or may not be true.
10/02/2022 324
What is Hypothesis Testing?
Hypothesis testing aids in reaching a decision
(conclusion) concerning a population by
examining a sample from that population.
10/02/2022 325
Types of statistical hypotheses
1. Null Hypothesis, HO
Specifies a hypothesized real value, or values, for a
parameter
10/02/2022 326
Null Hypothesis, HO
10/02/2022 327
2. The Alternative Hypothesis, HA
10/02/2022 328
How to Test Hypotheses
1. State the hypotheses
The hypotheses are stated in such a way that
they are mutually exclusive
Example
H0: = 0 H0: ≤ 0 H0: ≥ 0
H1: 0 H1: > 0 H1: < 0
two-tailed one-tailed one- tailed
10/02/2022 329
How to Test Hypotheses…
10/02/2022 332
Z-statistic
Test statistic = (Statistic - Parameter) / (Standard
deviation of statistic)
Test statistic = (Statistic - Parameter) / (Standard error
of statistic)
Where Parameter is the value appearing in the null
hypothesis, and Statistic is the point estimate of
Parameter.
As part of the analysis, you may need to compute the
standard deviation or standard error of the statistic.
10/02/2022 333
How to Test Hypotheses…
4. Specify the desired level of significance for the
statistical test (=0.05, 0.01, etc.)
10/02/2022 334
How to Test Hypotheses…
Level of significance
You are the one who decides on the
significance level to use in your research study.
10/02/2022 335
How to Test Hypotheses…
10/02/2022 336
10/02/2022 337
How to Test Hypotheses…
10/02/2022 338
Decision Rules
For rejecting the null hypothesis
10/02/2022 339
Region of acceptance
10/02/2022 340
Region of acceptance
10/02/2022 341
Decision Errors
Two types of errors can result from a hypothesis test.
10/02/2022 343
Types of errors
Truth
No diff Diff
H0 to be not rejected H0 to be rejected (H1)
Right decision
H0 not rejected
Decision 1-
based on No diff Type II error
10/02/2022 345
Example 1
The mean age of a random sample of 40
individuals is 27. If the variance of the
population is 20, can we conclude that the mean
age of the population is different than 30 years?
Step I
H0 : μ = 30
HA : μ 30
Step II: n=27, 2= 20, mean = 27, normally
distributed population
10/02/2022 346
Step III: test statistic
Z- statistic is appropriate
10/02/2022 347
Step V: Rejection region and Critical value
10/02/2022 348
Step 7:
P-value: <0.0002
10/02/2022 349
Example 2
Suppose that a random sample of 40 people gave a
mean age of 27. If the population variance is 20, can
we conclude that μ < 30?
Take significance level (α = 5 %)
H0: μ = 30 HA: μ < 30
Ztab = -1.645
Zcal
10/02/2022 350
• Zcal < Ztab, Reject H0.
10/02/2022 351
HT about a single population proportion
• H0 : P= o
• HA : P o
z p o
o(1 o ) / n
10/02/2022 353
HO: P = 0.082
HA: P 0.082
With α = 0.05, the critical values of Z are -
1.96 and +1.96. We reject Ho if Zcal < -1.96 or
Z > +1.96.
10/02/2022 354
Chapter V - Demography
10/02/2022 355
Introduction
Demography is a science that studies human
population with respect to size, distribution,
composition, social mobility and its
variation with respect to all the above features
and the causes of such variation and the
effect of all these on health, social, ethical,
and economic conditions.
10/02/2022 356
Size: the number of persons in the population at a
given time.
10/02/2022 357
Change: refers to the increase or decline of the
total population or its components. The
components of change are birth, death, and
migration.
10/02/2022 359
Uses of Demographic Data in Public health
Planning
Health service provision
Types of services
Health indicators
10/02/2022 360
Sources of Demographic Data
1. Census
10/02/2022 361
Characteristics of census
Universality
Simultaneity
Individual enumeration
There are two main different schemes for
enumerating a population in a census.
De facto: The enumeration is done according to
the actual place of residence on the day of the
census
De jure: The enumeration (or count) is done
according to the usual or legal place of residence
10/02/2022 362
Common errors in census data
Omission and over enumeration.
Miss reporting of age due to memory lapse,
preference of terminal digits, over/under estimation.
Overstating of the status within the occupation.
Under reporting of births due to problem of reference
period and memory lapse.
Under reporting of deaths due to memory lapse and
tendency not to report on deaths, particularly on
infant deaths.
10/02/2022 363
2. Sample survey
10/02/2022 365
Population pyramid
• Population pyramid presents the population of
an area or country in terms of its composition
by age and sex at a point in time
10/02/2022 366
Population pyramid
The pyramid consists of a series of bars, each drawn
proportionately to represent the relative contribution
of each age-sex group (often in five year groupings)
to the total population
10/02/2022 367
Population Pyramid
A triangular, broad-based pattern of a pyramid
reflects a high birth rate over a long period of
time
10/02/2022 368
Population Pyramid, Ethiopia
2007
10/02/2022 369
Demographic Transition
Demographic transition is a term used to describe the
major demographic trends of the past two
centuries.
10/02/2022 370
Pre-transitional: characterized by high
mortality and high fertility, with low
(moderate) population growth (young
population). Type I
10/02/2022 371
Demographic Transition
10/02/2022 372
Vital statistics
Dependency Ratio
It describes the relation between the
potentially self-supporting portion of the
population and the dependent portions
(young and aged) of the population.
10/02/2022 373
Sex Ratio
10/02/2022 374
Measures of Fertility
10/02/2022 375
1) Crude birth rate (CBR)
Is the number of live births in a year per 1000 mid
year population in the same year.
CBR = No of live births in a year x 1000
Mid year population of the same year
2) General fertility rate (GFR)
Is the number of births in a specified period per 1000
women aged 15-49 year;
10/02/2022 376
3) Age specific fertility rate (ASFR)
Because fertility varies within the childbearing years,
demographers often measure fertility according to the
age of the mother.
10/02/2022 377
4) Total fertility rate (TFR)
It estimates the total number of live births 1,000 women would
have if they all lived through their entire reproductive
period and were subject to a given set of ASFRs.
Is the sum of all age specific fertility rates for each year of age
from 15- 49 years.
It is the average number of children that a synthetic
(artificial) cohort (a group of persons who share a common
experience within a defined period) of women would have at
the end of reproduction, if there were no mortality among
women of reproductive age; each woman will live up to 49
years of age, about a total of 35 years.
10/02/2022 378
49 Bi
TFR
i 15 P f
x1000for a single year age classifica tion
i
7 Bi
TFR 5 x P f x1000for 5 years ageclassifica tion
i 1 i
10/02/2022 379
5) Gross Reproduction Rate (GRR):
10/02/2022 381
Let Bf = Number of female births
Let Bm+f = Number of male and female births,
i.e. total births
10/02/2022 382
6) Net reproduction rate (NRR)
10/02/2022 383
NRR is the average number of daughters that
would be born to a woman if she passed
through her life-time from birth to the end of
her reproductive years conforming to the age
specific fertility and mortality rates of a given
year.
10/02/2022 384
Replacement Level Fertility is said to have
been reached when NRR=1.0
10/02/2022 386
Population Growth and Projection
The rate of increase or decline of the size
population by natural causes (births and
deaths) can be estimated crudely by using the
measures related to births and deaths.
10/02/2022 387
Population projection provides information on the
future size and composition of the population of a
given area.
10/02/2022 388
Geometric projection model:
Doubling time
10/02/2022 389
For example, if the CBR=46, CDR=18 per 1000
population and population size of 25,460 in 1998, then ,
Crude rate of natural increase = 46 - 18 = 28 per 1000 =
2.8 percent per year.
10/02/2022 391