Biostatistics

Bio Statistics Notes
Biostatistics
Biostatistics is in fact the combination of two words “Bio” and “Statistics”.
The Bio parts involve Biology “The Study of living things” while the statistics part involves “The
accumulation, tracking, analysis and application of data. Biostatistics is the branch of statistics related to
medical and health applications. Biostatistics underpins the methodologies and epidemiological
investigation and research. Biostatistics is the used of the statistical procedures and analysis and study
and practice of biology. In simple words the branch of statistics that deals with data relating to living
organisms is called Biostatistics. Statistical process and methods apply to the collection, analysis, and
interpretation of biological data and especially data relating to human biology, health and medicine is
called Bio statistics
Applications of Biostatistics
Biostatistics has applications in all life sciences. Few applications of Biostatistics are
summarized below.
a) Community Medicine and Public Health
In modern Medicine Biostatistics is used to determine how diseases develop, progress

and spread. For Example Biostatisticians use statistics to predict the behavior of the
illness like flu. It is used to predict the mortality rate, the symptoms and even the time
of year people might get it. Medical research used biostatistics from beginning to end.
b) In Demography
It is used and estimating the attributes of population such as sex ratio, Birth rates,
Density of population etc.
c) In Pharmacology
To find the action of the drug, the drug is given to animals or humans to see whether
the changes are produced due to drug or by chance.
d) In Research
Research is incomplete without statistics. Every result needs to be statistically validated,
for the design of experiment, selecting the method of collection of data, deriving logical
conclusion from data, one need the enough knowledge of statistics.
Variable
Prepared By: Sir Zahawat Sahib

A variable is any characteristics, number or any quantity that can be measured or counted. It is a
characteristic that takes on different values in different persons, places or things. Some examples of
variables include diastolic blood pressure, Heart rate, and the height of adult males, the weight of
preschool children and the age of patients seen in a dental clinic etc.
Types of Variable
a. Quantitative Variables
Variables that are measured on a numerical or quantitative scale, it can be measured in

that usual sense. For example the height of adult males, the weight of preschool children and the age of
patients seen in a dental clinic etc are the examples of quantitative variables. Measurements Made on
quantitative variables convey information regarding amount.
b. Qualitative Variables
Qualitative Variables is also called categorical variables. Many characteristics are not
capable of being measured some of them are ordered called ordinal and some of them cannot ordered
called Nominal. Qualitative variables can be coded to appear but there numbers are meaningless. For
example classification of peoples into some socio-economic group, Examination Grades etc.
Types of Quantitative Variable

Discrete Variables
Discrete variables are characterized by a gap or interruption in the values that it can
assume. These gaps or interruptions indicate the absence of values between particular values that the
variable can assume. It takes only whole numbers. For example the number of admission to general
hospitals, the number of decayed, missing teeth per child in an elementary school, the number or
prescriptions an individual takes daily.
Continuous Variables
A continuous variable assumes any value within a specified relevant interval. Examples
of continuous variables includes the various measurements that can be made on individuals such as
height (inch), weight (pounds), skull circumference, heartbeat, blood pressure, time to recovery (days)
Scale of Measurements
All characteristics in life cannot be measured through same scale not same statistical
procedure appropriate for handling every type of measurements. Psychologist “Stanly Smith Stevens” as
proposed four scales of measurements which cover nearly all area of learning.
They are:

Nominal, Ordinal, Interval, Ratio
Nominal Scale
As the name implies, it consist of “naming or labeling” observations or classifying them into
various mutually exclusive and collectively exhaustive categories and the observations of each
categories are counted. For example Gender (Male, Female), Marital stratus (Married, unmarried) etc
Data obtained by nominal scale are called nominal data or qualitative data and are analyzed by statistics
of attributes. Summery statistic “Mode” is computed from such data. Nominal data can be represented
by pie-chart or Bar chart.
Example Gender (Male=1, Female=2), Base Ball uniform numbers, the number provides no insight into
the play.
Ordinal Scale
Qualitative observations can be ranked or ordered according to some criterion e.g. with respect
to some quality or performance but interval among category is unknown or unequal. Ordinal scale
process natural ordering, for example Qualification (Matric, Inter, BA/BSC., Master, M.Phil., Ph.D.),
feelings (Very unhappy, unhappy, ok, happy, very happy) etc., the defects of this scale is that it as
unequal interval i.e. we don’t know the how much one category is better than the other, nor can we say
that a difference between ok and unhappy as the same as difference between vary happy and happy.
Data obtained by ordinal scale are called ordinal or ranked data. Summary statistic like Median,
Percentiles and spearmen’s rank correlation co-efficient are computed from ordinal data. Ordinal data
cannot be represented by pie-chart the best choice to present on the column-Bar chart. Note: Ordinal
scale implies a statement of “grater then” or “less than” without being able to state how much greater
or less.
Interval Scale
The interval scale has numeric ordered values, it fixed or equal intervals and can go below zero
(Means it can have negative values) where “0” is not the ordinary zero. In interval scale the distance
between two values are same i.e. the distance between 5-6 and degree as a same is that between 7-8
degree also it can go below zero, for example the temperature of Ice ad -5 degree. Interval scale not
only tells us about that values are smaller or bigger, but also tell that how much bigger or smaller, they
are unlike that of ordinal scale. For example if it is 450 on Sunday and 550 on Monday. We know not only
that it was hotter on Monday and also know that it was 100 hotter. Zero as meaningful on this scale and
does not mean the absence of the quality. I.e. Zero degree temperature is hotter than -1 degree.
Statistical methods like Mean, Median, and Mode etc. can be easily calculated from the interval data.

Ratio Scale
This scale has numeric ordered values with equal intervals but cannot go below zero. I.e. it can
take only positive values. Here zero is a true zero. For example “0” heart beat means no life similarly “0”
money means no money. We can use the word twice thrice here under this scale. Ratio scale is used for
measuring height, length, width, etc. Wide ranges of both descriptive and inferential statistics For
example Mean, Median, Mode, C.V, etc. can be calculated from the ratio data. Ratio scale is the subset
of the interval scale.
Independent Variable
An independent Variable is presumed to influence other variables. Sometimes independent
variables are called manipulated-variables or experimental variables. Independent variable is presumed
cause, whereas the dependent variables are presumed effect.
For example: How stress affects a mantel sate of a human being.
Dependent Variables
A dependent Variable is presumed to be effected by one or more independent variables. The
dependent variables is often called an outcome variable.
For example: If we are interested How stress affects Heart-rate in humans in this case, stress will be
independent variable and Heart-rate that will be dependent variable
Intervening/Mediating Variable
Intervening/Mediating variable whose existence is inferred but it cannot be measured. For
example determining the effects of video clips on learning ability of students of students of B.S the
association between video clips and leaner ability need to be explained.
Numerical Data/Quantitative Data

Numerical Data or quantitative data is a numerical measurement expressed in term of
numbers. These data having measuring as a measurement, such as a person height, person weight, or
blood pressure or they are count, such as the number of stock share a person own. Numerical data can
be further broken into two types: i.e. Discrete and continuous
Discrete Data
Discrete data represent items that can be counted; they take on possible values that can be
listed out. The list of possible values may be fixed also called finite or it may go from 0, 1, and 2 onto
infinity (making it countable infinite).for example the number of heads in 100 coins flips takes on values
0-100(finite case), but the number of flips needed to get a 100 takes on values from 100 up to infinity, if

you never get that 100th head, its possible values are listed as 100, 101, 102,…… representing countable
infinite case.
Continuous Data
Continuous Data represent measurement there possible values cannot be counted in can only
be describe using interval on the real number line. For example the exact amount of gas purchased at a
pump for cause with 20 gallon tanks would be continuous Data, from zero gallons to 20 gallons
represented by the interval [0-20] inclusive. You might pump 8.40 gallons, or 8.41 or 8.414 gallons, or
any possible number from 0-20. in this way continuous Data the thought of as being unaccountably
infinite.
Qualitative/Categorical Data
Qualitative data is categorical measurements expressed not in terms of numbers, but rather it
varies in kind or names. In statistics qualitative data is often used interchangeable with “Categorical
Data”. Categorical data represents characteristics such as person’s Gender, Marital status, or the types
of movies they like. Categorical Data can take on numerical values such as “1” indicating males and “2”
indicating female, but those numbers does not have mathematical meaning. A classic example defining
categorical data is given below.
Amount of money earned last week, birth date, exercise, Favorite sports, horse steps per night,
Language mostly spoken at home, foot length, opinion on environment conservation etc.
Categorical Data Numerical Data

Favorite Sports Birth Date
Language Mostly spoken at home Exercise
Opinion on environment conservation Horse Steps per Height
State Territory live in Foot Length
Important Question About the types of Variables
Question: What kind of Variable is Marital Status?
Answer: Marital status is qualitative/categorical variable. It can take on values such as “Married”,
“Widowed”, and “divorced”.
Question: What kind of Variable is song length?
Answer: Song Length is a quantitative variable. It can take on values such as “180 Second”, “189.2
Seconds”, and “210.0039 Seconds”, It continuous quantitative variable because it can take on infinite
number of values.

Survival Analysis
Survival analysis is the collection of statistical procedure for data analysis for which the outcome
variable of interest its time until an event occur. By time, we mean years, month, weeks or days from
beginning of follow up of an individual until an event occur. I.e. time refers to the age of an individual
when an event occur. By event mean death, disease indicates, relapse from remission, recovery.
Censored Data
Censoring occur when we have some information about an individual survival time, but we don’t know
survival time exactly.
For example
Leukemia Patients
As Simple Example of censoring, consider Leukemia Patients, following until they go out of remission.
Shown as “X”, if for a given patient, the study ends while the patient is still in remission (i.e. do not get
the event then the patient survival-time is considered as Censored). We know that for this person the
survival time is atleast as long as the period that the person has been followed, but the person goes out
of remission of the study ends, we don’t know the complete survival time.
Cause of Censored
1) A person doesn’t experiment the event before study ends.
2) A person is lost to follow up during the study period.
3) A person withdraws from the study due to death (If death is not the event of
interest) or some other reasons (Inverse drug reaction).
These Situations are allocated as.
 Person A is followed from start of study until getting the event at week 5. Therefor person A,
survival time is 5 week and is not censored.
 Person B is also observed at the start of the study but it is followed to the end of 12 week study
period without getting the event, the survival time here is censored because we can say only it
is at least 12 week.
 Person C, enter the study between 2nd and 3rd week and is followed until he or she withdraws
from the study at 6 week, this persons survival time is censored after 3.5 weeks.

 Person D, enter the study at week 4 and his followed without getting event. This person’s
censored time is 8 weeks.
 Person E, enter the study at week 3 and followed until week 9 when he is lost to follow up his
censored time is there for 6 weeks.
 Person F, enter at week 8 and his followed until getting the event at week 11.5 there for the
survival time is 3.5 weeks.
In short a six person were observed to get the event (person A and person F) and four
Censored (B, C, D, and E)
A table of the survival time data for Six Person is presented as:
Person Survival Failure (1) Censored

Time (0)
A 5 1 -----
B 12 ----- 0
C 3.5 ---- 0
D 8 ----- 0
E 6 ------ 0
F 3.5 1 -----
Types Of Censored
Right Censoring
When a person exist survival time become incomplete at the right side of the following a period,
occurring when the study ends or when the person’s lost to follow up are as withdrawn, this is called
right censoring.
Left Censoring
When a person exist survival time become incomplete at the right side of the
following up period for that person. For example, if we are following person’s with “HIV” infection, we
may start following up when a subject first test positive for the “HIV” Virus, but we may not know
exactly the time. First exposed to the virus thus, the survival time is censored on the left side.
Sampled Population
A population from which a sample is drawn or chosen is called sampled population
Target Population
A population about which information is required or wanted is called target population.

Explanation
Suppose we want to know the opinion of GPGC Nowshera Students about the examination system then
the sampled population may consist of the total number of students of statistics deptt, political science
deptt, etc. and the target population will consist of the total number of students in GPGC Nowshera
Odds
The odds in favor of an event are the ratio of the probability that an event will happen to the
probability that it will not happen.
For example: The odds a randomly chosen day are the week is a Sunday are one to six; 1/6, which is
same time to return 1/6 or 1:6
Example: There are five pink Marbles, 2 blue and 8 purples. What are the odds in favor of picking 1
blue Marble?
Solution: The odds of picking one blue Marble: odds= p/q; Where “P” is the probability of picking one
(𝟐) 𝟐
𝟏
blue Marble. 𝐏 = 𝟏𝟓 =
𝟏𝟓
𝟏 ODDS :
1 − p: is the probability of not picking blue Marble: the probability of an event will
occure divided by the
(𝟑) 𝟏𝟑
𝟏 probability of an event will not
𝟏−𝐩= 𝟏𝟓 =
𝟏𝟓
𝟏 occure is known as odds
OR 𝟏 − 𝐩 = 𝟏 −
𝟐
=
𝟏𝟑 Odds= occurance /non
𝟏𝟓 𝟏𝟓
occurance
𝟐 Range of odds is o to infinity
𝐩 𝟏𝟓 𝟐
𝐎𝐝𝐝𝐬 = = =
𝐪 𝟏𝟑 𝟏𝟑
𝟏𝟓
It means that the odds of picking Marble is less than a half as compared to the odds of picking a Marble
other than blue.
Example: The probability of diabetes in patient is 5%. Find the odds of diabetes.
Solution: The odds of diabetes in a patient is

𝐩 𝐩
PROBABILITY:
𝐎𝐝𝐝𝐬 = = probability refers to
𝟏−𝐩 𝐪
the likelihood of
Where “P” is the probability of diabetes in is patient of 5%= 5/100= 0.05
occurance of an
1-p is the probability of no diabetes in a patient= 1-0.05=0.95 event is known as
𝟎.𝟎𝟓 𝟏 probability.
𝐎𝐝𝐝𝐬 = =
𝟎.𝟗𝟓 𝟏𝟗 P= fevorable
𝐎𝐝𝐝𝐬 = 𝟏: 𝟗 outcome
The chance of diabetes in a patient is less than a half of the other.
Equaly likely
Odd Ratio
It is defined as the Ratio of the odds of an event occurring in one group to the odds of it
occurring in another group i.e. the odd ratio compares the relative odds in each group.
Following the typical two by two contingency table
X- X+
Y- a b a+b
Y+ c d c+d
a+c b+d n
Since odd Ratio is the ratio of two odds
𝐚
𝐚𝐝
𝐎. 𝐑 = 𝐛 =
𝐜 𝐛𝐜
𝐝
Odds can be computed from probability and probability can be computed from odds.
𝐩(𝐀)
𝐎𝐝𝐝𝐬 𝐢𝐧 𝐅𝐚𝐯𝐨𝐫 𝐨𝐟 𝐀 =
𝟏−𝐩(𝐀)
𝐨𝐝𝐝𝐬 𝐢𝐧 𝐅𝐚𝐯𝐨𝐫 𝐨𝐟 (𝐀)

𝐩(𝐀) =
𝟏 + 𝐨𝐝𝐝𝐬 𝐢𝐧 𝐅𝐚𝐯𝐨𝐫 𝐨𝐟 (𝐀)
Note that if the odds are same in each row then the odd ratio is 1.

Interpretation
If odd=1
An odd Ratio=1, indicates that the condition or event under study is equally likely to occur in both
groups.
If odd> 1
An odd Ratio>1, indicates that the condition or event under study is more likely to occur in first group.
If odd< 1
An odd Ratio<1, indicates that the condition or event under study is less likely to occur in first group.
The Odd Ratio must be non-negative i.e. odd>=0 If the odd of first group approaches to zero then the
odd Ratio approaches to zero. But when the odd of the second group approaches to zero then the odd
Ratio approaches to ∞
Example: Considered the following data on survival of passengers on the titanic. There were 851
males passengers 142 survival and 709 died. Compute the odd Ratio and interpret your result.
Dead Alive Total

Male 709 142 851
Female 154 308 462
Total 863 450 1313
Solution:
First we calculate the Odds

𝐚 𝟕𝟎𝟗
Odds of death among male= =
𝐛 𝟏𝟒𝟐
𝐜 𝟏𝟓𝟒
Odds of death among female= =
𝐝 𝟑𝟎𝟖
Now we Calculate O.R

𝟕𝟎𝟗
𝐎𝐝𝐝𝐬 𝐨𝐟 𝐝𝐞𝐚𝐭𝐡 𝐚𝐦𝐨𝐧𝐠 𝐦𝐚𝐥𝐞 𝟏𝟒𝟐
O.R= = 𝟏𝟓𝟒
𝐎𝐝𝐝𝐬 𝐨𝐟 𝐝𝐞𝐚𝐭𝐡 𝐚𝐦𝐨𝐧𝐠 𝐅𝐞𝐦𝐚𝐥𝐞
𝟑𝟎𝟖
O.R=9.98

Interpretation:
The males are 10 times more likely to die in the titanic as compared to females.
Example: Suppose that in a sample of 100 men, 60 have drunk wine in a previous week, while in a
sample of 100 women, only 20 have drunk wine in the same period. Calculate odd ratio and comments
your results.
Who drunk Who do not Total

wine drunk wine
Men 60 40 100
Women 20 80 100
Total 80 120 200
Solution:

𝐚 𝟔𝟎
Odds of men who drink wine= =
𝐛 𝟒𝟎
𝐜 𝟐𝟎
Odds of women who drink wine = =
𝐝 𝟖𝟎

𝟔𝟎
𝐎𝐝𝐝𝐬 𝐨𝐟 𝐦𝐞𝐧 𝐰𝐡𝐨 𝐝𝐫𝐮𝐧𝐤 𝐰𝐢𝐧𝐞 𝟒𝟎
O.R= = 𝟐𝟎
𝐎𝐝𝐝𝐬 𝐨𝐟 𝐰𝐨𝐦𝐞𝐧 𝐰𝐡𝐨 𝐝𝐫𝐮𝐧𝐤 𝐰𝐢𝐧𝐞
𝟖𝟎
O.R= 6
Interpretation:
The males are 6 times more likely to drink wine as compared to female in the previous week.
Question: If the prevalence of smoking among lung cancer patient in 95 per 100, and the prevalence
of smoking among peoples without lung cancer in 25 per 100. Calculate odd ratio and comments your
results.

Smoking Non-Smoking Total
Lung Cancer 95 5 100
patient
Patient without 25 75 100
Lung Cancer
Total 120 80 200
Solution:
𝐚 𝟗𝟓
Odds of Smoking among Ling cancer patient= =
𝐛 𝟓
𝐜 𝟐𝟓
Odds of smoking among patient without Lung cancer = =
𝐝 𝟕𝟓

𝟗𝟓
𝐎𝐝𝐝𝐬 𝐨𝐟 𝐒𝐦𝐨𝐤𝐢𝐧𝐠 𝐚𝐦𝐨𝐧𝐠 𝐋𝐮𝐧𝐠 𝐜𝐚𝐧𝐜𝐞𝐫 𝐩𝐚𝐭𝐢𝐞𝐧𝐭 𝟓
O.R= = 𝟐𝟓
𝐎𝐝𝐝𝐬 𝐨𝐟 𝐒𝐦𝐨𝐤𝐢𝐧𝐠 𝐚𝐦𝐨𝐧𝐠 𝐩𝐚𝐭𝐢𝐞𝐧𝐭 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐋𝐮𝐧𝐠 𝐜𝐚𝐧𝐜𝐞𝐫
𝟕𝟓
O.R= 57
Interpretation:
The Patient with Lung cancer is more likely than without Lung Cancer.
Important Questions
Question: What does an odd Ratio of 0.5 mean?
Answer: The odd ratio of 0.5 means that odds of the exposer being found in the case group is 50% less
than the odds of finding to the exposer in the control group.
Question: What does an odd Ratio of 0.75 means?
Answer: The odd ratio of 0.75 means that odds in one group the outcomes is 25% less likely i.e. an odd
Ratio less than “1” means that the first group less likely to experience the event. If odd Ratio is 1.33
mean that the second group is the outcome is 33% more likely than the first group.

Standard Error of log Odd Ratio
The Standard Error of log odd ratio is estimated simply but by square root of the sum of the reciprocals
of the four frequencies.
1 1 1 1
S.E (ln ) =√ + + +
𝑎 𝑏 𝑐 𝑑
Knowing this S.E one can tests the significance hypothesis H o; ln (𝜃) and construct the confidence
interval
Where “Z𝛼/2” is the value of “Z” defining the confidence limits
Example :
Who drink Who do not Total

wine drunk wine
Men 60 40 100
Women 30 70 100
Total 90 110 200
Calculate (1) Odd Ratio (2) Test the hypothesis ln (𝜃) =0 (3) C.I for ln (𝜃)
(1) Odd Ratio
Solution:

𝐚 𝟔𝟎
Odds of men who drink wine= =
𝐛 𝟒𝟎
𝐜 𝟑𝟎
Odds of women who drink wine = =
𝐝 𝟕𝟎

Bio Statistics Notes 𝟔𝟎
𝐎𝐝𝐝𝐬 𝐨𝐟 𝐦𝐞𝐧 𝐰𝐡𝐨 𝐝𝐫𝐮𝐧𝐤 𝐰𝐢𝐧𝐞 𝟒𝟎
O.R= = 𝟑𝟎
𝐎𝐝𝐝𝐬 𝐨𝐟 𝐰𝐨𝐦𝐞𝐧 𝐰𝐡𝐨 𝐝𝐫𝐮𝐧𝐤 𝐰𝐢𝐧𝐞
𝟕𝟎
O.R= 4
Interpretation:
The males are 4 times more likely to drink wine as compared to female.
(2) Test the hypothesis ln (𝜃) =0
Solution:
1) We formulate our null and alternative Hypothesis as
Ho=ln (𝜃) =0 vs. Ho=ln (𝜃)≠0
2) Level of significance
We set 𝛼=0.5
3) Test Statistic to be used
4) Computation
=3.5 =1.2527
𝟏.𝟐𝟓𝟐𝟕
𝐙=
𝟏 𝟏 𝟏 𝟏
√ + + +
𝟔𝟎 𝟒𝟎 𝟑𝟎 𝟕𝟎
𝐙 = 𝟒. 𝟏𝟗𝟐
5) Critical Region
𝐙 ≥ 𝐙0.025= ±𝟏. 𝟗𝟔

6) Conclusion
Since 𝑧 = 4.19 falls in the critical region, so therefor we reject Ho that the association
between sex and drunken wine is significant at α = 0.05 level
(3) C.I for ln (𝜃)
The 100(1-𝜶)% C.I for ln(𝜽)
±1.96
=1.2527 S.E =0.2988

1.2527 ± 1.96(0.2988)
1.2527 − 0.5856,1.2527 + 0.5856
(0.671,1.8383)
The 95%C.I For 𝜃
(e0.6705 , e1.8383 )
(1.955,6.2858)
Since the C.I for 𝜃 doesn’t include “1” so there for significant association between Gender and
drunk wine.
Question:
Who is more likely to drink beer on queen(s) day student or teacher?
Drink Beer do not drink Total

beer
Student 90 10 100
Women 80 70 100
Total 170 30 200

Incidence
Incidence is a measure of diseases that allow us to determine a person’s probability being
diagnose with a diseases during a given period of time. Therefor incidence is the number of newly
diagnosed cases of diseases. An incidence rate is the number of new cases of a diseases divided by the
number of persons at risk for that diseases.
𝐍𝐞𝐰 𝐂𝐚𝐬𝐞𝐬
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 =
𝐓𝐨𝐭𝐚𝐥 𝐏𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
OR
𝐍𝐎: 𝐨𝐟 𝐍𝐞𝐰 𝐂𝐚𝐬𝐞𝐬
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐑𝐚𝐭𝐞 =
𝐍𝐎: 𝐨𝐟 𝐏𝐞𝐨𝐩𝐥𝐞 𝐚𝐭 𝐫𝐢𝐬𝐤 𝐢𝐧 𝐠𝐢𝐯𝐞𝐧 𝐭𝐢𝐦𝐞 𝐅𝐫𝐚𝐦𝐞
Example: If over the course of 1 year 5 women are diagnose with breast cancer out of the total female
study population of the 200.
Solution: Five women are diagnosing with breast cancer out of the total female study population of the
200(who do not breast cancer at the beginning of the study period. Then we would say that the
incidence of breast cancer in this population is:
𝟓
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = = 𝟎. 𝟎𝟐𝟓
𝟐𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟎. 𝟎𝟐𝟓 × 𝟏𝟎𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟐𝟓 𝐜𝐚𝐬𝐞𝐬 𝐩𝐞𝐫 𝟏𝟎𝟎𝟎
Question: In a population of 1000, non-diseased persons, 28 were infected with HIV over two years of
observation.
Solution:
𝐍𝐞𝐰 𝐂𝐚𝐬𝐞𝐬
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = ×𝐊
𝟐𝟖
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = =
𝟏𝟎𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟐𝟖 𝐂𝐚𝐬𝐞𝐬 𝐩𝐞𝐫 𝟏𝟎𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟐. 𝟖% 𝐨𝐯𝐞𝐫 𝐭𝐰𝐨 𝐲𝐞𝐚𝐫 𝐩𝐞𝐫𝐢𝐨𝐝

Question: 100 new Cases occurred in a population of 50000 in a year. Calculate Incidence rate
Solution:

𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 =
𝟏𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 = 𝐊
𝟓𝟎𝟎𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 = 𝟐 𝐏𝐞𝐫 𝟏𝟎𝟎𝟎 𝐩𝐞𝐫𝐬𝐨𝐧𝐬 𝐏𝐞𝐫 𝐘𝐞𝐚𝐫
Prevalence
It refer to all “old and new” cases existing at a given point or period of time in the given
population. The total number of individuals who have an attribute or diseases at a particular time (or
during a particular period) divided by the population at risk at that time. (Or Mid-year population), A
prevalence rate is the total number of cases of a diseases existing in a population divided by the total
population.
𝐨𝐥𝐝 + 𝐧𝐞𝐰 𝐜𝐚𝐬𝐞𝐬

𝐏𝐫𝐞𝐯𝐚𝐥𝐚𝐧𝐜𝐞 𝐑𝐚𝐭𝐞 = ×𝐊
𝐓𝐨𝐭𝐚𝐥 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
Question: If measurement of cases is taken the population of 40000 people and 1200 were recently
diagnosed and 3500 are living with cancer then find prevalence rate.
Solution:
𝐨𝐥𝐝 + 𝐧𝐞𝐰 𝐜𝐚𝐬𝐞𝐬

𝐏𝐫𝐞𝐯𝐚𝐥𝐚𝐧𝐜𝐞 𝐑𝐚𝐭𝐞 = ×𝐊
𝐓𝐨𝐭𝐚𝐥 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
𝟏𝟐𝟎𝟎 + 𝟑𝟓𝟎𝟎
𝐏𝐫𝐞𝐯𝐚𝐥𝐚𝐧𝐜𝐞 𝐑𝐚𝐭𝐞 = × 𝟏𝟎𝟎𝟎
𝟒𝟎𝟎𝟎𝟎
𝐏𝐫𝐞𝐯𝐚𝐥𝐚𝐧𝐜𝐞 𝐑𝐚𝐭𝐞 = 𝟏𝟏𝟖
The prevalence cancer is 118 per 1000 person in the population.
Types of Prevalence
Point Prevalence
The number of all current cases (old+ new) of a disease at one point of time in relation to a
defined population, at that point of time, point of time may be a day/several days/weeks etc depending
upon the time required to examine the entire population .

𝑵𝒐:𝒐𝒇 𝒂𝒍𝒍 𝒄𝒂𝒔𝒆𝒔 (𝒐𝒍𝒅+𝒏𝒆𝒘)𝒐𝒇 𝒂 𝒔𝒑𝒆𝒄𝒊𝒇𝒊𝒆𝒅 𝒅𝒊𝒔𝒆𝒂𝒔𝒆 𝒆𝒙𝒊𝒔𝒕𝒊𝒏𝒈 𝒂𝒕 𝒂 𝒈𝒊𝒗𝒆𝒏 𝒑𝒐𝒊𝒏𝒕 𝒐𝒇 𝒕𝒊𝒎𝒆
Point Prevalence= 𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝑷𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝒂𝒕 𝒔𝒂𝒎𝒆 𝒑𝒐𝒊𝒏𝒕 𝒐𝒇 𝒕𝒊𝒎𝒆
Question: one Health extension worker conducted a survey in one of the near by the elementary
school on 10th march 1997 to know the prevalence of trachoma in that school. The total no: of students
in that school 200. The Health extension worker examined all 200 students for trachoma 100 students
were found to have trachoma. Calculate point prevalence rate of trachoma in that school.
Solution:
𝐀𝐥𝐥 𝐒𝐭𝐮𝐝𝐞𝐧𝐭𝐬 𝐰𝐢𝐭𝐡 𝐭𝐫𝐚𝐜𝐡𝐨𝐦𝐚 𝐨𝐧 𝟏𝟎 𝐌𝐚𝐫𝐜𝐡 𝟏𝟗𝟗𝟕
Point Prevalence=
𝐓𝐡𝐞 𝐭𝐨𝐭𝐚𝐥 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧 𝐢𝐧 𝐭𝐡𝐚𝐭 𝐒𝐜𝐡𝐨𝐨𝐥
𝟏𝟎𝟎
Point Prevalence=𝟐𝟎𝟎 × 𝟏𝟎𝟎
Point Prevalence=𝟓𝟎
So that 50 trachoma patient per 100 students on 10th march 1997. Which means that 50% of the
students in that school affected by trachoma.
Period Prevalence
The proportion of individuals is a specified population at risk who has the disease of interest
over a specified period of time. I.e. Annual prevalence, life time prevalence, (when the time of
prevalence rate is not specified it is usually point prevalence.
𝐍𝐨:𝐨𝐟 𝐞𝐱𝐢𝐬𝐭𝐢𝐧𝐠 𝐜𝐚𝐬𝐞𝐬 𝐚𝐧𝐝 𝐧𝐞𝐰 𝐜𝐚𝐬𝐞𝐬

Period Prevalence=
Question: Between June 30 and august 30th 1999, Average Population of 1600, 29 existing cases of
hepatitis B on June 30, 6 incidences (New cases) of hepatitis B between July 1st and August 30.Find the
period prevalence.
Solution:
𝐍𝐨:𝐨𝐟 𝐞𝐱𝐢𝐬𝐭𝐢𝐧𝐠 𝐜𝐚𝐬𝐞𝐬 𝐚𝐧𝐝 𝐧𝐞𝐰 𝐜𝐚𝐬𝐞𝐬
Period Prevalence= 𝐓𝐨𝐭𝐚𝐥 𝐏𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
𝟐𝟗+𝟔
Period Prevalence= 𝟏𝟔𝟎𝟎
Period Prevalence=𝟎. 𝟎𝟐𝟐
So there 2% disease of total population who have the disease

Question: population size of 1000 people, December 31, 2014 100 people had diabetes June 30 2015,
the current have diabetes =125. So the number of new cases =25 Find the prevalence on June 30th 2015
and incidence Rate
Solution:
𝑵𝒐:𝒐𝒇 𝒂𝒍𝒍 𝒄𝒂𝒔𝒆𝒔 (𝒐𝒍𝒅+𝒏𝒆𝒘)𝒐𝒇 𝒂 𝒔𝒑𝒆𝒄𝒊𝒇𝒊𝒆𝒅 𝒅𝒊𝒔𝒆𝒂𝒔𝒆 𝒆𝒙𝒊𝒔𝒕𝒊𝒏𝒈 𝒂𝒕 𝒂 𝒈𝒊𝒗𝒆𝒏 𝒑𝒐𝒊𝒏𝒕 𝒐𝒇 𝒕𝒊𝒎𝒆
Point Prevalence= 𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝑷𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝒂𝒕 𝒔𝒂𝒎𝒆 𝒑𝒐𝒊𝒏𝒕 𝒐𝒇 𝒕𝒊𝒎𝒆
𝟏𝟐𝟓
Point Prevalence=𝟏𝟎𝟎𝟎
Point Prevalence=𝟎. 𝟏𝟐𝟓

𝟐𝟓
𝟗𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 = 𝟎. 𝟎𝟐𝟕
Relative Risk
A Relative Risk can only be calculated from prospective studies (cohort study). It can be defined
as the ratio of the incidence rate among exposed to the incidence rate among non-exposed.
Mathematically
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐫𝐚𝐭𝐞 𝐚𝐦𝐨𝐧𝐠 𝐞𝐱𝐩𝐨𝐬𝐞𝐝

𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐑𝐢𝐬𝐤 =
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐫𝐚𝐭𝐞 𝐚𝐦𝐨𝐧𝐠 𝐧𝐨𝐧 − 𝐞𝐱𝐩𝐨𝐬𝐞𝐝
Considered the following 2×2 contingency table for the calculation of measure of association.
Exposure Outcome
Present Present Absent Total
Absent A B a+b
Total C D c+d
a+c b+d N
Interpretation
If R.R=1

If R.R=1, then the incidence in the exposed is the same as the incidence in the non-exposed. No increase
Risk i.e. No association between exposer and outcomes.
If R.R>1
If R.R>1, then the incidence in the exposed is greater than the incidence in the non-exposed. Increase
Risk of outcome among exposed. It is positive association i.e. (exposure is the harmful so those who are
exposed are at higher risk of suffering from diseased for those who are not-exposed.
If R.R<1
If R.R<1, then the incidence in the exposed is lower than the incidence in non-exposed. I.e. the
Decreased Risk, It is negative association. The exposure is protective.
For example: providing Vaccine to group will be our exposure and not providing Vaccine will be non-
exposed. If R.R<1, providing Vaccine is protective.
Note: The further the R.R is from 1 the stronger is the association.
Example: Suppose we are researching the effect of benzene exposure in cancer we go to a work where
there is non-potential for exposure to benzene. There are 483 people in the work center. However only
212 were are exposed to benzene in their work duties, 12% of the work center employees. Our
discovery finds that 40 people with cancer were in exposure group. Calculate the relative risk.
Cancer Total
Benzene 40 172` 212
Exposure
Not 18 253 271
Benzene
Exposure
Total 58 425 483
Solution:
𝐚
Diseased risk among exposed = = 𝟎. 𝟏𝟖𝟖𝟔
𝐚+𝐛
𝐜
Diseased risk among not exposed= =0.0664
𝐜+𝐝
Now we Calculate R.R
𝐃𝐢𝐬𝐞𝐚𝐬𝐞𝐝 𝐫𝐢𝐬𝐤 𝐚𝐦𝐨𝐧𝐠 𝐞𝐱𝐩𝐨𝐬𝐞𝐝 𝟎.𝟏𝟖𝟖𝟔

R.R= =
𝐃𝐢𝐬𝐞𝐚𝐬𝐞𝐝 𝐫𝐢𝐬𝐤 𝐚𝐦𝐨𝐧𝐠 𝐧𝐨𝐭 𝐞𝐱𝐩𝐨𝐬𝐞𝐝 𝟎.𝟎𝟔𝟔𝟒

O.R= 2.8406
Interpretation:
We can say that if we are exposed to benzene 2.84 times more likely to get cancer, if we are not
exposed to benzene.
Question:
Outcome Total
Exposure 366 32 398
Exposure 64 319 383
Total 430 351 781
Solution:
𝐚
Diseased risk among exposed = = 𝟎. 𝟗𝟏𝟗𝟓
𝐚+𝐛
𝐜
Diseased risk among not exposed= =0.1671
𝐜+𝐝
𝐃𝐢𝐬𝐞𝐚𝐬𝐞𝐝 𝐫𝐢𝐬𝐤 𝐚𝐦𝐨𝐧𝐠 𝐞𝐱𝐩𝐨𝐬𝐞𝐝 𝟎.𝟗𝟏𝟗𝟓

R.R= =
𝐃𝐢𝐬𝐞𝐚𝐬𝐞𝐝 𝐫𝐢𝐬𝐤 𝐚𝐦𝐨𝐧𝐠 𝐧𝐨𝐭 𝐞𝐱𝐩𝐨𝐬𝐞𝐝 𝟎.𝟏𝟔𝟕𝟏
O.R=5.50
Interpretation:
We can say that if we are exposed group are 5.50 times more likely than the non-exposed
group.
Question:
Smoking Low Birth Rate Total

Status
Smokers 120 240 360
Non- 60 580 640
smokers
Total 180 820 1000

Solution:
𝐚
Incidence of LBW among smokers = = 𝟎. 𝟑𝟑
𝐚+𝐛
𝐜
Incidence of LBW among non-smokers= =0.09375
𝐜+𝐝
𝐃𝐢𝐬𝐞𝐚𝐬𝐞𝐝 𝐫𝐢𝐬𝐤 𝐚𝐦𝐨𝐧𝐠 𝐞𝐱𝐩𝐨𝐬𝐞𝐝 𝟎.𝟑𝟑

R.R= =
𝐃𝐢𝐬𝐞𝐚𝐬𝐞𝐝 𝐫𝐢𝐬𝐤 𝐚𝐦𝐨𝐧𝐠 𝐧𝐨𝐭 𝐞𝐱𝐩𝐨𝐬𝐞𝐝 𝟎.𝟎𝟗𝟑𝟕𝟓
O.R= 3.6
Interpretation:
Based on the study smokers are 3.6 times more likely to suffer LBW then from non-smokers.
Question:
In a prospective study of pregnant women, the collective information on exercise leader of low
risk pregnant women. A group of 217 women’s did no voluntary exercise during the pregnancy; while
the group of 238 women exercises extensively outcome variable of interest is exercising preterm Labor.
The result is summarized as:
Risk Cases of preterm Total

Factor Labor
Extreme 22 216 238
Exercising
Not 18 199 217
Extreme
Exercising
Total 40 415 455
Solution:
𝐚
Incidence of cases of preterm Labor extreme exercise = = 𝟎. 𝟎𝟗𝟐
𝐚+𝐛
𝐜
Incidence of cases of preterm Labor not extreme exercise= =0.082
𝐜+𝐝

𝐃𝐢𝐬𝐞𝐚𝐬𝐞𝐝 𝐫𝐢𝐬𝐤 𝐚𝐦𝐨𝐧𝐠 𝐞𝐱𝐩𝐨𝐬𝐞𝐝 𝟎.𝟎𝟗𝟐
R.R= =
𝐃𝐢𝐬𝐞𝐚𝐬𝐞𝐝 𝐫𝐢𝐬𝐤 𝐚𝐦𝐨𝐧𝐠 𝐧𝐨𝐭 𝐞𝐱𝐩𝐨𝐬𝐞𝐝 𝟎.𝟎𝟖𝟐
O.R= 1.12
Interpretation:
The result indicate that the risk of experiencing preterm labor when a women exercises heavily
is 1.12 times greater than the women who do not exercise at all.
C.I for R.R

To construct the C.I for R.R one has to follow the following steps.
Estimate the R.R from the given data.
Find the natural log “ln” of R.R I.e. “ln (R.R)”.
Find the Confidence interval from the standard normal distribution 1.96 for 95% C.I.
Calculate the standard error of ln (R.R) by using the formula
𝐛 𝐝
𝐒. 𝐄𝐥𝐧(𝐑. 𝐑) = √ +
𝐚(𝐚 + 𝐛) 𝐜(𝐜 + 𝐝)
Calculate the lower and upper on the ln scale.
𝐥𝐧(𝐑. 𝐑) ± 𝟏. 𝟗𝟔𝐒. 𝐄 𝐥𝐧(𝐑. 𝐑)

Use the exponential function to the limits of the original scale.
I.e. exp (L, L), exp (U, L)
If the 95% C.I doesn’t contain the value “1” the association is set to be statistically significant
at α=0.05 level.
Question:
Physicians enrolled in the physician health study were randomly assigned to take daily
aspirim or placebo. The table provides the number with M.I in each group.

Myocardial information Total
Group Yes No
Aspirim 139 10898 11037
Placebo 239 10795 11034
Total 378 21693 22071
Calculate (1) Calculate R.R (2) Construct the 95% C.I for R.R
Solution:
𝐚
Incidence of M.I among Aspirim= = 𝟎. 𝟎𝟏𝟐
𝐚+𝐛
𝐜
Incidence of M.I among placebo= =0.021
𝐜+𝐝
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐨𝐟 𝐌.𝐈 𝐚𝐦𝐨𝐧𝐠 𝐀𝐬𝐩𝐢𝐫𝐢𝐦 𝟎.𝟎𝟏𝟐

R.R= =
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐨𝐟 𝐌.𝐈 𝐚𝐦𝐨𝐧𝐠 𝐩𝐥𝐚𝐜𝐞𝐛𝐨 𝟎.𝟎𝟐𝟏
O.R=0.571
Interpretation:
The relative risk estimate=0.58 which indicates that physicians in the aspirim group had a lower
risk of M.I then physics in the placebo group.
Construct the 95% C.I for R.R
R.R=0.58 ln (R.R) = -0.5447
𝐛 𝐝
𝐒. 𝐄𝐥𝐧(𝐑. 𝐑) = √ +
𝐚(𝐚 + 𝐛) 𝐜(𝐜 + 𝐝)
𝟏𝟎𝟖𝟗𝟖 𝟏𝟎𝟕𝟗𝟓
𝐒. 𝐄𝐥𝐧(𝐑. 𝐑) = √ +
𝟏𝟑𝟗(𝟏𝟏𝟎𝟑𝟕) 𝟐𝟑𝟗(𝟏𝟏𝟎𝟑𝟒)
𝐒. 𝐄𝐥𝐧(𝐑. 𝐑) = 𝟎. 𝟏𝟎𝟓𝟖
Now the 100(1- 𝜶) % for ln (R.R)

𝐥𝐧(𝐑. 𝐑) ± 𝟏. 𝟗𝟔𝐒. 𝐄 𝐥𝐧(𝐑. 𝐑)
(−0.5447 ± 1.96(0.1058)
(−0.5447 − 0.207368, −0.5447 + 0.207368)
(−0.752068, −0.337332)
The 95%C.I for R.R
(e−0.752068 , e−0.337332 )
(0.47139,0.7136)
The 95% C.I indicates that the decreased risk related to daily aspirim use is significant at 𝜶=0.05
level, since the interval does not contain “1”.
Sensitivity 𝐩(𝐓+/𝐃+)
It is the probability of positive test result given the individual as the disease. i.e. the likelihood of
a disease individual getting a positive test result. It is also called true positive. The countermand of this is
false negative.
𝐚 𝐓𝐏
𝐩(𝐓+/𝐃+) = =
𝐚+𝐜 𝐓𝐏+𝐅𝐍
False Negative 𝐩(𝐓-/𝐃+)

The probability that test result is negative when actually the person is suffering from diseases.
OR The probability that is suffering from disease given test result is negative.
𝐜 𝐅𝐍
𝐩(𝐓-/𝐃+) = =
𝐚+𝐜 𝐓𝐏+𝐅𝐍
Note: False Negative+ True Positive=1 or 100%

Specificity 𝐩(𝐓-/𝐃-)
The probability of negative test result given the individual doesn’t have the disease .i.e. the
likelihood of non-disease individual getting a negative-test result. It is also called true negative. The
countermand of this is called false positive.
𝐝 𝐓𝐍
𝐩(𝐓-/𝐃-) = =
𝐛+𝐝 𝐓𝐍+𝐅𝐏
False Positive 𝐩(𝐓+/𝐃-)

The probability that test result is positive when actually the person is not suffering from the
diseases. OR The probability that is not suffering from disease given test result is positive.
𝐛 𝐅𝐏
𝐩(𝐓+/𝐃-) = =
𝐛+𝐝 𝐅𝐏+𝐓𝐍
Note: False Positive+ True Negative=1 or 100%

Screening Test Disease Total
T+ (a) TP (b) FP a+b
T- (c) FN (d) TN c+d
Total a+c b+d n
Positive Predictive Value (P.P.V) 𝐩(𝐓+/𝐃+)

The Probability that a person test positive has the disease .i.e. the probability that a subject has the
disease given the subject has a positive test result.
𝐚 𝐓𝐏
𝐩(𝐓+/𝐃+) = =
𝐚+𝐛 𝐓𝐏+𝐅𝐏
Negative Predictive Value (N.P.V) 𝐩(𝐓-/𝐃-)

The probability that a person, who test is negative, does not have the disease .i.e. probability that a
subject doesn’t have the disease give the subject has a negative test result.
𝐝 𝐓𝐍
𝐩(𝐓-/𝐃-) = =
𝐜+𝐝 𝐓𝐍+𝐅𝐍

Question:
If the total number of positive test in 350, out of which 200 actually have the diseases till 2018,
the late has tested 1000 cases total no; of patients are 400 , construct the table and calculate Sensitivity,
Specificity, False negative, False positive, P.P.V and N.P.V

T+ 200 150 350
T- 200 450 650
Total 400 600 1000
𝐓𝐏 𝐚 𝟐𝟎𝟎
Sensitivity 𝐩(𝐓+/𝐃+) = = = = 𝟎. 𝟓
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟎𝟎
This means that there is 50% chance that the person would get the positive test result when actually
he has the disease
𝐓𝐍 𝐝 𝟒𝟓𝟎
Specificity 𝐩(𝐓-/𝐃-) = = = = 𝟎. 𝟕𝟓
𝐓𝐍+𝐅𝐏 𝐛+𝐝 𝟔𝟎𝟎
This means that there is 75% chance that the person would get the negative test result when actually
he has the no disease
𝐅𝐍 𝐜 𝟐𝟎𝟎
False Negative 𝐩(𝐓-/𝐃+) = = = = 𝟎. 𝟓
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟎𝟎
he has the disease
𝐅𝐏 𝐛 𝟏𝟓𝟎
False Positive 𝐩(𝐓+/𝐃-) = = = =0.25
𝐅𝐏+𝐓𝐍 𝐛+𝐝 𝟔𝟎𝟎
𝐓𝐏 𝐚
Positive Predictive Value (P.P.V) 𝐩(𝐓+/𝐃+) = = =0.57
𝐓𝐏+𝐅𝐏 𝐚+𝐛
he has the disease

𝐓𝐍 𝐝
Negative Predictive Value (N.P.V) 𝐩(𝐓-/𝐃-) = = = 𝟎. 𝟔𝟗
𝐓𝐍+𝐅𝐍 𝐜+𝐝
Question:

T+ 36 25 61
T- 9 230 239
Total 45 255 300
𝐓𝐏 𝐚 𝟑𝟔
Sensitivity 𝐩(𝐓+/𝐃+) = = = = 𝟎. 𝟖
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟓
he has the disease
𝐓𝐍 𝐝 𝟐𝟑𝟎
Specificity 𝐩(𝐓-/𝐃-) = = = = 𝟎. 𝟗𝟎
𝐓𝐍+𝐅𝐏 𝐛+𝐝 𝟐𝟓𝟓
𝐅𝐍 𝐜 𝟗
False Negative 𝐩(𝐓-/𝐃+) = = = = 𝟎. 𝟐
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟓
he has the disease
𝐅𝐏 𝐛 𝟐𝟓
False Positive 𝐩(𝐓+/𝐃-) = = = = 𝟎. 𝟎𝟗
𝐅𝐏+𝐓𝐍 𝐛+𝐝 𝟐𝟓𝟓
This means that there is 9.80% or 10% chance that the person would get the positive test result when
actually he has the no disease
𝐓𝐏 𝐚
Positive Predictive Value (P.P.V) 𝐩(𝐓+/𝐃+) = = =0.59
𝐓𝐏+𝐅𝐏 𝐚+𝐛

he has the disease
𝐓𝐍 𝐝
Negative Predictive Value (N.P.V) 𝐩(𝐓-/𝐃-) = = = 𝟎. 𝟗𝟔
𝐓𝐍+𝐅𝐍 𝐜+𝐝
Note: P.P.V and N.P.V are affected by prevalence, when prevalence increases P.P.V increases
and N.P.V decreases.
𝐓𝐏 𝐓𝐏 𝐓𝐍 𝐓𝐍
𝐏. 𝐏. 𝐕 = = 𝐍. 𝐏. 𝐕 = =
𝐀𝐥𝐥 𝐓𝐞𝐬𝐭 𝐏𝐨𝐬𝐢𝐭𝐢𝐯𝐞 𝐓𝐏+𝐅𝐏 𝐀𝐥𝐥 𝐓𝐞𝐬𝐭 𝐍𝐞𝐠𝐚𝐭𝐢𝐯𝐞 𝐅𝐍+𝐓𝐍
P.P.V increases with increased specificity so higher the specificity the higher will be its P.P.V
P.P.V also increases with prevalence, N.P.V increases with increased sensitivity and decreases
with increases prevalence, so the higher the prevalence the lower will be N.P.V
Observational Studies
There are two basic types of Observational Studies
Prospective study
A prospective study is an observational study in which two random samples of subjects are
selected. One sample consists of subject who processes the risk factor, and the other sample consists of
subject who does not process the risk factor. The subjects are followed into the feature i.e. they are
followed prospectively and record in kept on the no: of subject in each sample who at some point in
time are classifiable into each of the categories of outcome variable. The data resulting from a
prospective study involving two dichotomous variables can be displayed in 2× 2 contingency table that
usually provides information regarding the no: of subjects with and without risk factor and the number
who did and did not succumb to the diseases of interest as well as the frequency for each combination
of categories of the two variables.
Classification of subject with respect to diseases status and risk factor
Disease Status
Risk Factor Present Absent Total
Present A b a+b
Absent C d c+d
Total a+c b+d Total

Retrospective Study
Retrospective Study is reverse of prospective study. The samples are selected from those falling
into the categories of outcome variable. Then the investigator look back (takes a retrospective look) at
the subject and determines which one have (or hold) and which one do not have (or did not had) the
risk factor. In general prospective study is more expensive then retrospective study.
Case Control Study
It is a type of retrospective study, in which two groups with different known outcomes are
compared, that’s way one group have the disease and the other doesn’t have the disease. We compere
the subjects who have a disease (the cases) with subjects who do not have that disease (the control).
We calculate Odd Ratio (O.R) from the case control study.
Risk Factor
A risk factor is something that increases your chance of getting a disease, this risk come from
something you do. For example Smoking increases your chance of developing colon cancer, therefor
smoking is a risk factor for colon cancer.
Application of Bayes’ theorem to find P.P.V
Say you are given the prevalence as A, sensitivity as B, and specificity as C.
Probability of obtaining TP=A×B Probability of obtaining FP= (1-A)×(1-C)
Now you have everything you need for P.P.V

𝐓𝐏
𝐏. 𝐏. 𝐕 = 𝐓𝐏+𝐅𝐏
Application of Bayes’ theorem to find N.P.V
Say you are given the prevalence as A, sensitivity as B, and specificity as C.
Probability of obtaining TN=A×C Probability of obtaining FN= (1-B)×(1-C)
𝐓𝐍
𝐍. 𝐏. 𝐕 =
𝐓𝐍 + 𝐅𝐍
Where TN and FN, we have all we need to find N.P.V
Bayes’ Rule by using previous information
P (A/B) =P (A).P (B/A)

Sensitivity and Specificity are easy to evaluate by case control study but predictivity requires. That the
subjects followed until such time that their diseases status is confused such as present or absent, this
could be very time confusing and expensive. Thus predictivity difficult to evaluate, but however there is
another approach called Bayes’ rule which utilized some priori (or additionally) information. It
prevalence of the diseases in the target population is known predictivity can be fixed by using sensitivity
and specificity which are under. We calculate these probabilities by using the knowledge of sensitivity
p(T+/D+), specificity p(T-/D-) and the probability of the relative disease in the general population P (D)
it is usually obtained from another independent study.
Application of Bayes’ theorem to calculate P.P.V

𝐏(𝐓∩𝐃)
𝐩(𝐃/𝐓)= 𝐏(𝐓)
̅)
𝐓 = (𝐓 ∩ 𝐃) ∪ (𝐓 ∩ 𝐃
̅)
𝐏(𝐓) = (𝐓 ∩ 𝐃) + (𝐓 ∩ 𝐃 Equation A
̅ are mutually exclusive

As 𝐓 ∩ 𝐃 𝐚𝐧𝐝 𝐓 ∩ 𝐃
𝐏(𝐓 ∩ 𝐃) = 𝐏(𝐃). 𝐏(𝐓/𝐃)
̅ ) = 𝐏(𝐃
𝐏(𝐓 ∩ 𝐃 ̅ ). 𝐏(𝐓/𝐃
̅)
Put in Equation A
̅ ). 𝐏(𝐓/𝐃
𝐏(𝐓) = 𝐏(𝐃). 𝐏(𝐓/𝐃) + 𝐏(𝐃 ̅)
Therefor we reach to the following version of Bayes ‘theorem for P.P.V

𝐏(𝐃).𝐏(𝐓/𝐃)
𝐩(𝐃/𝐓) = 𝐏(𝐃).𝐏(𝐓/𝐃)+𝐏(𝐃̅).𝐏(𝐓/𝐃̅)
𝐏(𝐓/𝐃) = 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲 ̅ )= 𝟏 − 𝐒𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲

𝐏(𝐓/𝐃
𝐏(𝐃) = 𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐨𝐟 𝐝𝐢𝐬𝐞𝐚𝐬𝐞 𝐢𝐧 𝐠𝐞𝐧𝐞𝐫𝐚𝐥 𝐩𝐨𝐩𝐚𝐮𝐥𝐚𝐭𝐢𝐮𝐨𝐧.
̅ ) = 𝟏 − 𝐏(𝐃)
𝐏(𝐃
Application of Bayes’ theorem to calculate N.P.V

̅ ̅
̅ )=𝐏(𝐃∩𝐓)
̅ /𝐓
𝐩(𝐃 ̅)
𝐏(𝐓
̅ ). 𝐏(𝐓
𝐏(𝐃 ̅ /𝐃
̅)
̅ /𝐓
𝐩(𝐃 ̅) =
𝐏(𝐓)̅̅̅

̅ = (𝐓
𝐓 ̅∩𝐃
̅ ) ∪ (𝐓
̅ ∩ 𝐃)
̅ ) = (𝐓
𝐏(𝐓 ̅∩𝐃
̅ ) + (𝐓
̅ ∩ 𝐃) Equation A
̅∩𝐃
As 𝐓 ̅ 𝐚𝐧𝐝 𝐓
̅ ∩ 𝐃 are mutually exclusive
̅∩𝐃
𝐏(𝐓 ̅ ) = 𝐏(𝐃
̅ ). 𝐏(𝐓
̅ /𝐃
̅)
̅ ∩ 𝐃) = 𝐏(𝐃). 𝐏(𝐓
𝐏(𝐓 ̅ /𝐃)
Put in Equation A
̅ ) = 𝐏(𝐃
𝐏(𝐓 ̅ ). 𝐏(𝐓
̅ /𝐃
̅ ) + 𝐏(𝐃). 𝐏(𝐓
̅ /𝐃)
Therefor we reach to the following version of Bayes ‘theorem for N.P.V

̅ ̅ ̅
𝐏(𝐃).𝐏(𝐓/𝐃)
̅ /𝐓
𝐩(𝐃 ̅) =
̅ ).𝐏(𝐓
𝐏(𝐃 ̅ /𝐃
̅ )+𝐏(𝐃).𝐏(𝐓
̅ /𝐃)
̅ /𝐃)= 𝟏 − 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲
𝐏(𝐓
Question:
Medical Research team wishes to evaluate a proposed screening test for Alzheimer’s disease.
The was given to a random sample of 450 patients with Alzheimer’s disease and an independent random
sample of 500 patients without symptoms of the diseases the two samples were drawn from population
of subjects who were 65 years of age or older. The result is as follows.
Disease Status
Alzheimer’s Alzheimer’s Total
Present Absent
T+ 436 5 441
T- 14 495 509
Total 450 b+d 950
Based on another independent study it is known that the % of patients with Alzheimer’s disease is 11.3%
out of all subjects who were 65 years of age or older. First we calculate sensitivity and specificity as
follows.
Solution:
𝐓𝐏 𝐚 𝟒𝟑𝟔
Sensitivity 𝐩(𝐓+/𝐃+) = = = = 𝟎. 𝟗𝟔
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟓𝟎
he has the disease

𝐓𝐍 𝐝 𝟒𝟗𝟓
Specificity 𝐩(𝐓-/𝐃-) = = = = 𝟎. 𝟗𝟗
𝐓𝐍+𝐅𝐏 𝐛+𝐝 𝟓𝟎𝟎
Now from the general population 𝐏(𝐃) = 𝟏𝟏. 𝟑% = 𝟎. 𝟏𝟏𝟑

̅ ) = 𝟏 − 𝟎. 𝟏𝟏𝟑 = 𝟎. 𝟖𝟖𝟕
𝐏(𝐃
The positive predictive value of the test we wish to estimate the probability that the subject who is
positive on the test has Alzheimer’s disease
First we Calculate P.P.V

̅ )= 𝟏 − 𝐒𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲
𝐏(𝐓/𝐃 ̅ )= 𝟏 − 𝟎. 𝟗𝟗
𝐏(𝐓/𝐃 ̅ )= 𝟎. 𝟎𝟏
𝐏(𝐓/𝐃
(𝟎. 𝟏𝟏𝟑). (𝟎. 𝟗𝟔)

𝐩(𝐃/𝐓) =
(𝟎. 𝟏𝟏𝟑). (𝟎. 𝟗𝟔) + (𝟎. 𝟖𝟖𝟕). (𝟎. 𝟎𝟏)
𝟎. 𝟏𝟎𝟖𝟒𝟖
𝐩(𝐃/𝐓) =
𝟎. 𝟏𝟎𝟖𝟒𝟖 + 𝟎. 𝟖𝟗𝟕
𝟎. 𝟏𝟎𝟖𝟒𝟖
𝐩(𝐃/𝐓) =
𝟎. 𝟏𝟏𝟕𝟑𝟓
𝐩(𝐃/𝐓) = 𝟎. 𝟗𝟐𝟒𝟒
This means that 93% of the subject has a disease when given that the test is positive.
Now we calculate N.P.V

̅ /𝐃)= 𝟏 − 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲
𝐏(𝐓 ̅ /𝐃)= 𝟏 − 𝟎. 𝟗𝟔
𝐏(𝐓 ̅ /𝐃)= 𝟎. 𝟎𝟒
𝐏(𝐓
̅ ). 𝐏(𝐓
𝐏 (𝐃 ̅ /𝐃
̅)
̅ /𝐓
𝐩(𝐃 ̅) =
̅ ). 𝐏(𝐓
𝐏(𝐃 ̅ /𝐃
̅ ) + 𝐏(𝐃). 𝐏(𝐓
̅ /𝐃)
(𝟎. 𝟖𝟖𝟕). (𝟎. 𝟗𝟗)

̅ /𝐓
𝐩(𝐃 ̅) =
(𝟎. 𝟖𝟖𝟕). (𝟎. 𝟗𝟗) + (𝟎. 𝟏𝟏𝟑). (𝟎. 𝟎𝟒)
𝟎. 𝟖𝟕𝟏𝟑
̅ /𝐓
𝐩(𝐃 ̅) =
𝟎. 𝟖𝟕𝟏𝟑 + 𝟎. 𝟎𝟎𝟒𝟓
𝟎. 𝟖𝟕𝟏𝟑
̅ /𝐓
𝐩(𝐃 ̅) =
𝟎. 𝟖𝟕𝟓𝟖
̅ /𝐓
𝐩(𝐃 ̅ ) = 𝟎. 𝟗𝟗

This means that 99% of the subject does not have a disease when given that the test is negative.
Likelihood Ratio
Likelihood ratio describes how many times a person with diseases is more likely to receive a
particular test result, then a person without disease. Another words it means how likely it is that a
patient has a disease as compare to patient without disease. A negative likelihood ratio means, how
likely it is that a patient has no disease as compare to patients with disease.
An LR+ of a positive test result is usually a number greater than “1” and an LR- of a negative test result
usually less between 0-1. When LR=1, this is useless, which means that this test has a very little
influence on a fact that a patient does or does not have a disease.
𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐨𝐟 𝐢𝐧𝐝𝐢𝐯𝐢𝐝𝐮𝐚𝐥 𝐰𝐢𝐭𝐡 𝐭𝐡𝐞 𝐜𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧 𝐡𝐚𝐯𝐢𝐠 𝐭𝐡𝐞 𝐭𝐞𝐬𝐭 𝐫𝐞𝐬𝐮𝐥𝐭

𝐋𝐑 =
𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐨𝐟 𝐢𝐧𝐝𝐢𝐯𝐢𝐝𝐮𝐚𝐥 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐭𝐡𝐞 𝐜𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧 𝐡𝐚𝐯𝐢𝐠 𝐭𝐡𝐞 𝐭𝐞𝐬𝐭 𝐫𝐞𝐬𝐮𝐥𝐭
𝐏(𝐓𝐞𝐬𝐭 𝐫𝐞𝐬𝐮𝐥𝐭 𝐢𝐬 𝐩𝐨𝐬𝐢𝐭𝐢𝐯𝐞 𝐰𝐢𝐭𝐡 𝐝𝐢𝐬𝐞𝐚𝐬𝐞)
𝐋𝐑+= 𝐏(𝐓𝐞𝐬𝐭 𝐫𝐞𝐬𝐮𝐥𝐭 𝐢𝐬 𝐩𝐨𝐬𝐢𝐭𝐢𝐯𝐞 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐝𝐢𝐬𝐞𝐚𝐬𝐞)
𝐏(𝐓/𝐃) 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲
𝐋𝐑+= 𝐏(𝐓/𝐃̅) = 𝟏−𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲
𝐏(𝐓𝐞𝐬𝐭 𝐫𝐞𝐬𝐮𝐥𝐭 𝐢𝐬 𝐧𝐞𝐠𝐚𝐭𝐢𝐯𝐞 𝐰𝐢𝐭𝐡 𝐝𝐢𝐬𝐞𝐚𝐬𝐞)

𝐋𝐑-= 𝐏(𝐓𝐞𝐬𝐭 𝐫𝐞𝐬𝐮𝐥𝐭 𝐢𝐬 𝐧𝐞𝐠𝐚𝐭𝐢𝐯𝐞 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐝𝐢𝐬𝐞𝐚𝐬𝐞)
̅ /𝐃)
𝐏(𝐓 𝟏−𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲
𝐋𝐑-= 𝐏(𝐓̅/𝐃̅) = 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲
Test Accuracy
The accuracy of a test is its ability to differentiable the patient and healthy cases correctly. To
estimate the accuracy of the test we should calculate the proportion of true positive and true negative
and all evaluated cases. Mathematically it can be stated as:
𝐓𝐏 + 𝐓𝐍
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍 + 𝐅𝐏 + 𝐅𝐍
𝐚+𝐝
𝐚+𝐛+𝐜+𝐝
Scenario-I
Imagine we have a sample of 100 cases, 50 healthy and other patients. If a test is positive for all patients
and be negative for all healthy once, it is a 100% accurate. In figure error shows the test and it is been
able to differentiate the healthy and patient exactly. In this example the sensitivity of the test is

Patients Healthy Total
+
T 50 0 50
T- 0 50 50
Total 50 50 100
𝐓𝐏 + 𝐓𝐍
𝐚+𝐝
𝐚+𝐛+𝐜+𝐝
𝟓𝟎 + 𝟓𝟎
𝟓𝟎 + 𝟎 + 𝟓𝟎 + 𝟎
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 = 𝟏 𝐨𝐫 𝟏𝟎𝟎%

𝐓𝐏 𝐚 𝟓𝟎
Sensitivity 𝐩(𝐓+/𝐃+) = 𝐓𝐏+𝐅𝐍 = 𝐚+𝐜 = 𝟓𝟎+𝟎 = 𝟏 𝐨𝐫 𝟏𝟎𝟎%
he has the disease
𝐓𝐍 𝐝 𝟓𝟎
Specificity 𝐩(𝐓-/𝐃-) = 𝐓𝐍+𝐅𝐏 = 𝐛+𝐝 = 𝟓𝟎+𝟎 = 𝟏 𝐨𝐫 𝟏𝟎𝟎%
he has no disease
Taking into account the mentioned statistical characteristics this test is appropriate for both screening
and final verification a disease.
Scenario-II
Test with 75% accuracy 50% sensitivity and 100% specificity. If test is can only diagnose 25 out of the 50
patients and has reported the other has healthy (as we see from figure II) then the accuracy sensitivity
and specificity are given below accuracy of the 100 cases that have been tasted the test could determine
25 patients and 50 healthy cases correctly, therefor the accuracy of the test is 75%

+
T 25 0 25
T- 25 50 50
Total 50 50 100
𝐓𝐏 + 𝐓𝐍

𝐚+𝐝
𝐚+𝐛+𝐜+𝐝
𝟐𝟓 + 𝟓𝟎
𝟐𝟓 + 𝟓𝟎 + 𝟎 + 𝟐𝟓
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 = 𝟕𝟓 𝐨𝐫 𝟕𝟓%

𝐓𝐏 𝐚 𝟐𝟓
Sensitivity 𝐩(𝐓+/𝐃+) = 𝐓𝐏+𝐅𝐍 = 𝐚+𝐜 = 𝟓𝟎 = 𝟎. 𝟓 𝐨𝐫 𝟓𝟎%
he has the disease
𝐓𝐍 𝐝 𝟓𝟎
Specificity 𝐩(𝐓-/𝐃-) = 𝐓𝐍+𝐅𝐏 = 𝐛+𝐝 = 𝟎+𝟓𝟎 = 𝟏 𝐨𝐫 𝟏𝟎𝟎%
he has no disease
This test is not suitable for screening purpose but is suitable for final confirmation of a disease.
Scenario III
If we assume that the test as mean able to identified 25% of the 50 healthy cases and as
reported the other as patient (we see from figure III) in this scenario accuracy, sensitivity and specificity
will be as follows test with 75% accuracy, 100% sensitivity and 50% specificity.

+
T 50 25 75
T- 0 25 25
Total 50 50 100
𝐓𝐏 + 𝐓𝐍
𝐚+𝐝
𝐚+𝐛+𝐜+𝐝
𝟓𝟎 + 𝟐𝟓
𝟓𝟎 + 𝟐𝟓 + 𝟐𝟓 + 𝟎
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 = 𝟕𝟓 𝐨𝐫 𝟕𝟓%

𝐓𝐏 𝐚 𝟓𝟎
Sensitivity 𝐩(𝐓+/𝐃+) = = = = 𝟏 𝐨𝐫𝟏𝟎𝟎%
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟓𝟎

he has the disease
𝐓𝐍 𝐝 𝟐𝟓
Specificity 𝐩(𝐓-/𝐃-) = 𝐓𝐍+𝐅𝐏 = 𝐛+𝐝 = 𝟓𝟎 = 𝟎. 𝟓 𝐨𝐫 𝟓𝟎%
he has no disease
This test is suitable for screening purpose but it is not suitable for final confirmation of a disease.
Diagnostic Test
A diagnostic test is a procedure perform to conform or to determine the presence or absence of
disease in an individual suspected of having the disease usually following the reported of symptoms or
base on the result of other medical tests. This procedure will give as a rapid indication of whether a
patient has certain disease. A diagnostic test is any approach use together clinical information for
purpose of making a clinical decision. i.e. (Diagnoses) some examples of diagnostic test x-ray, Biopsies,
pregnancy test, medical histories and result from physical examination.
Diagnostic Odd Ratio

A diagnostic O.R is a measured of the effectiveness of diagnostic tests. It is defined “The Ratio of
the odd of the test being positive if the subject as a disease relative to the odds of the test being positive
if the doesn’t have the disease. The diagnostic range is from 0-∝ for useful test diagnostic O.R is greater
than 1. The higher diagnostic Odd Ratio is:
𝐃. 𝐎. 𝐑 = 𝐋𝐑+/𝐋𝐑-
The Diagnostic Odd Ratio may be express in term of sensitivity and specificity of the test.
𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲 𝐒𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲
𝑫. 𝐎. 𝐑 = ×
(𝟏 − 𝐬𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲) (𝟏 − 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲)
𝐚 𝐝
𝐃. 𝐎. 𝐑 = 𝐚+𝐜 × 𝐛+𝐝
𝐚
(𝟏 − ) (𝟏 − 𝐛 )
𝐚+𝐜 𝐛+𝐝
𝐚 𝐝
𝐚 + 𝐜
𝐃. 𝐎. 𝐑 = 𝐚 + 𝐜 − 𝐚 × 𝐛 + 𝐝
𝐛+𝐝−𝐝
𝐚+𝐜 𝐛+𝐝

𝐚 𝐝
𝐃. 𝐎. 𝐑 = ×
𝐜 𝐛
𝐚𝐝
𝐃. 𝐎. 𝐑 = = 𝐎. 𝐑
𝐛𝐜
The Diagnostic Odd Ratio may also be express in terms of positive predictive value (P.PV) and Negative
predictive value (N.P.V)
𝐏. 𝐏. 𝐕 𝐍. 𝐏. 𝐕
𝐃. 𝐎. 𝐑 = ×
(𝟏 − 𝐏. 𝐏. 𝐕) (𝟏 − 𝐍. 𝐏. 𝐕)
𝐚 𝐝
𝐃. 𝐎. 𝐑 = 𝐚+𝐛 × 𝐜+𝐝
𝐚
(𝟏 − ) (𝟏 − 𝐝 )
𝐚+𝐛 𝐜+𝐝
𝐚 𝐝
𝐃. 𝐎. 𝐑 = 𝐚 + 𝐛 × 𝐜 + 𝐝
𝐚+𝐛−𝐚 𝐜+𝐝−𝐝
𝐚+𝐛 𝐜+𝐝
𝐚 𝐝
𝐃. 𝐎. 𝐑 = ×
𝐛 𝐜
𝐚𝐝
𝐃. 𝐎. 𝐑 = = 𝐎. 𝐑
𝐛𝐜
Question:
Concerned test with the following 2×2 contingency table calculate Diagnostic O.R
Disease Status
D+ D- Total
+
T 26 12 38
T- 3 48 51
Total 29 60 89
𝐓𝐏
𝐃. 𝐎. 𝐑 = 𝐅𝐏
𝐅𝐍
𝐓𝐍

𝟐𝟔
𝐃. 𝐎. 𝐑 = 𝟏𝟐
𝟑
𝟒𝟖
𝟐𝟔 𝟒𝟖
𝐃. 𝐎. 𝐑 = ×
𝟏𝟐 𝟑
𝐃. 𝐎. 𝐑 = 𝟑𝟒. 𝟓𝟔
Since D.O.R>1, so we conclude that the test is discriminating correctly.
Probability Distribution
The probability distribution of a random variable, describes how the probability are distributed
over the values of a random variable. A probability distribution is a listing of all the outcomes of an
experiment and their associated probabilities. For a discrete random variable X, the probability
distribution is defined by probability Mass functionf(x) = p[X = x], where this function gives the
probability for each value of the random variable. Consider the example of tossing of three coins in
which the variable of interest is a random variable X (the number of heads) when three coins are tossed,
let X, be the no: of heads.
s = {HHH, HHT, HTH, THH, TTH, HTT, THT, TTT}
Where X=0, 1, 2, 3
𝟏
𝐩(𝐗 = 𝟎) = 𝐩(𝐍𝐨 𝐡𝐞𝐚𝐝𝐬) = 𝐩(𝐓𝐓𝐓) =
𝟖
𝟑
𝐩(𝐗 = 𝟏) = 𝐩( 𝐨𝐧𝐞 𝐡𝐞𝐚𝐝𝐬) = 𝐩(𝐓𝐇𝐓, 𝐇𝐓𝐓, 𝐓𝐓𝐇) =
𝟖
𝟑
𝐩(𝐗 = 𝟎) = 𝐩(𝐭𝐰𝐨 𝐡𝐞𝐚𝐝𝐬) = 𝐩(𝐇𝐇𝐓, 𝐓𝐇𝐇, 𝐇𝐓𝐇) =
𝟖
𝟏
𝐩(𝐗 = 𝟎) = 𝐩(𝐭𝐡𝐫𝐞𝐞 𝐡𝐞𝐚𝐝𝐬) = 𝐩(𝐇𝐇𝐇) =
𝟖
Probability Distribution for no: of heads
X P(X=x)
0 1/8
1 3/8
2 3/8
3 1/8
1

Discrete Probability Distribution
Probability distribution of a discrete random variable is a table, graph or formula that gives the
probability associated with each possible value that the variable can assume. For the discrete Random
variable X, the probability Mass function is denoted byf(x) = p[X = x], which satisfy the following two
conditions.
i. 𝑓 (𝑥) ≥ 0 ∀ 𝑥 ∈ 𝑋,
ii. ∑𝑓(𝑥) = 1
Binomial Experiment
A binomial experiment is a statistical experiment that as the following properties.
1) The experiment consists of “n” repeated trails.

2) Each trail can result in two possible outcomes; we call one of these outcomes a success, and
the other, a failure.
3) The probability of success, denoted by “p” is the same on every trail.
4) The trails are independent i.e. the outcome on one trail does not affect the outcome on other
trail.
Consider the statistical experiment, in which we flip a coin two times and count the number of times
that a head occur, this is a binomial experiment because.
I. The experiment consist of repeated trails, we flip the coin two times.
II. Each trail can result in just two possible outcomes i.e. head or trail.
III. The probability of success is constant, i.e. 1/2
IV. The trails are independent i.e. getting head on one trail doesn’t affect whether we
get head on the other trail.
Daily uses examples of Binomial Experiment
If a new drug is introduced to cure a disease, it either cures the disease (it is successful) or it
does not cure the disease (it is failure), if you purchased a lottery ticket, you are either going to won a
price or not. Basically anything you can think of that can only be success or failure can be represented by
a binomial distribution.
Notations:
X: The number of success that result from a binomial experiment.
n: The number of trails in the binomial experiment.

p: The probability of success on an individual trail.
q: The probability of failure on an individual trail.
n!: The factorial of n.
b(X; n, p)→Binomial Probability
The Probability that an “n” trail Binomial-experiment, results an exactly X success, when the probability
of success on an individual trail is p.
The probability Distribution of a binomial random variable is called Binomial Distribution.
Suppose a binomial experiment consist of “n” trails which results on an “n” successes on an individual
trail is p if then the probability Mass function (P.M.F) of the Binomial Distribution is.
𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗 = 𝟎, 𝟏, 𝟐 … … 𝐧
𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = 𝟎 𝐎. 𝐖
Question:
Suppose a die is rolled is 5 times. What is the probability of getting exactly 2, fours?
Solution:
This is a binomial experiment in which the number of successes=2, the number of trails=5 and the
probability of successes=p=1/6 or 0.167, therefor the Binomial probability is
𝑏(𝑋; 𝑛, 𝑝) = 𝑏(2; 5,0.167)

𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟐
𝐱
𝟓
𝐩[𝐗 = 𝟐] = ( ) (𝟎. 𝟏𝟔𝟕)𝟐 (𝟎. 𝟖𝟑𝟑)𝟓−𝟐
𝟐
𝐩[𝐗 = 𝟐] = 𝟎. 𝟏𝟔𝟎𝟔
Question:
The probability that a student is accepted to a prestigious college is 0.3 If 5 students from the
same school apply. What is the probability that at most 2 are accepted?
Solution:
Here n=5, p=0.3 q=0.7

𝑏(𝑋; 𝑛, 𝑝) = 𝑏(𝑋 ≤ 2; 5,0.3)
2
𝐩[𝐗 ≤ 𝟐] = 𝐟(𝐱) = ∑ 𝑏 (𝑋; 𝑛, 𝑝)

𝑥=0
2
5
𝐩[𝐗 ≤ 𝟐] = ∑ ( ) 0.3𝑥 0.75−𝑥
𝑥
𝑥=0
𝟓 𝟓 𝟓
𝐩[𝐗 ≤ 𝟐] = ( ) (𝟎. 𝟑)𝟎 (𝟎. 𝟕)𝟓−𝟎 + ( ) (𝟎. 𝟑)𝟏 (𝟎. 𝟕)𝟓−𝟏 + ( ) (𝟎. 𝟑)𝟐 (𝟎. 𝟕)𝟓−𝟐
𝟎 𝟏 𝟐
𝐩[𝐗 ≤ 𝟐] = 𝟎. 𝟏𝟔𝟖𝟎 + 𝟎. 𝟑𝟔𝟎𝟏 + 𝟎. 𝟑𝟎𝟖𝟕
𝐩[𝐗 ≤ 𝟐] = 𝟎. 𝟖𝟑𝟔𝟖
Question:
60% of the people who purchased sports car are male. If 10 sports car are randomly selected.
Find the probability that exactly 7 are men.
Solution:
Here n=10, p=0.60 q=0.40 X=7
𝑏(𝑋; 𝑛, 𝑝) = 𝑏(7; 10,0.6)
𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟕
𝐱
𝟏𝟎
𝐩 [ 𝐗 = 𝟕] = ( ) (𝟎. 𝟔𝟎)𝟕 (𝟎. 𝟒𝟎)𝟏𝟎−𝟕
𝟕
𝐩[𝐗 = 𝟕] = 𝟎. 𝟐𝟏𝟒𝟗
Question:
Suppose that 80% of adults with allergies with report symptoms relief with a specific
Medication. If the medication is given to 10 new patients with allergies, what is the probability that is
effective in exactly7?
Solution:
Here n=10, p=0.80 q=0.20 X=7
𝑏(𝑋; 𝑛, 𝑝) = 𝑏(7; 10,0.8)

Bio Statistics Notes 𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟕
𝐱
𝟏𝟎
𝐩 [ 𝐗 = 𝟕] = ( ) (𝟎. 𝟖𝟎)𝟕 (𝟎. 𝟐𝟎)𝟏𝟎−𝟕
𝟕
𝐩[𝐗 = 𝟕] = 𝟎. 𝟐𝟎𝟏𝟑
Question:
The likelihood that a patient with heart attack is 0.04.suppose we have 5 patients who suffer a
heart attack. What is the probability that all survive?
Solution:
Here n=5, p=0.04 q=0.96 X=0
𝑏(𝑋; 𝑛, 𝑝) = 𝑏(0; 5,0.04)
𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟎
𝐱
𝐩(𝐀𝐥𝐥 𝐭𝐡𝐞 𝐩𝐚𝐭𝐢𝐞𝐧𝐭𝐬 𝐬𝐮𝐫𝐯𝐮𝐯𝐞)
𝟓
𝐩[𝐗 = 𝟎] = ( ) (𝟎. 𝟎𝟒)𝟎 (𝟎. 𝟗𝟔)𝟓−𝟎
𝟎
𝐩[𝐗 = 𝟎] = 𝟎. 𝟖𝟏𝟓𝟑
𝐓𝐡𝐞𝐫𝐞 𝐢𝐬 𝟖𝟏. 𝟓𝟑% 𝐜𝐡𝐚𝐧𝐜𝐞 𝐭𝐚𝐡𝐭 𝐚𝐥𝐥 𝐩𝐚𝐭𝐢𝐞𝐧𝐭𝐬 𝐰𝐢𝐥𝐥 𝐬𝐮𝐫𝐯𝐢𝐯𝐞.
Question:
In a class of 8 students 3% of the students are suffering from anxiety. A sample of 100 students
is selected. Find the probability that out of these.
I. No students are suffering.

II. At least one student is suffering.
III. At least Majority of students are suffering.
IV. All the students are suffering.
Solution:
Let X is a random variable denoted the number of students suffering from anxiety.
Here n=5, p=0.03 q=0.97

No students are suffering.
𝑏(𝑋; 𝑛, 𝑝 ) = 𝑏(0; 5,0.03)
𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟎
𝐱
𝟓
𝐩[𝐗 = 𝟎] = ( ) (𝟎. 𝟎𝟑)𝟎 (𝟎. 𝟗𝟕)𝟓−𝟎
𝟎
𝐩[𝐗 = 𝟎] = 𝟎. 𝟖𝟓𝟖𝟕
At least one student is suffering.
𝑏(𝑋; 𝑛, 𝑝) = 𝑏(𝑋 ≥ 1; 5,0.3)
𝐩 [ 𝐗 ≥ 𝟏] = 𝟏 − 𝐩 ( 𝐗 < 𝟏)
𝐩[𝐗 ≥ 𝟏] = 𝟏 − 𝟎. 𝟖𝟓𝟖𝟕
𝐩[𝐗 ≥ 𝟏] = 𝟎. 𝟏𝟒𝟏𝟑
At least Majority of students are suffering.
𝑏(𝑋; 𝑛, 𝑝) = 𝑏(𝑋 ≥ 3; 5,0.3)
𝐩[𝐗 ≥ 𝟑] = 𝟏 − 𝐩(𝐗 < 𝟑)
𝐩[𝐗 ≥ 𝟏] = 𝟏 − {𝐩(𝐗 = 𝟎) + 𝐩(𝐗 = 𝟏) + 𝐩(𝐗 = 𝟐)}
𝟓 𝟓 𝟓
𝐩[𝐗 ≥ 𝟏] = 𝟏 − {( ) (𝟎. 𝟎𝟑)𝟎 (𝟎. 𝟗𝟕)𝟓−𝟎 + ( ) (𝟎. 𝟎𝟑)𝟏 (𝟎. 𝟗𝟕)𝟓−𝟏 + ( ) (𝟎. 𝟎𝟑)𝟐 (𝟎. 𝟗𝟕)𝟓−𝟐 }
𝟎 𝟏 𝟐
𝐩[𝐗 ≥ 𝟑] = 𝟏 − 𝟎. 𝟖𝟓𝟖𝟕 + 𝟎. 𝟏𝟑𝟐𝟕 + 𝟎. 𝟎𝟎𝟖𝟐
𝐩[𝐗 ≥ 𝟑] = 𝟎. 𝟐𝟖𝟐𝟐
All the students are suffering.

𝐧
𝐩[𝐗 = 𝐱] = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟓
𝐱
𝟓
𝐩[𝐗 = 𝟓] = ( ) (𝟎. 𝟎𝟑)𝟓 (𝟎. 𝟗𝟕)𝟓−𝟓
𝟓
𝐩 [ 𝐗 = 𝟓] = 𝟎

Poisson Distribution
The Poisson distribution is a discrete distribution. It is named after “Simeon-Denis Poisson”
(1781-1840). A French mathematician, who published its essentials in a paper in 1837, The Poisson
distribution and the binomial distribution have some simulates, but also several differences. The
binomial distribution describes a distribution of two possible outcomes, designated as success and
failure from a given number of trails. The Poisson distribution focus is only on the number of discrete
occurrence over interval. A Poisson experiment doesn’t have a given number of trails (n) as binomial
experiment does for examples.
A binomial experiment might be used to determine how many black cars are in a random sample of 50
cars. A Poisson experiment might focus on the number of cars random Arriving at a car Wash during a 20
minute interval. The Poisson distribution has the following characteristics.
i) It is discrete Distribution.
ii) Each occurrence is independent of the other occurrence.
iii) It describes discrete occurrence over an interval.
iv) The occurrence in each interval can range 0-∞.
v) The mean number of occurrence must be constant throughout the experiment.
Probability Mass function of Poisson distribution
Let X a random variable having p.m.f of the form of
𝐞−𝛍 𝛍 𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = , 𝐱 = 𝟎, 𝟏, 𝟐, … . . ∞
𝐱!
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = 𝟎 𝐎. 𝐖
Then the random Variable X is said to be have Poisson distribution with Parameter𝜇. Where the symbol
“!” is called Factorial.
𝜇(is called the expected or mean number of occurrence) is sometimes written as𝜆, some times is called
event rate or rate parameter.
Here 𝑒 = 2.71828(𝐸𝑢𝑙𝑒𝑟 ′ 𝑠 𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡)
Mean, Variance and 𝛃1 and 𝛃2 of a Poisson distribution
𝜇=𝜆 Var(X) = 𝜇 = 𝜆 𝛽1= 1/𝑚 𝛽2=3 + 1/𝑚

Question:
The average number of major stories in a city is 2 per year. What is the probability that exactly 3
storms will hit in the city next year.
Solution:
Here 𝜇 = 2 X=3 e=2.71828
𝐞−𝛍 𝛍𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
𝐞−𝟐 𝟐𝟑
𝐩 [ 𝐗 = 𝟑] =
𝟐!
𝐩[𝐗 = 𝟑] = 𝟎. 𝟏𝟖𝟎 𝐨𝐫 𝟏𝟖%
Thus the probability of 3 storms having next year is 18%
Question:
The average number of home should by a gcon’s company has 2 home per day. What is the
probability that exactly 3 home will be sold tomorrow.
Solution:
Here 𝜇 = 2 X=3 e=2.71828
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
𝐞−𝟐 𝟐𝟑
𝐩 [ 𝐗 = 𝟑] =
𝟐!
𝐩[𝐗 = 𝟑] = 𝟎. 𝟏𝟖𝟎 𝐨𝐫 𝟏𝟖%
Question:
Suppose the average number of loins seen in jungle on 1 day visits as 5. What is the probability
that has 2 arrests will see fewer than 4 loins on the next day visit.
Solution:
Here 𝜇 = 5 X=0, 1, 2, 3 or X<4 e=2.71828
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
Since we want to find likelihood tourist will see four lions i.e. we want to find the probability that they
will see X=0, 1, 2, 3 or X<4

Bio Statistics Notes 𝟑
𝐩[𝐗 < 𝟒] = ∑ 𝐩[𝐗 = 𝐱]

𝐱=𝟎
𝐩 [ 𝐗 < 𝟒] = 𝐩 ( 𝐗 = 𝟎) + 𝐩 ( 𝐗 = 𝟏) + 𝐩 ( 𝐗 = 𝟐) + 𝐩 ( 𝐗 = 𝟑)
𝐞−𝟓 𝟓𝟎 𝐞−𝟓 𝟓𝟏 𝐞−𝟓 𝟓𝟐 𝐞−𝟓 𝟓𝟑

𝐩 [ 𝐗 < 𝟒] = + + +
𝟎! 𝟏! 𝟐! 𝟑!
𝐩[𝐗 < 𝟒] = 𝟎. 𝟐𝟔𝟒
That the probability that tourist will see no more than 3 lions are 0.264.
Question:
Consider a computer system will Poisson job annual determine the probability.
I. Zero Jobs.
II. Exactly 2 Jobs.
III. At most three Jobs.
Solution:
Zero Jobs
Here 𝜇 = 2 X=0 e=2.71828
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
𝐞−𝟐 𝟐𝟎
𝐩 [ 𝐗 = 𝟎] =
𝟎!
𝐩[𝐗 = 𝟎] = 𝟎. 𝟏𝟑𝟓
Exactly 2 Jobs
Here 𝜇 = 2 X=2 e=2.71828
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
𝐞−𝟐 𝟐𝟐
𝐩 [ 𝐗 = 𝟐] =
𝟐!
𝐩[𝐗 = 𝟐] = 𝟎. 𝟐𝟕

At most three Jobs
Here 𝜇 = 2 X=0, 1, 2, 3 e=2.71828

𝟑
𝐩[𝐗 < 𝟒] = ∑ 𝐩[𝐗 = 𝐱]

𝐱=𝟎
𝐩 [ 𝐗 < 𝟒] = 𝐩 ( 𝐗 = 𝟎) + 𝐩 ( 𝐗 = 𝟏) + 𝐩 ( 𝐗 = 𝟐) + 𝐩 ( 𝐗 = 𝟑)
𝐞−𝟐 𝟐𝟎 𝐞−𝟐 𝟐𝟏 𝐞−𝟐 𝟐𝟐 𝐞−𝟐 𝟑𝟑

𝐩 [ 𝐗 < 𝟒] = + + +
𝟎! 𝟏! 𝟐! 𝟑!
𝐩[𝐗 < 𝟒] = 𝟎. 𝟖𝟓𝟔𝟎
Properties of Normal Distribution
 The function f(x) representing the normal distribution satisfies the properties of proper p.d.f
I .e. 𝐟(𝐱) ≥ 𝟎 ∀ 𝐱 𝐭𝐨𝐭𝐚𝐥 𝐀𝐫𝐞𝐚 𝐮𝐧𝐝𝐞𝐫 𝐭𝐡𝐞 𝐧𝐨𝐫𝐦𝐚𝐥 𝐜𝐮𝐫𝐯𝐞 𝐢𝐬 𝐮𝐧𝐢𝐭𝐲
 The Mean, Median and Mode for normal distribution are equal.
𝐌𝐞𝐚𝐧 = 𝐌𝐞𝐝𝐢𝐚𝐧 = 𝐌𝐨𝐝𝐞 = 𝛍
 The Normal distribution as symmetrical is unimodel. (which has one Mode)

 The M.D of the Normal distribution is approximately 4/5 of its S.D. (M.D=4/5 S.D)
 The odd order moments about the Mean are all zero, where is the even order moments about
Mean are given by
𝛍2n= (𝟐𝐧 − 𝟏)(𝟐𝐧 − 𝟑) … … … . 𝟓. 𝟑. 𝟏 𝛔2n
𝛍2n+1= 𝟎 ∀ 𝐧 (𝐨𝐝𝐝)
 The Normal curve expends n indefinitely for to the left and to the right, approaching more
closely the x-axis, as x increases in magnitude.
 The curve is symmetric about its Mean and thus the area to the left to the Mean and the area
to the right of the Mean each equal to the 0.5.
 For Normal distribution about 68% of the area under the curve are between 𝜇 − 𝜎 and 𝜇 + 𝜎
and about 95% of the area under the curve are between 𝜇 − 2𝜎 and 𝜇 + 2𝜎 and about 99.7%
of the area under the curve are between 𝜇 − 3𝜎 and𝜇 + 3𝜎.
 The points of inflection on the curve are standard deviation away from the Mean.
ANDECTIVE STATISTIC DEDECTIVE STSTISTIC

From general to specific From specific to general

Question:
The average on the statistics test was 78, with S.D of 8. If the test score are normally distributed.
Find the probability that a student receives a test score less than 90
Solution:
Given that 𝜇 = 78 𝜎=8
Let “X” denotes the number of score of statistics test.
𝐗 − 𝛍 𝟗𝟎 − 𝛍
𝐏(𝐗 < 𝟗𝟎) = 𝐩( < )
𝛔 𝛔
In Standardize Form
𝟗𝟎 − 𝟕𝟖
𝐏(𝐗 < 𝟗𝟎) = 𝐩(𝐙 < )
𝟖
𝐏(𝐗 < 𝟗𝟎) = 𝐩(𝐙 < 𝟏. 𝟓𝟎)
𝐏(𝐗 < 𝟗𝟎) = 𝐏(−∞ < 𝐙 < 𝟎) + 𝐏(𝟎 < 𝐙 < 𝟏. 𝟓𝟎)
𝐏(𝐗 < 𝟗𝟎) = 𝟎. 𝟓 +
𝐏(𝐗 < 𝟗𝟎) = 𝟎. 𝟗𝟑𝟑𝟐
Question:
A pollen count for a species of flowers vary randomly in a manner well represented by a normal
distribution with 𝜇 = 1000, and 𝜎 = 80
I. Find the probability that an individual pollen count will be greater than 1200
II. Less than 775.
III. Between 800 and 1100.
Solution:
Given that 𝜇 = 1000 𝜎 = 80
Let “X” represents the pollen counts in a flower.
Find the probability that an individual pollen count will be greater than 1200
𝐗 − 𝛍 𝟏𝟐𝟎𝟎 − 𝛍
𝐏(𝐗 < 𝟏𝟐𝟎𝟎) = 𝐩( < )
𝛔 𝛔

In Standardize Form
𝟏𝟐𝟎𝟎 − 𝟏𝟎𝟎𝟎
𝐏(𝐗 < 𝟏𝟐𝟎𝟎) = 𝐩(𝐙 < )
𝟖𝟎
𝐏(𝐗 < 𝟗𝟎) = 𝐩(𝐙 < 𝟐. 𝟓𝟎)
𝐏(𝐗 < 𝟗𝟎) = 𝐏(𝟎 < 𝐙 < ∞) − 𝐏(𝟎 < 𝐙 < 𝟐. 𝟓𝟎)
𝐏(𝐗 < 𝟏𝟐𝟎𝟎) = 𝟎. 𝟓 − 𝟎. 𝟎𝟏𝟕𝟓
𝐏(𝐗 < 𝟏𝟐𝟎𝟎) = 𝟎. 𝟒𝟖𝟐𝟓
Less than 775.
𝐗 − 𝛍 𝟕𝟕𝟓 − 𝛍
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝐩( < )
𝛔 𝛔
In Standardize Form
𝟕𝟕𝟓 − 𝟏𝟎𝟎𝟎
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝐩(𝐙 < )
𝟖𝟎
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝐩(𝐙 < −𝟐. 𝟖𝟏)
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝐏(−𝟐. 𝟖𝟏 < 𝐙 < 𝟎) + 𝐏(𝟎 < 𝐙 < +∞)
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝟎. 𝟎𝟎𝟕 + 𝟎. 𝟓
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝟎. 𝟓𝟎𝟕𝟖
Between 800 and 1100.
𝟖𝟎𝟎 − 𝛍 𝐗 − 𝛍 𝟏𝟏𝟎𝟎 − 𝛍
𝐏(𝟖𝟎𝟎 ≤ 𝐗 ≤ 𝟏𝟏𝟎𝟎) = 𝐩( < < )
𝛔 𝛔 𝛔
In Standardize Form
𝟖𝟎𝟎 − 𝟏𝟎𝟎𝟎 𝟏𝟏𝟎𝟎 − 𝟏𝟎𝟎𝟎

𝐏(𝟖𝟎𝟎 ≤ 𝐗 ≤ 𝟏𝟏𝟎𝟎) = 𝐩( <𝐙< )
𝟖𝟎 𝟖𝟎
𝐏(𝟖𝟎𝟎 ≤ 𝐗 ≤ 𝟏𝟏𝟎𝟎) = 𝐩(−𝟐. 𝟓 < 𝐙 < 𝟏. 𝟐𝟓)
𝐏(𝟖𝟎𝟎 ≤ 𝐗 ≤ 𝟏𝟏𝟎𝟎) = 𝐏(−𝟐. 𝟓𝟎 < 𝐙 < 𝟎) + 𝐏(𝟎 < 𝐙 < 𝟏. 𝟐𝟓)
𝐏(𝟖𝟎𝟎 ≤ 𝐗 ≤ 𝟏𝟏𝟎𝟎) == 𝟎. 𝟒𝟗𝟑𝟖 + 𝟎. 𝟑𝟗𝟒𝟒
𝐏(𝟖𝟎𝟎 ≤ 𝐗 ≤ 𝟏𝟏𝟎𝟎) == 𝟎. 𝟖𝟖𝟖𝟐

P-Value (Probability Value)
When we perform a hypothesis test in statistics, a P-value helps us to determine the significant
of the results. The P-value is a measure of the strength of the evidence against the null-hypothesis. The
P-value is used as an alternative to rejection points to provide the smallest level of significance at which
the null-hypothesis would be rejected.
Definition:
The P-value (or Probability value) is a probability of getting a sample statistic (such as the Mean)
or a more extreme sample statistic in the direction of the alternative hypothesis when the null-
hypothesis is true.
OR
“The P-value is the probability of getting the observed value of the test statistic, or a value with even
greater evidence against Ho, if the null-hypothesis is actually true”
The smaller the P-value the greater the evidence against Ho
Decision Rule when using a P-value

1) Reject Ho if P-value ≤ 𝜶
2) Do not Reject Ho, if P-value > 𝜶
Testing Hypothesis, solving problem (P-Value Method)

Step N0: 1
State the hypothesis and identify the claim.
Step N0: 2
Compute the test value.
Step N0: 3
Find the P-value

Step N0: 4
Make the Decision
Step N0: 5
Summarized the result
Two tailed test
P-value=P (Z<-|𝒁𝒐 | or Z>|𝒁𝒐 |
P-value= 2P (Z>Zo)
Right tailed test
P-value= P (Z>Zo)
Left tailed test
P-value= P (Z<Zo)
Case: I
If HA or H1 contains a less the alternative, find the probability that Z< your test statistic (i.e look
up your test statistic on the Z table and find its corresponding probability) the result is your P-value.
Note : In this case your test statistic usually negative.
Case: II
If HA or H1 contains a greater than the alternative, find the probability that Z> your test statistic
(i.e look up your test statistic on the Z table and find its corresponding probability and subtract it from 1)
the result is your P-value.
Note : In this case your test statistic usually Positive.
Case: III
If HA or H1 contains a not equal to alternative, find the probability that Z is beyond your test
statistic and double it.

There are two cases
 If your test statistic is negative, first find the probability that Z is less than test-statistic (i.e look
up your test statistic on the Z table and find its corresponding probability) then double this
probability to get P-value from)
 If your test statistic is positive , first find the probability that your test-statistic (i.e look up your
test statistic on the Z table and find its corresponding probability and then subtract it from 1)
then double this result to get P-value
Question: A researcher wishes to test to claim that the average cost of tuition in fees it 2 Year college
is greater than $5550. She selects a random sample of 36 2 year colleges and find is the mean to be
$5800, the population S.D is $600. Is there any evidence to support the claim at α 0.05? use P-value
Method.
Solution:
HO 𝝁 = 𝟓𝟓𝟓𝟎 VS H1 𝝁 > 𝟓𝟓𝟓𝟎
2) Test Statistic
𝟓𝟖𝟎𝟎−𝟓𝟓𝟓𝟎
Z= 𝟔𝟎𝟎
√𝟑𝟔
Z=2.50
3) Compute the P-value
P-value= 1-P(Z>2.50)
=1-0.4938
=0.062
4) Make the Decision
0.062<0.05
I.e. P<α

5) Summarized the result
Since P is less than α so there is enough evidence to support the claim that the tuition is
fees it 2 years colleges are greater than $5550.
Question: A researcher wishes to test to claim that the average wind speed in a certain city is 9 per
hour. A sample of 36 days has an average wind speed 9.3, the S.D of the population is 0.8 miles per
hours at α=0.01. Is there enough evidence to reject the claim? Use P-value Method.
Solution:
HO 𝝁 = 𝟗 VS H1 𝝁 ≠ 𝟗
2) Test Statistic
𝟗.𝟑−𝟗
Z= 𝟎.𝟖
√𝟑𝟔
Z=2.25
P-value= 1-P (Z>2.25)
P-value =1-0.9878
P-value =0.0122
P-value=2(0.0122)
P-value=0.0244
0.0244>0.01
I.e. P>α

Since P> α so there is not enough evidence to reject the claim that the average wind
speed is 9 miles per hour.
Question: Suppose the average no: of Facebook friend from 150 S.D = 40.3. A random sample of 64
high school students in a particular country related the average Facebook friend was 160 at α=0.01. Is
their sufficient evidence to compute that the mean.
Solution:
HO 𝝁 = 𝟏𝟓𝟎 VS H1 𝝁 > 𝟏𝟓𝟎
2) Test Statistic
𝟏𝟔𝟎−𝟏𝟓𝟎
Z= 𝟒𝟎.𝟑
√𝟔𝟒
Z=1.9851
P-value= 1-P (Z>1.9851)
P-value =1-0.9767
P-value =0.0233
0.0233>0.01
I.e. P>α

Since P> α so there is not enough evidence to reject the claim.

Biostatistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostatistics

Uploaded by

Copyright:

Available Formats

Bio Statistics Notes

a) Community Medicine and Public Health

In modern Medicine Biostatistics is used to determine how diseases develop, progress

Prepared By: Sir Zahawat Sahib

Variables that are measured on a numerical or quantitative scale, it can be measured in

Types of Quantitative Variable

Prepared By: Sir Zahawat Sahib

Prepared By: Sir Zahawat Sahib

For example: How stress affects a mantel sate of a human being.

Numerical Data/Quantitative Data

Prepared By: Sir Zahawat Sahib

Categorical Data Numerical Data

Important Question About the types of Variables

Question: What kind of Variable is Marital Status?

Question: What kind of Variable is song length?

Prepared By: Sir Zahawat Sahib

These Situations are allocated as.

Prepared By: Sir Zahawat Sahib

Person Survival Failure (1) Censored

A population from which a sample is drawn or chosen is called sampled population

A population about which information is required or wanted is called target population.

Prepared By: Sir Zahawat Sahib

Solution: The odds of diabetes in a patient is

Prepared By: Sir Zahawat Sahib

Following the typical two by two contingency table

𝐨𝐝𝐝𝐬 𝐢𝐧 𝐅𝐚𝐯𝐨𝐫 𝐨𝐟 (𝐀)

Prepared By: Sir Zahawat Sahib

Dead Alive Total

First we calculate the Odds

Now we Calculate O.R

Prepared By: Sir Zahawat Sahib

Who drunk Who do not Total

First we calculate the Odds

Now we Calculate O.R

Prepared By: Sir Zahawat Sahib

First we calculate the Odds

Now we Calculate O.R

Question: What does an odd Ratio of 0.5 mean?

Question: What does an odd Ratio of 0.75 means?

Prepared By: Sir Zahawat Sahib

Where “Z𝛼/2” is the value of “Z” defining the confidence limits

Who drink Who do not Total

(1) Odd Ratio

First we calculate the Odds

Now we Calculate O.R

Prepared By: Sir Zahawat Sahib

(2) Test the hypothesis ln (𝜃) =0

1) We formulate our null and alternative Hypothesis as

Ho=ln (𝜃) =0 vs. Ho=ln (𝜃)≠0

3) Test Statistic to be used

Prepared By: Sir Zahawat Sahib

(3) C.I for ln (𝜃)

The 100(1-𝜶)% C.I for ln(𝜽)

=1.2527 S.E =0.2988

Who is more likely to drink beer on queen(s) day student or teacher?

Drink Beer do not drink Total

Prepared By: Sir Zahawat Sahib

𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟐𝟓 𝐜𝐚𝐬𝐞𝐬 𝐩𝐞𝐫 𝟏𝟎𝟎𝟎

𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟐. 𝟖% 𝐨𝐯𝐞𝐫 𝐭𝐰𝐨 𝐲𝐞𝐚𝐫 𝐩𝐞𝐫𝐢𝐨𝐝

Prepared By: Sir Zahawat Sahib

𝐍𝐎: 𝐨𝐟 𝐍𝐞𝐰 𝐂𝐚𝐬𝐞𝐬

𝐨𝐥𝐝 + 𝐧𝐞𝐰 𝐜𝐚𝐬𝐞𝐬

𝐨𝐥𝐝 + 𝐧𝐞𝐰 𝐜𝐚𝐬𝐞𝐬

The prevalence cancer is 118 per 1000 person in the population.