Professional Documents
Culture Documents
Biostatistics
Biostatistics
Biostatistics
Biostatistics is in fact the combination of two words “Bio” and “Statistics”.
The Bio parts involve Biology “The Study of living things” while the statistics part involves “The
accumulation, tracking, analysis and application of data. Biostatistics is the branch of statistics related to
medical and health applications. Biostatistics underpins the methodologies and epidemiological
investigation and research. Biostatistics is the used of the statistical procedures and analysis and study
and practice of biology. In simple words the branch of statistics that deals with data relating to living
organisms is called Biostatistics. Statistical process and methods apply to the collection, analysis, and
interpretation of biological data and especially data relating to human biology, health and medicine is
called Bio statistics
Applications of Biostatistics
Biostatistics has applications in all life sciences. Few applications of Biostatistics are
summarized below.
b) In Demography
It is used and estimating the attributes of population such as sex ratio, Birth rates,
Density of population etc.
c) In Pharmacology
To find the action of the drug, the drug is given to animals or humans to see whether
the changes are produced due to drug or by chance.
d) In Research
Research is incomplete without statistics. Every result needs to be statistically validated,
for the design of experiment, selecting the method of collection of data, deriving logical
conclusion from data, one need the enough knowledge of statistics.
Variable
Types of Variable
a. Quantitative Variables
b. Qualitative Variables
Qualitative Variables is also called categorical variables. Many characteristics are not
capable of being measured some of them are ordered called ordinal and some of them cannot ordered
called Nominal. Qualitative variables can be coded to appear but there numbers are meaningless. For
example classification of peoples into some socio-economic group, Examination Grades etc.
Discrete variables are characterized by a gap or interruption in the values that it can
assume. These gaps or interruptions indicate the absence of values between particular values that the
variable can assume. It takes only whole numbers. For example the number of admission to general
hospitals, the number of decayed, missing teeth per child in an elementary school, the number or
prescriptions an individual takes daily.
Continuous Variables
A continuous variable assumes any value within a specified relevant interval. Examples
of continuous variables includes the various measurements that can be made on individuals such as
height (inch), weight (pounds), skull circumference, heartbeat, blood pressure, time to recovery (days)
Scale of Measurements
All characteristics in life cannot be measured through same scale not same statistical
procedure appropriate for handling every type of measurements. Psychologist “Stanly Smith Stevens” as
proposed four scales of measurements which cover nearly all area of learning.
They are:
Nominal Scale
As the name implies, it consist of “naming or labeling” observations or classifying them into
various mutually exclusive and collectively exhaustive categories and the observations of each
categories are counted. For example Gender (Male, Female), Marital stratus (Married, unmarried) etc
Data obtained by nominal scale are called nominal data or qualitative data and are analyzed by statistics
of attributes. Summery statistic “Mode” is computed from such data. Nominal data can be represented
by pie-chart or Bar chart.
Example Gender (Male=1, Female=2), Base Ball uniform numbers, the number provides no insight into
the play.
Ordinal Scale
Qualitative observations can be ranked or ordered according to some criterion e.g. with respect
to some quality or performance but interval among category is unknown or unequal. Ordinal scale
process natural ordering, for example Qualification (Matric, Inter, BA/BSC., Master, M.Phil., Ph.D.),
feelings (Very unhappy, unhappy, ok, happy, very happy) etc., the defects of this scale is that it as
unequal interval i.e. we don’t know the how much one category is better than the other, nor can we say
that a difference between ok and unhappy as the same as difference between vary happy and happy.
Data obtained by ordinal scale are called ordinal or ranked data. Summary statistic like Median,
Percentiles and spearmen’s rank correlation co-efficient are computed from ordinal data. Ordinal data
cannot be represented by pie-chart the best choice to present on the column-Bar chart. Note: Ordinal
scale implies a statement of “grater then” or “less than” without being able to state how much greater
or less.
Interval Scale
The interval scale has numeric ordered values, it fixed or equal intervals and can go below zero
(Means it can have negative values) where “0” is not the ordinary zero. In interval scale the distance
between two values are same i.e. the distance between 5-6 and degree as a same is that between 7-8
degree also it can go below zero, for example the temperature of Ice ad -5 degree. Interval scale not
only tells us about that values are smaller or bigger, but also tell that how much bigger or smaller, they
are unlike that of ordinal scale. For example if it is 450 on Sunday and 550 on Monday. We know not only
that it was hotter on Monday and also know that it was 100 hotter. Zero as meaningful on this scale and
does not mean the absence of the quality. I.e. Zero degree temperature is hotter than -1 degree.
Statistical methods like Mean, Median, and Mode etc. can be easily calculated from the interval data.
Independent Variable
An independent Variable is presumed to influence other variables. Sometimes independent
variables are called manipulated-variables or experimental variables. Independent variable is presumed
cause, whereas the dependent variables are presumed effect.
Dependent Variables
A dependent Variable is presumed to be effected by one or more independent variables. The
dependent variables is often called an outcome variable.
For example: If we are interested How stress affects Heart-rate in humans in this case, stress will be
independent variable and Heart-rate that will be dependent variable
Intervening/Mediating Variable
Intervening/Mediating variable whose existence is inferred but it cannot be measured. For
example determining the effects of video clips on learning ability of students of students of B.S the
association between video clips and leaner ability need to be explained.
Discrete Data
Discrete data represent items that can be counted; they take on possible values that can be
listed out. The list of possible values may be fixed also called finite or it may go from 0, 1, and 2 onto
infinity (making it countable infinite).for example the number of heads in 100 coins flips takes on values
0-100(finite case), but the number of flips needed to get a 100 takes on values from 100 up to infinity, if
Continuous Data
Continuous Data represent measurement there possible values cannot be counted in can only
be describe using interval on the real number line. For example the exact amount of gas purchased at a
pump for cause with 20 gallon tanks would be continuous Data, from zero gallons to 20 gallons
represented by the interval [0-20] inclusive. You might pump 8.40 gallons, or 8.41 or 8.414 gallons, or
any possible number from 0-20. in this way continuous Data the thought of as being unaccountably
infinite.
Qualitative/Categorical Data
Qualitative data is categorical measurements expressed not in terms of numbers, but rather it
varies in kind or names. In statistics qualitative data is often used interchangeable with “Categorical
Data”. Categorical data represents characteristics such as person’s Gender, Marital status, or the types
of movies they like. Categorical Data can take on numerical values such as “1” indicating males and “2”
indicating female, but those numbers does not have mathematical meaning. A classic example defining
categorical data is given below.
Amount of money earned last week, birth date, exercise, Favorite sports, horse steps per night,
Language mostly spoken at home, foot length, opinion on environment conservation etc.
Answer: Marital status is qualitative/categorical variable. It can take on values such as “Married”,
“Widowed”, and “divorced”.
Answer: Song Length is a quantitative variable. It can take on values such as “180 Second”, “189.2
Seconds”, and “210.0039 Seconds”, It continuous quantitative variable because it can take on infinite
number of values.
Censored Data
Censoring occur when we have some information about an individual survival time, but we don’t know
survival time exactly.
For example
Leukemia Patients
As Simple Example of censoring, consider Leukemia Patients, following until they go out of remission.
Shown as “X”, if for a given patient, the study ends while the patient is still in remission (i.e. do not get
the event then the patient survival-time is considered as Censored). We know that for this person the
survival time is atleast as long as the period that the person has been followed, but the person goes out
of remission of the study ends, we don’t know the complete survival time.
Cause of Censored
1) A person doesn’t experiment the event before study ends.
2) A person is lost to follow up during the study period.
3) A person withdraws from the study due to death (If death is not the event of
interest) or some other reasons (Inverse drug reaction).
Person A is followed from start of study until getting the event at week 5. Therefor person A,
survival time is 5 week and is not censored.
Person B is also observed at the start of the study but it is followed to the end of 12 week study
period without getting the event, the survival time here is censored because we can say only it
is at least 12 week.
Person C, enter the study between 2nd and 3rd week and is followed until he or she withdraws
from the study at 6 week, this persons survival time is censored after 3.5 weeks.
In short a six person were observed to get the event (person A and person F) and four
Censored (B, C, D, and E)
A table of the survival time data for Six Person is presented as:
Types Of Censored
Right Censoring
When a person exist survival time become incomplete at the right side of the following a period,
occurring when the study ends or when the person’s lost to follow up are as withdrawn, this is called
right censoring.
Left Censoring
When a person exist survival time become incomplete at the right side of the
following up period for that person. For example, if we are following person’s with “HIV” infection, we
may start following up when a subject first test positive for the “HIV” Virus, but we may not know
exactly the time. First exposed to the virus thus, the survival time is censored on the left side.
Sampled Population
Target Population
Suppose we want to know the opinion of GPGC Nowshera Students about the examination system then
the sampled population may consist of the total number of students of statistics deptt, political science
deptt, etc. and the target population will consist of the total number of students in GPGC Nowshera
Odds
The odds in favor of an event are the ratio of the probability that an event will happen to the
probability that it will not happen.
For example: The odds a randomly chosen day are the week is a Sunday are one to six; 1/6, which is
same time to return 1/6 or 1:6
Example: There are five pink Marbles, 2 blue and 8 purples. What are the odds in favor of picking 1
blue Marble?
Solution: The odds of picking one blue Marble: odds= p/q; Where “P” is the probability of picking one
(𝟐) 𝟐
𝟏
blue Marble. 𝐏 = 𝟏𝟓 =
𝟏𝟓
𝟏 ODDS :
1 − p: is the probability of not picking blue Marble: the probability of an event will
occure divided by the
(𝟑) 𝟏𝟑
𝟏 probability of an event will not
𝟏−𝐩= 𝟏𝟓 =
𝟏𝟓
𝟏 occure is known as odds
OR 𝟏 − 𝐩 = 𝟏 −
𝟐
=
𝟏𝟑 Odds= occurance /non
𝟏𝟓 𝟏𝟓
occurance
𝟐 Range of odds is o to infinity
𝐩 𝟏𝟓 𝟐
𝐎𝐝𝐝𝐬 = = =
𝐪 𝟏𝟑 𝟏𝟑
𝟏𝟓
It means that the odds of picking Marble is less than a half as compared to the odds of picking a Marble
other than blue.
Example: The probability of diabetes in patient is 5%. Find the odds of diabetes.
Odd Ratio
It is defined as the Ratio of the odds of an event occurring in one group to the odds of it
occurring in another group i.e. the odd ratio compares the relative odds in each group.
X- X+
Y- a b a+b
Y+ c d c+d
a+c b+d n
Since odd Ratio is the ratio of two odds
𝐚
𝐚𝐝
𝐎. 𝐑 = 𝐛 =
𝐜 𝐛𝐜
𝐝
Odds can be computed from probability and probability can be computed from odds.
𝐩(𝐀)
𝐎𝐝𝐝𝐬 𝐢𝐧 𝐅𝐚𝐯𝐨𝐫 𝐨𝐟 𝐀 =
𝟏−𝐩(𝐀)
An odd Ratio=1, indicates that the condition or event under study is equally likely to occur in both
groups.
If odd> 1
An odd Ratio>1, indicates that the condition or event under study is more likely to occur in first group.
If odd< 1
An odd Ratio<1, indicates that the condition or event under study is less likely to occur in first group.
The Odd Ratio must be non-negative i.e. odd>=0 If the odd of first group approaches to zero then the
odd Ratio approaches to zero. But when the odd of the second group approaches to zero then the odd
Ratio approaches to ∞
Example: Considered the following data on survival of passengers on the titanic. There were 851
males passengers 142 survival and 709 died. Compute the odd Ratio and interpret your result.
Solution:
𝐜 𝟏𝟓𝟒
Odds of death among female= =
𝐝 𝟑𝟎𝟖
O.R=9.98
The males are 10 times more likely to die in the titanic as compared to females.
Example: Suppose that in a sample of 100 men, 60 have drunk wine in a previous week, while in a
sample of 100 women, only 20 have drunk wine in the same period. Calculate odd ratio and comments
your results.
𝐜 𝟐𝟎
Odds of women who drink wine = =
𝐝 𝟖𝟎
O.R= 6
Interpretation:
The males are 6 times more likely to drink wine as compared to female in the previous week.
Question: If the prevalence of smoking among lung cancer patient in 95 per 100, and the prevalence
of smoking among peoples without lung cancer in 25 per 100. Calculate odd ratio and comments your
results.
Solution:
𝐚 𝟗𝟓
Odds of Smoking among Ling cancer patient= =
𝐛 𝟓
𝐜 𝟐𝟓
Odds of smoking among patient without Lung cancer = =
𝐝 𝟕𝟓
O.R= 57
Interpretation:
The Patient with Lung cancer is more likely than without Lung Cancer.
Important Questions
Answer: The odd ratio of 0.5 means that odds of the exposer being found in the case group is 50% less
than the odds of finding to the exposer in the control group.
Answer: The odd ratio of 0.75 means that odds in one group the outcomes is 25% less likely i.e. an odd
Ratio less than “1” means that the first group less likely to experience the event. If odd Ratio is 1.33
mean that the second group is the outcome is 33% more likely than the first group.
1 1 1 1
S.E (ln ) =√ + + +
𝑎 𝑏 𝑐 𝑑
Knowing this S.E one can tests the significance hypothesis H o; ln (𝜃) and construct the confidence
interval
Example :
Calculate (1) Odd Ratio (2) Test the hypothesis ln (𝜃) =0 (3) C.I for ln (𝜃)
Solution:
𝐜 𝟑𝟎
Odds of women who drink wine = =
𝐝 𝟕𝟎
O.R= 4
Interpretation:
The males are 4 times more likely to drink wine as compared to female.
Solution:
2) Level of significance
We set 𝛼=0.5
4) Computation
=3.5 =1.2527
𝟏.𝟐𝟓𝟐𝟕
𝐙=
𝟏 𝟏 𝟏 𝟏
√ + + +
𝟔𝟎 𝟒𝟎 𝟑𝟎 𝟕𝟎
𝐙 = 𝟒. 𝟏𝟗𝟐
5) Critical Region
𝐙 ≥ 𝐙0.025= ±𝟏. 𝟗𝟔
Since 𝑧 = 4.19 falls in the critical region, so therefor we reject Ho that the association
between sex and drunken wine is significant at α = 0.05 level
±1.96
Question:
𝐍𝐞𝐰 𝐂𝐚𝐬𝐞𝐬
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 =
𝐓𝐨𝐭𝐚𝐥 𝐏𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
OR
𝐍𝐎: 𝐨𝐟 𝐍𝐞𝐰 𝐂𝐚𝐬𝐞𝐬
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐑𝐚𝐭𝐞 =
𝐍𝐎: 𝐨𝐟 𝐏𝐞𝐨𝐩𝐥𝐞 𝐚𝐭 𝐫𝐢𝐬𝐤 𝐢𝐧 𝐠𝐢𝐯𝐞𝐧 𝐭𝐢𝐦𝐞 𝐅𝐫𝐚𝐦𝐞
Example: If over the course of 1 year 5 women are diagnose with breast cancer out of the total female
study population of the 200.
Solution: Five women are diagnosing with breast cancer out of the total female study population of the
200(who do not breast cancer at the beginning of the study period. Then we would say that the
incidence of breast cancer in this population is:
𝟓
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = = 𝟎. 𝟎𝟐𝟓
𝟐𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟎. 𝟎𝟐𝟓 × 𝟏𝟎𝟎𝟎
Question: In a population of 1000, non-diseased persons, 28 were infected with HIV over two years of
observation.
Solution:
𝐍𝐞𝐰 𝐂𝐚𝐬𝐞𝐬
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = ×𝐊
𝐓𝐨𝐭𝐚𝐥 𝐏𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
𝟐𝟖
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = =
𝟏𝟎𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟐𝟖 𝐂𝐚𝐬𝐞𝐬 𝐩𝐞𝐫 𝟏𝟎𝟎𝟎
Solution:
𝟏𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 = 𝐊
𝟓𝟎𝟎𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 = 𝟐 𝐏𝐞𝐫 𝟏𝟎𝟎𝟎 𝐩𝐞𝐫𝐬𝐨𝐧𝐬 𝐏𝐞𝐫 𝐘𝐞𝐚𝐫
Prevalence
It refer to all “old and new” cases existing at a given point or period of time in the given
population. The total number of individuals who have an attribute or diseases at a particular time (or
during a particular period) divided by the population at risk at that time. (Or Mid-year population), A
prevalence rate is the total number of cases of a diseases existing in a population divided by the total
population.
Question: If measurement of cases is taken the population of 40000 people and 1200 were recently
diagnosed and 3500 are living with cancer then find prevalence rate.
Solution:
𝟏𝟐𝟎𝟎 + 𝟑𝟓𝟎𝟎
𝐏𝐫𝐞𝐯𝐚𝐥𝐚𝐧𝐜𝐞 𝐑𝐚𝐭𝐞 = × 𝟏𝟎𝟎𝟎
𝟒𝟎𝟎𝟎𝟎
𝐏𝐫𝐞𝐯𝐚𝐥𝐚𝐧𝐜𝐞 𝐑𝐚𝐭𝐞 = 𝟏𝟏𝟖
Types of Prevalence
Point Prevalence
The number of all current cases (old+ new) of a disease at one point of time in relation to a
defined population, at that point of time, point of time may be a day/several days/weeks etc depending
upon the time required to examine the entire population .
Question: one Health extension worker conducted a survey in one of the near by the elementary
school on 10th march 1997 to know the prevalence of trachoma in that school. The total no: of students
in that school 200. The Health extension worker examined all 200 students for trachoma 100 students
were found to have trachoma. Calculate point prevalence rate of trachoma in that school.
Solution:
𝐀𝐥𝐥 𝐒𝐭𝐮𝐝𝐞𝐧𝐭𝐬 𝐰𝐢𝐭𝐡 𝐭𝐫𝐚𝐜𝐡𝐨𝐦𝐚 𝐨𝐧 𝟏𝟎 𝐌𝐚𝐫𝐜𝐡 𝟏𝟗𝟗𝟕
Point Prevalence=
𝐓𝐡𝐞 𝐭𝐨𝐭𝐚𝐥 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧 𝐢𝐧 𝐭𝐡𝐚𝐭 𝐒𝐜𝐡𝐨𝐨𝐥
𝟏𝟎𝟎
Point Prevalence=𝟐𝟎𝟎 × 𝟏𝟎𝟎
Point Prevalence=𝟓𝟎
So that 50 trachoma patient per 100 students on 10th march 1997. Which means that 50% of the
students in that school affected by trachoma.
Period Prevalence
The proportion of individuals is a specified population at risk who has the disease of interest
over a specified period of time. I.e. Annual prevalence, life time prevalence, (when the time of
prevalence rate is not specified it is usually point prevalence.
Question: Between June 30 and august 30th 1999, Average Population of 1600, 29 existing cases of
hepatitis B on June 30, 6 incidences (New cases) of hepatitis B between July 1st and August 30.Find the
period prevalence.
Solution:
𝐍𝐨:𝐨𝐟 𝐞𝐱𝐢𝐬𝐭𝐢𝐧𝐠 𝐜𝐚𝐬𝐞𝐬 𝐚𝐧𝐝 𝐧𝐞𝐰 𝐜𝐚𝐬𝐞𝐬
Period Prevalence= 𝐓𝐨𝐭𝐚𝐥 𝐏𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
𝟐𝟗+𝟔
Period Prevalence= 𝟏𝟔𝟎𝟎
Solution:
𝑵𝒐:𝒐𝒇 𝒂𝒍𝒍 𝒄𝒂𝒔𝒆𝒔 (𝒐𝒍𝒅+𝒏𝒆𝒘)𝒐𝒇 𝒂 𝒔𝒑𝒆𝒄𝒊𝒇𝒊𝒆𝒅 𝒅𝒊𝒔𝒆𝒂𝒔𝒆 𝒆𝒙𝒊𝒔𝒕𝒊𝒏𝒈 𝒂𝒕 𝒂 𝒈𝒊𝒗𝒆𝒏 𝒑𝒐𝒊𝒏𝒕 𝒐𝒇 𝒕𝒊𝒎𝒆
Point Prevalence= 𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝑷𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝒂𝒕 𝒔𝒂𝒎𝒆 𝒑𝒐𝒊𝒏𝒕 𝒐𝒇 𝒕𝒊𝒎𝒆
𝟏𝟐𝟓
Point Prevalence=𝟏𝟎𝟎𝟎
𝟐𝟓
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 =
𝟗𝟎𝟎
𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐜𝐞 𝐏𝐫𝐨𝐩𝐨𝐫𝐭𝐢𝐨𝐧 = 𝟎. 𝟎𝟐𝟕
Relative Risk
A Relative Risk can only be calculated from prospective studies (cohort study). It can be defined
as the ratio of the incidence rate among exposed to the incidence rate among non-exposed.
Mathematically
Considered the following 2×2 contingency table for the calculation of measure of association.
Exposure Outcome
Present Present Absent Total
Absent A B a+b
Total C D c+d
a+c b+d N
Interpretation
If R.R=1
If R.R>1
If R.R>1, then the incidence in the exposed is greater than the incidence in the non-exposed. Increase
Risk of outcome among exposed. It is positive association i.e. (exposure is the harmful so those who are
exposed are at higher risk of suffering from diseased for those who are not-exposed.
If R.R<1
If R.R<1, then the incidence in the exposed is lower than the incidence in non-exposed. I.e. the
Decreased Risk, It is negative association. The exposure is protective.
For example: providing Vaccine to group will be our exposure and not providing Vaccine will be non-
exposed. If R.R<1, providing Vaccine is protective.
Note: The further the R.R is from 1 the stronger is the association.
Example: Suppose we are researching the effect of benzene exposure in cancer we go to a work where
there is non-potential for exposure to benzene. There are 483 people in the work center. However only
212 were are exposed to benzene in their work duties, 12% of the work center employees. Our
discovery finds that 40 people with cancer were in exposure group. Calculate the relative risk.
Cancer Total
Benzene 40 172` 212
Exposure
Not 18 253 271
Benzene
Exposure
Total 58 425 483
Solution:
𝐚
Diseased risk among exposed = = 𝟎. 𝟏𝟖𝟖𝟔
𝐚+𝐛
𝐜
Diseased risk among not exposed= =0.0664
𝐜+𝐝
We can say that if we are exposed to benzene 2.84 times more likely to get cancer, if we are not
exposed to benzene.
Question:
Outcome Total
Exposure 366 32 398
Exposure 64 319 383
Total 430 351 781
Solution:
𝐚
Diseased risk among exposed = = 𝟎. 𝟗𝟏𝟗𝟓
𝐚+𝐛
𝐜
Diseased risk among not exposed= =0.1671
𝐜+𝐝
O.R=5.50
Interpretation:
We can say that if we are exposed group are 5.50 times more likely than the non-exposed
group.
Question:
O.R= 3.6
Interpretation:
Based on the study smokers are 3.6 times more likely to suffer LBW then from non-smokers.
Question:
In a prospective study of pregnant women, the collective information on exercise leader of low
risk pregnant women. A group of 217 women’s did no voluntary exercise during the pregnancy; while
the group of 238 women exercises extensively outcome variable of interest is exercising preterm Labor.
The result is summarized as:
Solution:
𝐚
Incidence of cases of preterm Labor extreme exercise = = 𝟎. 𝟎𝟗𝟐
𝐚+𝐛
𝐜
Incidence of cases of preterm Labor not extreme exercise= =0.082
𝐜+𝐝
O.R= 1.12
Interpretation:
The result indicate that the risk of experiencing preterm labor when a women exercises heavily
is 1.12 times greater than the women who do not exercise at all.
Find the Confidence interval from the standard normal distribution 1.96 for 95% C.I.
𝐛 𝐝
𝐒. 𝐄𝐥𝐧(𝐑. 𝐑) = √ +
𝐚(𝐚 + 𝐛) 𝐜(𝐜 + 𝐝)
Calculate the lower and upper on the ln scale.
If the 95% C.I doesn’t contain the value “1” the association is set to be statistically significant
at α=0.05 level.
Question:
Physicians enrolled in the physician health study were randomly assigned to take daily
aspirim or placebo. The table provides the number with M.I in each group.
Calculate (1) Calculate R.R (2) Construct the 95% C.I for R.R
Solution:
𝐚
Incidence of M.I among Aspirim= = 𝟎. 𝟎𝟏𝟐
𝐚+𝐛
𝐜
Incidence of M.I among placebo= =0.021
𝐜+𝐝
O.R=0.571
Interpretation:
The relative risk estimate=0.58 which indicates that physicians in the aspirim group had a lower
risk of M.I then physics in the placebo group.
𝐛 𝐝
𝐒. 𝐄𝐥𝐧(𝐑. 𝐑) = √ +
𝐚(𝐚 + 𝐛) 𝐜(𝐜 + 𝐝)
𝟏𝟎𝟖𝟗𝟖 𝟏𝟎𝟕𝟗𝟓
𝐒. 𝐄𝐥𝐧(𝐑. 𝐑) = √ +
𝟏𝟑𝟗(𝟏𝟏𝟎𝟑𝟕) 𝟐𝟑𝟗(𝟏𝟏𝟎𝟑𝟒)
𝐒. 𝐄𝐥𝐧(𝐑. 𝐑) = 𝟎. 𝟏𝟎𝟓𝟖
Now the 100(1- 𝜶) % for ln (R.R)
Sensitivity 𝐩(𝐓+/𝐃+)
It is the probability of positive test result given the individual as the disease. i.e. the likelihood of
a disease individual getting a positive test result. It is also called true positive. The countermand of this is
false negative.
𝐚 𝐓𝐏
𝐩(𝐓+/𝐃+) = =
𝐚+𝐜 𝐓𝐏+𝐅𝐍
𝐜 𝐅𝐍
𝐩(𝐓-/𝐃+) = =
𝐚+𝐜 𝐓𝐏+𝐅𝐍
𝐝 𝐓𝐍
𝐩(𝐓-/𝐃-) = =
𝐛+𝐝 𝐓𝐍+𝐅𝐏
𝐛 𝐅𝐏
𝐩(𝐓+/𝐃-) = =
𝐛+𝐝 𝐅𝐏+𝐓𝐍
𝐚 𝐓𝐏
𝐩(𝐓+/𝐃+) = =
𝐚+𝐛 𝐓𝐏+𝐅𝐏
𝐝 𝐓𝐍
𝐩(𝐓-/𝐃-) = =
𝐜+𝐝 𝐓𝐍+𝐅𝐍
If the total number of positive test in 350, out of which 200 actually have the diseases till 2018,
the late has tested 1000 cases total no; of patients are 400 , construct the table and calculate Sensitivity,
Specificity, False negative, False positive, P.P.V and N.P.V
𝐓𝐏 𝐚 𝟐𝟎𝟎
Sensitivity 𝐩(𝐓+/𝐃+) = = = = 𝟎. 𝟓
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟎𝟎
This means that there is 50% chance that the person would get the positive test result when actually
he has the disease
𝐓𝐍 𝐝 𝟒𝟓𝟎
Specificity 𝐩(𝐓-/𝐃-) = = = = 𝟎. 𝟕𝟓
𝐓𝐍+𝐅𝐏 𝐛+𝐝 𝟔𝟎𝟎
This means that there is 75% chance that the person would get the negative test result when actually
he has the no disease
𝐅𝐍 𝐜 𝟐𝟎𝟎
False Negative 𝐩(𝐓-/𝐃+) = = = = 𝟎. 𝟓
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟎𝟎
This means that there is 50% chance that the person would get the negative test result when actually
he has the disease
𝐅𝐏 𝐛 𝟏𝟓𝟎
False Positive 𝐩(𝐓+/𝐃-) = = = =0.25
𝐅𝐏+𝐓𝐍 𝐛+𝐝 𝟔𝟎𝟎
This means that there is 25% chance that the person would get the positive test result when actually
he has the no disease
𝐓𝐏 𝐚
Positive Predictive Value (P.P.V) 𝐩(𝐓+/𝐃+) = = =0.57
𝐓𝐏+𝐅𝐏 𝐚+𝐛
This means that there is 57% chance that the person would get the positive test result when actually
he has the disease
This means that there is 69% chance that the person would get the negative test result when actually
he has the no disease
Question:
𝐓𝐏 𝐚 𝟑𝟔
Sensitivity 𝐩(𝐓+/𝐃+) = = = = 𝟎. 𝟖
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟓
This means that there is 80% chance that the person would get the positive test result when actually
he has the disease
𝐓𝐍 𝐝 𝟐𝟑𝟎
Specificity 𝐩(𝐓-/𝐃-) = = = = 𝟎. 𝟗𝟎
𝐓𝐍+𝐅𝐏 𝐛+𝐝 𝟐𝟓𝟓
This means that there is 90% chance that the person would get the negative test result when actually
he has the no disease
𝐅𝐍 𝐜 𝟗
False Negative 𝐩(𝐓-/𝐃+) = = = = 𝟎. 𝟐
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟓
This means that there is 20% chance that the person would get the negative test result when actually
he has the disease
𝐅𝐏 𝐛 𝟐𝟓
False Positive 𝐩(𝐓+/𝐃-) = = = = 𝟎. 𝟎𝟗
𝐅𝐏+𝐓𝐍 𝐛+𝐝 𝟐𝟓𝟓
This means that there is 9.80% or 10% chance that the person would get the positive test result when
actually he has the no disease
𝐓𝐏 𝐚
Positive Predictive Value (P.P.V) 𝐩(𝐓+/𝐃+) = = =0.59
𝐓𝐏+𝐅𝐏 𝐚+𝐛
𝐓𝐍 𝐝
Negative Predictive Value (N.P.V) 𝐩(𝐓-/𝐃-) = = = 𝟎. 𝟗𝟔
𝐓𝐍+𝐅𝐍 𝐜+𝐝
This means that there is 96% chance that the person would get the negative test result when actually
he has the no disease
Note: P.P.V and N.P.V are affected by prevalence, when prevalence increases P.P.V increases
and N.P.V decreases.
𝐓𝐏 𝐓𝐏 𝐓𝐍 𝐓𝐍
𝐏. 𝐏. 𝐕 = = 𝐍. 𝐏. 𝐕 = =
𝐀𝐥𝐥 𝐓𝐞𝐬𝐭 𝐏𝐨𝐬𝐢𝐭𝐢𝐯𝐞 𝐓𝐏+𝐅𝐏 𝐀𝐥𝐥 𝐓𝐞𝐬𝐭 𝐍𝐞𝐠𝐚𝐭𝐢𝐯𝐞 𝐅𝐍+𝐓𝐍
P.P.V increases with increased specificity so higher the specificity the higher will be its P.P.V
P.P.V also increases with prevalence, N.P.V increases with increased sensitivity and decreases
with increases prevalence, so the higher the prevalence the lower will be N.P.V
Observational Studies
There are two basic types of Observational Studies
Prospective study
A prospective study is an observational study in which two random samples of subjects are
selected. One sample consists of subject who processes the risk factor, and the other sample consists of
subject who does not process the risk factor. The subjects are followed into the feature i.e. they are
followed prospectively and record in kept on the no: of subject in each sample who at some point in
time are classifiable into each of the categories of outcome variable. The data resulting from a
prospective study involving two dichotomous variables can be displayed in 2× 2 contingency table that
usually provides information regarding the no: of subjects with and without risk factor and the number
who did and did not succumb to the diseases of interest as well as the frequency for each combination
of categories of the two variables.
Disease Status
Risk Factor Present Absent Total
Present A b a+b
Absent C d c+d
Total a+c b+d Total
Retrospective Study is reverse of prospective study. The samples are selected from those falling
into the categories of outcome variable. Then the investigator look back (takes a retrospective look) at
the subject and determines which one have (or hold) and which one do not have (or did not had) the
risk factor. In general prospective study is more expensive then retrospective study.
It is a type of retrospective study, in which two groups with different known outcomes are
compared, that’s way one group have the disease and the other doesn’t have the disease. We compere
the subjects who have a disease (the cases) with subjects who do not have that disease (the control).
We calculate Odd Ratio (O.R) from the case control study.
Risk Factor
A risk factor is something that increases your chance of getting a disease, this risk come from
something you do. For example Smoking increases your chance of developing colon cancer, therefor
smoking is a risk factor for colon cancer.
𝐓𝐍
𝐍. 𝐏. 𝐕 =
𝐓𝐍 + 𝐅𝐍
̅)
𝐓 = (𝐓 ∩ 𝐃) ∪ (𝐓 ∩ 𝐃
̅)
𝐏(𝐓) = (𝐓 ∩ 𝐃) + (𝐓 ∩ 𝐃 Equation A
̅ ) = 𝐏(𝐃
𝐏(𝐓 ∩ 𝐃 ̅ ). 𝐏(𝐓/𝐃
̅)
Put in Equation A
̅ ). 𝐏(𝐓/𝐃
𝐏(𝐓) = 𝐏(𝐃). 𝐏(𝐓/𝐃) + 𝐏(𝐃 ̅)
̅ ) = 𝟏 − 𝐏(𝐃)
𝐏(𝐃
̅ ). 𝐏(𝐓
𝐏(𝐃 ̅ /𝐃
̅)
̅ /𝐓
𝐩(𝐃 ̅) =
𝐏(𝐓)̅̅̅
̅ ) = (𝐓
𝐏(𝐓 ̅∩𝐃
̅ ) + (𝐓
̅ ∩ 𝐃) Equation A
̅∩𝐃
As 𝐓 ̅ 𝐚𝐧𝐝 𝐓
̅ ∩ 𝐃 are mutually exclusive
̅∩𝐃
𝐏(𝐓 ̅ ) = 𝐏(𝐃
̅ ). 𝐏(𝐓
̅ /𝐃
̅)
̅ ∩ 𝐃) = 𝐏(𝐃). 𝐏(𝐓
𝐏(𝐓 ̅ /𝐃)
Put in Equation A
̅ ) = 𝐏(𝐃
𝐏(𝐓 ̅ ). 𝐏(𝐓
̅ /𝐃
̅ ) + 𝐏(𝐃). 𝐏(𝐓
̅ /𝐃)
̅ /𝐃)= 𝟏 − 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲
𝐏(𝐓
Question:
Medical Research team wishes to evaluate a proposed screening test for Alzheimer’s disease.
The was given to a random sample of 450 patients with Alzheimer’s disease and an independent random
sample of 500 patients without symptoms of the diseases the two samples were drawn from population
of subjects who were 65 years of age or older. The result is as follows.
Disease Status
Alzheimer’s Alzheimer’s Total
Present Absent
T+ 436 5 441
T- 14 495 509
Total 450 b+d 950
Based on another independent study it is known that the % of patients with Alzheimer’s disease is 11.3%
out of all subjects who were 65 years of age or older. First we calculate sensitivity and specificity as
follows.
Solution:
𝐓𝐏 𝐚 𝟒𝟑𝟔
Sensitivity 𝐩(𝐓+/𝐃+) = = = = 𝟎. 𝟗𝟔
𝐓𝐏+𝐅𝐍 𝐚+𝐜 𝟒𝟓𝟎
This means that there is 96% chance that the person would get the positive test result when actually
he has the disease
This means that there is 99% chance that the person would get the negative test result when actually
he has the no disease
The positive predictive value of the test we wish to estimate the probability that the subject who is
positive on the test has Alzheimer’s disease
𝟎. 𝟏𝟎𝟖𝟒𝟖
𝐩(𝐃/𝐓) =
𝟎. 𝟏𝟎𝟖𝟒𝟖 + 𝟎. 𝟖𝟗𝟕
𝟎. 𝟏𝟎𝟖𝟒𝟖
𝐩(𝐃/𝐓) =
𝟎. 𝟏𝟏𝟕𝟑𝟓
𝐩(𝐃/𝐓) = 𝟎. 𝟗𝟐𝟒𝟒
This means that 93% of the subject has a disease when given that the test is positive.
̅ ). 𝐏(𝐓
𝐏 (𝐃 ̅ /𝐃
̅)
̅ /𝐓
𝐩(𝐃 ̅) =
̅ ). 𝐏(𝐓
𝐏(𝐃 ̅ /𝐃
̅ ) + 𝐏(𝐃). 𝐏(𝐓
̅ /𝐃)
𝟎. 𝟖𝟕𝟏𝟑
̅ /𝐓
𝐩(𝐃 ̅) =
𝟎. 𝟖𝟕𝟏𝟑 + 𝟎. 𝟎𝟎𝟒𝟓
𝟎. 𝟖𝟕𝟏𝟑
̅ /𝐓
𝐩(𝐃 ̅) =
𝟎. 𝟖𝟕𝟓𝟖
̅ /𝐓
𝐩(𝐃 ̅ ) = 𝟎. 𝟗𝟗
Likelihood Ratio
Likelihood ratio describes how many times a person with diseases is more likely to receive a
particular test result, then a person without disease. Another words it means how likely it is that a
patient has a disease as compare to patient without disease. A negative likelihood ratio means, how
likely it is that a patient has no disease as compare to patients with disease.
An LR+ of a positive test result is usually a number greater than “1” and an LR- of a negative test result
usually less between 0-1. When LR=1, this is useless, which means that this test has a very little
influence on a fact that a patient does or does not have a disease.
𝐏(𝐓/𝐃) 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲
𝐋𝐑+= 𝐏(𝐓/𝐃̅) = 𝟏−𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲
̅ /𝐃)
𝐏(𝐓 𝟏−𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲
𝐋𝐑-= 𝐏(𝐓̅/𝐃̅) = 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲
Test Accuracy
The accuracy of a test is its ability to differentiable the patient and healthy cases correctly. To
estimate the accuracy of the test we should calculate the proportion of true positive and true negative
and all evaluated cases. Mathematically it can be stated as:
𝐓𝐏 + 𝐓𝐍
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍 + 𝐅𝐏 + 𝐅𝐍
𝐚+𝐝
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐚+𝐛+𝐜+𝐝
Scenario-I
Imagine we have a sample of 100 cases, 50 healthy and other patients. If a test is positive for all patients
and be negative for all healthy once, it is a 100% accurate. In figure error shows the test and it is been
able to differentiate the healthy and patient exactly. In this example the sensitivity of the test is
𝐓𝐏 + 𝐓𝐍
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍 + 𝐅𝐏 + 𝐅𝐍
𝐚+𝐝
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐚+𝐛+𝐜+𝐝
𝟓𝟎 + 𝟓𝟎
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝟓𝟎 + 𝟎 + 𝟓𝟎 + 𝟎
This means that there is 100% chance that the person would get the positive test result when actually
he has the disease
𝐓𝐍 𝐝 𝟓𝟎
Specificity 𝐩(𝐓-/𝐃-) = 𝐓𝐍+𝐅𝐏 = 𝐛+𝐝 = 𝟓𝟎+𝟎 = 𝟏 𝐨𝐫 𝟏𝟎𝟎%
This means that there is 100% chance that the person would get the negative test result when actually
he has no disease
Taking into account the mentioned statistical characteristics this test is appropriate for both screening
and final verification a disease.
Scenario-II
Test with 75% accuracy 50% sensitivity and 100% specificity. If test is can only diagnose 25 out of the 50
patients and has reported the other has healthy (as we see from figure II) then the accuracy sensitivity
and specificity are given below accuracy of the 100 cases that have been tasted the test could determine
25 patients and 50 healthy cases correctly, therefor the accuracy of the test is 75%
𝐓𝐏 + 𝐓𝐍
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍 + 𝐅𝐏 + 𝐅𝐍
This means that there is 50% chance that the person would get the positive test result when actually
he has the disease
𝐓𝐍 𝐝 𝟓𝟎
Specificity 𝐩(𝐓-/𝐃-) = 𝐓𝐍+𝐅𝐏 = 𝐛+𝐝 = 𝟎+𝟓𝟎 = 𝟏 𝐨𝐫 𝟏𝟎𝟎%
This means that there is 100% chance that the person would get the negative test result when actually
he has no disease
This test is not suitable for screening purpose but is suitable for final confirmation of a disease.
Scenario III
If we assume that the test as mean able to identified 25% of the 50 healthy cases and as
reported the other as patient (we see from figure III) in this scenario accuracy, sensitivity and specificity
will be as follows test with 75% accuracy, 100% sensitivity and 50% specificity.
𝐓𝐏 + 𝐓𝐍
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍 + 𝐅𝐏 + 𝐅𝐍
𝐚+𝐝
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐚+𝐛+𝐜+𝐝
𝟓𝟎 + 𝟐𝟓
𝐓𝐞𝐬𝐭 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝟓𝟎 + 𝟐𝟓 + 𝟐𝟓 + 𝟎
𝐓𝐍 𝐝 𝟐𝟓
Specificity 𝐩(𝐓-/𝐃-) = 𝐓𝐍+𝐅𝐏 = 𝐛+𝐝 = 𝟓𝟎 = 𝟎. 𝟓 𝐨𝐫 𝟓𝟎%
This means that there is 50% chance that the person would get the negative test result when actually
he has no disease
This test is suitable for screening purpose but it is not suitable for final confirmation of a disease.
Diagnostic Test
A diagnostic test is a procedure perform to conform or to determine the presence or absence of
disease in an individual suspected of having the disease usually following the reported of symptoms or
base on the result of other medical tests. This procedure will give as a rapid indication of whether a
patient has certain disease. A diagnostic test is any approach use together clinical information for
purpose of making a clinical decision. i.e. (Diagnoses) some examples of diagnostic test x-ray, Biopsies,
pregnancy test, medical histories and result from physical examination.
𝐃. 𝐎. 𝐑 = 𝐋𝐑+/𝐋𝐑-
The Diagnostic Odd Ratio may be express in term of sensitivity and specificity of the test.
𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲 𝐒𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲
𝑫. 𝐎. 𝐑 = ×
(𝟏 − 𝐬𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲) (𝟏 − 𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜𝐢𝐭𝐲)
𝐚 𝐝
𝐃. 𝐎. 𝐑 = 𝐚+𝐜 × 𝐛+𝐝
𝐚
(𝟏 − ) (𝟏 − 𝐛 )
𝐚+𝐜 𝐛+𝐝
𝐚 𝐝
𝐚 + 𝐜
𝐃. 𝐎. 𝐑 = 𝐚 + 𝐜 − 𝐚 × 𝐛 + 𝐝
𝐛+𝐝−𝐝
𝐚+𝐜 𝐛+𝐝
𝐏. 𝐏. 𝐕 𝐍. 𝐏. 𝐕
𝐃. 𝐎. 𝐑 = ×
(𝟏 − 𝐏. 𝐏. 𝐕) (𝟏 − 𝐍. 𝐏. 𝐕)
𝐚 𝐝
𝐃. 𝐎. 𝐑 = 𝐚+𝐛 × 𝐜+𝐝
𝐚
(𝟏 − ) (𝟏 − 𝐝 )
𝐚+𝐛 𝐜+𝐝
𝐚 𝐝
𝐃. 𝐎. 𝐑 = 𝐚 + 𝐛 × 𝐜 + 𝐝
𝐚+𝐛−𝐚 𝐜+𝐝−𝐝
𝐚+𝐛 𝐜+𝐝
𝐚 𝐝
𝐃. 𝐎. 𝐑 = ×
𝐛 𝐜
𝐚𝐝
𝐃. 𝐎. 𝐑 = = 𝐎. 𝐑
𝐛𝐜
Question:
Concerned test with the following 2×2 contingency table calculate Diagnostic O.R
Disease Status
D+ D- Total
+
T 26 12 38
T- 3 48 51
Total 29 60 89
𝐓𝐏
𝐃. 𝐎. 𝐑 = 𝐅𝐏
𝐅𝐍
𝐓𝐍
Probability Distribution
The probability distribution of a random variable, describes how the probability are distributed
over the values of a random variable. A probability distribution is a listing of all the outcomes of an
experiment and their associated probabilities. For a discrete random variable X, the probability
distribution is defined by probability Mass functionf(x) = p[X = x], where this function gives the
probability for each value of the random variable. Consider the example of tossing of three coins in
which the variable of interest is a random variable X (the number of heads) when three coins are tossed,
let X, be the no: of heads.
Where X=0, 1, 2, 3
𝟏
𝐩(𝐗 = 𝟎) = 𝐩(𝐍𝐨 𝐡𝐞𝐚𝐝𝐬) = 𝐩(𝐓𝐓𝐓) =
𝟖
𝟑
𝐩(𝐗 = 𝟏) = 𝐩( 𝐨𝐧𝐞 𝐡𝐞𝐚𝐝𝐬) = 𝐩(𝐓𝐇𝐓, 𝐇𝐓𝐓, 𝐓𝐓𝐇) =
𝟖
𝟑
𝐩(𝐗 = 𝟎) = 𝐩(𝐭𝐰𝐨 𝐡𝐞𝐚𝐝𝐬) = 𝐩(𝐇𝐇𝐓, 𝐓𝐇𝐇, 𝐇𝐓𝐇) =
𝟖
𝟏
𝐩(𝐗 = 𝟎) = 𝐩(𝐭𝐡𝐫𝐞𝐞 𝐡𝐞𝐚𝐝𝐬) = 𝐩(𝐇𝐇𝐇) =
𝟖
X P(X=x)
0 1/8
1 3/8
2 3/8
3 1/8
1
i. 𝑓 (𝑥) ≥ 0 ∀ 𝑥 ∈ 𝑋,
ii. ∑𝑓(𝑥) = 1
Binomial Experiment
A binomial experiment is a statistical experiment that as the following properties.
Consider the statistical experiment, in which we flip a coin two times and count the number of times
that a head occur, this is a binomial experiment because.
I. The experiment consist of repeated trails, we flip the coin two times.
II. Each trail can result in just two possible outcomes i.e. head or trail.
III. The probability of success is constant, i.e. 1/2
IV. The trails are independent i.e. getting head on one trail doesn’t affect whether we
get head on the other trail.
If a new drug is introduced to cure a disease, it either cures the disease (it is successful) or it
does not cure the disease (it is failure), if you purchased a lottery ticket, you are either going to won a
price or not. Basically anything you can think of that can only be success or failure can be represented by
a binomial distribution.
Notations:
The Probability that an “n” trail Binomial-experiment, results an exactly X success, when the probability
of success on an individual trail is p.
Suppose a binomial experiment consist of “n” trails which results on an “n” successes on an individual
trail is p if then the probability Mass function (P.M.F) of the Binomial Distribution is.
𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗 = 𝟎, 𝟏, 𝟐 … … 𝐧
𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = 𝟎 𝐎. 𝐖
Question:
Suppose a die is rolled is 5 times. What is the probability of getting exactly 2, fours?
Solution:
This is a binomial experiment in which the number of successes=2, the number of trails=5 and the
probability of successes=p=1/6 or 0.167, therefor the Binomial probability is
Question:
The probability that a student is accepted to a prestigious college is 0.3 If 5 students from the
same school apply. What is the probability that at most 2 are accepted?
Solution:
2
5
𝐩[𝐗 ≤ 𝟐] = ∑ ( ) 0.3𝑥 0.75−𝑥
𝑥
𝑥=0
𝟓 𝟓 𝟓
𝐩[𝐗 ≤ 𝟐] = ( ) (𝟎. 𝟑)𝟎 (𝟎. 𝟕)𝟓−𝟎 + ( ) (𝟎. 𝟑)𝟏 (𝟎. 𝟕)𝟓−𝟏 + ( ) (𝟎. 𝟑)𝟐 (𝟎. 𝟕)𝟓−𝟐
𝟎 𝟏 𝟐
𝐩[𝐗 ≤ 𝟐] = 𝟎. 𝟏𝟔𝟖𝟎 + 𝟎. 𝟑𝟔𝟎𝟏 + 𝟎. 𝟑𝟎𝟖𝟕
𝐩[𝐗 ≤ 𝟐] = 𝟎. 𝟖𝟑𝟔𝟖
Question:
60% of the people who purchased sports car are male. If 10 sports car are randomly selected.
Find the probability that exactly 7 are men.
Solution:
𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟕
𝐱
𝟏𝟎
𝐩 [ 𝐗 = 𝟕] = ( ) (𝟎. 𝟔𝟎)𝟕 (𝟎. 𝟒𝟎)𝟏𝟎−𝟕
𝟕
𝐩[𝐗 = 𝟕] = 𝟎. 𝟐𝟏𝟒𝟗
Question:
Suppose that 80% of adults with allergies with report symptoms relief with a specific
Medication. If the medication is given to 10 new patients with allergies, what is the probability that is
effective in exactly7?
Solution:
Question:
The likelihood that a patient with heart attack is 0.04.suppose we have 5 patients who suffer a
heart attack. What is the probability that all survive?
Solution:
𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟎
𝐱
𝐩(𝐀𝐥𝐥 𝐭𝐡𝐞 𝐩𝐚𝐭𝐢𝐞𝐧𝐭𝐬 𝐬𝐮𝐫𝐯𝐮𝐯𝐞)
𝟓
𝐩[𝐗 = 𝟎] = ( ) (𝟎. 𝟎𝟒)𝟎 (𝟎. 𝟗𝟔)𝟓−𝟎
𝟎
𝐩[𝐗 = 𝟎] = 𝟎. 𝟖𝟏𝟓𝟑
Question:
In a class of 8 students 3% of the students are suffering from anxiety. A sample of 100 students
is selected. Find the probability that out of these.
Solution:
Let X is a random variable denoted the number of students suffering from anxiety.
𝐧
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = ( ) 𝐩𝐱 𝐪𝐧−𝐱 𝐗=𝟎
𝐱
𝟓
𝐩[𝐗 = 𝟎] = ( ) (𝟎. 𝟎𝟑)𝟎 (𝟎. 𝟗𝟕)𝟓−𝟎
𝟎
𝐩[𝐗 = 𝟎] = 𝟎. 𝟖𝟓𝟖𝟕
𝐩 [ 𝐗 ≥ 𝟏] = 𝟏 − 𝐩 ( 𝐗 < 𝟏)
𝐩[𝐗 ≥ 𝟏] = 𝟏 − 𝟎. 𝟖𝟓𝟖𝟕
𝐩[𝐗 ≥ 𝟏] = 𝟎. 𝟏𝟒𝟏𝟑
𝟓 𝟓 𝟓
𝐩[𝐗 ≥ 𝟏] = 𝟏 − {( ) (𝟎. 𝟎𝟑)𝟎 (𝟎. 𝟗𝟕)𝟓−𝟎 + ( ) (𝟎. 𝟎𝟑)𝟏 (𝟎. 𝟗𝟕)𝟓−𝟏 + ( ) (𝟎. 𝟎𝟑)𝟐 (𝟎. 𝟗𝟕)𝟓−𝟐 }
𝟎 𝟏 𝟐
𝐩[𝐗 ≥ 𝟑] = 𝟏 − 𝟎. 𝟖𝟓𝟖𝟕 + 𝟎. 𝟏𝟑𝟐𝟕 + 𝟎. 𝟎𝟎𝟖𝟐
𝐩[𝐗 ≥ 𝟑] = 𝟎. 𝟐𝟖𝟐𝟐
A binomial experiment might be used to determine how many black cars are in a random sample of 50
cars. A Poisson experiment might focus on the number of cars random Arriving at a car Wash during a 20
minute interval. The Poisson distribution has the following characteristics.
i) It is discrete Distribution.
ii) Each occurrence is independent of the other occurrence.
iii) It describes discrete occurrence over an interval.
iv) The occurrence in each interval can range 0-∞.
v) The mean number of occurrence must be constant throughout the experiment.
𝐞−𝛍 𝛍 𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = , 𝐱 = 𝟎, 𝟏, 𝟐, … . . ∞
𝐱!
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) = 𝟎 𝐎. 𝐖
Then the random Variable X is said to be have Poisson distribution with Parameter𝜇. Where the symbol
“!” is called Factorial.
𝜇(is called the expected or mean number of occurrence) is sometimes written as𝜆, some times is called
event rate or rate parameter.
Solution:
𝐞−𝛍 𝛍𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
𝐞−𝟐 𝟐𝟑
𝐩 [ 𝐗 = 𝟑] =
𝟐!
𝐩[𝐗 = 𝟑] = 𝟎. 𝟏𝟖𝟎 𝐨𝐫 𝟏𝟖%
Question:
The average number of home should by a gcon’s company has 2 home per day. What is the
probability that exactly 3 home will be sold tomorrow.
Solution:
𝐞−𝛍 𝛍𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
𝐞−𝟐 𝟐𝟑
𝐩 [ 𝐗 = 𝟑] =
𝟐!
𝐩[𝐗 = 𝟑] = 𝟎. 𝟏𝟖𝟎 𝐨𝐫 𝟏𝟖%
Question:
Suppose the average number of loins seen in jungle on 1 day visits as 5. What is the probability
that has 2 arrests will see fewer than 4 loins on the next day visit.
Solution:
𝐞−𝛍 𝛍𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
Since we want to find likelihood tourist will see four lions i.e. we want to find the probability that they
will see X=0, 1, 2, 3 or X<4
𝐩 [ 𝐗 < 𝟒] = 𝐩 ( 𝐗 = 𝟎) + 𝐩 ( 𝐗 = 𝟏) + 𝐩 ( 𝐗 = 𝟐) + 𝐩 ( 𝐗 = 𝟑)
That the probability that tourist will see no more than 3 lions are 0.264.
Question:
Consider a computer system will Poisson job annual determine the probability.
I. Zero Jobs.
II. Exactly 2 Jobs.
III. At most three Jobs.
Solution:
Zero Jobs
𝐞−𝛍 𝛍𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
𝐞−𝟐 𝟐𝟎
𝐩 [ 𝐗 = 𝟎] =
𝟎!
𝐩[𝐗 = 𝟎] = 𝟎. 𝟏𝟑𝟓
Exactly 2 Jobs
𝐞−𝛍 𝛍𝐱
𝐩[𝐗 = 𝐱] = 𝐟(𝐱) =
𝐱!
𝐞−𝟐 𝟐𝟐
𝐩 [ 𝐗 = 𝟐] =
𝟐!
𝐩[𝐗 = 𝟐] = 𝟎. 𝟐𝟕
𝐩 [ 𝐗 < 𝟒] = 𝐩 ( 𝐗 = 𝟎) + 𝐩 ( 𝐗 = 𝟏) + 𝐩 ( 𝐗 = 𝟐) + 𝐩 ( 𝐗 = 𝟑)
The function f(x) representing the normal distribution satisfies the properties of proper p.d.f
The Mean, Median and Mode for normal distribution are equal.
𝛍2n+1= 𝟎 ∀ 𝐧 (𝐨𝐝𝐝)
The Normal curve expends n indefinitely for to the left and to the right, approaching more
closely the x-axis, as x increases in magnitude.
The curve is symmetric about its Mean and thus the area to the left to the Mean and the area
to the right of the Mean each equal to the 0.5.
For Normal distribution about 68% of the area under the curve are between 𝜇 − 𝜎 and 𝜇 + 𝜎
and about 95% of the area under the curve are between 𝜇 − 2𝜎 and 𝜇 + 2𝜎 and about 99.7%
of the area under the curve are between 𝜇 − 3𝜎 and𝜇 + 3𝜎.
The points of inflection on the curve are standard deviation away from the Mean.
The average on the statistics test was 78, with S.D of 8. If the test score are normally distributed.
Find the probability that a student receives a test score less than 90
Solution:
𝐗 − 𝛍 𝟗𝟎 − 𝛍
𝐏(𝐗 < 𝟗𝟎) = 𝐩( < )
𝛔 𝛔
In Standardize Form
𝟗𝟎 − 𝟕𝟖
𝐏(𝐗 < 𝟗𝟎) = 𝐩(𝐙 < )
𝟖
𝐏(𝐗 < 𝟗𝟎) = 𝐏(−∞ < 𝐙 < 𝟎) + 𝐏(𝟎 < 𝐙 < 𝟏. 𝟓𝟎)
Question:
A pollen count for a species of flowers vary randomly in a manner well represented by a normal
distribution with 𝜇 = 1000, and 𝜎 = 80
I. Find the probability that an individual pollen count will be greater than 1200
II. Less than 775.
III. Between 800 and 1100.
Solution:
Find the probability that an individual pollen count will be greater than 1200
𝐗 − 𝛍 𝟏𝟐𝟎𝟎 − 𝛍
𝐏(𝐗 < 𝟏𝟐𝟎𝟎) = 𝐩( < )
𝛔 𝛔
𝟏𝟐𝟎𝟎 − 𝟏𝟎𝟎𝟎
𝐏(𝐗 < 𝟏𝟐𝟎𝟎) = 𝐩(𝐙 < )
𝟖𝟎
𝐏(𝐗 < 𝟗𝟎) = 𝐏(𝟎 < 𝐙 < ∞) − 𝐏(𝟎 < 𝐙 < 𝟐. 𝟓𝟎)
𝐗 − 𝛍 𝟕𝟕𝟓 − 𝛍
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝐩( < )
𝛔 𝛔
In Standardize Form
𝟕𝟕𝟓 − 𝟏𝟎𝟎𝟎
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝐩(𝐙 < )
𝟖𝟎
𝐏(𝐗 < 𝟕𝟕𝟓) = 𝐏(−𝟐. 𝟖𝟏 < 𝐙 < 𝟎) + 𝐏(𝟎 < 𝐙 < +∞)
𝟖𝟎𝟎 − 𝛍 𝐗 − 𝛍 𝟏𝟏𝟎𝟎 − 𝛍
𝐏(𝟖𝟎𝟎 ≤ 𝐗 ≤ 𝟏𝟏𝟎𝟎) = 𝐩( < < )
𝛔 𝛔 𝛔
In Standardize Form
Definition:
The P-value (or Probability value) is a probability of getting a sample statistic (such as the Mean)
or a more extreme sample statistic in the direction of the alternative hypothesis when the null-
hypothesis is true.
OR
“The P-value is the probability of getting the observed value of the test statistic, or a value with even
greater evidence against Ho, if the null-hypothesis is actually true”
Step N0: 2
Compute the test value.
Step N0: 3
Find the P-value
Step N0: 5
Summarized the result
P-value= 2P (Z>Zo)
P-value= P (Z>Zo)
P-value= P (Z<Zo)
Case: I
If HA or H1 contains a less the alternative, find the probability that Z< your test statistic (i.e look
up your test statistic on the Z table and find its corresponding probability) the result is your P-value.
Case: II
If HA or H1 contains a greater than the alternative, find the probability that Z> your test statistic
(i.e look up your test statistic on the Z table and find its corresponding probability and subtract it from 1)
the result is your P-value.
Case: III
If HA or H1 contains a not equal to alternative, find the probability that Z is beyond your test
statistic and double it.
If your test statistic is negative, first find the probability that Z is less than test-statistic (i.e look
up your test statistic on the Z table and find its corresponding probability) then double this
probability to get P-value from)
If your test statistic is positive , first find the probability that your test-statistic (i.e look up your
test statistic on the Z table and find its corresponding probability and then subtract it from 1)
then double this result to get P-value
Question: A researcher wishes to test to claim that the average cost of tuition in fees it 2 Year college
is greater than $5550. She selects a random sample of 36 2 year colleges and find is the mean to be
$5800, the population S.D is $600. Is there any evidence to support the claim at α 0.05? use P-value
Method.
Solution:
2) Test Statistic
𝟓𝟖𝟎𝟎−𝟓𝟓𝟓𝟎
Z= 𝟔𝟎𝟎
√𝟑𝟔
Z=2.50
P-value= 1-P(Z>2.50)
=1-0.4938
=0.062
0.062<0.05
I.e. P<α
Since P is less than α so there is enough evidence to support the claim that the tuition is
fees it 2 years colleges are greater than $5550.
Question: A researcher wishes to test to claim that the average wind speed in a certain city is 9 per
hour. A sample of 36 days has an average wind speed 9.3, the S.D of the population is 0.8 miles per
hours at α=0.01. Is there enough evidence to reject the claim? Use P-value Method.
Solution:
HO 𝝁 = 𝟗 VS H1 𝝁 ≠ 𝟗
2) Test Statistic
𝟗.𝟑−𝟗
Z= 𝟎.𝟖
√𝟑𝟔
Z=2.25
P-value =1-0.9878
P-value =0.0122
P-value=2(0.0122)
P-value=0.0244
0.0244>0.01
I.e. P>α
Since P> α so there is not enough evidence to reject the claim that the average wind
speed is 9 miles per hour.
Question: Suppose the average no: of Facebook friend from 150 S.D = 40.3. A random sample of 64
high school students in a particular country related the average Facebook friend was 160 at α=0.01. Is
their sufficient evidence to compute that the mean.
Solution:
2) Test Statistic
𝟏𝟔𝟎−𝟏𝟓𝟎
Z= 𝟒𝟎.𝟑
√𝟔𝟒
Z=1.9851
P-value =1-0.9767
P-value =0.0233
0.0233>0.01
I.e. P>α