Bio Statistic

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Biostatistics

Biostatistics

Content

IS
S/N Topics Page
1 Epidemiology 2
2 Biostatistics 2
3 Research 3
4 Variable 4
5 Based On Relationship To Each Other 5
6 Based On Characteristics (The Values It Takes) 6
7 Data 7

S
8 Explanation And Examples 8
9 Scales Of Measurement 9
10 Study Design 10
11 Cross Sectional Study/Prevalence Study 11
12 Case-Control Study (Retrospective Study) 12
13 Cohort Study (Prospective Study) 13
14
15
Clinical Trial
E
Quasi Experimental Study
15
17
16 Measuring Association In Epidemiology 18
17 Sampling Technique 20
18 Types Of Sampling 21
N
19 Probability Sample 22
20 Non Probability Sampling 24
21 Measures Of Location 25
22 Measures Of Dispersion 27
23 Data Presentation 31
24 Presentation Of Quantative Data 33
25 Presentation Of Qualitative Data 36
E

26 Hypothesis 39
27 Probability & ‘P’ Value 40
28 Parametric Test & Non-Parametric Test 42
29 Hypothesis Testing 43
30 Performance Of A Diagnostic Test 47
G

31 Previous Years Question 50

GENESIS 1
Biostatistics

EPIDEMIOLOGY
Definition: Epidemiology is the study of the distribution and determinants of health, disease, or injury
in human populations and the application of this study to the control of health problems

Examples
 National and local surveillance system (cancer, AIDS, occurrence of E. coliO157:H7 outbreak)

IS
 Cohort study to investigate the association of cell phone use and the development of brain
tumors
 Survey of individuals who took Cox-2 inhibitors

Statistics:
Statistics is a science dealing with the collection and analysis of data to obtain reliable results and
conclusions

S
Biostatistics
Def: Biostatistics is the application of statistics to problems in the biological sciences, health, and
medicine

Examples
E
 Computing age-adjusted cancer incidence rates to determine trends over time and locality
 Calculating statistical measures of the risk of developing brain tumors following cell phone use
after adjusting for possible confounding variables
 Quantifying the relationship between use of Cox-2 inhibitors and quality of life
N
E
G

2 GENESIS
Biostatistics

RESEARCH

Def: Scientific and systematic search (collection, analysis and interpretation of data) for knowledge
to answer certain question or to solve problem.

Research protocol: A structured account of plan of action of an intended research. It includes


problem statement, justification, methods to be followed and detail budget

IS
Thesis/dissertation: A write up of the findings of a research usually required for an academic
degree.

Types Of Research:
Descriptive research:
 Describe the state of affairs as it exists at present.

S
Analytical research:
 Analyze present available facts to make a critical evaluation.

Qualitative research:
 Deals with qualitative phenomenon/ data.
 Subjective assessment of attitude, behavior, opinion , feeling, values, emotion, practice etc.
E
Quantitative research:
 Deals with quantitative phenomenon/ data.

Applied research:
N
 Problem oriented research to solve the problem

Basic (Pure) research:


 Deals with basic biological processes at cellular level
E
G

GENESIS 3
Biostatistics

VARIABLE

Concept: Variable is the key concept in research. We understand the magnitude of a problem (how
big or small it is) by measuring some variables or their relationship.

Definition: Variables are characteristics or attribute of an individual (or people, animal,


communities, time etc) that varies.

IS
Classification:
 Based on relationship to each other
o Independent (Exposure) variable
o Dependant (Out come) variable
o Intervening variable
o Confounding variable

S
 Based on characteristics (the values it takes)
o Quantitative or numerical
 Discrete
 continuous
o Qualitative or categorical
 Nominal
 Ordinal
E
N
E
G

4 GENESIS
Biostatistics

BASED ON RELATIONSHIP TO EACH OTHER

Independent (Exposure) variable:


 It influences the dependant variable or outcome variable
 It is the cause, exposure or input of an outcome variable
 Smoking causes lung cancer, smoking is an independent variable
 Oral pill is associated with cervical cancer, oral pill is an independent variable

IS
Dependant (Out come) variable:
 It is influenced by the independent variable
 It is the effect or output of an exposure variable
 Smoking causes lung cancer, lung cancer is an dependent variable
 Oral pill is associated with cervical cancer, cervical cancer is an dependent variable

Intervening variable:

S
 These are variables through which independent variable influence dependant variable
 Vitamin-A maintains epithelial health and prevents infection. Here Vit-A is independent,
epithelial health is dependant and prevention of infection is dependant variable
 Salt intake causes hypertension, hypertension causes MI. Here salt intake independent,
hypertension intervening and MI dependant variable.

Confounding variable:
E
 These variables distort the relationship between independent and dependant variables being
independently associated with both.
 If a study assesses high alcohol consumption is a risk factor for coronary heart disease,
smoking is confounding variable, because smoking is known to related to alcohol
N
consumption, and also a risk factor for coronary heart disease.
 In case of oral pill intake and developing cervical cancer, age at marriage is a confounder.
 A variable can be independent in one context, but dependant, intervening or confounding in
another context. For example,
 Poverty>Less vitamin intake>blindness
E

 Less vitamin intake>blindness>poverty


 Blindness of parents>poverty>less intake of Vit-A in children
G

GENESIS 5
Biostatistics

BASED ON CHARACTERISTICS (THE VALUES IT TAKES)

Quantitative or numerical:
 These variables take numerical values whose size is meaningful
 They answer question like how many and how much
 Usually composed of numbers
 They have measurement units

IS
 Height in inch, age in years, income in dollars, family size in number of family members etc.
 This is divided into discrete and continuous variable
 Discrete variable does not take a fraction such as pulse rate(72b/min), family member (4)
 Continues variable takes fractions such as exact height (169.36 cm), weight (17.5 kg)

Qualitative or categorical:
 These are expressed as categories or levels
 Answer question like which kind

S
 Usually composed of texts
 May take numerical values whose size is not meaningful (social security number, Zip code)
 They have no measurement units
 Gender, pain, color etc. are some of the examples
 This is divided into Nominal and ordinal variable
 Nominal variables are categories without a natural order like gender


E
Ordinal variables are categories with a natural order like pain (mild, severe, very severe)
Continuous or numeric discrete variable can often be converted into categorical (Nominal or
ordinal) variables or vice versa.
 Like pulse rate (numerical variable) can be converted into three categories
bradicardia(≤60bpm), normocardia(61-89 bpm) and tachycardia(≥90 bpm)
N
Derived or computed variable:
 It is developed from one or more independently measured but somehow related variables to
have a valid account of an issue of interest.
 Socioeconomic status is derived from household expenditure, years of schooling and living
space, 3 important determinants of socioeconomic status of a study population.
E
G

6 GENESIS
Biostatistics

DATA

Concept: In any scientific research what we collect from the study participants is data. Quality
data is the key concern in research

Data: Information obtained through measurement or observation on individual, object or

IS
phenomenon (or about any variable)

Classification:
 Based on characteristics (the values it takes)
o Quantitative or numerical
 Discrete
 continuous

S
o Qualitative or categorical
 Nominal
 Ordinal

 Based on source
o Primary data
o Secondary data
o Derived data
E
 Based on how many variable and piece of information incorporates
o Univariate data
N
o Bivariate data
o Multivariate data

 Based on Scale of measurement


o Nominal data
o Ordinal data
E

o Interval data
o Ratio data
G

GENESIS 7
Biostatistics

EXPLANATION AND EXAMPLES

Primary data:
 Data obtained first hand by researcher
 Generated by observation, measurement, experiment, interview etc
Secondary data:
 Data already collected by someone else

IS
 Data taken from records, journals, books etc
Derived data:
 Derived from primary and secondary data
 BMI calculation from weight and height
Univariate data:
 Single variable and single piece of information
 Birth rate of male baby
Bivariate data:

S
 Two variables linking two pieces of information
 Birth rate of Rh +ve male baby
Multi variate data:
 More than two variables linking more than two pieces of information
 Birth rate of Rh +ve, premature male baby
E
Dichotomous (Binary) data:
 Express two mutually exclusive information
 Sex (Male and Female)
Sources of data:
 Observation
N
 Measurement, counting
 Experiment
 Survey
 Census
 Test e.g. exam.
 Records
E

 Documents
 Other studies
Methods of data collection:
 Observation, interview, survey, census, FGD, questionnaire
 Measurement
G

 Experiment/testing
 Review of journals, documents, books
Data collection tools:
 Data collection sheet
 Standard questionnaire
 Laboratory instruments/testing kit
 Scale, watch, pen, paper

8 GENESIS
Biostatistics

SCALES OF MEASUREMENT
 Nominal Scale
 Ordinal scale
 Interval scale
 Ratio scale

IS
S
E
N
E
G

GENESIS 9
Biostatistics

STUDY DESIGN

Def: Scientific and ethical ways and methods of search to harvest valid and reliable information.

Types Of Study Design:


Observational study
 Descriptive study

IS
 Case study
 Case series
 Cross sectional study
 Analytical study
 Case control study
 Cohort study
 Cross sectional study

S
Experimental study
 Clinical trial (exp. study on human)
 RCT (Randomized controlled clinical trial)
 CIT (Community interventional trial)
 Quasi experimental study
E
Difference between observational and Experimental study:
1. Based on Observation 1. Based on Experiment
2. Nature affects the Outcome 2. Researcher intervenes to affect the outcome
3. Researcher measures only 3. Researcher intervenes and measures
4. Ethical problem less 4. Ethical problem more
N
Difference between descriptive and analytical study:
1. Describes the distribution of problem 1. Describes the determinants of problem
2.No comparison group 2.Comparison group present
3.No attempt to analyze the link between 3.Exposure and outcome relationship is analyzed
exposure and outcome
E

4.Usually no hypothesis testing 4.Hypothesis testing done


G

10 GENESIS
Biostatistics

CROSS SECTIONAL STUDY/PREVALENCE STUDY


Cross-sectional studies measure the prevalence of disease and thus are often called prevalence studies.
In a cross-sectional study the measurements of exposure and effect are made at the same time. It is not
easy to assess the reasons for associations shown in cross-sectional studies. The key question to be
asked is whether the exposure precedes or follows the effect. If the exposure data are known to
represent exposure before any effect occurred, the data from a cross-sectional study can be treated like
data generated from a cohort study.

IS
Characteristics:
 Sampling done with regard to neither exposure nor outcome.
 Finds what is happening.
 Doneinadefinitecrosssectionofpopulationataparticularpointoftimeorwithinashortspanoftime
 Measures exposure & outcome simultaneously.

S
Advantage
 Quick, inexpensive.
 Can study several exposures & several out comes at a time.
 Provides prevalence information

Disadvantage

E
Can’t infer temporality.
 Prone to bias.
 Not good for rare disease.
 Not good for disease with short duration & high fatality.
N
E
G

GENESIS 11
Biostatistics

CASE-CONTROL STUDY (RETROSPECTIVE STUDY)


A case-control study is an observational or descriptive analytic study in which diseased and non
diseased or affected and non affected subjects are identified after the fact and then compared regarding
specific characteristics to determine possible association to risk for the disease in question.

Characteristics:
 Case-control study is called retrospective study. I.e. study precedes backwards both

IS
exposure and outcome (disease) has occurred before the start of the study.
 It starts with the disease and proceeds from the disease to the cause.
 If is usually the first and easier to test a causal hypothesis.
 It uses fewer number of subjects
 Study yields relatively quick results
 It is suitable to study rare diseases
 It is relatively inexpensive
 It uses a control or comparison group to support of refute on inference.

S
Steps in case-control study:
 Selection of cases and controls
 Matching
 Measurement of exposure and
 Analysis and interpretation
E
Methodology (How to do case-control study)
1) Diseased (cases) and no diseased (controls) populations must be identified usually
retrospectively
2) Determination of exposure: Once cases and controls are selected, information must be
N
collected on prior exposure to the risk factor of interest as well to other exposures
3) Matching of controls to cases on the basis of known risk factors for the disease of interest.
The intent of matching is usually to decrease the possibility of confounding or mixing of
the effect of interest with the effects of other risk factor.
4) Conclusions are useful for the generation of hypotheses and for initial evidence of putative
risk associations. Results cannot be used for define causally.
E

Example: A comparison of prior estrogen used in uterine cancer cases compared to age matched
controls without cancer to assess possible risk for exposure to estrogens is on appropriate subject for
case-control study.

Advantages and disadvantages of case-control studies


G

Advantages Disadvantages
(i) Efficient for the study of rare diseases (i) Risk of disease cannot be estimated directly
(ii) Efficient for the study of Ehronic diseases (ii) Not efficient for the study of rare exposures
(iii) Tend to require a smaller sample size than (iii) More susceptible to selectionbias than
other designs altemotive designs
(iv) Less expensive than alternative designs (iv) Information of exposure may be less occurate
than that availablo alternative designs

12 GENESIS
Biostatistics

COHORT STUDY (PROSPECTIVE STUDY)


A cohort study is on observational or descriptive analytic study in which exposed and non exposed
populations are identified and followed prospectively over time to determine the rate of specific
clinical disease or event.

Characteristics:
 Cohort study is called prospective study, i.e. study proceeds forwards from cause to effect.

IS
 It is done to test a precisely formulated hypothesis.
 It involves a group of peoples (Cohorts) who share a common characteristic or experience
within a defined time period e.g. age occupation, exposure to a drug or vaccine,
pregnancy etc. They are identified prior to the appearance of the disease under
investigation
 The study groups (Cohorts) are observed for a long period of time to determine the
frequency of disease among them.
 The study results are delayed.

S
 It is not appropriate when the disease under study is more
 It is expensive
 The comparison group may be general peoples from which the cohort is drawn or another
cohort but not exposed to the putative etiologic factor

Types of cohort studies:


E
 Prospective cohort studies
 Retrospective cohort studies
 A combination of retrospective ad prospective cohort studies.

Steps (components) of Cohort studies:


N
(i) Selection of study subject
(ii) Obtaining data on exposure
(iii)Selection of comparison groups
(iv) Follow up
(v) Analysis
E

Advantages and disadvantages of Cohort studies


Advantages Disadvantages
1. Direct calculation of risk ratio (relative risk) 1. Time consuming
2. May yield information on the incidence of disease 2. Often requires a large sample size
3. Clear temporal relationship between exposure and 3. Expensive
G

disease 4. Not efficient for the study of rare


4. Particularly efficient for study of rare exposures diseases
5. Can yield information on multiple exposures 5. Losses to follow up may diminish
6. Can yield information on multiple outcome of a validity
particular exposure 6. Changes over time in diagnostic
7. Minimizes bios methods may lead to biased results
8. Strongest observational design for establishing cause
and –effect relationship

GENESIS 13
Biostatistics
Differences between case control and cohort studies (between retrospective and
prospective study)

Case control study Cohort study


(Retrospective study) (Prospective study)
1. Proceeds from effect to cause 1. Proceeds from cause to effect
2. Starts with the disease 2. Starts with people exposed to risk factor or

IS
3. Tests whether the suspected cause occurs more suspected cause
frequently in those with disease than among 3. Test whether disease occurs more frequently in
those without the disease. those exposed, than in those not similarly
4. Usually the first approach to the testing of a exposed
hypothesis 4. Reserve for testing of precisely formulated
5. Involves fewer number of subject hypothesis

S
6. Yields relatively quick results 5. Involves larger number of subjects
7. Suitable for the study of rare disease 6. Long follow up period often needed, involving
8. Only estimates relative risk (RR) delayed results
9. Can’t yield information about more diseases 7. Inappropriate when the disease is rare
other than that selected for study 8. Yieds incidence rates, RR as well as attributable
10. Relatively inexpensive
E risk
9. Can yield information about more than one
disease outcome
10. Expensive
N
E
G

14 GENESIS
Biostatistics

CLINICAL TRIAL
Whereas cohort and case-control studies aim to establish what has caused a disease, randomized
control trials (also called RCTs) are conducted in order to examine the effectiveness of a particular
intervention. These are also referred to as comparative or experimental studies or clinical trials.
Groups of subjects are recruited by being randomly selected to receive a particular intervention or
treatment. RCTs are usually conducted in order to compare the effectiveness of a specific treatment
against one or more others. They may also be used for preventional (or prophylactic) interventions.

IS
Example:
Imagine that you wish to compare the effectiveness of a new anti-cancer drug with a current treatment.
A group of patients would be randomly assigned to receive the new drug (group A), and the remainder
would be given an existing drug (group B). Detailed records would be maintained on factors such as
length of survival, side-effects experienced and quality of life. At the end of the trial, the results for
group A would be compared with those for group B, and conclusions would be drawn as to which

S
drug was the most effective one.

Types Of Clinical Trial (Ct)


CT without control group
CT with control group
o CT with concurrent control
E
 CT with randomization e.g. RCT
 CT without randomization
o Self control CT (before & after study)
o CT with non concurrent control (historical control)
N
Randomized Controlled Clinical Trial (Rct):
 Involve human volunteer.
 Prospective, longitudinal, analytical.
 Classic features are
o Interventional
o randomized
E

o controlled
o blinding (not always)
G

GENESIS 15
Biostatistics
Basic Design Of Rct:

Reference Population

Study population

Exclusion criteria informed consent Excluded Refused

IS
Study subjects

Random Allocation
Treatment group Control group

S
Losses to follow –UP
Outcome Losses to follow UP

Advantages of RCT:
E
o Reliable & valid.
o Ensures temporality & causality.
o Good control of confounders.

Disadvantages of RCT:
N
o Ethical constraints.
o Sometimes expensive.
o Often need more time & large sample.
o Non-compliance.
o Attrition (loss to follow up).
E

Blinding (masking)
Type Blinded
Participants Assessor Researcher
Single blind Yes No No
Double blind Yes Yes No
G

Triple blind Yes Yes Yes

16 GENESIS
Biostatistics

QUASI EXPERIMENTAL STUDY


 It share features of CT (exp. Study)
 It differ from RCT in key point of randomization
o CT without control group
o CT with control group but without randomization

IS
S
E
N
E
G

GENESIS 17
Biostatistics

MEASURING ASSOCIATION IN EPIDEMIOLOGY

Risk ratio or relative risk


 Risk is the probability that an event will happen. It is calculated by dividing the number of events
by the number of people at risk. One boy is born for every two births, so the probability (risk) of
giving birth to a boy is 1⁄2 = 0.5 If one in every 100 patients suffers a side-effect from a

IS
treatment, the risk is 1⁄100 = 0.01.
 Risk ratios are calculated by dividing the risk in the treated or exposed group by the risk in the
control or unexposed group. A risk ratio of one indicates no difference in risk between the
groups. If the risk ratio of an event is >1, the rate of that event is increased compared to controls.
If <1, the rate of that event is reduced. Risk ratios are frequently given with their 95% CIs – if the
CI for a risk ratio does not include one (no difference in risk), it is statistically significant.
 Relative risk is used in “cohort studies”, prospective studies that follow a group (cohort) over a

S
period of time and investigate the effect of a treatment or risk factor.

Examples
 A cohort of 1000 regular football players and 1000 non-footballers were followed to see if
playing football was significant in the injuries that they recived.
 After 1 year of follow-up there had been 12 broken legs in the football players and only four in
the non –footballer.
E
 The risk of a footballer breaking a leg was therefore 12/1000 or 0.012. The risk of a non-
footballer breaking a leg was 4/1000 or 0.004.
 The risk ratio of breaking a leg was therefore 0.012/0.004 which equals three. The 95% CI was
calculated to be 0.97 to 9.41. As the CI includes the value 1 we cannot exclude the possibility
N
that there was no difference in the risk of footballers and non-footballers breaking a leg.
However, given these results further investigation would clearly be warranted.

Odds Ratio
 Odds are calculated by dividing the number of times an event happens by the number of times
it does not happen. One boy is born for every two births, so the odds of giving birth to a boy
E

are 1:1 (or 50:50) = 1⁄1 = 1 If one in every 100 patients suffers a side-effect from a treatment,
the odds are 1:99 = 1⁄99 = 0.0101
 Odds ratios are calculated by dividing the odds of having been exposed to a risk factor by the
odds in the control group. An odds ratio of 1 indicates no difference in risk between the
groups, i.e. the odds in each group are the same. If the odds ratio of an event is >1, the rate of
G

that event is increased in patients who have been exposed to the risk factor. If <1, the rate of
that event is reduced. Odds ratios are frequently given with their 95% CI – if the CI for an odds
ratio does not include 1 (no difference in odds), it is statistically significant.
 Used by epidemiologists in studies looking for factors which do harm, it is a way of comparing
patients who already have a certain condition (cases) with patients who do not (controls) – a
“case–control study”.

18 GENESIS
Biostatistics
Risk reduction and numbers needed to treat
ARR is the difference between the event rate in the intervention group and that in the control group. It
is also the reciprocal of the NNT and is usually given as a percentage, i.e. ARR = 100/NNT NNT is
the number of patients who need to be treated for one to get benefit. RRR is the proportion by which
the intervention reduces the event rate.

Examples
One hundred women with vaginal candida were given an oral antifungal. 100 were given placebo.

IS
They were reviewed 3days later. The results are given in.

Results of placebo-controlled trial of oral antifungal agent


Given antifungal Given placebo
Improved No improvement Improved No improvement
80 20 60 40

S
ARR – Improvement rate in the intervention group-improvement rate in the control group= 80% -60%
=20%
NNT =100 = 100 = 5
ARR 20
So five women have to be treated for one to get benefit.
The incidence of candidiasis was reduced from 40% with placebo to 20% with treatment. i.e. by half.

Thus, the RRR is 50%.


E
In another trial young men were treated with an expensive lipid-lowering agent. Five years later the
death rate from ischaemic heart disease (IHD) is recorded. See Table 5 for the results.
N
Given Cleverstatin Given placebo
Survived Died Survived Died
998 (99.8%) 2 (0.2%) 996 (99.6%) 4 (0.4%)

ARR= improvement rate in the intervention group-improvement rate in the control group =99.8%-
E

99.6%=0.2%
NNT = 100 100 = 500
ARR 0.2
So 500 men have to be treated for 5 years for one to survive who would otherwise have died.
G

The incidence of death from IHD is reduced from 0.4% with placebo to 0.2% with treatment –
i.e. by half.

Thus, The RRR is 50%.

The RRR and NNT from the same study can have opposing effects on prescribing habits. The
RRR of 50% in this example sounds fantastic. However, for every life saved, 499 patients had
unnecessary treatment for 5 years.

GENESIS 19
Biostatistics

SAMPLING TECHNIQUE

Common Terms Used


 Population: All members of a group for which information is desired or sought
o Examples: Humans, animals, plants, time, events, places etc
 Target (reference) population:
The population on which study results are extrapolated

IS
 Sampling population:
The convenient sub-group of population from which sample is actually taken
 Sampling unit (SU): every member of a listed population is known as sampling unit
 Sampling frame (SF): A list of all sampling units is called sampling frame
 Parameter: A quantity or numerical characteristics (such as the mean, proportion or
variance) of population

S
 Statistics: Numerical characteristics of a sample (mean, median, proportion and standard
deviation of a sample). We estimate parameter from statistics
E
N
E
G

20 GENESIS
Biostatistics

TYPES OF SAMPLING

A. Probability or random or representative sampling:


 Simple random sampling (SRS)
 Systematic random sampling
 Stratified random sampling
 Cluster sampling

IS
 Multi stage sampling
 Multi phase sampling

B. Non probability or non random or non-representative sampling:


 Convenient sampling
 Purposive (judgment) sampling
 Quota sampling

S
E
N
E
G

GENESIS 21
Biostatistics

PROBABILITY SAMPLE
 Each member of a population has an equal chance of being selected in the sample
 Ensure random selection of sampling unit
 Representative of the population and information can be penalized to the whole population
 Needs prior planning, cost and sampling frame

IS
Simple random sampling:
 Easy way of probability sampling
 Sampling frame is must
 Sampling unit is selected randomly from sampling frame by lottery or random number table
 Good for small, homogenous , easy accessible population
 Highly representative of total population and results has high generisability or external validity

Systematic Random Sampling (Sy. Rs):

S
 Much easier, simpler and less expensive
 Used when other probability sampling techniques are not applicable
 Good for large & scattered population
 SF is a prerequisite but not must
 Sampling interval (I) is calculated by dividing population size(N) with sample size(n). So


I=n/N
E
1st SU selected is either by lottery or by random number table and then Ith person of the
sampling frame is selected

Example:
N
A study on Patient satisfaction in an outpatient department of a tertiary care hospital where a prior list
of patient may not be available.

Stratified random sampling:


 Total population divided into some homogeneous strata (sub groups)
 SF of each strata constructed
E

 Sub sample taken from each strata by SRS or Sy. RS.


 Good for heterogeneous population

Cluster sampling
G

 Total population is divided into small clusters (groups)


 Clusters are regarded as SU & their SF prepared
 Some clusters are selected by SRS or Sy. RS.
 All elementary units of selected clusters are included in sample
 Good when SF of elementary units is not available & population is large, dispersed

22 GENESIS
Biostatistics
Example:
Vaccination coverage of 15-45 year old pregnant women for TT in Dhaka with a population of
5000000. It may not possible to find a list of all pregnant women of Dhaka City. We can collect the
map of the city and divide all its wards into smaller blocks, suppose 1000 blocks with a population of
5000. If our sample size is 500, we may select 500 blocks randomly and then 10 pregnant women as a
cluster from each block.

Multi stage or area sampling

IS
 Sampling done at stages
 Total population is first divided into a set of 1st stage SU & a sample of these SU selected
by SRS or Sy. RS.
 Each selected 1st stage SU is further divided into a set of 2nd stage SU & a sample of these
SU selected by SRS or Sy. RS.
 The procedure continued till the desired stage is reached
 Good when SF of elementary units is not available & population is large, dispersed

S
Multi phase sampling
 Part of information is collected from large sample in 1 st phase.
 Additional information is collected from the sub sample of the whole sample in subsequent
phases.
 Similar types of SU are sampled at each phase but gradually of smaller size.
E
N
E
G

GENESIS 23
Biostatistics

NON PROBABILITY SAMPLING


Convenient sampling
 Sample selected considering:
o easy availability of SU
o easy accessibility to SU
o proximity of SU to researcher

IS
Purposive (judgment) sampling
Researcher’s subjective judgment is exclusively exercised here to select the SU whom researcher
judge to be typically representative one for the proposed study.

Quota sampling
 Total population divided into some homogeneous quota.
 From each quota samples are selected at the researcher’s discretion by convenient &

S
purposive sampling.
Sample size:
o Should be optimum (not too large/ not too small) that is expected to give valid
result
o Calculated by different statistical formula appropriate for different studies
E
N
E
G

24 GENESIS
Biostatistics

MEASURES OF LOCATION

 Measures of central tendency


o Mean
o Median
o Mode
 Percentiles

IS
 Deciles
 Quartiles

Mean
 The sum of all values of the observations divided by the total number of observations
 The sum of all scores divided by the total frequency
 The most stable measure of central tendency

S
 Can be affected by extreme values
 Its value may not be an actual value in the data set
 If a constant c is added/subtracted to all values, the new mean will increase/decrease by the
same amount c

Example:
E
The ages (in years) of 7 children seen in an emergency room after a house lire are- 1,1,1,2,4,6 and
6. Since the sum of the ages (Exi) is 21 years and the number of children (n) is 7, the arithmetic
mean (7) of the age is 21 years divided by 7 or 3years.

Median
N
 Positional middle of an array of data
 Divides ranked values into halves with 50% larger than and 50% smaller than the median
value.
 The median is a positional measure
 Can be determined only if arranged in order
E

 Its value may not be an actual value in the data set


 It is affected by the position of items in the series but not by the value of each item
 Affected less by extreme values
Mode
G

 Value that occurs most frequently in the data set


 Locates the point where scores occur with the greatest density
 Less popular compared to mean and median measures
 It may not exist, or if it does, it may not be unique
 Not affected by extreme values
 Applicable for both qualitative and quantitative data

GENESIS 25
Biostatistics

Percentile
 Values of a data set arranged in ascending order and divided into 100 equal parts by 99
imaginary lines called percentiles (P1- P99)
 Kth percentile = K(n + 1))/100
 5th percentile means, 5% data are below & 95% data are above that value

Decile

IS
 Values of a data set arranged in ascending order and divided into 10 equal parts by 9 imaginary
lines called decile (D1- D9)
 Kth decile = K(n + 1))/10
 5th decile means, 50% data are below & 50% data are above that value

Quartile
o Values of a data set arranged in ascending order and divided into 4 equal parts by 3

S
imaginary lines called quartiles (Q1- Q3)
o Kth quartile =
o 1 st quartile means, 25% data are below & 75% data are above that value
o IQR = Q3 – Q1. It contain central 50% values
o Median = P50 = D5 = Q2
E
N
E
G

26 GENESIS
Biostatistics

MEASURES OF DISPERSION

 Range
 Mean deviation (MD)
 Standard deviation (SD)
 Variance (S)
 Coefficient of variation (CV)

IS
Range

Definition: The range is the difference between the highest and lowest values in a series,

Calculation: The range is calculated by subtracting the lowest value in the series from the highest

S
value.

Example: Five individuals arrested for driving automobiles under the influence of alcohol are aged
17,18,18,21 and 26 years. The range of ages is 26 years minus 17 years, or 7 years.

Applications and characteristics


E
i. The range is used to measure data spread
ii. The range provides no information concerning the scatter within the series.

Variance
N
Definition: The variance is the sum of squares of the difference between the observations and their
mean divided by the total number of observations.

This is given by the formula-


S2 = E (X-7)2
E

n-1

Example: A sample consisting of nine numbers (observations) is 7,3,4,6,1,6,7,6,5

Standard deviation
G

Definition: The standard deviation is the positive square root of the variance.

Example: Using above result (S2=4)


S.D = 4 + 2 (units)

GENESIS 27
Biostatistics
Calculation:
The modified formula is as follows-
S.D = E(x-x) 2
n-1
The steps involved in calculating the standard deviation are:
(i) First of all, the deviation of each value is taken from the arithmetic mean, (x-x)
(ii) Then, each deviation is squared, (x-x)2
(iii)The squared deviations are added, E (x-x)2

IS
(iv) The result is divided by the number of observations N or (n-1) in case the sample size is less
than 30)
(v) Then the square root is taken, which gives the standard deviation.

Example: The diastolic blood pressure of 10 individuals is: 83,75,81,79,71,95,75,77,84,90. From this
the standard deviation can be calculated as follows.

X X-X (X-X)2

S
83 2 4
75 -6 36
81 0 -
79 -2 4
71 -10 100
95
75
E 14
6
196
36
77 4 9
84 3 81
90 9
N
X= 81 N= 10 Total= 482

S.D = E (x-7)2 = 482


n-1 n-1

= 53.55 = 7.31
E

Applications and characteristics of S.D


The standard deviation is the most useful measure of dispersion. In certain circumstances, quantitative
probability statements that characterize a series, a sample of observations, or a total population can be
derived from the standard deviation of the series sample, or population.
G

28 GENESIS
Biostatistics

Coefficient of variation
Definition: the coefficient of variation is the ratio of the standard deviation of a series to the
arithmetic mean of the series. The coefficient of variation is unit less and is expressed as a percentage.

Calculation: The coefficient of variation is calculated as:


CV (%) = SD X 100

IS
X
Where CV= the coefficient of variation

Example:
In a typical medical school, the mean weight of 100 fourth-year medical students is 140 lb, with a
standard deviation of 28 lb. The coefficient of variation for weight is 140 Ib divided by 28 Ib or 20%

S
Probability distribution (PD)
 Population distribution of an observation
 Concern with parameter

Types of probability distribution (PD)


E
1. Normal (Gaussian) distribution
2. Asymmetric or skewed
o Right skewed (positive skew)
o Left skewed (negative skew)
N
3. Log –normal distribution:
This is a skewed distribution when plotted using an arithmetic scale, but is a normal distribution
using a logarithmic scale.

4. Binomial Distribution :
This describes the probability distribution o0f possible outcomes from a series of data when there
E

are:
a) Only two mutually exclusive outcomes, e.g. success or failure, boy or girt.
b) A known number of independent trials of an event, and the probability of an event or
outcome is the same for all trials, e.g the pro9bability of the male births in a family of six
children, where the child’s sex is the outcome of the trial.
G

5. Poisson Distribution :
This describes the probability of occurrence of rare events in a large population. It represents a
limiting case of the binomial distribution, e.g. the probability of occurrence of a specific congenital
birth defect in a large number of births .

GENESIS 29
Biostatistics
Characteristics of normal distribution of data
 Bell shaped
 Bilaterally symmetrical frequency curve
 Mean, median & mode coincide and represent the highest point in frequency distribution.
 About 50% values above & 50% values below the mean.
 Maximum values lie in the middle around the mean.
 Mean ± 1SD covers 68% observations.
 Mean ± 2SD covers 95% observations.

IS
 Mean ± 3SD covers 99% observations.

Symmetric distribution: A distribution having the same shape on either side of the center

Skewed distribution:
 One whose shapes on either side of the center differ; a nonsymmetrical distribution.
 Can be positively or negatively skewed, or bimodal

S
E
Zero skewness. Here mean=median=mode
N
Positively skewed. Here mean and median are to the right of mode
E

Mean>median>Mode
G

Negatively skewed: mean and median are to the left of the mode. Mean<median<mode
Biostatistics

DATA PRESENTATION
Methods of data presentation:
 Textual methods.
 Tabular method e.g. Frequency distribution table, Contingency table, Cross table.
 Graphical method
o Graphs –
o Use coordinate system of both x-axis & y-axis.

IS
o Usually for quantitative data
o e.g. histogram, frequency polygon, Frequency curve, line chart, Scattered diagram,
etc.
o Charts /diagram –
o Use only one coordinate (x-axis or y-axis)
o Usually for qualitative data
o e.g. bar diagram (simple, multiple component), pie diagram, pictogram, map
diagram.

S
Parts of a table
 Table number (Arabic numerals).
 Title & subtitle (if any).
 Head note (if necessary) to clarify any term used.
 Column heading & row heading.


Main body.
E
Foot notes (if necessary). – Any special remark.
 Source ((if necessary).

General principles of graphs / diagram


N
 Simple & self explanatory & pleasant.
 Comprehensive title at top or bottom.
 Method of classification (independent variable) on x-axis.
 Frequency (dependent variable) on y-axis.
 Scale divisions clearly indicated.
 Source (if 20 data).
E

 Blood pressure data on a sample of 113 men

Commonly used graphs:


 Histogram(block frequency diagram)
 Frequency polygon
G

 Frequency curve
 Cumulative frequency curve (Ogive)
 Line graph
 Scatter /dot/correlation diagram
 Dot plot
 Stem and leaf plot
 Box-plot (Box and whisker plot)

GENESIS 31
Biostatistics
Commonly used chart:
 Bar cahrt or bar diagram
 Pie chart or pie diagram or circle diagram or sector diagram
 Pictogram
 Map/spot diagram

IS
S
E
N
E
G

32 GENESIS
Biostatistics

PRESENTATION OF QUANTATIVE DATA

Histogram:
 Graphical presentation of frequency distribution
 Variable characters of different groups are indicated in the horizontal line (x-axis) is called
abscissa
 No. of observations marked on the vertical line (y-axis) is called ordinate

IS
 Frequency of each group forms a triangle

Frequency Polygon:

S
An area diagram of frequency distribution developed over a histogram
Mid points of the class intervals at the height of frequency are joined by straight lines

E
It gives a polygon, figure with many angles
N
Frequency Curve:
 If no. of observation are very large & group interval reduced
E

 Frequency polygon tends to loose its angulations


 Gives rise to a smooth curve → frequency curve
G
Biostatistics
Line Chart or Graph:
 A frequency polygon presenting variation by lin
 Shows trend of event occurring over a period of time
 Shows rise, fall or periodic fluctuations vertical axis may not start from zero, but some
point above frequency

IS
Cumulative Frequency Diagram or “Ogive”
 Graph of the cumulative frequency distribution

S
 An ordinary frequency distribution table→ relative frequency table
 Cumulative frequency: total no. of persons in each particular range from lowest value of the
characteristic up to & including any higher group value
E
N
Scatter or Dot Diagram:
 Prepared after tabulation in which frequencies of at least two variables have been cross
classified
 Shows nature of correlation between two variable character in same person(s)( e.g., height &
weight)
E

 Also called correlation diagram


G

34 GENESIS
Biostatistics

IS
S
E
N
E
G
Biostatistics

PRESENTATION OF QUALITATIVE DATA

Bar Diagram:
 Graphically present frequencies of different categories of qualitative data
 Vertical/ horizontal
 May be descending/ascending order
 Widths should be equal

IS
 Spacing between bars should also be equal

Simple Bar Diagram:


 Each bar represents frequency of a single category with a distinct gap from one another

Multiple bar diagram:-

S

E
Used to show comparison of two or more sets of related statistical data
N
Component/ proportional bar diagram:
 Used to compare sizes of different component parts among themselves
E

 Also shows relation between each part & the whole



G
Biostatistics

IS
Pie / sector Diagram:
 A circle whose area is divided into different segments by different straight lines from cenre

S
to circumference
 Each segment express proportional components of the attributes
 Angle (◦) of a sector is calculated by
 Class frequency X 3.6 or
 (Class frequency/total frequency)X 360

E
N
Pictogram / Picture Diagram:
 A popular method to denote the frequency of the occurrence of events to common man such as
E

attacks, deaths, number operated, admitted, discharged, accidents, etc. in a population


G

Map diagram/ spot Map:


 These diagrams are prepared to visualize the geographic distribution of frequency of
characteristics
 One point denotes occurrence of one more events

GENESIS 37
Biostatistics

IS
Inferential statistics
 Conclusion about the population parameter
based on sample statistics
 Generalization about target population
based on sample result

S
Methods of inferential statistical
o Estimation (Estimates from sample)
o Hypothesis testing
Estimates from sample
o Estimation (Method of statistical inference): Use of statistic to estimate parameter
E
o Point estimate: sample statistic
o Interval estimate (CI): Interval around point estimate which is expected to contain
parameter with a certain confidence level
 95% CI = m/p ± 1.96 SE.
 99% CI = m/p ± 2.58 SE.
N
E
G

38 GENESIS
Biostatistics

HYPOTHESIS
Def: Tentative & testable explanation of a research question arising out of an observation

Null hypothesis (H0): hypothesis of no difference There is no difference between two statistics or
between statistics & parameter

IS
Alternate hypothesis (HA): hypothesis of difference There is difference between two statistics
or between statistics & parameter

Hypothesis test: An approach to statistical inference resulting in a decision to reject or not to


reject H0 .

S
E
N
E
G

GENESIS 39
Biostatistics

PROBABILITY & ‘P’ VALUE

Probability: Chances of occurrence of an event out of by chance or sampling error.

P-value: Quantitative estimate of probability.

Statistical significance:

IS
Significant – Unlikely to occur out of by chance
Not significant – Likely to occur out of by chance

Level of significance:
Point of demarcation between chances of by chance and not by chance for an observation to occur.

Hypothesis testing

S
Method of Statistical Inference
 State the Ho.
 Decide on appropriate statistical test and calculate test statistic and p-value.
 Select the level of significance (α-value) e.g. 0.05, 0.01, 0.001.
 Interpretation.
E
 If p < α → Ho rejected (significant).
 HA Accepted If p > α → Ho retained (not significant).

Errors of hypothesis testing


 Type I (α) :
N
o incorrect rejection of HO
o false positive
o missing of no significant difference

 Type II (β):
o incorrect acceptance of HO
E

o false negative
o missing of significant difference

To reduce errors:
o sample size should be adequate
G

o confounder should be controlled


o bias should be checked

40 GENESIS
Biostatistics
Analogy of errors of hypothesis test :

Assassination

Guilty Accused Not guilty

H0 imposed

IS
Judge

Basis

Witness evidence Sample information

Law

S
Punished Judge’s decision Not punished

H0 rejected Acceptance of H0

Correct Incorrect Incorrect Correct

Ok ±-error
E β-error ok

Power of test (i -β)


N
 Ability to detect true difference.
 Ability of correct rejection of H0
 Ability to detect significance when result really significant.
 β ≡ FN (failure to detect real difference)
 I-β ≡ Ability to detect real difference.
 At 5% level of significance (0.05), Power is 80% (0.8) when β = 0.2
E

Statistical tests of significance


 One tail test: Effect of drug A >B or A<B
 Two tail test: Effect of drug A is different from that of B
G

GENESIS 41
Biostatistics

PARAMETRIC TEST & NON-PARAMETRIC TEST

 Parametric test: Quantitative data with normal distribution


 T – test
 F – test (ANOVA)
 Pearson’s correlation coefficient test

IS
 Non-parametric test: Qualitative data or quantitative data with skewed distribution,
 ҳ2-test,
 Fisher’s exact test,
 Spearman’s rank correlation test,
 proportion test
 Logistic regression etc

For qualitative data:

S
o X2-test, Fisher’s exact test
o Proportion test.
o Spearman’s rank correlation test

For quantitative data:


Student’s t-test.
o
o ANOVA (F-test)
E
o MWU test Alternative of t & Z test if asymmetric distribution.
o WRS test
o Pearson’s correlation coefficient test
N
E
G

42 GENESIS
Biostatistics

HYPOTHESIS TESTING

 Univariate analysis :
o It involves one variable.
o e.g. t-test, x2-test, ANOVA etc.

 Bivariate Analysis:

IS
o It involves two variables together.
o e.g. simple regression test, correlation analysis etc.

 Multivariate analysis:
o It involves many independent and dependent variables.
o e.g. multiple regression, logistic regression etc

Assessment of association between variables

S
o Descriptive tool
 Comparative bar diagram – if qualitative data.
 Scatter diagram – if quantitative data.

o Inferential tool
 X2 – test, Correlation, Regression
E
N
E
G

GENESIS 43
Biostatistics

STUDENTS‘T’ TEST

Paired & Unpaired


Preconditions:
o Random sampling
o Quantitative data.
o Normal distribution

IS
o Comparison between two means.

ANOVA
Preconditions:
o Quantitative data
o Unpaired design
o Comparison among the means of > 2 groups.

S
CHI-SQUARE TEST

Preconditions:
o Qualitative data of bivariate cross table.
E
o Association between two variables.
o Comparison between two variables.

Proportion Test (z-test)


N
Preconditions
o Qualitative data.
o Compare percentage/ proportion between two groups.
o Unpaired design
E

Pearson’s Correlation Coefficient test


Preconditions:
o Assess nature (positive or negative) of association between two variable
G

o Assess strength of association between two variables.


o Both variables quantitative.
o One or both variable normally distributed.
o It is interpreted by correlation coefficient (‘r’ value)
o Negative ‘r’ value indicate negative association and positive ‘r’ value indicate positive
association

44 GENESIS
Biostatistics

Spearman’s rank correlation test

Preconditions:
o Alternative to Pearson’s correlation test
o Two ordinal data.
o One ordinal & one numerical data
o Both quantitative data but in skewed distribution

IS
Correlation
Describes the strength of the linear relationship between variables and is denoted by the correlation
coefficient (r) or Pearson’s product moment correlation coefficient (a parametric test)
 Its value can range from – 1 to +1.
 A correlation coefficient may be strong but statistically non-significant because of sample size.
The statistical significance of the correlation coefficient is based on the associated p value.

S
 Assumes that one or both variables are normally distributed (i.e. parametric correlation)

Correlation coefficient (r) Degree of association


0.8 to 1.0 Strong
0.5 to 0.8 Moderate
E 0.2 to 0.5
0 to 0.2
Weak
Negligible

 Scatter grams show the relationship between X and Y, e.g. height and weight (Fig. 7.9)
 Spearman’s and Kendall’s rank correlation coefficients are the non-parametric alternatives to
N
Pearson’s correlation coefficient.
Y Y Y

High + ve correlation with r close to + 1 Zero correlation High – ve correlation with r close
E

to -1

Regression
 Describes the relationship between two variables, and how one value varies depending on the value
G

of another, e.g. The incidence of myocardial incidence of myocardial infarction and number of
cigarettes smoked per day.
 Mainly used when there is one measured dependent variable and one or more independent
variables.
 A regression line is a line that minimizes the sum of the squares of the vertical distances to the line
of each data pint, i.e. ‘least squares regression’ (Fig. 1.10)
 Used when the main purpose is to develop a predictive model. i.e to predict Y for a given value of
X, using the equation: Y = a +bx.
GENESIS 45
Biostatistics
 As with the correlation coefficient (r) , a slope of O represents on relationship between the
variables. But a regression coefficient can vary between - and + , and is expressed in the same
units as outcome variable.
 Two regression techniques: multiple linear regression and logistic regression.
 Multiple linear regression predicts a single dependent or response variable using a number of
independent variables, e.g. blood pressure predicted by weight, age, smoking and family history
can only be used for normally
 Logistic regression is used to predict the probability of a binary outcome occurring, e.g. breast

IS
cancer/no cancer using several predictor of explanatory variables, e.g. age, family history. It is
often used to assess odds ratios in case- control studies and allows for the correction of multiple
potential confounding factors.

S
E
N
E
G

46 GENESIS
Biostatistics

PERFORMANCE OF A DIAGNOSTIC TEST

The following measurements of a test help us take the decision which test to be
used in a particular context:
 Sensitivity
 Specificity
 Positive predictive value

IS
 Negative predictive value

S
E
N
E
G

GENESIS 47
Biostatistics
Gold standard

Disease present Disease absent Total

a b
Positive a+b
TP FP
Test result
c D
Negative c+d

IS
FN TN

Total a+c b+d a+b+c+d

Sensitivity:
The ability of the test to identify correctly those who have the disease
T+
Sensitivity = =

S
D+

Specificity:
The ability of the test to identify correctly those who do not have the disease
T
Specificity = =
D

Positive predictive value:


E
The proportion of patients who test positive who actually have the disease (a/a=b)

D
PPV = =
N
T

Negative predictive Value:


The proportion of patients who test negative who are actually free of the disease (d/c=d)
D
PPV = =
T
E

Must to know:
 High sensitive test(few false negatives) is in case of serious disease(don not want to miss
case), diseases having potential for person to person transmission, and subsequent diagnostic
test is low risk and low cost
G

 High specific test(few false positive) is used when subsequent diagnostic test is high risk and
high cost psychological burden on individual is high
 Sensitivity and specificity are fixed characteristics of the test but PPV and NPV are not
 NPV and PPV are determined by both the test characteristics (sensitivity and specificity) and
prevalence of the disease

48 GENESIS
Biostatistics
Screening test:

Applied on apparently healthy population to identify disease before producing symptom

Diagnostic test: Applied to person with symptoms to confirm the presence of disease

Principal of screening (WHO):


 The condition should be an Important health problem

IS
 There should be a treatment for the health problem
 Facilities for diagnosis and treatment should be available
 There should be a latent stage of the disease
 There should be a test or examination of the disease
 The test should be acceptable to the population
 The natural history of the disease should be adequately understood
 There should be an agreed policy on who to treat

S
 The total cost of flinging a case should be economically balanced in relation to medical
expenditure as whole
 Case finding should be a continuous process not just a “once and for all” project

Characteristics of ideal screening test:




Simple
Rapid
E
 Inexpensive
 Safe
 Acceptable
N
 High reliability
 High validity

Reliability:
The capacity of a test to give the same result on repeated applications of the tes5t in the same person
with a given level of disease whether correct or incorrect (Does it always tell the same thing)
E

Validity:
The capacity of the test to identify the condition correctly (can it tell the truth, is it saying he truth).
The sensitivity and specificity are measures of validity of a test.
G

GENESIS 49
Biostatistics

Previous Years Question


1. Age & Sex of a population is better described SBA 5. Standard deviation affected by
by- (Paediatrics, Jan’19) (Paediatrics, July’16)
a) Bar chart a) Mean
b) Population pyramid b) Median
c) Histogram c) Sample size
d) Sector diagram d) Mode

IS
e) Pictogram e) Class interval
Ans: D Ans: A
[Ref: ABC of Research Methodology & [Ref: ABC of Research Methodology &
Biostatistics/3rd/P-65] Biostatistics/3rd/P-172]

2. Quantitative variables include (Paediatrics, 6. Central tendency includes (Paediatrics,


Jan’19) Jan’19)
a) Weight a) Percentile
b) Age b) Range

S
c) Sex c) Standard deviation
d) Religion d) Median
e) Occupation e) Mode
TTFFF FFFTT
[Ref: ABC of Research Methodology & [Ref: ABC of Research Methodology &
Biostatistics/3rd/P-08] Biostatistics/3rd/P-164]
E
SBA 3. 3,2,2,5,7 sample- (Paediatrics, Jan’19) SBA 7. It you are told to perform a study in
a) Mean is 3 chikungunia affected children of Dhaka city,
b) Mode is 2 which study design can be done (Paediatrics,
c) Median is 3 Jan’19)
d) Range is 1-10 a) Case control
N
e) SD is 2 b) Cohort
Ans: B c) Cross sectional
[Ref: ABC of Research Methodology & d) RCT
Biostatistics/3rd/P-164] e) Experimental study (Paediatrics, Jan’19)
Ans: B
4. Qualitative data presented by (Paediatrics, [Ref: ABC of Research Methodology &
July’16)
E

Biostatistics/3rd/P-83]
a) Bar diagram
b) Pictogram SBA 8. Which includes Helsinki declaration
c) Histogram (Paediatrics, July’18)
d) Scatter diagram a) Human trial
e) Frequency polygon b) Animal trial
G

TTFFF c) War criminals


[Ref: ABC of Research Methodology & d) Both Human and Animal trial
Biostatistics/3rd/P-62] e) Only observational study
Ans; A
[Ref: ABC of Research Methodology &
Biostatistics/3rd/P-123]

50 GENESIS
Biostatistics
SBA 9. Placebo is known to the patient in which SBA 14. 2 Groups of people ē their blood
clinical trial (Paediatrics, July’18) cholesterol level sample size is 2000 what test
a) Open trial you do to compare whether they are
b) Single blind trial significantly different of not? . (Obs. & Gynae,
c) Double blind trial July’17)
d) Tripple blind trial a) T test
e) None b) F test
Ans: A c) Correlation analysis
[Ref: ABC of Research Methodology & d) Regression analysis

IS
Biostatistics/3rd/P-101] e) Z test
Ans: E
10. Which is not the principles of clinical ethics [Ref: ABC of Research Methodology &
(Paediatrics, July’18) Biostatistics/3rd/P-269]
a) Respect for persons and their autonomy
b) Beneficence 15. Chi-square test (Obs. & Gynae, July’19)
c) Maleficence a)Non parametric
d) Justice b) Population variation
e) Accountability

S
c) Exposure, affect difference/ compare
TTFTF d) No mean, median, mode
[Ref: ABC of Research Methodology & e) Frequency more than 30
Biostatistics/3rd/P-122] TTTTF
[Ref: ABC of Research Methodology &
SBA 11. Effective conversation does not include Biostatistics/3rd/P-283]
(Paediatrics, July’18)
a) Clear speaking
b) Don’t interrupt
E SBA 16. 2 drugs is tested upon two group of
people, if sample size is less than 300. Then
c) Use jargon which test is done in this study?(Obs. & Gynae,
d) Ask open question Jan’18)
e) Reassurance a) Paired T test
Ans: C
N
b) Unpaired T test
c) Z test
12. Observational study (Surgery, Jan’19) d) Chi-square Test
a) Longitudinal prospective study e) F Test
b) Less ethical constraint Ans: A
c) Less bias [Ref: ABC of Research Methodology &
d) Case controll is best Biostatistics/3rd/P-265]
E

e) Researcher intervenes and measures


TTTFF 17. Sample size <30...... (Obs & Gynae, Jan’18)
[Ref: ABC of Research Methodology & a) Paired t test
Biostatistics/3rd/P-71] b) Z test
c) Unpaired t test
G

13. Cross sectional study (Surgery, July’17) d) ANOVA test


a) Prevalence study e) Chi- square test
b) Retrospective study TFTFT
c) Case control study [Ref: ABC of Research Methodology &
d) Observational study Biostatistics/3rd/P-285]
e) Prospective study
Ans: A
[Ref: ABC of Research Methodology &
Biostatistics/3rd/P-89]

GENESIS 51
Biostatistics
18. Cross sectional study (Obs. & Gynae, July - 2018
Jan’19) 23. True about mean?
a) Different group of people may involve a) Easy to measure
b) More accurate study b) Not affected by extreme value
c) Pictural by diagram c) Applicable in skewed data
d) May not provide prevalence information d) Separating the higher half from lower half of a
e) Data collection longitudinal data
TF F(?) FF e)Most stable measure of centred tendency
[Ref: ABC of Research Methodology & TFFFT

IS
Biostatistics/3rd/P-89]
July-17
SBA 19. 7,3,2,2,1- (Obs. & Gynae, July’17) 24. Lyons hypothesis
a) Mean = 3 a) Explain phenotype effect of the x chromosome in
b) Median = 3 same mammalian female.
c) Mode = 2 b) Male has only one x chromosome
d) Range = 4 - 7 c) One female x chromosome inactivated in
e) S D is 2 embryonic

S
Ans: C development
[Ref: ABC of Research Methodology & d) Lyon was british geneticist
Biostatistics/3rd/P-164] e) Hypothesis given on autosom
TTTTT
Jan 2019
20. Following are true about Normal January-16
E
Distribution curve (FCPS - Medicine - 19Ja)
a) Plotted using an arithmetic scale
25. Chi squame
a)
b) Bilaterally symmetrical b)
c) 50% values above the mean c)
d) +_ 2SD covers 90% populations d)
e) Mean median mode coincide e)
N
F T T F T[Ref:Lecture of Prof.Mozammel Hoque
Sir] 26. Meta analysis –
a)
21. Regarding study design - (FCPS - Medicine - b)
19 Ja) c)
a) SCIENTIFIC WAY to harvest valid and reliable d)
E

information e)
b) Case control study is observational study July-13
c) Cohort study is descriptive study 27. Method for presentation of quantitative
d) RCT is the strongest study design continuous numerical data:
e) Clinical trial is experimental study a) Histogram
T TFTT b) Line chart
G

c) Bar diagram
22. Case control study is (FCPS - Medicine – d) Frequency polygon
19Ja) e) Scatter diagram
a) Analytical study TTFTT[Ref:Lecture of Prof.Mozammel Hoque Sir]
b) Randomized CT
c) Interventional study
d) Efficient for rare disease
e) Less expensive
TFFTT

52 GENESIS
Biostatistics
28. There is a pre-requisite that the population 34. Non-parametric test include:
must have a normal distribution for: a) Chi-square test
a) Chi squared test b) Mann-whitney U test
b) Variance estimation c) Kruskal-Wallis (KW) test
c) Estimation of Error of the mean d) Anova test
d) Kruskal Walli's rank test e) Rank correlation test
e) t-test' TTTFT[Ref:Lecture of Prof.Mozammel Hoque Sir]
FTFFT [Ref:Lecture of Prof.Mozammel Hoque Sir]
29. Measures of central tendency are: 35. In student's 'X test

IS
a) Median a) Variable should be qualitative
b) Mean b) Number of variable > -30
c) Mode c) Variable distributed normally
d) Range d) Random sample used T
e) Standard deviation e) Comparison between to mean
TTTFF FFTTT[Ref:Lecture of Prof.Mozammel Hoque Sir]
January 2013
30. Pie chart: 36. Typical Experimental study includes:

S
a) Also called pie diagram a) Hospital trials
b) 360 °rotation b) Cohort study
c) Compares between two different types of value c) Clinical trial
d) Also called sector diagram d) Field trial
e) 1% means 3.60 Rotation e) Community trial
TTTTT TFTTT
January 2010
31. Standard deviation:
E
a) Most important measure of dispersion T
37. Histogram:
a)
b) Associated with all values b)
c) Low standard deviation values close to the mean c)
d) High standard deviation value spread out over a d)
N
wider range e)
e) Expressed in the different unit from data
TFTTF 38. Chi square test:
a)
32. Regarding standard error: b)
a) Type I: Wrong refusal of null hypothesis c)
b) Type II: acceptance of null hypothesis d)
E

c) Occurs when more data are indicated e)


d) January 2009
e) 39. Line diagram:
TTT a) Time dependant
b) Variable dependant
G

January 2011 c) Always start from zero


33. Quantitative data of continuous type d) Show periodic fluctuations over vertical axis
graphically represented by: e)Called correlation diagram
a) Pie chart TFFTF
b) Histogram
c) Frequency polygon
d) Line chart
e) Ogive chart
FTTTT

GENESIS 53
Biostatistics
July - 2008 January-07
40. Non-parametric test include : 44. Normal curve:
a) Mann Whitney U test a) Pyramid shape
b) Wilcoxon signed rank test b) Dome shape
c) Pearson correlation test c) Bell shaped
d) T test d) Mean, median & mode concide
e) ANOVA e) Bilaterally symmetrical
TTFFF FFTTT [Ref:Lecture of Prof.Mozammel Hoque Sir]

IS
41. What are the properties of median? 45. Measurement of quantitative value by:
a) Less affected by extreme values a) Pictogram
b) Applicable for ordinal data b) Spot Graph
c) Can be obtained from qualitative data c) Ogive
d) Not easy to understand d) Frequency curve
e) May not be an actual value e) Pie diagram
TTFFT FFTTF

S
42. What are the features of double blind clinical 46. Double blind clinical trial:
trial? a) Patient receives placebo
a) Both patient & doctor are unaware of the effect b) Patient don't Know about trial
b) Patient's & doctor & pharmacist are unaware of c) Patient receives either one of the drug
the d) Assessor know about the drug
effect e) Investigators do not know about the drug they
c) Placebo effect could be present use

e) Ivolve human volutiar


E
d) Patient could not know anything regarding trial TFTFT

TFTFT 47. Followings are the significance tests for


Quantitative variable:
43. Which are the following indicates highly a) Chi-squared test
N
Significant test: b) z test
a) 0.1 c) t test
b) 0.01 d) Correlation
c) 0.05 e) Wilcoxon signed rank test
d) Less than 0.05 FTTTF[Ref:Lecture of Prof.Mozammel Hoque Sir]
e) 1
B
E
G

54 GENESIS
Biostatistics

Previous Years’ Questions:


Medicine 6. What are the non-parametric tests (FCPS-
1. Square root of variance (FCPS-July-2011) July-2008)
a) Mean a) Mann whitney test
b) Variation b) Wilcoxon rank sum test
c) Standard deviation c) Pearson correlation test
d) Median d) T test

IS
e) Accuracy TTFF
FFTFF
7. What are the properties of median- (FCPS-
2. Quantitative data of continuous type July-2008)
graphically represented by – (FCPS-Jan-2011,09) a) Not affected by external values
a) Pie chart b) Applicable for nominal/ordinal data
b) Histogram c) Can be obtained from qualitative data
c) Frequency polygon d) Not easy to understand

S
d) Line chart TTFF
e) Ogive chart
FTTFF 8. What are the features of double blind clinical
trial? (FCPS-July-2008)
3. Non-parametric test include: (FCPS-Jan-2011) a) Both patient & doctor are unaware of the effect
a) Chi-square test b) Patient, doctor & pharmacist are unaware of the effect
b) Mann-whit c) Placebo effect could be present

d) Anova test
E
c) Will-coxon Rangs song test TFF

e) Will coxon sighned rank test 9. Which of the following indicates significant
TTTTT test (FCPS-July-2008?)
a) 0.1
N
4. In student’s test: (FCPS-Jan-2011) b) 0.01
a) Variable should be qualilatitive c) 0.05
b) Number of sample>30 d) Less than 0.05
c) Variables distributed normally FTFT
d) Random sample used
FTTT 10. Normal curve- (FCPS-Jan-2007)
a) Pyramid shape
E

5. Typical experimental study include: (FCPS- b) Dome shape


Jan-2011) c) Bell shaped
a) Hospital trials d) Mean, median & mode coincide
b) Cohort study FFTT
c) Clinical trial
G

d) Field trial 11. Measurement of quantitative value by-(FCPS-Jan-


e) Community trial 2007)
TFTTT a) Pictogram
b) Spot map
c) Histogram
d) Line graph
FFTT

GENESIS 55
Biostatistics
12. Double blind clinical trial-(FCPS-Jan-2007) 18. Followings are the significance tests for
a) Patient receives placebo quantitative variable-(FCPS-05,Ju)
b) Patient don’t know about trial a) Chi-squared test
c) Patient receives either one of the drug b) z test
d) Assessor know about the drug c) t test
e) Investigators do not know about the drug they d) Correlation
use e) Sign test
TTTFF FTTFF

IS
13. Variable mean -(FCPS-Jan-2007) 19. Followings are considered for frequency
a) Mean deviation distribution table (FCPS-05, Ju)
b) Measurement of proportion a) Group interval should be too broad not too
c) Difference between two proportions narrow
a) A trait that is measurable b) Number of groups should be between 6 & 16
FFFT depending on size of sample
c) Class interval should be different
14. Accuracy of a large population group d) Groups should be tabulated in ascending or

S
depends on (FCPS-Jan-2007) descending order
a) Sample e) Data omitted deliberately, reason should not be given
b) Population FFFTF
TF
20. Square root of variance to act: (FCPS-04,Ju)
15. ST depression on the ECG measured in mm a) Standard deviation
is -(FCPS-05,Ju) b) Standard error

b) Ordinal type of data


E
a) Categorical type of data c) Coefficient of variation
d) Square root of SD
c) Qualitative data e) Mean
d) Quantitative type of data TFFFF
e) Measured in ratio scale
N
FFFTF 21. Scales used in biostatistics: (FCPS-04,Ju)
a) Normal scale
16. The quality of measured data is expressed in b) Ratio scale
terms of -(FCPS-05,Ju) c) Continuous scale
a) Validity d) Intervening scale
b) Reliability e) Interval scale
c) Feasibility FTFFT
E

d) Precision
e) Sensitivity 22. In student’s test: (FCPS-04,Ju)
TTFTF a) Variable should be qualitative
b) Number of sample>30
17. The major types of probability sampling are c) Variables distributed normally
G

-(FCPS-05,Ju) d) Random sample used


a) Simple random sampling FTTT
b) Quota sampling 23. Measures of dispersion are: (FCPS-04,Ju)
c) Stratified sampling a) SD
d) Systemic sampling b) Range
e) Purposive sampling c) Median
TFTTF d) Co-efficient of variation
e) Harmonic mean
TTFTF

56 GENESIS
Biostatistics
24. Graphical presentation of quantitative data SURGERY
are: (FCPS-04,Ju) 30. P value <0.5 means (FCPS-2011,Ju)
a) Frequency polygon a) Low probability of the result to occur under null
b) Histogram hypothesis
c) Scattered diagram b) Result is unlikely to occur out of sampling error
d) Multiple bar diagram c) Null hypothesis is rejected
e) Pie chart d) Result is not significant
TTTFF e) Result is likely to occur out of by chance
25. In T-test for statistical analysis-(FCPS-04,Ja) FFFTT

IS
a) Data is qualitative
b) Sample is random 31. Cohort study-(FCPS-2011,Ju)
c) Sample size is more than 30 a) Longitudinal
d) Variable is normally distributed b) Interventional
e) Significant difference is shown between two c) Analytical
mean d) Rare disease
F T T TF TFTF

S
26. Following are the measures of variability of 32. Random sampling -(FCPS-2007,Ju)
sample -(FCPS-04,Ja) a) It is biased
a) Coefficient of variation b) It is more reliable
b) Standard deviation c) Selected object may be included
c) Inter quartile range FTF
d) Standard error of proportion
e) Mean deviation 33. Quantitative variable includes -(FCPS-
TTTTT
E 2007,Ju)
a) Body weight
27. Measure of dispersion include- (FCPS-04,Ja) b) Hospital pt. number
a) Range c) Sex
b) Mode d) Emotion
N
c) Variation e) Age
d) Standard error TTFFT
e) Quantitative deviation
TFFTF 34. In double blind clinical trial -(FCPS-2007,Ja)
a) Both the patients and the investigators should
28. Scales of measurement include- (FCPS-04,Ja) know about the placebo and drug given
a) Normal scale b) All the patients will take placebo and the drug
E

b) Ratio scale c) Placebo has the similarity to active drug


c) Proportion scale d) One group will receive placebo
d) Interval scale e) Patients must be allocated on random basis
FTFT FFTTT
G

29. Z-test in statistical analysis - (FCPS-04, Ja) 35. The followings are the examples of non
a) Shows significant difference between two mean random sampling- (FCPS-2007,Ja)
b) It is done when sample size more than 30 a) Convergence sampling
c) Degree freedom needed to interpret it b) Quota sampling
d) Yates correction is sometimes needed c) Purposive sampling
e) Qualitative data is a precondition d) Cluster sampling
F TFFT e) Snowball sampling
F TTFT

GENESIS 57
Biostatistics
36. Clinical trial (FCPS-2006,Ju) 42. Diagrammatic data presented by (FCPS-
a) Includes only healthy individuals 09,Ja)
b) Includes both healthy and diseased individual a) Histogram
c) There should be always a placebo b) Frequency diagram
d) Those who undergoing trial should not know c) Bar diagram
anything about it d) Pictogram
e) Informed written consent is taken e) Scatter diagram
FTTFT FFTTF
43. Standard deviation (FCPS-09, Ja)

IS
37. Population (FCPS-2006,Ju) a) Measures of dispersion from the mean value
a) Indicates only human b) Very simple
b) Can be any object of nature c) Square root of the variance
c) Is what we study d) Most commonly used measure of spread
d) Is a part of mass e) Relative measures of disperse
e) In research which is studied TTTTF
FTTTT
44. Standard error- (FCPS-08,Ju)

S
38. Random sampling - (FCPS-2006,Ju) a) Measured by standard deviation
a) Mean or average b) Increases sample increases standard error
b) Mean deviation TF
c) Geometric mean
d) Co-efficient 45. Central tendency (FCPS-07,Ju)
e) Correlation co-efficient a) Mean
FTFTT b) Median
E
39. Clinical trial - (FCPS-2003,Ju)
c) Mode
d) Range
a) Include only healthy individual e) Percentile
b) Include both healthy & disease individual TTTFF
c) There should be always a placebo
N
d) Those undergoing trial should not know 46. Infamous conduct (FCPS-07,Ju)
anything about it a) Unlawful abortion
e) Informed written consent is must b) Issuing false certificate
FTFFT c) Political involvement
d) Drug addiction
40. W.t & B.P. measure using which (FCPS- e) Relation with drug manufactures
2003,Ju) TTFTF
E

a) Regression co-efficient
b) Chi-square test 47. Standard error (FCPS-07,Ju)
c) Correlation a) Measured by standard deviation
FFT b) Increase sample –Increase standard error
TF
G

PAEDIATRICS
41. Probability sampling (FCPS-09,Ja) 48. Behavioral changes (FCPS-07,Ju)
a) Quota sampling a) Attitude
b) Simple sampling b) Advocacy
c) Systematic sampling c) Knowledge
d) Stratified sampling d) Acceptance
e) Multi-phase sampling e) Reinforcement
FTTTT TFTTF

58 GENESIS
Biostatistics
49. Criteria for test performances (FCPS-07,Ju) 55. Following are the continuous variables
a) Specificity (FCPS-06,Ja)
b) Sensitivity a) Colour of people’s hair
c) Predictive value b) Religion of people
d) False positive test c) Annual rainfall in Bangladesh in 2004
TTTF d) The political party people vote for in an election
e) Relative humidity recorded at a given location
50. Quantitative variable (FCPS-07,Ja) FFTFT
a) Weight

IS
b) Age 56. Probability (P) value (FCPS-05,Ju)
c) Sex a) Chance of occurrence by chance
d) Marital status b) Value 0-1
e) Height c) Value <0.5
TTFFT d) Value is <0.1
TTFF
51. “Diagnostic test” criteria (FCPS-07,Ja)
a) Specificity 57. Mean (FCPS-05,Ju)

S
b) Sensitivity a) Arithmeic average
c) Least negative chance b) Ex
d) Attributable population risk c) Alternative of bimodal
e) False positive readily occurs TFF
TTFFF
OBS. & GYNAE
52. Cross sectional study features (FCPS-07,Ja)
a) Preventive study
b) Cumulative incidence
E 58. Central tendency means (FCPS-2010,Ja)
a) Discrete distribution
c) Incidence can be done b) Continuous distribution
FFT c) Height is the measurement of it
d) Helps in comparative study
N
53. Random sampling includes (FCPS-06,Ja) FFFT
a) Quota sampling
b) Purposive sampling 59. Case control study (FCPS-2008,Ja)
c) Cluster sampling a) Done in case of rare disease
d) Systematic sampling b) Prospective study
e) Stratified sampling c) Used in hypothesis
FFTTT TFF
E

54. Standard deviation (FCPS-06,Ja) 60. Measures of dispersion (FCPS-2007,Ju)


a) Means standard error a) Standard deviation
b) Is not commonly used in statistical analysis b) Median
c) Is denoted as square root of the variance c) Arithmetic mean
G

d) Is more acceptable than mean deviation d) Mean


e) Is a measure of dispersion from the mean e) Co-efficient of dispersion
FFTTT TFFFT

GENESIS 59
Biostatistics
61. Analytical study (FCPS-2007,Ju) 68. Indicator of prevalence (FCPS-2006, Ja)
a) Cohort study a) Infant mortality
b) Case control study b) Maternal mortality
c) Cross observation c) Expectation of life
d) Ecological study d) Increase of fertility
e) Experimental case control study FFFF
TTFFT
62. Which graph is frequently used in 69. 3,1,7,2,2-(FCPS-2005,Ju)
quantitative study (FCPS-2007,Ju) a) Mean-3

IS
a) Dot diagram b) Mood-2
b) Bar diagram c) Median-7
c) Histogram d) Range-6
d) Pictoram e) Standard deviation -0/9
e) Frequency polygon TTTTF
TTTTT
70. Analytical data -(FCPS-2005,Ju)
63. Multifactorial inheritance of disease-(FCPS- a) Case control study

S
2007, Ju) b) Cohort study
a) Neural tube defect c) Retrospective study
b) Congenital heart disease d) Random sampling
c) HTN TTTF
d) Diabetes insipidus
TTTF 71. Observation -(FCPS-2005,Ja)
a) Mean is always less than mode

a) Cohort study
E
64. Hypothesis is generated in -(FCPS-2007,JA) b) ½ Observation > median
c) Data right median less than mode
b) Case-control study d) Mode frequency
c) Cross sectional study e) Variation
FFT FFFFF
N
65. Measurement of dispersion -(FCPS-2006,Ju) 72. Incidence of a disease frequently occur at
a) Mean high rate--(FCPS-2004,Ja)
b) Range a) Population distribution
c) Variance b) Prevalence rate
d) Mode c) Case fatality rate
FTTF d) Specific attack rate
E

e) Death rate
66. Z for - (FCPS-2006, Ja) FFFTF
a) Regression
b) Correlation 73. Best clinical trial/Type & study-(FCPS-
c) Standard deviation 2004,Ja)
G

d) Median a) Case control study


e) Mode b) Cohort study
FFFFF c) Retrospective study
d) Prospective study
67. Skewed diagram -(FCPS-2006,Ja) e) Cases solution
a) Bimodal FTFTF
b) Asymmetrical
c) Heterogenous
FTT

60 GENESIS
Biostatistics
74. Epidemiological study (FCPS-2003,Ja) 79. Validity refers -(FCPS-2008,Ju)
a) Cohort study a) Express the degree to which 2 things related
b) Case control study b) Implied result of a test can be reproduced
c) Common trial c) Describe how a study measures the purpose
TTF d) Capacity to identify condition correctly
FFFT
75. Association (FCPS-2003,Ja)
a) Incidence-Morbidity 80. Which one of the following statements
b) Prevalence-Mortality concerning the distribution curve is not

IS
c) Odds no.-Fatality accurate?
TTF a) The mean is the sum of all the scores divided by
the number of scores
Radiology b) The mean is a good measure of central tendency
in skewed distributions
76. Graphical presentation of quantitative data- c) The mean is higher than the median in positively
(FCPS-2008,Ju) skewed distributions
a) Bar diagram d) When there is an even number of numbers, the

S
b) Histogram median is the mean of the two middle numbers
c) Frequency polygon e) Many distributions have more than one mode
d) Pie diagram C
e) O given
FTTFF 81. Blood pressure is normally distiblbuted. For
a sample of 100 Asian women the average blood
77. Example of central tendency -(FCPS-2008,Ju) pressure is 50, standard deviation=5, standard
a) Variance
b) Mode
E error = 1/2 , Which one of the following
statements is correct ?
c) Standard deviation a) Approximately 95% of asian women have blood
d) Range pressures in the range (45, 55)
e) Median b) We are approximately 95% confident that the
N
FTFFT population average blood pressure for Asian
women lies in the interval (49, 51)
78. Random sampling -(FCPS-2008,Ju) c) Blood pressure measurement in asian women is
a) Is done when size is large not informative
b) Here some of the members have equal chance to d) 10% of Asian women have blood pressure below
be included 40 mmHg
c) Called probability sampling e) The mean blood pressure of Asian women must
E

d) Selected by choice lie within the range (49, 51)


e) Error is less B
FFTTT
G

GENESIS 61

You might also like