Professional Documents
Culture Documents
6 Biostatistics 10082013
6 Biostatistics 10082013
Information bias
confounding
True about cluster sampling is all except
a)Precision higher than Simple Random Sampling
b)Considered Rapid Assessment Method
c) Costlier than Simple Random Sampling
d)Done for evaluation of immunization status
Design effect is used specifically in which of the
sampling approaches:
a)Cluster sampling
b)Simple random sampling
c) Systematic sampling
d)Quota sampling
Terms describing data
Quantitative data- there is a natural numeric scale- can be
subdivided into interval / ratio data
Age, height, weight
Qualitative data- Measuring a characteristic which has no
natural numeric scale – subdivided into nominal/ ordinal
scale- Gender, Eye color
Quantitative data
Discrete- values are distinct and separate
Values are invariably whole nos
No. of children in a family
Continuous- Those which have uninterrupted range of
values
Can assume either integral or fractional values
Height, weight, age, birth weight, time, body temperature
The response which is graded by an observer on an
agree or disagree continuum is based on:
a)Visual analog scale
b)Guttman scale
c) Likert scale
d)Adjectival scale
Measurement scales
Nominal: Data are divided into qualitative categories or groups
e.g. male/female;
urban/suburban/ rural;
Colours ; religion.
Nominal data that fall into two groups (yes/no , present/absent) are called
a) 0.01
b) 0.03
c) 0.001
d) 0.003
The normal distribution
Bell shaped curve, Guassian curve
Mean, median and mode coincide
Mean = 0
Area under the curve = 1
Ends never touch the baseline
Mean + 1 SD = 68.3 % of distribution
Mean + 2 SD = 95.4 % of distribution
Mean + 3 SD = 99.7 % of distribution
Skewed distribution
Positively skewed: Mean > Median > Mode
Relatively large number of low scores and a small
number of very high scores
Negatively skewed: Mean < Median < Mode
Relatively large number of high scores and a small
number of low scores
Measures
of central tendency
Mean, Median Mode
Mode: Highest point on frequency polygon ; Totally
uninfluenced by small number of extreme scores in a
distribution ; uni-modal / bi-modal / multi-modal
Median: Insensitive to a small number of extreme
scores in the distribution ; Very useful for highly
skewed distributions
Mean: Responds to the exact value of every score in
the distribution ; Unsuitable for skewed data
Measures of Variability
Range: Minimum to maximum
Variance: σ2 = Σ (X - Xbar)2 / N
Standard Deviation: Square root of variance
Steps to calculate SD:
1. Calculate the mean
2. Subtract mean from each value
3. Square the result
4. Add the individual values
5. Divide it by the total number of observations (THIS IS
THE VARIANCE)
6. Square root the variance to get the SD
Inferential statistics
Statistic & Parameter
Random sampling distribution of means
Central limit theorem: RSD of means will always be
normal and the mean of this distribution is equal to the
population mean
This distribution has a SD known as the Standard error
of means or simply Standard Error (SE)
Standard Error
Standard error = Standard deviation of the population
/ Square root of the sample size
Inversely proportional to the sample size
Z score
Denotes the location of an element in a normal
distribution (in terms of SD)
z = X – μ (or Mean)
SD
Where X is the X is the element, μ is the Standard error and
Sigma is the SD.
Uses:
If we want to know what heart rate divides the fastest-
beating 5% of the population from the remaining 95%
Specifying probability of an event
Limitation: z score tables required
Question 1: The test scores of students in a class test has
a mean of 70 and with a standard deviation of 12. What is
the probable percentage of students scored more than
85?
The z score for the given data is,
z = 85–70/12 = 1.25
From the z score table the fraction of the data
within this z score is 0.8944.
This means 89.44% of the students are within the
test scores of 85 and hence the percentage oof
students who are above the test score of 85 = (100 –
89.44)% = 10.56%
Hence, the required probable percentage is 10.56%.
Example: An organization made a survey on the monthly salary
of their clerical level employees, in dollars. The data revealed the
mean as 4000 with a standard deviation of $600. Find
what percentage of employees are in the salary bracket [3000,
4500].
i.e. the population mean value of systolic blood pressure will lie between
118.04 and 121.96 and we can have a confidence of 95% for making this
statement.
A study was undertaken to compare treatment options in
black and white patients who are diagnosed as having breast
cancer. The 95% confidence interval for the odds ratio for
blacks being more likely to be untreated than whites was 1.1
to 2.5. The statement that most accurately describes the
meaning of these limits is that:
a)95% of the time blacks are more likely than whites to be untreated
b)95% of the odds ratios fall within these limits
c) the probability is 95% that odds ratio in similar studies would fall
within these limits
d)since the observed odds ratio falls in the centre of these limits, the
probability is 95% that it is the correct value
Precision and Accuracy
Precision is the degree to which successive
measurements yield similar results (Repeatibility,
Consistency, Reliability)
Accuracy: The degree to which a measurement is close
to the true value.
The width of the Confidence Interval reflects
precision: The wider the confidence interval, the less
precise the estimate
TESTING
Hypothesis:
OF HYPOTHESIS
Null Hypothesis (H0)
Mean1=Mean2
Intervention
Type I error Correct conclusion
better than
control
Type I error: Rejecting the null hypothesis when it is
true
Type II error: Accepting the null hypothesis when it is
false
In a clinical trial, two drugs A and B were administered to alternate patients in
100 cases of hypertension and the effect of these 2 drugs was studied
statistically by applying chi-square test. The value of chi-square was 4.12 with
degree of freedom =1 against the table value of 3.84 at 5% level. Which of the
following conclusions can be drawn from this study?
1. Null hypothesis is proved
2. Null hypothesis is rejected
3. There is no statistical difference between the effects of 2 drugs
4. The probability of the effect of the 2 drugs being the same is less than 0.05
Hypothesis Testing
Five Important Terms
Outcome variable
Exposures
Bias
Confounder(s)
Chance factor
Types of Variables
Variables
Categorical Quantitative
Statistical Estimation
Statistical Hypothesis Testing
Statistical Modeling
Data Mining
Three ways of Describing results
1. Graphically
2. Tabular form
3. Statistics or summary measures
1. Uni-variate analysis
2. Bi-variate analysis:
(Cat, Cat)
(Cat, Quant)
(Quant, Quant)
3. Multivariate analysis:
May decide to perform one or all three
depending on the need
• Incidence
• Cumulative incidence (new cases during a given
time/population at risk)
• Incidence density (new cases during a given
period/total person time)
Measures of locations
Quartiles (Q1, Q2, Q3)
Deciles (D1, D2, ----, D5, ----, D9)
Percentiles (P1, P2, ----, P50, ----,
Measures
1. Range of Dispersion
2. Inter quartile range
3. Mean Deviation
4. Standard Deviation (SD)
Coefficient of Variation:
Comparing SD between groups
•Point estimation
mean, proportion, correlation coefficient etc.
computed from sample serve as estimates of the
population parameters.
•Interval estimation
68.6%
95.0% area
H0: µ1 = µ2
against
H1: µ1 < µ2 H1: µ1 ≠ µ2 H1: µ1 > µ2
(Left tailed) (two-tailed) (Right tailed)
Types of Errors
“Truth”
Treatments Treatments
Study Results
differ do not differ
Treatments
A B
Correct Type I error
differ
(true positive) (false positive)
Treatments C D
do not differ Type II error Correct
(false negative) (true negative)
Errors in Statistical Testing
Weak Evidence
(Not Significant)
No Evidence
(Not Significant)
p=.0069
Clinical Significance Vs Statistical Significance
Because the sample size is so large we are able to detect a very small
change in temperature
Which of the following is most likely to contribute
to a statistically significant but clinically
meaningless outcome
1. Categorical vs Categorical
2. Categorical vs Quantitative
3. Quantitative vs Quantitative
X=2, Y=2 X>2, Y>2
Unrelated Related Unrelated
-Chi square test McNemar test - Chi square test
- Fishers Exact test - Fishers Exact test
X :Group variable
Y :Outcome variable
Proportion Test
Rx A 20 373 393
(5.1%) (94.9%)
Rx B 6 316 322
(1.9%) (98.1%)
Parametric
X=2 & Y: Normal
Unrelated Related
Student’s t test Paired ‘t’ test
Unrelated Related
One way Repeated
a)Chi-square test
b)Unpaired or independent t test
c) ANOVA
d)Paired t test
A clinical trial of an antihypertensive agent is
performed by administering the drug or a
placebo, with a washout period in between, to
each study subject. The treatments are
administered in random order, and each study
subject serves as his or her own control. The trial
is double-blind. The appropriate significance test
for the change in blood pressure with drug versus
placebo is
a. ANOVA
b. Chi-square test
c. Paired t-test
d. Pearson correlation coefficient
Linear regression would be an appropriate
method for which of the following scenarios:
a)correlation
b)Bland Altman Plot
c) Regression
d)None of the above
A clinical trial was performed in which asthma
patients were randomized into three treatment
groups: salmetrol BD, Albuterol QID, and placebo.
The investigators measured FEV1 values seen in three
treatment groups. The test of significance done will
be:
a)Chi-square test
b)Unpaired t –test
c) Z test
d)ANOVA
How to report the results?
Birth weight of
newborns
3.20 0.49 3.60 0.37 0.4 (0.06 – 0.72) 0.022
The difference between birth weight of babies born to two group of mothers
found by chance is only 2 in a 100 times.
The Multivariate Problem
Typical Question asked is Bi-variate:
Exposure (Risk factor) Outcome
Therefore,
E, C1, C2, ….. Outcome
(Independent variables)