6 Biostatistics 10082013

Biostatistics
Biostatistics (a combination of the words biology and

statistics; sometimes referred to
as biometry or biometrics) is the application of
statistics to a wide range of topics in biology. The science
of biostatistics encompasses the design of biological
experiments, especially in medicine & agriculture; the
collection, summarization, and analysis of data from
those experiments; and the interpretation of, and
inference from, the results.
Types of sampling
Probability sampling
Non probability sampling
Probability sampling- where each element of

the population is given a known and non zero
chance of being included in the sample.
 Sampling frame should be complete
 Selection process must involve a known probability of inclusion
 during implementation only those selected units are taken- no
replacement for non response.
Probablity sampling
 It is more representative.
 Quantitative study
 For analytical study, hypothesis testing for statistical inferential
studies
 When the group is heterogeneous and large
 When I want to know what causes something
 To study comparison
 When I have time and money
 I want to generalise small patterns to big patterns
 I want to make sure that others can repeat my findings in
study(reliability)
Random events are predictable, we can compare, generalise.
Such samples are somewhat free from human bias and personal
judgments
Main disadvantage
should be strictly structured, expensive and extensive.
Non probability sampling
Purposive sampling, judgmental sampling
Homogenous group- where variations are less
Small population
Exploratory study, qualitative research
When lot of details are to be studied
To study how social changes occur, used for describing
trends of opinion
 The study is flexible
Less expensive and extensive- very convenient
Disadvantage
Cannot be generalized, hence cannot be used for
parameter estimation
Probability sampling
Types
 Simple random-like lottery
 Systemic random-like simple random sample but with a difference-
starts from a randomly chosen point and every nth unit is taken from
the list- sampling interval depend on the size of population and
sample
 Stratified random-the population is divided into definite subgroups
or strata and then from each strata random samples are taken.
(sample size: population size should remain same in each strata.)It
takes care of the inter strata variability.
 Cluster- it is drawn in stages-e.g in first stage clusters are selected
randomly from blocks, in second stage a cluster of households may be
selected randomly from each selected block
 Multi stage sampling-probability proportional to size(PPS sampling)
Non probability samples
Types
Purposive samples-the respondents are selected purposely-persons are
more knowledgeable- e.g policymakers, principal of college
Quota samples-the numbers of respondents are fixed in each category,
when the specifically designed responders are not available they are
replaced by others. e.g Immunisation coverage
Chunk samples-Chunk of people who are available at the spot of
interview, talking to the first 10 mothers who are available at the
immunisation centre.
Volunteer samples-people who volunteer as respondents e.g-
newspaper , telephone, internet surveys
Snowball sample-First case is identified of a known character or
behavior, the first case locates the second , second to third …..a chain
system follows for snow ball selection
Probability samples are scientific based on law of
chance
Probability sample Non probability samples
Sample size is determined The samples are drawn based
depending on the level of on judgment and purpose
precision desired Cannot be used for scientific
Specify probability of deductions on probability of
selection of each unit in the bias, sampling error, level of
sample confidence
Estimate the amount of
sampling error
Determine the level of
confidence
Errors
Error is a false or mistaken result in a study
It has two components
Sampling error or random error-
Non sampling error or systematic error
It can be due to

Investigator, observer
Study participants
Study instrument
Errors can be due to bias and may be random
Sampling Error
Sampling errors are due to chance
Refers to fluctuations around a true value because
of sample variability
Sampling error-it is the Standard deviation of
sample means
higher the Random error-----more lack of study
precision
can be corrected by-
 increasing size of the sample
 By repeated sampling
SYSTEMATIC ERROR
Because of any other cause than the sampling
variability
 Selection bias-
 fault in selection of subjects
 Factors influencing participation in the study-response bias
 Factors influencing participation drop outs
 Information bias
 confounding
True about cluster sampling is all except
a)Precision higher than Simple Random Sampling
b)Considered Rapid Assessment Method
c) Costlier than Simple Random Sampling
d)Done for evaluation of immunization status
Design effect is used specifically in which of the
sampling approaches:
a)Cluster sampling
b)Simple random sampling
c) Systematic sampling
d)Quota sampling
Terms describing data
Quantitative data- there is a natural numeric scale- can be
subdivided into interval / ratio data
Age, height, weight
Qualitative data- Measuring a characteristic which has no
natural numeric scale – subdivided into nominal/ ordinal
scale- Gender, Eye color
Quantitative data
Discrete- values are distinct and separate
Values are invariably whole nos
No. of children in a family
Continuous- Those which have uninterrupted range of
values
Can assume either integral or fractional values
Height, weight, age, birth weight, time, body temperature
The response which is graded by an observer on an
agree or disagree continuum is based on:
a)Visual analog scale
b)Guttman scale
c) Likert scale
d)Adjectival scale
Measurement scales
 Nominal: Data are divided into qualitative categories or groups
 e.g. male/female;
 urban/suburban/ rural;
 Colours ; religion.
 There is no implication of order or ratio.
 Nominal data that fall into two groups (yes/no , present/absent) are called
dichotomous data/ binary variable.

Ordinal: Data can be placed in a meaningful order e.g.
Rank; Mild/ moderate/ severe or grading of Ca Lungs –
Grade I/II/III/IV. There is no information about the size of
the interval.
No conclusion can be drawn whether the difference
between the first and second category is same as the
difference between the second and third.
Likert scale is an ordinal scale on responses to degree of
agreement viz. Strongly agree ↔ agree ↔ Equivocal ↔ disagree
↔ strongly disagree
Semantic differentiation scale of Osgood is another example viz.
Among available answers – 30% were Yes; 40% were Maybe
and 30% were No; here 30%, 40% and 30% - refers to Osgood’s
scale.
Interval scale
 Interval scale: Numerical
The distance between these
unit of measurement has
meaningful order, has ordered category values are
meaningful equal equal because there is some
intervals.
 Difference between any acceptable physical unit of
two measurements is measurement
shown in terms of an
The zero point is arbitary
interval between two
points on the scale
 e.g. IQ. Level, body
temperature in Celsius/
Fahrenheit scale.
Interval scale
On the celsius scale, the difference between 100C and 90
is same as the difference between 50 and 40
However because interval scales do not have an absolute
zero, ratios of scores are not meaningful
100C is not twice as hot as 50C because OC does not
indicate a complete absence of heat
Ratio scale
Ratio scale: Have the same properties as interval scale, but
because it has an absolute zero, meaningful ratios do exist.
Most biomedical variables form ratio scales e.g. Kelvin
temp, BP, Pulse rate etc.
Possible to multiply or divide across a ratio scale
Ratio between 2 values on a scale is a meaningful measure
of the relative magnitude of 2 measurements.
Ratio scale
The only ratio scale of
temperature is Kelvin scale,
in which zero degrees
indicate an absolute absence
of heat
Also, it would be correct to
say that a pulse rate of 120 is
twice as fast as pulse rate of
60.
Measurement scales
Type Description Key words Examples
of
scale
Nom Different groups This or that or that Gender, comparing
inal among treatment
(cate interventions
goric
al)
Ordi Groups in sequence Comparative Olympic medals, class
nal quality, rank order rank in school
Inter Exact differences Quantity, mean , SD Temp in Celsius, F
val among groups
Ratio Interval + true zero Zero means zero Temperature measured
point in Kelvin
Graphical representation of
data
Quantitative Data:
Histogram
Frequency polygon
Scatter / Dot diagram
Cummulative frequency diagram (Ogive)
Line chart / graph
Qualitative data
Bar diagram
Pie / Sector diagram
Pictogram
Mapping
Probability
Addition Rule: Probability of ANY ONE (Either this or
that) of several events occurring is equal to the sum of
their individual probabilities provided they are mutually
exclusive
Multiplication rule: Probability of two or more
statistically independent events occurring together
(This & that) is equal to the product of their individual
probabilities.
If prevalence of diabetes is 10%, the probability that
three people selected at random from the population
will have diabetes is:
a) 0.01
b) 0.03
c) 0.001
d) 0.003
The normal distribution
Bell shaped curve, Guassian curve
Mean, median and mode coincide
Mean = 0
Area under the curve = 1
Ends never touch the baseline
Mean + 1 SD = 68.3 % of distribution
Skewed distribution
Positively skewed: Mean > Median > Mode
Relatively large number of low scores and a small
number of very high scores
Negatively skewed: Mean < Median < Mode
Relatively large number of high scores and a small
number of low scores
Measures

of central tendency
Mean, Median Mode
Mode: Highest point on frequency polygon ; Totally
uninfluenced by small number of extreme scores in a
distribution ; uni-modal / bi-modal / multi-modal
Median: Insensitive to a small number of extreme
scores in the distribution ; Very useful for highly
skewed distributions
Mean: Responds to the exact value of every score in
the distribution ; Unsuitable for skewed data
Measures of Variability
Range: Minimum to maximum
Variance: σ2 = Σ (X - Xbar)2 / N
Standard Deviation: Square root of variance
Steps to calculate SD:
1. Calculate the mean
2. Subtract mean from each value
3. Square the result
4. Add the individual values
5. Divide it by the total number of observations (THIS IS
THE VARIANCE)
6. Square root the variance to get the SD
Inferential statistics
Statistic & Parameter
Random sampling distribution of means
Central limit theorem: RSD of means will always be
normal and the mean of this distribution is equal to the
population mean
This distribution has a SD known as the Standard error
of means or simply Standard Error (SE)
Standard Error
Standard error = Standard deviation of the population
/ Square root of the sample size
Inversely proportional to the sample size
Z score
Denotes the location of an element in a normal
distribution (in terms of SD)
z = X – μ (or Mean)
SD
Where X is the X is the element, μ is the Standard error and
Sigma is the SD.
Uses:
If we want to know what heart rate divides the fastest-
beating 5% of the population from the remaining 95%
Specifying probability of an event
Limitation: z score tables required
Question 1: The test scores of students in a class test has
a mean of 70 and with a standard deviation of 12. What is
the probable percentage of students scored more than
85?
The z score for the given data is,
z = 85–70/12 = 1.25
From the z score table the fraction of the data
within this z score is 0.8944.
This means 89.44% of the students are within the
test scores of 85 and hence the percentage oof
students who are above the test score of 85 = (100 –
89.44)% = 10.56%
Hence, the required probable percentage is 10.56%.
Example: An organization made a survey on the monthly salary
of their clerical level employees, in dollars. The data revealed the
mean as 4000 with a standard deviation of $600. Find
what percentage of employees are in the salary bracket [3000,
4500].
The z score of the employees with a salary less than 3000

= 3000−4000/600 = - 1.67 (approx)
The z score of the employees with a salary more than
4500 = 4500−4000/600 = 0.83 (approx)
From the z score table, the fraction of the data within, z
score of -1.67 = 0.0475
z score of 0.83 = 0.7967,
Therefore, the fraction of data between the z scores of -
1.67 and 0.83 = 0.7967 – 0.0475 = 0.7492
Hence, 74.92% of clerical level employees are within the
salary bracket [3000, 4500].
The LL and UL estimates for the Population mean are given as :-
Mean - C* SE and Mean + C*SE

C= Confidence coefficient, SE ={ SD / (n) },
n = sample size.
If 95% confidence is desired , C = 1.96 ,

for 99% confidence, C = 2.58
for 99.9% confidence, C = 3.29
Example-:
In a study of a sample of 100 subjects it was found that
the mean systolic blood pressure was 120mm. of hg.
with a standard deviation of 10mm. of hg. Find out
95% confidence limits for the population mean of
systolic blood pressure.
SE = SD / ( n ) = 10/ ( 100 ) = 10/10 =1
LL :--- mean - 1.96*1 :--- 120 - 1.96 = 118.04

UL :--- mean +1.96*1 :--- 120 + 1.96 = 121.96
i.e. the population mean value of systolic blood pressure will lie between
118.04 and 121.96 and we can have a confidence of 95% for making this
statement.
A study was undertaken to compare treatment options in
black and white patients who are diagnosed as having breast
cancer. The 95% confidence interval for the odds ratio for
blacks being more likely to be untreated than whites was 1.1
to 2.5. The statement that most accurately describes the
meaning of these limits is that:
a)95% of the time blacks are more likely than whites to be untreated
b)95% of the odds ratios fall within these limits
c) the probability is 95% that odds ratio in similar studies would fall
within these limits
d)since the observed odds ratio falls in the centre of these limits, the
probability is 95% that it is the correct value
Precision and Accuracy
Precision is the degree to which successive
measurements yield similar results (Repeatibility,
Consistency, Reliability)
Accuracy: The degree to which a measurement is close
to the true value.
 The width of the Confidence Interval reflects
precision: The wider the confidence interval, the less
precise the estimate
TESTING
Hypothesis:
OF HYPOTHESIS
Null Hypothesis (H0)
Mean1=Mean2
Alternate Hypothesis (H1)

Mean1<Mean2
Mean1>Mean2
Mean1= Mean2
FOUR POSSIBILITIES IN A DECISION
MAKING PROCESS
TRUTH (Actual Situation)
Intervention same Intervention better

STUDY as control than control
FINDINGS Ho true Ho false
(Test Result)
Type II error
Intervention Correct conclusion 
same as control
(H0 Accepted)
Intervention
Type I error Correct conclusion
better than
control
Type I error: Rejecting the null hypothesis when it is
true
Type II error: Accepting the null hypothesis when it is
false
In a clinical trial, two drugs A and B were administered to alternate patients in
100 cases of hypertension and the effect of these 2 drugs was studied
statistically by applying chi-square test. The value of chi-square was 4.12 with
degree of freedom =1 against the table value of 3.84 at 5% level. Which of the
following conclusions can be drawn from this study?
 1. Null hypothesis is proved
 2. Null hypothesis is rejected
 3. There is no statistical difference between the effects of 2 drugs
 4. The probability of the effect of the 2 drugs being the same is less than 0.05
The correct choices are:

 a. 1 and 3
 b. 2 and 3
 c. 2 and 4
 d. 1, 3 and 4
A study was undertaken to evaluate any increased
risk of breast cancer among women who use birth
control pills. The relative risk was calculated. A
type I error in this study consists of concluding:
a)A significant increase in the relative risk when the
relative risk is actually 1
b)A significant increase in the relative risk when the
relative risk is actually greater than 1
c) No significant increase in the relative risk when the
relative risk is actually 1
d)No significant increase in the relative risk when the
relative risk is actually greater than 1
In a randomized trial of patients who received a
cadaver renal transplant 100 were treated with
cyclosporine and 50 were treated with
conventional immunosuppression therapy. The
difference in treatments was not statistically
significant at the 5% level. Therefore:
a)This study has proven cyclosporine is not effective
b)Cyclosporine could be significant at the 1% level
c) Cyclosporine could be significant at the 10% level
d)The treatments should not be compared because of
the differences in the sample sizes
Data Scales & types of test Scale of Data
Nominal Ordinal Interval / Ratio
Questions Proportions Chi-square

Concerning
One or two t-test
means (z-test if n>100)
> 2 means ANOVA

Variances F-test
Association Spearman (p) Pearson (r)
Value Regression
prediction
Non Parametric tests
Also called as rank, distribution free, small sample
tests
Used for quantitative or ordinal data
Test hypothesis about medians
Appropriate for skewed data
Make few assumptions
Useful and easy for small sample sizes
Confidence intervals difficult
Biostatistics
I. Descriptive bio-statistics
II. Inferential bio-statistics
Estimation: Point estimate

Interval estimate (95% CI)
Hypothesis Testing
Five Important Terms
Outcome variable
Exposures
Bias
Confounder(s)
Chance factor
Types of Variables
Variables
Categorical Quantitative
ORDINAL DISCRETE CONTINUOUS

NOMINAL
Gender Severity of Disease TLC Age, bmi, Sbp
Smk Dose groups In Clinical Trial:
Exposure: Always Categorical
Treat groups Outcome: May be Categorical/ Quantitative
Cure Confounders: May be Categorical/ Quantitative
What Statistical Calculations CAN Do
Statistical Estimation
Statistical Hypothesis Testing
Statistical Modeling
Data Mining
Three ways of Describing results
1. Graphically
2. Tabular form
3. Statistics or summary measures
May decide to use combination of above
There is no other way of reporting of results

Three Ways of Data Analysis
1. Uni-variate analysis
2. Bi-variate analysis:
(Cat, Cat)
(Cat, Quant)
(Quant, Quant)
3. Multivariate analysis:
May decide to perform one or all three
depending on the need
There is no other way of data analysis

Univariate Analysis
• Prevalence/mean of baseline characteristics
• Incidence
• Cumulative incidence (new cases during a given
time/population at risk)
• Incidence density (new cases during a given
period/total person time)
In each group separately

Univariate Analysis
II. Quantitative variable:
Measures of Central Tendency
Mean: Arithmetic mean
Geometric mean
Harmonic mean
Median (n:even, n:odd)
Mode
Measures of locations
Quartiles (Q1, Q2, Q3)
Deciles (D1, D2, ----, D5, ----, D9)
Percentiles (P1, P2, ----, P50, ----,
Measures
1. Range of Dispersion
2. Inter quartile range
3. Mean Deviation
4. Standard Deviation (SD)
 Coefficient of Variation:
Comparing SD between groups
 Standard error (SE= SD/sqrt(n)

NOT a descriptive statistic
 Which one you should report?

SD or SE for showing variation in observations
Estimation of Population Parameter
Two types of Estimation
•Point estimation
mean, proportion, correlation coefficient etc.
computed from sample serve as estimates of the
population parameters.
This estimate is a single value and is called Point estimate.
•Interval estimation
A lower limit (LL) and an upper limit (UL) are computed

from sample values
These limits are called Confidence limits

or Interval estimates.
Empirical properties of a Normal Deviate
68.6%
95.0% area
99.0% area under the curve
X-2.58SD X-2SD X-1SD X X+1SD X+2SD X+2.58SD

Z   2.58 Z   1.96 Z   1 Z 0 Z  1 Z   1.96 Z   2.58
Types of Hypotheses
Null Hypothesis (H0): no difference in outcome
between the treatments
Alternative Hypothesis (H1): difference in outcome
between the treatments
Example: Difference in means
H0: µ1 = µ2
against
H1: µ1 < µ2 H1: µ1 ≠ µ2 H1: µ1 > µ2
(Left tailed) (two-tailed) (Right tailed)
Types of Errors
“Truth”
Treatments Treatments
Study Results
differ do not differ
Treatments
A B
Correct Type I error
differ
(true positive) (false positive)
Treatments C D
do not differ Type II error Correct
(false negative) (true negative)
Errors in Statistical Testing
Truth in the population

Decision
H0 is true H0 is false
Accept H0 No error Type II Error (β)
Reject H0 Type I error () No error
Analogous with a laboratory test

: False Positivity
β: False Negativity
1 – β: Sensitivity (Power of a test)
Types of errors
Type I error:
finds difference in treatments when in actuality no
difference
 P (Type I error) = α
Type II error:
fails to find a difference in treatments when in actuality
there is a difference
P (Type II error) = β
Power of the study
1- β: Ability of the study to detect a true difference
between groups
Usually, it is 80% or 90%
Probability of detecting a specified difference in

outcome between the treatments is 80% or 90%
Which of the following is an accurate statement
concerning the statistical power of the study
a)It is low when the alpha error is high

b)It is low when the beta error is high
c) It is low when the beta error is low
d)It varies inversely with sample size
P - Value
Probability of getting a result as extreme as or more
extreme than the one observed when the null
hypothesis is true.
When our study results in a probability of 0.01, we say

that the likelihood of getting the difference we found by
chance would be 1 in a 100 times.
It is unlikely that our results occurred by chance and the

difference we found in the sample probably due to the
exposure.
‘P’ as a significance level
P < 0.05 result is statistically significant
P > 0.05 result is not statistically significant.
These cutoffs are arbitrary & have no specific importance.

Interpreting the p-value…
Overwhelming Evidence
(Highly Significant)
Strong Evidence
(Significant)
Weak Evidence
(Not Significant)
No Evidence
(Not Significant)
0 .01 .05 .10
p=.0069
Clinical Significance Vs Statistical Significance
A possible antipyretic is tested in patients with the common cold.

500 receive the candidate drug
500 receive a placebo control
Temperatures measured 4 hours after dosing
N Mean StDev SE Mean

Drug 500 39.950 0.653 0.029
Control 500 40.058 0.699 0.031 p value = 0.011
Statistical Significance? Yes. Probably there is a reduction in temperature

__________________________________
Clinical Significance? NO. Temperature only fell by about 0.1c
__________________________________
Because the sample size is so large we are able to detect a very small
change in temperature
Which of the following is most likely to contribute
to a statistically significant but clinically
meaningless outcome
a)Obtaining a p value larger than alpha

b)Performing a two-tailed test instead of a one tailed test
c) Setting alpha at <0.05
d)Using a very large sample
Bivariate Analyses
1. Categorical vs Categorical
2. Categorical vs Quantitative
3. Quantitative vs Quantitative
X=2, Y=2 X>2, Y>2
Unrelated Related Unrelated
-Chi square test McNemar test - Chi square test
- Fishers Exact test - Fishers Exact test
X :Group variable
Y :Outcome variable
Proportion Test
CURE NO CURE TOTAL
Rx A 20 373 393
(5.1%) (94.9%)
Rx B 6 316 322
(1.9%) (98.1%)
What is our interest?
Rx A cure rate Vs Rx B cure rate

A group of researchers were investigating gender of
the head of the household in families of the patients
whose medical costs are covered by insurance,
medicaid or self. Select the most appropriate
statistical method to use in analyzing the data:
a)Independent group t-test

b)Analysis of variance
c) Multiple regression
d)chi-square test
Fisher’s Exact Test
Exact test is recommended when

• the overall total of the table is less than 20 or
• smallest of the four expected numbers is less then 5
McNemar’s Chi-Square test
(Paired Case)
In some studies, researcher is interested in comparing
two proportions which are paired
This arises whenever two proportions are measured on
the same individuals from matched pair design
McNemar’s Chi-Square test is based on the numbers of
discordant pairs
2. Categorical vs Quantitative
Parametric
X=2 & Y: Normal
Unrelated Related
Student’s t test Paired ‘t’ test
X> 2 & Y: Normal
Unrelated Related
One way Repeated
ANOVA measures ANOVA

Parametric Non-Parametric
X=2 & Y: Normal X=2 & Y: Non Normal
Unrelated Related Unrelated Related

Student’s t test Paired ‘t’ test Wilcoxon ranksum Wilcoxon
signrank
X> 2 & Y: Normal X>2 & Y: Non-Normal
Unrelated Related Unrelated Related

One way Repeated Kruskal Wallis Freidmans test
ANOVA measures ANOVA
An investigator wants to study the association
between maternal intake of iron supplements and
birth weights of newborn babies. He collects
relevant data from 100 pregnant women and their
newborns. What statistical Test of hypothesis would
you advise for the investigator in this situation?
a)Chi-square test
b)Unpaired or independent t test
c) ANOVA
d)Paired t test
A clinical trial of an antihypertensive agent is
performed by administering the drug or a
placebo, with a washout period in between, to
each study subject. The treatments are
administered in random order, and each study
subject serves as his or her own control. The trial
is double-blind. The appropriate significance test
for the change in blood pressure with drug versus
placebo is
a. ANOVA
b. Chi-square test
c. Paired t-test
d. Pearson correlation coefficient
Linear regression would be an appropriate
method for which of the following scenarios:
a)Predicting blood pressure levels when the family

history of the presence or absence of hypertension is
known
b)Predicting serum cholesterol levels when dietary
intake data are known
c) Predicting the blood type of a patient when the blood
types of the parents are known
d)Predicting the peak air flow rates when the pack-years
of cigarettes smoked are known
Lung Function can be measured by spirometer, or by
a peak flow meter, The analysis used for comparing
the two methods for measuring the true value of lung
function can be done by:
a)correlation
b)Bland Altman Plot
c) Regression
d)None of the above
A clinical trial was performed in which asthma
patients were randomized into three treatment
groups: salmetrol BD, Albuterol QID, and placebo.
The investigators measured FEV1 values seen in three
treatment groups. The test of significance done will
be:
a)Chi-square test
b)Unpaired t –test
c) Z test
d)ANOVA
How to report the results?
Normal diet Normal diet + adjunct Diff in means P-Value

(n=14) (n=15)
(95% CI)
Mean SD Mean SD
Birth weight of
newborns
3.20 0.49 3.60 0.37 0.4 (0.06 – 0.72) 0.022
The difference between birth weight of babies born to two group of mothers
found by chance is only 2 in a 100 times.
The Multivariate Problem
Typical Question asked is Bi-variate:
Exposure (Risk factor) Outcome
However, there are other factors (conf /Effect Modif.)
Have to control for these other factors to get unbiased

exposure(risk factor) outcome relationship
Therefore,
E, C1, C2, ….. Outcome
(Independent variables)
The Analysis: Mathematical Model

Outcome Confounding Factor
Quantitative and/or Categorical
Quantitative Multiple linear regression

Categorical
: binary Multiple Logistic Regression
: >2 cat Multiple Logistic Regression
: ordinal Ordinal logistic Regression
In general Analysis of Covariance
Time to an Cox Regression

Event

6 Biostatistics 10082013

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6 Biostatistics 10082013

Uploaded by

Copyright:

Available Formats

Biostatistics

Biostatistics (a combination of the words biology and

Probability sampling- where each element of

It can be due to

 Factors influencing participation in the study-response bias

 Factors influencing participation drop outs

 There is no implication of order or ratio.

dichotomous data/ binary variable.

The z score of the employees with a salary less than 3000

Mean - C* SE and Mean + C*SE

If 95% confidence is desired , C = 1.96 ,

SE = SD / ( n ) = 10/ ( 100 ) = 10/10 =1

LL :--- mean - 1.96*1 :--- 120 - 1.96 = 118.04

Alternate Hypothesis (H1)

Intervention same Intervention better

The correct choices are:

Questions Proportions Chi-square

> 2 means ANOVA

II. Inferential bio-statistics

Estimation: Point estimate

ORDINAL DISCRETE CONTINUOUS

May decide to use combination of above

There is no other way of reporting of results

There is no other way of data analysis

In each group separately

 Standard error (SE= SD/sqrt(n)

 Which one you should report?

This estimate is a single value and is called Point estimate.

A lower limit (LL) and an upper limit (UL) are computed

These limits are called Confidence limits

99.0% area under the curve

X-2.58SD X-2SD X-1SD X X+1SD X+2SD X+2.58SD

Truth in the population

Analogous with a laboratory test

Usually, it is 80% or 90%

Probability of detecting a specified difference in

a)It is low when the alpha error is high

When our study results in a probability of 0.01, we say

It is unlikely that our results occurred by chance and the

P < 0.05 result is statistically significant

P > 0.05 result is not statistically significant.

These cutoffs are arbitrary & have no specific importance.

0 .01 .05 .10

A possible antipyretic is tested in patients with the common cold.

N Mean StDev SE Mean

Statistical Significance? Yes. Probably there is a reduction in temperature

a)Obtaining a p value larger than alpha

CURE NO CURE TOTAL

What is our interest?

Rx A cure rate Vs Rx B cure rate

a)Independent group t-test

Exact test is recommended when

X> 2 & Y: Normal

ANOVA measures ANOVA

Unrelated Related Unrelated Related

X> 2 & Y: Normal X>2 & Y: Non-Normal

Unrelated Related Unrelated Related

a)Predicting blood pressure levels when the family

Normal diet Normal diet + adjunct Diff in means P-Value

However, there are other factors (conf /Effect Modif.)

Have to control for these other factors to get unbiased

The Analysis: Mathematical Model

Quantitative Multiple linear regression