Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

20.9.

2022
Prof. Seema Daud

Bivariate & Multivariate Inferential


Analytic Biostatistics Statistics
1. Bivariate Simultaneous comparison & analysis of two or
Analysis more variables in one, two or more samples
2. Hypothesis Existence of Association or Relationship:
Testing between variables
• Direction of association
3. Correlation • Strength of association
Existence of Difference between variables:
Prof. Seema Daud • Finding significance of difference between
variables

Bivariate Analysis Depending on the types of Variables,


• Determines the relationship between one There are four bivariate analyses:
independent (possibly causal) variable 1. Correlation
and one dependent (outcome) variable
2. Chi Square Test
• Can be done to test hypothesis and probe
for relationships 3. t test (sample size ≤ 30)
• The types of variables and the research –Student t test
design set the limits to the choice of test –Paired t test
used for bivariate analysis
4. z test (sample size > 30)

Formulate Hypothesis
Fix the weightage given to chance
Steps of testing alpha (α) level: p value
statistical significance Do mathematical calculations or
To ascertain statistically significant Apply test of statistical significance
relationship (association or difference) (Calculate critical ratio)
between variables Calculate Degree of Freedom
Interpret the result &
Draw inference or conclusion

Department of Community Medicine, LMDC 1


20.9.2022
Prof. Seema Daud

Hypothesis
To ascertain statistically
Hypothesis is prediction about what the
significant association or examination of appropriately collected
difference between variables data will show.
Null Hypothesis (H0):
• Formulate a hypothesis There is no relationship (difference or
• Fix the weightage given to chance association) between two or more
variables
• Do mathematical calculations
Alternate Hypothesis (Ha) :
• Interpret the result There is a relationship (difference or
association) between two or more variables

Alpha (α) Error (Type I Error) Analytic Biostatistics


Null hypothesis is rejected when it is true;
i.e. there is no association or difference
Correlation
between the groups compared (False Positive)
Beta (β) Error (Type II Error)
Null hypothesis is accepted when it is not true;
i.e. there is an association or difference
between the groups compared (False Negative)
Prof. Seema Daud

Correlation Correlation
Association between Continuous or Ordinal variables
Effect of independent on dependent variables 1. Measures the linear association and
Independent Dependent Examples
variable variable its direction between two variables.
Continuous Continuous Age (C) & systolic 2. Two variables are said to be
(C) (C) blood pressure (C)
correlated if change in independent
Continuous Ordinal Age (C) & level of
(C) (O) satisfaction with health (exposure) variable is accompanied
care (O) by change in the dependent
Ordinal Ordinal severity of illness (O) (outcome) variable, either in the
(O) (O) & level of satisfaction
with health care (O) same or reverse direction.

Department of Community Medicine, LMDC 2


20.9.2022
Prof. Seema Daud

Correlation
1.This association can be seen on a
scatter or joint distribution
diagram
2. Where independent variable is
plotted on the x axis
3. and the dependent variable on the
y axis).

Joint Distribution Graph Correlation Coefficient


Line of best fit • The Correlation Coefficient,
denoted by r, is an index of extent
to which two variables are
associated.
• r can take on values between –1
and +1.
• Zero (0) means no association

Correlation (Scatter Plot)

r = +0.9 r=-1

r = +0.3 r = - 0.5

r=0 r = -0.2

Department of Community Medicine, LMDC 3


20.9.2022
Prof. Seema Daud

Example of Positive Correlations Example of Negative Correlations


• Salt intake in mg and hypertension in (Inverse relationship)
mm/Hg • Age in years and memory graded on
• Age in years and Height in meters a scale of 1 to 10
• Blood sugar levels in mg/Dl and grade • Stage of cancer and survival time
of Retinopathy • Blood plasma level in ml &
• Hours of meditation and attention level Hematocrit value
• Hours of study and test scores • Circulating zinc level in blood in
mcg/ml and goitre volume in cm

Chi square test


Association between Categorical variables
(Minus Ordinal Variables)
Chi Square Test First variable Second variable Example: Checking
Association between
Dichotomous Dichotomous Disease or Cure (D) in
(D) (D) treated & untreated
groups (D)
Dichotomous Nominal (N) Disease or Cure (D) &
(D) blood groups (N)
Nominal (N) Nominal (N) Ethnicity (N) & blood
groups (N)

Chi Square (x2) test Formula: Formulate Hypothesis (H0 & Ha)
Fix the weightage given to chance

Σ ( O – E )2

E
alpha (α) level: p value
Do mathematical calculations or
Apply test of statistical significance
(Calculate critical ratio)
O = Observed value
Calculate Degree of Freedom
E = Expected value Interpret the result &
Draw inference or conclusion

Department of Community Medicine, LMDC 4


20.9.2022
Prof. Seema Daud

Data: Research Question:


50 patients were examined in a obesity control Is there an association between exercise and
clinic. obesity?
27 patients exercised regularly while 23 Null Hypothesis (H0):
patients did not do so. There is no association between exercise and
Among those who exercise regularly, 7 were obesity.
obese and the rest were non-obese. Alternate Hypothesis (Ha):
Among those patients who did not exercise There is an association between exercise and
regularly, 15 were obese and the rest were obesity.
non- obese. α level or p value: = ≤ 0.05

Construct a 2 (r) x 2 (c) table (2 rows and 2 columns)


Dependent Variables Total
Dependent Variables Total Cell 1 Cell 2 Row Total
Cell 1 Cell 2 Row Total O1 = O2 = (O1 + O2)
Independent
Independent

E1 = E2 =
Variables

Cells 1+2
Variables

Cell 3 Cell 4 Row Total Cell 3 Cell 4 Row Total


Cells 3+4 O3 = O4 = (O4 + O4)
E3 = E4 =
Total Column Total Column Total Grand Total
Total Column Total Column Total Grand Total
Cells 1+3 Cells 2+4 (Cells 1+2+3+4 (O1 + O3) (O2 + O4) (O1+O2+O3+O4)

Obese Non- Total Obese Non- Total


Obese Obese
Exercise O1 = O2 = Row Total Exercise O1 = 7 O2 = 20 27
regularly E1 = E2 = (O1 + O2) regularly E1 = E2 =
Don’t O3 = O4 = Row Total Don’t O3 = 15 O4 = 8 23
exercise E3 = E4 = (O4 + O4) exercise E3 = E4 =
regularly regularly
Total Column Column Grand Total Total 22 28 50
Total Total (O1+O2+O3+O4)
(O1 + O3) (O2 + O4)

Department of Community Medicine, LMDC 5


20.9.2022
Prof. Seema Daud

E = Row Total x Column Total E = Row Total x Column Total


Grand Total Grand Total
O1 = 7 O2 = 20 27 O1 = 7 O2 = 20 27
Cell E E1 = E2 = Cell E E1 = 12 E2 =
1 1 2 2
O3 = 15 O4 = 8 23 O3 = 15 O4 = 8 23
27 x 22 = 12 E3 = E4 = 27 x 28 = 15 E3 = E4 =
50 50
22 28 50 22 28 50

E = Row Total x Column Total E = Row Total x Column Total


Grand Total Grand Total
O1 = 7 O2 = 20 27 O1 = 7 O2 = 20 27
Cell E E1 = 12 E2 = 15 Cell E E1 = 12 E2 = 15
3 3 4 4
O3 = 15 O4 = 8 23 O3 = 15 O4 = 8 23
23 x 22 = 10 E3 = E4 = 23 x 28 = 13 E3 = 10 E4 =
50 50
22 28 50 22 28 50

Obese Non- Total Calculate x2


Obese Σ [( O1 – E1 )2 + ( O2 – E2 )2 + ( O3 – E3 )2 + ( O4 – E4 )2 ]
E1 E2 E3 E4
Exercise O1 = 7 O2 = 20 27
regularly E1 = 12 E2 = 15 Σ [ (7 – 12)2 + (20 – 15)2 + (15 – 10)2 + (8 – 13)2 ]
12 15 10 13
Don’t O3 = 15 O4 = 8 23 Σ [ (– 5)2 + (5) + (5) + (– 5) ]
2 2 2

exercise E3 = 10 E4 = 13 12 15 10 13
regularly Σ [ 25 + 25 + 25 + 25 ]
12 15 10 13
Total 22 28 50
= 2.08 + 1.67 + 2.5 + 1.92
x2 = 8.17

Department of Community Medicine, LMDC 6


20.9.2022
Prof. Seema Daud

Interpretation of result (1st way without df) Interpretation of result (2nd way with df)
Roughly, if x2 > 3, there is statistically
significant association (Null hypothesis is • Calculate degree of freedom (df)
rejected & Alternate hypothesis is accepted),
df = (r –1) x (c –1)
If < 3 then there is no statistically significant
association (Null hypothesis is accepted & r = row; c = column
Alternate hypothesis is rejected).
As x2 value of 8.17 is greater than 3, then Null = (2 – 1) x (2 – 1) = 1 x 1= 1
hypothesis is rejected.
• Check p value from the table with x2 value
In the given data set, there is a statistically
significant association of exercise with obesity. of 8.17 and df = 1.

Chi-Square Distribution Table


Inference
Degrees Probability (p)
of
Freedom
• For the x2 value of 8.17, at 1 df, the table
(df) 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 value of p = < 0.01.
1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 • The p value of < 0.01 is much smaller
2
3
0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60
0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25
5.99 9.21 13.82
7.82 11.34 16.27
than 0.05.
4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 Interpretation:
5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52
6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46 In the given data set, Null hypothesis is
7
8
2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32
2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12
rejected as there is a statistically
9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 significant association of exercise with
10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 obesity.
Non-significant Significant

t test and t distribution Types of t-test


One Sample t test
• In 1908 William Sealy Gosset, an Compares the mean of a single
Englishman publishing under the Group against a known or
standard mean.
pseudonym Student, developed the t- test Independent sample
and t distribution. (unpaired) t test
• At least one variable is continuous (usually Compares the means for two
dependent one). independent (different) groups.
• t test assumes the data is normally Paired t test:
distributed. Compares means from the
same group at different times.

Department of Community Medicine, LMDC 7


20.9.2022
Prof. Seema Daud

t test Formulate Hypothesis (H0 & Ha)


Comparison of two means Fix the weightage given to chance
First variable Second Examples Test of alpha (α) level: p value
variable Significance
Continuous Dichotomous Comparison of systolic
blood pressure (C) in a
One sample Do mathematical calculations or
(C) unpaired (C) t test
group and a standard Apply test of statistical significance
systolic BP value (C)
(Calculate critical ratio)
Continuous Dichotomous Difference in Independent
(C) unpaired systolic blood pressure (C) sample (Student) Calculate Degree of Freedom
(DU) in males & females (DU) t test (n= ≤30)
Continuous Dichotomous Difference in systolic Paired t test
Interpret the result &
(C) paired (DP) blood pressure (C) before Draw inference or conclusion
& after treatment (DP)

A researcher wants to determine whether there


Student t test calculation steps was a difference between systolic blood pressure
• The t-test is used to calculate the significance of males and females.
of observed differences between the means of He measured the BP in 26 subjects (14 male &
two samples. 12 female) and his data was as follows:
• Step 1: calculate the means of two the samples
• Step 2: calculate the variance of the two Number Means Variance
samples (n) (Average systolic (s2)
• Step 3: calculate the t value BP)
• Step 4: calculate the degree of freedom Male 14 118.3 mm Hg 70.1 mm Hg
• Step 5: compare test t value with critical t
value in the table Female 12 107.0 mm Hg 82.5 mm Hg
• Step 6: Interpret the result of the t test

Difference between Means


Null Hypothesis: Variance 1 + Variance 2
Sample size 1 Sample size 2
There is no difference in systolic
BP between males & females x 1 = Mean of Sample 1
Alternative Hypothesis: x 2 = Mean of Sample 1
n1 = number of subjects in Sample 1
There is difference in systolic BP n2 = number of subjects in Sample 2
between males & females s12 = variance of Sample 1
α level or p value = ≤ 0.05 s22 = variance of Sample 2

Department of Community Medicine, LMDC 8


20.9.2022
Prof. Seema Daud

Calculations:
Degree of Freedom
118.3 – 107.0
√70.1/ 14 + 82.5/ 12 df = n1 + n2 – 2
= 11.30
= (14 +12) – 2
√ 5.01 + 6.87
= 26 – 2
= 11.30
√ 11.88 = 24
= 11.30/ 3.45
t = 3.28

Inference
• For the t value of 3.28, with 24
degrees of freedom, the table value
of p = <0.002.
• This means that in the given data
set, males subjects have statistically
significantly different (higher)
systolic BP than the females.
T value = 3.28; df =24

Department of Community Medicine, LMDC 9

You might also like