Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Statistical methods in researches

Gergő József Szőllősi


Assistant lecturer
UD – Faculty of Public Health
szollosi.gergo@sph.unideb.hu
Which statistical test should I choose?
Terminology

Population parameter
Population

Observations in
the sample

sample
Target population x1 , x 2 , xn Statistics
x
What is the most likely value of the
Study population
parameter of interest?
How precisely does the statistics estimate the
parameter?
• We estimate the population parameters with
the sample statistics.
What is the minimum a user should know about
each biostatistical method?

• What type of data can be analysed by that


method?
• What are the assumptions of the use of the
method?
• How one can check whether these assumptions
are fulfilled?
• How the analysis can be done by a statistical
package?
• How to interpret the result of the analysis?
Steps
• Data collection
• Data entry Data management
• Data checking
• Describing the data, depicting, data reduction
• Data analysis (statistical conclusion)
• Checking of the results
• Interpretation
How to start…?
Let us assume
• Null hypothesis (its’ probability is given by the statistical tests..) that there is no
association
• Research hypothesis
Decision treshold (usually 5%)

– H0 rejection, so accepting the opposite statement (research hypothesis)

– significant results/differencies

OR

– There is not enough evidence against H0 , so we accept/keep it


– NO significant results/differencies
What does the p-value show?
• The smaller the p value, the more information
we have in contradiction with the null
hypothesis.
• The p-value could be interpreted WITH the
point estimation
• The measure of the evidence against the null-
hypothesis. The smaller this value is the more
our data is against the null-hypothesis
Is there a difference between…?

Reject
Accept

Null
hypothesis
P-value
>0.05
Question Statistics Results
P-value
<0.05

Research
hypothesis
Reject
Accept

Interpretation
Summarize
the null-hypothesis seems very
unlikely in the light of the data

Research
P-value Null hypothesis (H0 ) Statistical interpretation
hypothesis (H1)

Significant
<0.05 Reject Accept
association/difference

NO significant
≥0.05 Accept Reject association/difference

BUT! we can never say that the null-


hypothesis is certainly true!!
(we don’t have enough evidence against to reject it)
Presentation of data

Normal distribution
• Mean and standard deviation

Non-normal distribution
• Median and interquartile range
Statistical tests
• 95% confidence intervals (mean and proportion)
• T-test Null hypothesis
• ANOVA (no difference, equal)

• Linear regression Research hypothesis

• Chi-square test
• Cross-tabs

Research questions usually have the following format:


• How strong is the association between A and B?
• Is A related to B?
• Is A different from B?
Test characteristics
Definitions

Sensitivity: the ability of a test to correctly identify patients with a disease.


Specificity: the ability of a test to correctly identify people without the disease.
True positive: the person has the disease and the test is positive.
True negative: the person does not have the disease and the test is negative.
False positive: the person does not have the disease and the test is positive.
False negative: the person has the disease and the test is negative.
Prevalence: the percentage of people in a population who have the condition of
interest.

https://ebn.bmj.com/content/23/1/2
Which statistical test should I choose?
• Point estimation (average) and its’ 95%
confidence intervals
• 95% CI-s are more informative compared to p-
value, because it shows the „size” of
differences

• Average ± 1,96 SE (95%) | proportion ± 1,96 SE (95%)


• Average ± 1,64 SE (90%) | proportion ± 1,64 SE (90%)
• Average ± 2,57 SE (99%) | proportion ± 2,57 SE (99%)
• It is the interval estimation that gives the
answer to the question how precisely the
sample statistics estimates the population
parameter
Normal distribution
• No matter how the original data is distributed,
if the sample size is large enough, the sample
distribution of the sample mean is normal
distribution. This gives us the possibility to use
mathematical probability models to infer the
precision of the sample statistics
Which statistical test should I choose?
• T-test
• We can test hypothesis related to averages
(which are estimated from the sample)
• Types of t-test:
– One-sample
– Paired (pre-post)
– Two-sample (before it we have to use f-test!)

How large is the difference between the means of two


populations?
Did the mean of a continuous variable changed in a
population?
F-test
• Homoscedastic
– Two samples’ variance are equal
• Heteroscedastic
– Two samples’ variance are not equal

• Even the two groups’ average are equal, there could be significant
differencies between its’ variances
•  F-test is used to discover, if its there a significant differencie
between the two datasets variances.
• So, the f-test hypothesises are the following:
– H1: the two standard deviation are not equal (different).
(stdev1≠stdev2)
– H0: the two standard deviation are equal (no difference).
(stdev1=stdev2)
Which statistical test should I choose?
• ANOVA
• Comparison of three(+) groups’ averages at least.
• P<0.05  significant differencies among the
averages
• P>0.05  NO significant differencies among the
averages (point estimation could be interpreted,
but not statisticaly proven!)

Is the mean of a continuous variable different in


different populations/groups?
Which statistical test should I choose?
• Linear regression (univariate or multivariate)
• Continous variables (!)  What is the association between the two variables?
How can it effect each other?
• Regression coefficient („way” and „size”)
• Deterministic coefficient (correlations’ strength)
• Equotation (y=bx+a)
• +; -; 0  correlations

• Coefficients
0  no correlation
- negative correlation
+  positive correlation

Is there a linear relationship between the two variables, and how does the expected
value of the outcome variable depends on the value of the explanatory variable?
Which statistical test should I choose?
• Chi-square test
• Is there any differencies between two (or
more) groups categorical variables’
frequencies?(RxC)
• Categorical variables!
• Contingency table
One-sample t- Paired t-test Two-sample t-
test Pre-post test

P<0,05 P>0,05
Comparison of Heterosc Homosc
averages
F-test

Comparison of
1 group averages
Average ± 95% CI
Normal distribution
PARAMETRIC 2 Comparison of
STATISTICAL TESTS groups averages ANOVA

2+
groups Positive, negative or
neutral association?
How to predict one Linear
from the another? regression
How strong the
Categorical variables Continous variables correlation is?
(frequencies) effect on each other

Chi-square test Pearson


correlation
Proportion +
95% CI

You might also like