Professional Documents
Culture Documents
Use of Software For Statistical Analysis
Use of Software For Statistical Analysis
Outline
Use of computers in research
Hypothesis testing
Type I error Type II error
What is a computer?
A computer is a programmable machine designed
to automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem.
when compared with handwriting, organization, editing and corrections of documents. STORAGE ability to store large amount of documents with ease of retrieval. ACCURACY consistent nearly error free performance. AUTOMATION- perform tasks with keyboard flicks. DILIGENCE carry out tasks perfectly as instructed. Obey commands as programmed but remember GIGO
What is research?
Systematic collection, analysis and interpretation
Types of research
Basic Research
Generates new knowledge and technologies to deal with major unresolved problems.
Applied Research
Identifies priority problems, designs and evaluates policies and programs that will deliver the greatest benefit using optimally the available resources
Randomized Controlled Trial Qualitative Research Methods Quantitative Descriptive Observational Analytical Experimental Nonrandomized Control Trial
Conceptual
Collecting & preparing data for analysis Data storage data entry, editing, data management.
other auxiliary memories like CD, DVD, external hard drives, memory card and flash drive. Dissemination of research findings.
10
Caveats
Computers makes research work faster, easier,
11
more accurate and more reliable while minimizing errors. However remember garbage in and garbage out. Computers only carry out instructions and cannot think. We should be aware of the abilities and limitations of the computer software for optimum results. Plan well.
25/06/2012
Hypothesis testing
12
Test Statistic
13
the research question into null and alterative hypotheses. The null hypothesis (H0)is a claim of no difference. The opposing hypothesis is the alternative hypothesis (H1). The alternative hypothesis is a claim of a difference in the population
This is the hypothesis the researcher often hopes to
bolster.
It is important to keep in mind that the null and
14
depending on:
the nature of the data, and the null and alternative hypotheses.
15
sample statistic to an expected population parameter. Large test statistics indicate data are far from expected, providing evidence against the null hypothesis and in favor of the alternative hypothesis.
probability called a P-value. The P- value answers the question If the null hypothesis were true, what is the probability of observing the current data or data that is more extreme? Small p values provide evidence against the null hypothesis because they say the observed data are unlikely when the null hypothesis is true.
16
not significant When p value .10 the observed difference is marginally significant When p value .05 the observed difference is significant When p value .01 the observed difference is highly significant
Use of significant in this context means the
observed difference is not likely due to chance. It does not mean of important or meaningful.
17
Step D: Decision
Alpha () is a probability threshold for a decision.
If P , we will reject the null hypothesis. Otherwise it will be retained for want of evidence.
18
acceptance.
(WRONG! Failure to reject the null hypothesis
hypothesis is incorrect.
(WRONG! The p value is the probability of the
taken on unwise mechanical use. There is no sharp distinction between significant and insignificant
20
Vocabulary
Null hypothesis (H0)
A statement that declares the observed difference is
Vocabulary
P-value
A probability statement that answers the question If
the null hypothesis were true, what is the probability of observing the current data or data that is more extreme than the current data?. It is the probability of the data conditional on the truth of H0. It is NOT the probability that the null hypothesis is true.
Type I error
A rejection of a true null hypothesis; a false alarm,
Vocabulary
Confidence (1 - )
The complement of alpha.
Beta ()
The probability of a type II error; probability of a
23
Defendant guilty
24
Null Hypothesis True Reject Null Hypothesis Fail to reject Null Hypothesis
25
26
inferences The choice of an appropriate statistical test is based on a sound research proposal Test statistics are fundamental to statistical inferences Good answers come from good questions not from esoteric analysis Schoolman et al, (1968)
27
Test Statistic
28
depending on:
the nature of the data, and the null and alternative hypotheses.
29
sample statistic to an expected population parameter. Large test statistics indicate data are far from expected, providing evidence against the null hypothesis and in favor of the alternative hypothesis.
Research Design
Quality of Data Number of groups of observations Distribution of Data
30
Ordinal Data
Dichotomous & Nominal Data Combination of variables
31
32
33
34
Continuous/Nominal
One-way analysis of variance
Continuous/Continuous
Pearson correlation coefficient Linear regression
Ordinal/Dichotomous
Unpaired Mann-Whitney U test; Chi-square test
35
Ordinal/Ordinal
Spearman Correlation Coefficient (rho) Kendall Correlation Coefficient (tau)
Ordinal/Continuous
Spearman Correlation Coefficient (rho) Kendall Correlation Coefficient (tau)
Chi-square test
36
Dichotomous/Nominal
Chi-square test
Nominal/Nominal
Chi-square test
37
Continuous/All continuous
Multiple linear regression
Time-to-event data
Kaplan Meier survival curve
39
Non-parametric tests
Many statistical tests are based upon the
assumption that the data are sampled from a normal distribution. These tests are referred to as parametric tests. Commonly used parametric tests are
Mean, SD Student t test, unpaired t test, paired t test ANOVA, Pearson correlation, Linear regression
data from a non-Gaussian population? The central limit theorem ensures that parametric tests work well with large samples even if the population is non-Gaussian. Unless the population distribution is really weird, you are probably safe choosing a parametric test when there are at least two dozen data points in each group.
41
with data from a Gaussian population? Nonparametric tests work well with large samples from Gaussian populations. The P values tend to be a bit too large, but the discrepancy is small. In other words, nonparametric tests are only slightly less powerful than parametric tests with large samples.
42
data from non-Gaussian populations? You can't rely on the central limit theorem, so the P value may be inaccurate.
Small samples.
When you use a nonparametric test with data from
a Gaussian population, the P values tend to be too high. The nonparametric tests lack statistical power with small samples.
43
44
one- or two-sided P value. The P value is calculated for the null hypothesis that the two population parameters are equal, and any discrepancy between the two sample statistics is due to chance. If this null hypothesis is true, the one-sided P value is the probability that the two sample statistics would differ as much as was observed (or further) in the direction specified by the hypothesis just by chance, even though the means of the overall populations are actually equal.
probability that the sample statistics would differ that much in the opposite direction. The two-sided P value is twice the one-sided P value. A one-sided P value is appropriate when you can state with certainty (and before collecting any data) that
there either will be no difference between the
means or that the difference will go in a direction you can specify in advance (i.e., you have specified which group will have the larger mean).
45
46
difference before collecting data, then a two-sided P value is more appropriate. If in doubt, select a two-sided P value. If you select a one-sided test, you should do so before collecting any data and you need to state the direction of your experimental hypothesis. If the data go the other way, you must be willing to attribute that difference (or association or correlation) to chance, no matter how striking the data. If you would be intrigued, even a little, by data
essential than its solution which may be merely a matter of mathematical or experimental skill. - Albert Einstein
47
48
series of programs for use in epidemiology and statistics based on JavaScript and HTML
PSPP
A free software replacement for SPSS
49
Proprietary
SAS
SPSS
Stata
50
Workflow
Raw Data
Statistical methods
51
Organize raw data code, rename & label variables eliminate outliers
52
after we get back to work after a short break for our friends (coauthors), so that they can understand what we are doing for our enemies we should always (even years after) be able to prove our results exactly
53
Practical Demos
54