Professional Documents
Culture Documents
Data Collection and Analysis: Loading
Data Collection and Analysis: Loading
LOADING…
C. DATA ANALYSIS
Descriptive Research
It generally uses different types of descriptive
statistics: frequencies, central tendencies or
averages, variability.
• Describing what is and what the data shows.
Descriptive
Statistics • Used to present quantitative descriptions in manageable
form.
• Helps to simplify large number of people on any measure.
• Each descriptive stat. reduces lots of data into simpler
summary.
• Typically distinguished from inferential statistics.
The Distribution
It is a summary of frequency of individual values or
ranges of values for variable. The simplest
distribution would list every value of a variable and
the number of person who had each value.
Frequencies
36-45 21%
46-55 45%
56-65 19%
66+ 6%
Figure 1. Frequency distribution bar chart
50%
40%
35%
Frequency 30%
25%
20%
Histogram or bar chart 15%
10%
5%
0%
Under 35 36-45 46-55 56-65 66+
Series 1
Central tendencies
Includes the mean, mode, and the median. These measure
provide information about the average and typical behavior of
the language learners as regards the linguistic elements being
investigated.
The mean refers to the measure The mode refers to the scores which
obtained by adding all scores of the occur frequently in the large group of
respondents and dividing the sum of respondents.
subjects.
The median is the score which divides the
population into two in which half of the scores are
above and half are below it. (Seliger and Shohamy;
Catane, 2000)
For example, the mean or average quiz score is determined by summing all the scores
and dividing by the number of students taking the exam. For example, consider the test
score values:
15 20 21 20 36 15 25 15
15 15 15 20 20 21 25 36
15 15 15 20 20 21 25 36
Notice that for the same set of 8 scores we got three different values
(20.875, 20, and 15) for the mean, median and mode respectively. If the
distribution is truly normal (i.e., bell-shaped), the mean, median and mode are
all equal to each other.
Dispersion
Refers to the spread of the values around the central tendency. There are
two common measures of dispersion, the range and standard deviation.
The range is simply the highest value minus the lowest value.
In our example, the high value is 36 and the low is 15, so the range is 36 -15 =21
15 15 15 20 20 21 25 36
to compute the standard deviation, we first find the distance between each value and the mean.
We know from above that the mean is 20.875.
Notice that values that are
below the mean have negative
discrepancies and values above
it have positive ones. Next, we
square each discrepancy:
Difficult
Round
Here, the result is 350.875 After that divide it by 7 = ?
This value is known as the variance. To get the standard deviation, we take the square root of
the variance (remember that we squared the deviations earlier). This would be SQRT(50.125)
=?
Although this computation may seem convoluted, it’s actually quite simple.
To see this, consider the formula for the standard deviation:
For instance, since the mean in our example is 20.875 and the standard deviation is 7.0799, we can
from the above statement estimate that approximately 95% of the scores will fall in the range
of 20.875-(2*7.0799) to 20.875+(2*7.0799) or between 6.7152 and 35.0348. This kind of information
is a critical stepping stone to enabling us to compare the performance of an individual on one
variable with their performance on another, even when the variables are measured on entirely
different scales.
Different types of experimental designs call for different
methods of analysis. When comparing two groups, such as
experimental and control, t-test is used; whereas, if more than
two groups are compared, one way analysis of variance is an
appropriate statistical measurement. Factorial analysis of
variance is used for more complex experimental designs
What Is a T-Test?
A t-test is a statistical test that is used to compare the means of two
groups. It is often used in hypothesis testing to determine whether a
process or treatment actually has an effect on the population of interest, or
whether two groups are different from one another.
When to use a t-test
A t-test can only be used when comparing the means of two groups (a.k.a. pairwise
comparison). If you want to compare more than two groups, or if you want to do multiple
pairwise comparisons, use an ANOVA test or a post-hoc test.
The t-test is a parametric test of difference, meaning that it makes the same assumptions
about your data as other parametric tests. The t-test assumes your data:
1. are independent
2. are (approximately) normally distributed.
3. have a similar amount of variance within each group being compared (a.k.a.
homogeneity of variance)
What type of t-test should I use?
One-tailed or two-tailed t-test?
•If you only care whether the two populations are
When choosing a t-test, you will need to consider two things: different from one another, perform a two-tailed t-test.
whether the groups being compared come from a single •If you want to know whether one population mean is
population or two different populations, and whether you greater than or less than the other, perform a one-tailed
want to test the difference in a specific direction. t-test.
One-sample, two-sample, or paired t-test?
•If the groups come from a single population (e.g. measuring
before and after an experimental treatment), perform a paired t-
test.
•If the groups come from two different populations (e.g. two
different species, or people from two separate cities), perform
a two-sample t-test (a.k.a. independent t-test).
•If there is one group being compared against a standard value (e.g.
comparing the acidity of a liquid to a neutral pH of 7), perform
a one-sample t-test.
Performing a t-test
The p-value indicates that these variables are not independent of each other
and that there is a statistically significant relationship between the categorical
variables.
What are special concerns with regard to the Chi-Square statistic?
There are a number of important It is also sensitive to the distribution within the
considerations when using the Chi- cells, and SPSS gives a warning message if cells
Square statistic to evaluate a cross have fewer than 5 cases. This can be addressed
tabulation. Because of how the Chi- by always using categorical variables with a
limited number of categories (e.g., by
Square value is calculated, it is
combining categories if necessary to produce a
extremely sensitive to sample size – smaller table).
when the sample size is too large
(~500), almost any small difference will
appear statistically significant.