Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Descriptive Statistics

 Statistical procedures that are used for


summarizing a set of data
◦ Picture techniques:
 Frequency distributions
 Stem-and-leaf displays
 Histograms
 Frequency polygons
 Bar graphs
 Pie graphs
◦ Measures of central tendency:
 The mode, median, and mean
◦ Measures of variability:
 The range, interquartile range, and box-and-whisker
plot
 Standard deviation and variance
2
 Frequency Distributions
◦ They show how many subjects in the data are in the
same category or have the same score (in numbers
and percentages).
◦ Three basic kinds:
 simple (or ungrouped) frequency distributions
 grouped frequency distributions
 cumulative frequency distributions
◦ N is used for the total number of scores in analysis
◦ n is used for the number of scores in a category
◦ f or frequency are sometimes used instead of n

3
TABLE 1. Age Distribution within the Sample

Age Frequency Percentage

17 4 6,7

18 22 36,7

19 20 33,3

20 8 13,3

21 6 10,0

4
TABLE 2. Age by Gender Distribution within the Sample (N = 60)

Females Males
(n = 30, 50 %) (n = 30, 50 %)

Age f % f %

17 0 0 4 13,3

18 16 53,3 6 20,0

19 6 20,0 14 46,7

20 5 16,7 3 10,0

21 3 10,0 3 10,0

5
TABLE 3. Grouped Frequency Distribution of Number of English MP3 Songs in
Cellphone (N = 60)

Number of
Songs Frequency Percentage

0 – 99 13 21,7

100 – 299 32 53,3

300 – 499 9 15,0

500 – 699 2 3,3

700 - 899 1 1,7

>900 3 5,0

6
TABLE 4. Grouped Frequency Distribution of Number of English MP3 Songs in Cellphone
by Gender (N = 60)
Females Males
(n = 30, 50 %) (n = 30, 50 %)

Number of f % f %
Songs

0 – 99 7 23,3 6 20

100 – 299 16 53,3 16 53,3

300 – 499 5 16,7 4 13,3

500 – 699 0 0 2 6,7

700 – 899 1 3,3 0 0

>900 1 3,3 2 6,7

7
TABLE 5. Cumulative Frequency Distribution of Region (N = 60)
Number of Participants Percentage of
Region Participants Cumulative Percentage

MARMARA 16 26,7 26,7

AEGEAN 10 16,7 43,3

MEDITERRANEAN 9 15,0 58,3

BLACK SEA 8 13,3 71,7

CENTRAL ANT. 7 11,7 83,3

EASTERN ANT. 6 10,0 93,3

SOUTHEASTERN ANT. 4 6,7 100

8
TABLE 6. Cumulative Frequency Distribution of Region by Gender (N = 60)
Females Males
(n = 30, 50 %) (n = 30, 50 %)
Cumltv. Cumltv.
Region f % % f % %

MARMARA 7 23,3 23,3 9 30 30

AEGEAN 6 20 43,3 4 13,3 43,3

MDTRN. 5 16,7 60 4 13,3 56,7

BLACK SEA 4 13,3 73,3 4 13,3 70

CENT. ANT. 2 6,7 80 5 16,7 86,7

EASTRN. ANT. 2 6,7 86,7 4 13,3 100

STH.EST. ANT. 4 13,3 100 - - -

9
 A stem-and-leaf display is similar to a
grouped frequency distribution but it
contains no loss of information

 In a stem-and-leaf display:
◦ first, the score intervals are set up on the left side
of a vertical line
◦ these intervals (stem) contain all but the last digit
of the scores falling into each interval
◦ then, to the right of the vertical line, the final digit
of each score in the interval is given (leaf)

10
TABLE 7. Stem-and-Leaf Display of Proficiency Scores
0 367
2 799
3 235
4 459
5 12233467799
6 1222223344677799
7 11222336
8 0134688
9 0122
10 0 0
11
 In a histogram:
◦ vertical columns indicate how many times any given
score (or score intervals) appears in the data set
◦ the horizontal axis (x axis) is labeled with scores on
the dependent variable
◦ the vertical axis (y axis) is labeled with frequencies
◦ a tall bar indicates a high frequency of occurrence
◦ a short bar indicates a low frequency of occurrence

12
Figure 1. Distribution of Proficiency Scores in the Sample (N = 60)

13
 A frequency polygon (line graph):
◦ is similar to a histogram
◦ has a horizontal axis labeled with individual scores
or score intervals
◦ has a vertical axis labeled with frequencies
◦ first a single dot is put for the frequency of each
score on the horizontal axis
◦ then the dots are connected with straight lines

14
Figure 2. Frequency Polygon for Age Distribution in the Sample (N = 60)

15
 A bar graph is different from a histogram

 The horizontal axis of a bar graph represents


different categories of a qualitative variable

 Whereas, in a histogram, the horizontal axis


is labeled with numerical values of a
quantitative variable

16
Figure 3. Distribution of Participants According to Region (N = 60)

17
Figure 4. Pie Graph Percentage Distribution of Participants According to
Region (N = 60)

18
 The mode (Mo) is the most frequently
occurring score in a data set.

 The median (Mdn) is the number that lies at


the midpoint of the distribution (when the
scores are written from lowest to highest);
dividing the distribution into two equally
large parts.

 The mean (M, µ, x̄) is calculated by dividing


the sum of scores by the number of scores.

19
 Sample mode, median, mean calculation:
6 2 5 1 2 9 3 6 2

1 2 2 2 3 5 6 6 9

Mode: 2 Median: 3 Mean: 4

20
 A measure of variability
◦ indicates the degree of dispersion among the
scores;
◦ indicates how spread out the scores are.

 If the scores are


◦ very similar, there is little dispersion and little
variability (homogeneous);
◦ very dissimilar, there is a high degree of dispersion
and high variability (heterogeneous).

21
 The range is the difference between the lowest and
highest scores.

 The range usually reported by giving the extreme scores,


but sometimes the difference between the extreme scores
is given.

 In any group of scores,


◦ the numerical value that separates the top 25 percent scores from
the bottom 75 percent scores is the upper quartile (Q3);

◦ the numerical value that separates the bottom 25 percent scores


from the top 75 percent scores is the lower quartile (Q1);

◦ the numerical value that separates the scores into two equal
halves is the median (Q2).

 The interquartile range indicates the spread among the


middle 50 percent of the scores.
22
 Sample range, Q1, Q2, Q3, IQR

16 38 43 19 6 45 47 41 26 8 51 31 12 61 46 67 14 17 44 19 37 32 19

6 8 12 14 16 17 19 19 19 26 31 32 37 38 41 43 44 45 46 47 51 61 67

Q1 Q2 Q3

 Mean: 32.13; Median: 32; Mode: 19


 Range: 67 – 6 or 61
 Q1 (lower quartile): 17
 Q2: 32
 Q3 (upper quartile): 45
 interquartile range: 17–45 or 28
23
 Proficiency Scores in Data Set 1:
◦ Range: 100 – 3
◦ Q1: 52
◦ Q2: 63
◦ Q3: 73
◦ interquartile range: 52-73

24
 With a box-and-whisker plot, the degree of variability within a
data set is summarized with a picture.

 A rectangle (box) is drawn to the right of a vertical line labeled


with scores on the dependent variable.

 The positions of the top and bottom sides of the rectangle are
determined by Q3 and Q1.

 Two vertical lines (whiskers) are drawn on the outside of the


rectangle and they extend to the highest and lowest scores.

 However, neither whisker can be longer than 1.5 times the


height of the rectangle.

 If there are any scores further out than the whiskers, they are
considered to be outliers, and their positions are indicated by
small circles or asterisks.
25
26
27
28
 The standard deviation ( SD, s, σ , ∓, sigma)
◦ is based on all the scores in a group of scores;
◦ is determined by
 figuring how much each score deviates from the mean
 and putting these deviation scores into a computational
formula;

 In a set of scores, the standard deviation shows


on average how much the scores deviate from
the mean.

 The variance is the square of the standard


deviation (s2, σ2).

29
N Minimum Maximum Mean Std. Dev iation Variance
AGE 60 17 21 18,83 1,076 1,158
PROFICIENCY SCORE 60 3 100 61,78 21,744 472,817
LLS USE SCORE 60 1,99 4,22 3,2400 ,43710 ,191
ANXIETY SCORE 60 1,08 5,96 3,3480 1,03192 1,065
NUMBER of MP3s 60 0 3245 306,58 464,637 215887,6
Valid N (listwise) 60

PROFICIENCY SCORE
St andard
Count Mean Dev iation
PROFICIENCY SCORE GENDER FEMALE 30 65 17
St andard MALE 30 58 25
Count Mean Dev iation
REGION MARMARA 16 63 21
AEGEAN 10 62 21
MEDITERRANEAN 9 61 16 PROFICIENCY SCORE
BLACK SEA 8 67 27 St andard
CENTRAL ANT. 7 68 25 Count Mean Dev iation
EASTERN ANT. 6 49 30 S. E.S. HIGH S.E. S. 9 61 28
SOUTHEASTERN ANT. 4 59 9 MID S.E.S. 34 61 22
LOW S.E.S. 17 64 20

30
 Almost all the techniques used for describing
data describe features of the entire data set.

 Sometimes, researchers use standard scores


to focus on a single score within a data set.

 The most frequently used standard score is


called z-score.

 A z-score indicates how many standard


deviations a particular raw score is above or
below the group mean.

31
 With z-scores, the mean is fixed at 0 and the
standard deviation is fixed at 1.

 Therefore, a z-score gives an exact answer to


the question ‘How many standard deviations
is a given score above or below the mean?’

 For example,
◦ a z-score of +3 indicates that that person’s score
was 3 standard deviations above the group mean;
◦ a z-score of -1.4 indicates that that person’s score
was 1.4 standard deviations below the group mean.

32
Some proficiency scores from
the Data Set 1 z-scores
(mean: 61,78; SD: 21,74)
67 + 0,23
92 + 1,38
44 - 0,81
63 + 0,05
3 - 2,70
81 + 0,88
62 + 0,009
100 + 1,75

33
34
35
36
37
38
39

You might also like