Descriptive Statistics

 Statistical procedures that are used for

summarizing a set of data
◦ Picture techniques:
 Frequency distributions
 Stem-and-leaf displays
 Histograms
 Frequency polygons
 Bar graphs
 Pie graphs
◦ Measures of central tendency:
 The mode, median, and mean
◦ Measures of variability:
 The range, interquartile range, and box-and-whisker
 Standard deviation and variance
 Frequency Distributions
◦ They show how many subjects in the data are in the
same category or have the same score (in numbers
and percentages).
◦ Three basic kinds:
 simple (or ungrouped) frequency distributions
 grouped frequency distributions
 cumulative frequency distributions
◦ N is used for the total number of scores in analysis
◦ n is used for the number of scores in a category
◦ f or frequency are sometimes used instead of n

TABLE 1. Age Distribution within the Sample

Age Frequency Percentage

17 4 6,7

18 22 36,7

19 20 33,3

20 8 13,3

21 6 10,0

TABLE 2. Age by Gender Distribution within the Sample (N = 60)

Females Males
(n = 30, 50 %) (n = 30, 50 %)

Age f % f %

17 0 0 4 13,3

18 16 53,3 6 20,0

19 6 20,0 14 46,7

20 5 16,7 3 10,0

21 3 10,0 3 10,0

TABLE 3. Grouped Frequency Distribution of Number of English MP3 Songs in
Cellphone (N = 60)

Number of
Songs Frequency Percentage

0 – 99 13 21,7

100 – 299 32 53,3

300 – 499 9 15,0

500 – 699 2 3,3

700 - 899 1 1,7

>900 3 5,0

TABLE 4. Grouped Frequency Distribution of Number of English MP3 Songs in Cellphone
by Gender (N = 60)
Females Males
(n = 30, 50 %) (n = 30, 50 %)

Number of f % f %

0 – 99 7 23,3 6 20

100 – 299 16 53,3 16 53,3

300 – 499 5 16,7 4 13,3

500 – 699 0 0 2 6,7

700 – 899 1 3,3 0 0

>900 1 3,3 2 6,7

TABLE 5. Cumulative Frequency Distribution of Region (N = 60)
Number of Participants Percentage of
Region Participants Cumulative Percentage

MARMARA 16 26,7 26,7

AEGEAN 10 16,7 43,3


BLACK SEA 8 13,3 71,7

CENTRAL ANT. 7 11,7 83,3

EASTERN ANT. 6 10,0 93,3


TABLE 6. Cumulative Frequency Distribution of Region by Gender (N = 60)
Females Males
(n = 30, 50 %) (n = 30, 50 %)
Cumltv. Cumltv.
Region f % % f % %

MARMARA 7 23,3 23,3 9 30 30

AEGEAN 6 20 43,3 4 13,3 43,3

MDTRN. 5 16,7 60 4 13,3 56,7

BLACK SEA 4 13,3 73,3 4 13,3 70

CENT. ANT. 2 6,7 80 5 16,7 86,7

EASTRN. ANT. 2 6,7 86,7 4 13,3 100

STH.EST. ANT. 4 13,3 100 - - -

 A stem-and-leaf display is similar to a
grouped frequency distribution but it
contains no loss of information

 In a stem-and-leaf display:
◦ first, the score intervals are set up on the left side
of a vertical line
◦ these intervals (stem) contain all but the last digit
of the scores falling into each interval
◦ then, to the right of the vertical line, the final digit
of each score in the interval is given (leaf)

TABLE 7. Stem-and-Leaf Display of Proficiency Scores
0 367
2 799
3 235
4 459
5 12233467799
6 1222223344677799
7 11222336
8 0134688
9 0122
10 0 0
 In a histogram:
◦ vertical columns indicate how many times any given
score (or score intervals) appears in the data set
◦ the horizontal axis (x axis) is labeled with scores on
the dependent variable
◦ the vertical axis (y axis) is labeled with frequencies
◦ a tall bar indicates a high frequency of occurrence
◦ a short bar indicates a low frequency of occurrence

Figure 1. Distribution of Proficiency Scores in the Sample (N = 60)

 A frequency polygon (line graph):
◦ is similar to a histogram
◦ has a horizontal axis labeled with individual scores
or score intervals
◦ has a vertical axis labeled with frequencies
◦ first a single dot is put for the frequency of each
score on the horizontal axis
◦ then the dots are connected with straight lines

Figure 2. Frequency Polygon for Age Distribution in the Sample (N = 60)

 A bar graph is different from a histogram

 The horizontal axis of a bar graph represents

different categories of a qualitative variable

 Whereas, in a histogram, the horizontal axis

is labeled with numerical values of a
quantitative variable

Figure 3. Distribution of Participants According to Region (N = 60)

Figure 4. Pie Graph Percentage Distribution of Participants According to
Region (N = 60)

 The mode (Mo) is the most frequently
occurring score in a data set.

 The median (Mdn) is the number that lies at

the midpoint of the distribution (when the
scores are written from lowest to highest);
dividing the distribution into two equally
large parts.

 The mean (M, µ, x̄) is calculated by dividing

the sum of scores by the number of scores.

 Sample mode, median, mean calculation:
6 2 5 1 2 9 3 6 2

1 2 2 2 3 5 6 6 9

Mode: 2 Median: 3 Mean: 4

 A measure of variability
◦ indicates the degree of dispersion among the
◦ indicates how spread out the scores are.

 If the scores are

◦ very similar, there is little dispersion and little
variability (homogeneous);
◦ very dissimilar, there is a high degree of dispersion
and high variability (heterogeneous).

 The range is the difference between the lowest and
highest scores.

 The range usually reported by giving the extreme scores,

but sometimes the difference between the extreme scores
is given.

 In any group of scores,

◦ the numerical value that separates the top 25 percent scores from
the bottom 75 percent scores is the upper quartile (Q3);

◦ the numerical value that separates the bottom 25 percent scores

from the top 75 percent scores is the lower quartile (Q1);

◦ the numerical value that separates the scores into two equal
halves is the median (Q2).

 The interquartile range indicates the spread among the

middle 50 percent of the scores.
 Sample range, Q1, Q2, Q3, IQR

16 38 43 19 6 45 47 41 26 8 51 31 12 61 46 67 14 17 44 19 37 32 19

6 8 12 14 16 17 19 19 19 26 31 32 37 38 41 43 44 45 46 47 51 61 67

Q1 Q2 Q3

 Mean: 32.13; Median: 32; Mode: 19

 Range: 67 – 6 or 61
 Q1 (lower quartile): 17
 Q2: 32
 Q3 (upper quartile): 45
 interquartile range: 17–45 or 28
 Proficiency Scores in Data Set 1:
◦ Range: 100 – 3
◦ Q1: 52
◦ Q2: 63
◦ Q3: 73
◦ interquartile range: 52-73

 With a box-and-whisker plot, the degree of variability within a
data set is summarized with a picture.

 A rectangle (box) is drawn to the right of a vertical line labeled

with scores on the dependent variable.

 The positions of the top and bottom sides of the rectangle are
determined by Q3 and Q1.

 Two vertical lines (whiskers) are drawn on the outside of the

rectangle and they extend to the highest and lowest scores.

 However, neither whisker can be longer than 1.5 times the

height of the rectangle.

 If there are any scores further out than the whiskers, they are
considered to be outliers, and their positions are indicated by
small circles or asterisks.
 The standard deviation ( SD, s, σ , ∓, sigma)
◦ is based on all the scores in a group of scores;
◦ is determined by
 figuring how much each score deviates from the mean
 and putting these deviation scores into a computational

 In a set of scores, the standard deviation shows

on average how much the scores deviate from
the mean.

 The variance is the square of the standard

deviation (s2, σ2).

N Minimum Maximum Mean Std. Dev iation Variance
AGE 60 17 21 18,83 1,076 1,158
PROFICIENCY SCORE 60 3 100 61,78 21,744 472,817
LLS USE SCORE 60 1,99 4,22 3,2400 ,43710 ,191
ANXIETY SCORE 60 1,08 5,96 3,3480 1,03192 1,065
NUMBER of MP3s 60 0 3245 306,58 464,637 215887,6
Valid N (listwise) 60

St andard
Count Mean Dev iation
St andard MALE 30 58 25
Count Mean Dev iation
AEGEAN 10 62 21
BLACK SEA 8 67 27 St andard
CENTRAL ANT. 7 68 25 Count Mean Dev iation
EASTERN ANT. 6 49 30 S. E.S. HIGH S.E. S. 9 61 28
SOUTHEASTERN ANT. 4 59 9 MID S.E.S. 34 61 22
LOW S.E.S. 17 64 20

 Almost all the techniques used for describing
data describe features of the entire data set.

 Sometimes, researchers use standard scores

to focus on a single score within a data set.

 The most frequently used standard score is

called z-score.

 A z-score indicates how many standard

deviations a particular raw score is above or
below the group mean.

 With z-scores, the mean is fixed at 0 and the
standard deviation is fixed at 1.

 Therefore, a z-score gives an exact answer to

the question ‘How many standard deviations
is a given score above or below the mean?’

 For example,
◦ a z-score of +3 indicates that that person’s score
was 3 standard deviations above the group mean;
◦ a z-score of -1.4 indicates that that person’s score
was 1.4 standard deviations below the group mean.

Some proficiency scores from
the Data Set 1 z-scores
(mean: 61,78; SD: 21,74)
67 + 0,23
92 + 1,38
44 - 0,81
63 + 0,05
3 - 2,70
81 + 0,88
62 + 0,009
100 + 1,75


