Lec 11 Chapter IV Descriptiv and Inferential Stat.

Chapter IV:

Descriptive and inferential statistics

4.1 Descriptive statistics
Purpose: To communicate the essential characteristics of a
population through a data set obtained from a sample.
Population sample Data Tables
Numerical indexes
(Averages, Percentages, percentile ranks, variability
measures, correlation coefficients, regression coefficients,
etc) 1
4.1.1 Frequency distributions and Graphs

Frequency Distribution:

1. Ungrouped frequency distribution

2. Grouped frequency distribution

Ungrouped frequency distribution is an arrangement of data in

which the frequency of each data value of any variable is

Grouped frequency distribution is an arrangement of data in

which data values of any variable are clustered or grouped into
intervals and the frequencies of each interval is shown.
• Steps in constructing freq. distribution tables:
– List each data value in ascending order (column 1)

– Count the number of times each value occurs or frequency

(column 2); Collapse data values into intervals and find freq. in
each interval to construct grouped freq. distn.; Intervals do not
– Cumulative frequency of each value/interval (column 3,
– Percentage of each value/interval (column 4, optional).

– Cumulative percentage of each value (column 5, optional). 3

Categorical Freq. Table

Cumulative Cumulative
Blood Type Frequency Percent
Frequency percent

A 5 5 20 20

B 4 9 16 36

AB 7 16 28 64

O 9 25 36 100

Grouped Frequency Table
Cumulative Cumulative
Temperature Frequency Percent
Frequency percent

100-104 2 4 4 4

105-109 8 10 16 20

110-114 18 28 36 56

115-119 13 41 26 82

120-124 7 48 14 96

125-129 1 49 2 98

130-134 1 50 2 100

Graphic presentations of data
• Graphs include:
– Bar graphs

– Histograms

– Line graphs

– Scatter plots

– Pie charts

– Pictorial diagrams, etc

Define each and obtain examples of each.

Bar charts

Line charts


Pie chart

Scatter diagram

4.1.2. Measures of central tendency, Variability,
Relative Positions, and Relationships

• Measures of central tendency include:

– Mean: arithmetic average value

– Median: 50th percentile value

– Mode: the most frequent value

Compare the mean, median and mode.

• In normal distribution (symmetrical, unimodal, or bell-shaped)
the three averages are equal.
• In skewed distribution (asymmetrical), the three averages are
different. 13
Skewedness and Kurtosis
• Normal distribution
mean = median = mode
• Negatively skewed distributions
– Skewed to the left

M Md Mo

Mean < Median < Mode

• Positively skewed distributions
– Skewed to the right

Mo Md M
Mode < Median < Mean
• Outliers at only one end cause skewedness.
• The tail indicates the direction of the skewness
• Kurtosis is the hump at the modal point.

• Both skewness and normality are matter of degree. They are

approximated by histograms of frequency distributions of samples
drawn from skewed and normal populations respectively.
• According to Karl Pearson, for moderately skewed distribution the
following empirical relation holds true between the mean, mode
and the median values.
mode-median =3(median-mean)

Measures of variability
Measures of variability include:
– Range : difference between the largest and
the smallest value in the data set.
– Variance: average of squared deviations of
scores from their mean.
• Variance = sum of (X-µ)2/N (or of (X-M)2/n-
1) for a population and a sample
– Standard deviation: Square root of variance.
– Coefficient of variation: SD/mean (100%).
Normal distribution
Circle Normal curve
Equation: x2+y2=r2 y = e-(x-µ)2/(2 σ2)
σ √(2π)

r= the radius e=2.718; π =3.14;

µ = population mean
σ = population SD
Help study wheel, circular Help study variables
motions that are not that are not
perfectly circular perfectly normal.
Think of circles of Think of normal
curves of different radii. of different
µ and σ
Shapes of normal distributions
a b
1 1

2 2

µ1 = µ 2 µ1 > µ2
σ1 > σ 2 σ1 = σ 2
a) Same mean different SDs. b) Different means but same SD.
NB: You can imagine another situation under which we have
different means and different standard deviations
SD and normal distribution

• For a given normal population distribution,

68.26% of cases fall within 1 SD
95.44% of cases fall within 2 SD
99.74% of cases fall within 3 SD
• The percentage in each case represents the
portion of area under the normal curve within
the given SD.
• Area under the whole curve is assumed to be 1
or 100%.
Measures of relative position
• Provides info. about where a score falls w.r.t. other scores in the
distribution of data.
• Raw score

• Derived scores: Mean, SD, percentile ranks, deciles, quartiles, Z

scores, … etc.
• Percentile rank: The percentage of scores in a reference group
(norm group) that falls below a particular raw score.
• Percentile corresponding to score X is equal to (No. of scores
below X plus .5 )/Tot no. of scores, the whole multiplied by
Standard score and Percentile rank
• Quartiles divide the distn. into four parts.

• Deciles divide the distn. into 10 parts.

• Median is 50th percentile or 2nd quartile, or 5th decile.

• Standard scores: scores converted from one scale to

another so that they can have a particular mean and
SD and are more interpretable. Ex. Z score.
• Z=(X-M)/SD; It has mean 0 and SD=1. It has normal
distribution. 22
Measures of Relationships
• Measures of relationships include:
– Pearson’s correlation coefficient r
– Spearman’s correlation coefficient ƥ
– Contingency table
– Regression coefficient beta
• Simple regression coefficient
• Multiple regression coefficients

Data 1
Exercise 4.1.2
Using SPSS and Data 1 above, answer each of the following.

1.Summarize the data for the three continuous variables (Startsal,

GPA and GRE) reporting averages, SD, range, coefficient of
variation in an appropriate table. Display the data in frequency
distribution tables, histograms with normal curve and scatter
plots for these variables. Communicate the characteristics of the
population from which this sample is drawn in a paragraph.

2.Draw frequency bar graphs and percentage pie charts for Gender
and Major.
Exercise 4.1.2 (cont)
Based on Data 1, Use SPSS to obtain:
3. Obtain a contingency table for Startsal by
Gender and Major.
4. The percentile rank of GPA of 2.9 and
salary of 33000.
5. Z score of GPA of 2.9 and salary of 33000.
6. Pearson’s r between Startsal and GRE.
7. Spearman ρ between rank scores of Startsal
and GRE.

