Professional Documents
Culture Documents
Business Statistics - Session Descriptive Statistics
Business Statistics - Session Descriptive Statistics
Descriptive Statistics
• After attending this session, you should be able to:
APPLIED
Measures of
Central Measures of Dispersion Measures of Shape
Tendency
Population consists all the items or individuals about which you want to reach conclusions.
All of the Good Tunes and More sales transactions for a specific year, all the full-time students
enrolled in a college, and all the registered voters in Ohio are examples of populations.
Sample is a portion of a population selected for analysis. For example, a class has 160 students
enrolled in a year. This 160 number of students is the population of the class. If 65 out of 160 are
chosen randomly for some study then this 65 is the sample represents the population.
Parameter is a measure that describes a variable that uses population data. Mean score
calculated for entire class is a parameter.
Guidelines
• Refer to the data sheet, ‘Grad Survey’
• Identify the variables for which you can
calculate central tendency
• Use appropriate Excel functions to
calculate the measures of central
tendency
• What do you notice?
Introduction
The concept of central tendency plays a dominant role in the study of statistics.
In many frequency distributions, the tabulated values show a distinct tendency to cluster
This behaviour of the data to concentrate the values around a central part of distribution
A good measure of central tendency should possess as far as possible the following
characteristics:
Easy to understand.
Simple to compute.
Uniquely defined.
Cont….
Common Measures of Central Tendency
Mean
Median Mode
Arithmetic Mean
The arithmetic mean of a Properties of Arithmetic Mean
Properties of Arithmetic Mean
series is the quotient obtained
In algebraic language, if
Merits and Demerits of Arithmetic Mean
X1, X2, X3....... Xn are the n Merits and Demerits of Arithmetic Mean
values of a variate X.
Weighted Arithmetic Mean
Weighted Arithmetic Mean
Median
Calculation of Median
Quartiles
Deciles
Percentiles
Mode
If the distribution is skewed, the mean, the median and the mode are not equal.
In a moderately skewed distribution distance between the mean and the median is
approximately one third of the distance between the mean and the mode. This can be
expressed as:
Guidelines
• Refer to the data sheet, ‘Grad Survey’
• Calculate the Five Number Summary
using appropriate excel function
• What do these positional measures tell
us?
Summary
Measures of the central tendency give one of the very important characteristics of
the data. According to the situation, one of the various measures of central tendency may
be chosen as the most representative.
Arithmetic mean is widely used and understood. What characterizes the three
measures of centrality, and what are the relative merits of each in the given situation, is
the question.
Mean summarizes all the information in the data. Mean can be visualized as a single
point where all the mass (the weight) of the observations is concentrated. It is like a centre of
gravity in physics. Mean also has some desirable mathematical properties that make it
useful in the context of statistical inference.
Cont….
To simplify the manual calculation, we may sometimes use shift of origin and
change of scale. Shifting of origin is achieved by adding or subtracting a constant to all
observations. In case of discrete data we add or subtract (usually subtract) a constant to the
individual observations. Whereas for grouped data, we add or subtract (usually subtract) the
constant to the class mark values.
There are cases where relative importance of the different items is not the same. In
such a case, we need to compute the weighted arithmetic mean. The procedure is similar to the
grouped data calculations studied earlier, when we consider frequency as a weight associated
with the class-mark.
Median is the middle value when the data is arranged in order. The median is resistant to
the extreme observations. Median is like the geometric centre in physics. In case we want to
guard against the influence of a few outlying observations (called outliers), we may use the
median.
Cont….
Quantiles are related positional measures of central tendency. These are useful and
frequently employed measures. Most familiar quantiles are Quartiles, Deciles, and Percentiles.
Quartiles are position values similar to the Median. There are three quartiles denoted
by Q1, Q2 and Q3. Q1 is called the lower Quartile or first quartile. The second quartile Q2 is
nothing but the median. In a distribution, one fourth of the item are less then Q1 and the
other ¾ th item are greater then Q1 is called the upper quartile (or) the 3rd quartile.
Inter-quartile range is defined as the difference between the first and third quartile. It
is a measure of spread of the data.
D1, D2, D3… and D9 are the nine deciles. They divide a series into 10 equal
parts. One tenth of the items are less than or equal to D1. One tenth of the items are more
than or equal to D9 and one tenth of the items between any successive pairs of deciles when
all the items are in ascending order
Cont….
Pth percentile of a group of observations is that observation below which lie P% (P
percent) observations. The position of Pth percentile is given by ,
where ‘n’ is the number of data points.
The Mode of a data set is the value that occurs most frequently. There are many
situations in which arithmetic mean and median fail to reveal the true characteristics of a
data (most representative figure), for example, most common size of shoes, most common
size of garments etc. In such cases, mode is the best-suited measure of the central tendency.
A distribution in which the mean, the median, and the mode coincide is known as
symmetrical (bell shaped) distribution. Normal distribution is one such a symmetric
distribution, which is very commonly used.
Cont….
This can be expressed as:
No single average can be regarded as the best or most suitable under all
circumstances. Each average has its merits and demerits and its own particular field of
importance and utility. A proper selection of an average depends on the (1) nature of the data
StatTools calculated a kurtosis of 4.72, but Excel's KURT( ) function calculated 1.72. Is StatTools wrong,
or Excel?
If your distribution has a kurtosis of less than 3, it has a negative excess kurtosis and is called
platykurtic
if your distribution has a kurtosis greater than 3, it has a positive excess kurtosis and is called
leptokurtic
Kurtosis