Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Business Statistics – Session 3

Descriptive Statistics
• After attending this session, you should be able to:

● Understand the development, importance and role of statistics


● Understand the classification of statistical techniques as a
decision-making tool
● Learn how to install and use Data Analysis Toolpak for
conducting statistical analysis
● Learn descriptive statistical analysis technique and its utility
The Blue Crabs of Naples

The Blue Crabs of Naples


The Pyramid of Statistics

APPLIED

DESCRIPTIVE INFERENTIAL ANALYTICAL


Measures of Descriptive Statistics

Measures of Descriptive Statistics

Measures of
Central Measures of Dispersion Measures of Shape
Tendency

• Mean • Standard deviation • Kurtosis


• Median • Variance • Skewness
• Mode • Range and Interquartile Range
(IQR)
The Basic Statistics Vocabulary

Population consists all the items or individuals about which you want to reach conclusions.

All of the Good Tunes and More sales transactions for a specific year, all the full-time students
enrolled in a college, and all the registered voters in Ohio are examples of populations.

Sample is a portion of a population selected for analysis. For example, a class has 160 students
enrolled in a year. This 160 number of students is the population of the class. If 65 out of 160 are
chosen randomly for some study then this 65 is the sample represents the population.

Parameter is a measure that describes a variable that uses population data. Mean score
calculated for entire class is a parameter.

Statistic is a measure that describes a variable that uses sample data.


Activity 1: Measuring Central Tendency

Guidelines
• Refer to the data sheet, ‘Grad Survey’
• Identify the variables for which you can
calculate central tendency
• Use appropriate Excel functions to
calculate the measures of central
tendency
• What do you notice?
Introduction
 The concept of central tendency plays a dominant role in the study of statistics.

 In many frequency distributions, the tabulated values show a distinct tendency to cluster

or to group around a typical central value.

 This behaviour of the data to concentrate the values around a central part of distribution

is called ‘Central Tendency’ of the data.


Characteristics of Central Tendency

A good measure of central tendency should possess as far as possible the following

characteristics:

 Easy to understand.

 Simple to compute.

 Based on all observations.

 Uniquely defined.

 Possibility of further algebraic treatment.

 Not unduly affected by extreme values.

Cont….
Common Measures of Central Tendency

Mean

Median Mode
Arithmetic Mean
 The arithmetic mean of a Properties of Arithmetic Mean
Properties of Arithmetic Mean
series is the quotient obtained

by dividing the sum of the


Calculation of Simple Arithmetic Mean
Calculation of Simple Arithmetic Mean
values by the number of items.

In algebraic language, if
Merits and Demerits of Arithmetic Mean
X1, X2, X3....... Xn are the n Merits and Demerits of Arithmetic Mean

values of a variate X.
Weighted Arithmetic Mean
Weighted Arithmetic Mean
Median

Median is the value, which divides the distribution of data, arranged in


ascending or descending order, into two equal parts. Thus, the ‘Median’ is a
value of the middle observation.

 Calculation of Median

 Merits and Demerits of Median

 Partition Values or Positional Measures

 Quartiles

 Deciles

 Percentiles
Mode

 Mode is the value which  Calculation of Mode

has the greatest frequency


 Merits and Demerits of Mode
density. Mode is denoted by Z.

 Graphic Location of Mode


Empirical Relationship between Mean, Median and Mode
 A distribution in which the mean, the median, and the mode coincide is known

as symmetrical (bell shaped) distribution. Normal distribution is one such a

symmetric distribution, which is very commonly used.

 If the distribution is skewed, the mean, the median and the mode are not equal.

In a moderately skewed distribution distance between the mean and the median is

approximately one third of the distance between the mean and the mode. This can be

expressed as:

Mean – Median = (Mean – Mode) / 3

Mode = 3 * Median – 2 * Mean


Limitations of Central Tendency

 In case of highly skewed data.

 In case of uneven or irregular spread of the data.

 In open end distributions.

 When average growth or average speed is required.

 When there are extreme values in the data.

 Except in these cases AM is widely used in practice.


Plotting the CT Measures
Activity 2: Positional Measures

Guidelines
• Refer to the data sheet, ‘Grad Survey’
• Calculate the Five Number Summary
using appropriate excel function
• What do these positional measures tell
us?
Summary

 Measures of the central tendency give one of the very important characteristics of
the data. According to the situation, one of the various measures of central tendency may
be chosen as the most representative.

 Arithmetic mean is widely used and understood. What characterizes the three
measures of centrality, and what are the relative merits of each in the given situation, is
the question.

 Mean summarizes all the information in the data. Mean can be visualized as a single
point where all the mass (the weight) of the observations is concentrated. It is like a centre of
gravity in physics. Mean also has some desirable mathematical properties that make it
useful in the context of statistical inference.

Cont….
 To simplify the manual calculation, we may sometimes use shift of origin and
change of scale. Shifting of origin is achieved by adding or subtracting a constant to all
observations. In case of discrete data we add or subtract (usually subtract) a constant to the
individual observations. Whereas for grouped data, we add or subtract (usually subtract) the
constant to the class mark values.

 There are cases where relative importance of the different items is not the same. In
such a case, we need to compute the weighted arithmetic mean. The procedure is similar to the
grouped data calculations studied earlier, when we consider frequency as a weight associated
with the class-mark.

 Median is the middle value when the data is arranged in order. The median is resistant to
the extreme observations. Median is like the geometric centre in physics. In case we want to
guard against the influence of a few outlying observations (called outliers), we may use the
median.
Cont….
 Quantiles are related positional measures of central tendency. These are useful and
frequently employed measures. Most familiar quantiles are Quartiles, Deciles, and Percentiles.

 Quartiles are position values similar to the Median. There are three quartiles denoted
by Q1, Q2 and Q3. Q1 is called the lower Quartile or first quartile. The second quartile Q2 is
nothing but the median. In a distribution, one fourth of the item are less then Q1 and the
other ¾ th item are greater then Q1 is called the upper quartile (or) the 3rd quartile.

 Inter-quartile range is defined as the difference between the first and third quartile. It
is a measure of spread of the data.

 D1, D2, D3… and D9 are the nine deciles. They divide a series into 10 equal
parts. One tenth of the items are less than or equal to D1. One tenth of the items are more
than or equal to D9 and one tenth of the items between any successive pairs of deciles when
all the items are in ascending order

Cont….
 Pth percentile of a group of observations is that observation below which lie P% (P
percent) observations. The position of Pth percentile is given by ,
where ‘n’ is the number of data points.

 If the value of is a fraction, we need to interpolate the value.

 The Mode of a data set is the value that occurs most frequently. There are many
situations in which arithmetic mean and median fail to reveal the true characteristics of a
data (most representative figure), for example, most common size of shoes, most common
size of garments etc. In such cases, mode is the best-suited measure of the central tendency.

 A distribution in which the mean, the median, and the mode coincide is known as
symmetrical (bell shaped) distribution. Normal distribution is one such a symmetric
distribution, which is very commonly used.

Cont….
This can be expressed as:

 Mean – Median = (Mean – Mode) / 3

 Mode = 3 * Median – 2 * Mean

 No single average can be regarded as the best or most suitable under all

circumstances. Each average has its merits and demerits and its own particular field of

importance and utility. A proper selection of an average depends on the (1) nature of the data

and (2) purpose of enquiry or requirement of the data.


Practice Time:
Excel Functions for Descriptive stas
5-Number Summary & Box Plot

5-Number Summary & Box Plot


Skewness
Kurtosis – A Question to Ponder?

StatTools calculated a kurtosis of 4.72, but Excel's KURT( ) function calculated 1.72. Is StatTools wrong,
or Excel?

They're both right.


The standard computation of kurtosis gives a value of 3 for a normal distribution, and some
statistical tools follow that norm

Whereas, the ‘Kurt’ function in Excel calculates the excess kurtosis:


(excess kurtosis) = (kurtosis) – 3

If your distribution has a kurtosis of less than 3, it has a negative excess kurtosis and is called
platykurtic

if your distribution has a kurtosis greater than 3, it has a positive excess kurtosis and is called
leptokurtic
Kurtosis

You might also like