Fundamentals of Statistics With MS Excel

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

Descriptive Statistics

with Microsoft Excel


DataSense Analytics

Presented By:
%
Engr. Raniel B. Taripe, CIE MBA-FM
Analytics is everywhere…
Statistics is everywhere…
OUTLINE:
In this session, we will try to
answer the following questions:

I. What is Descriptive Statistics?


II. Why we need to study Descriptive Statistics
and its application in MS Excel ?
III. When should we study and apply
Statistics in MS Excel?
IV. Where Descriptive Statistics is applicable?
V. How to analyze dataset using Descriptive
Statistics thru MS Excel ?
I. What is
Descriptive
Statistics?
Before we discuss DESCRIPTIVE
STATISTICS, let’s all define first
the word STATISTICS
STATISTICS
• The scientific discipline that involves the
collection, analysis, interpretation,
presentation, and organization of data.
Statistics provide methods and techniques
for summarizing and making inferences
from data, allowing researchers and
analysts to draw conclusions and make
informed decisions based on the available
information.
• © CallTutors
• © International Six Sigma Institute
II. Why we need to study
Descriptive Statistics and its
Application in MS Excel?
III. When should we study and
apply Statistics in MS Excel?
IV. Where Descriptive Statistics
is applicable?
Analytics is everywhere…
Statistics is everywhere…
V. How to analyze dataset using
Descriptive Statistics
thru MS Excel ?
• © International Six Sigma Institute
Measures of
• Measures of
Central Tendencies Location
• 1. Mean • 1. Quartiles, Deciles and
• 2. Median Percentiles
• 3. Mode • 2. Midrange, Interquartile
• 4. Midrange Range & Quartile Deviation
• 3. Coefficient of Variation
• Measures of • 4. Kurtosis
Dispersion • 5. Skewness
• 1. Range
• 6. Outliers
• 2. Average Deviation
• 3. Variance and Standard • 7. Boxplot
Deviation
MEASURES OF
CENTRAL TENDENCIES
MEAN
MEAN
• The arithmetic mean, often called as mean, is the most frequently
used measure of central tendency.
• The mean is the only common measure of in which all values play
an equal role.
PROPERTIES OF MEAN
• 1. A set of data has only one mean.
• 2. Mean can be applied for interval and ratio data.
• 3. All values in the data set are included in computing the mean.
• 4. The mean is very useful in comparing two or more datasets.
• 5. Mean is affected by the extreme small or large values on a dataset.
• 6. The mean cannot be computed for the data in a frequency distribution
with an open-ended class.
• 7. Mean is most appropriate in symmetrical data.

Σ𝑥
𝑥=
ҧ
𝑛
EXCEL TIME!!!
MEDIAN
MEDIAN
• The median is the midpoint of the data array. When the dataset is
ordered whether ascending or descending, it is called a data
array. Median is best for ordinal type of data.
PROPERTIES OF MEDIAN
• 1. The median is unique, there is only one median for a set of data.
• 2. The median is found by arranging the set of data from lowest to
highest (or vice versa) and getting the value of the middle observation.
• 3. Median is not affected by the extreme small or large values.
• 4. Median can be computed for an open-ended frequency distribution.
• 5. Median can be applied for ordinal, interval and ratio data.
• 6. Median is most appropriate in a skewed data.

𝑛+1
𝑥෤ (Rank Value) =
2
EXCEL TIME!!!
MODE
MODE
• The mode is the value in a dataset that appears most frequently.
Like the median and unlike the mean, extreme values in a dataset
do not affect the mode.
• A dataset may not contain any mode if non of the values is “most
typical”

• Types of Modes: Unimodal, Bimodal, Multimodal, No Mode


PROPERTIES OF MODE
• 1. The mode is found by locating the most frequently occurring value.
• 2. The mode is the easiest average to compute.
• 3. There can be more than one mode or even no mode in any
given dataset.
• 4. Mode is not affected by extreme small or large values.
• 5. Mode can be applied for nominal, ordinal, interval and ratio data.
EXCEL TIME!!!
MIDRANGE
MIDRANGE
• The midrange is the average of the lowest and highest value in a
dataset.
PROPERTIES OF MIDRANGE
• 1. The midrange is easy to compute.
• 2. The midrange gives the midpoint.
• 3. The midrange is unique.
• 4. Midrange is affected by extreme small or large values.
• 5. Midrange can be applied for interval and ratio data.

𝑙𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 +ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒


𝑀𝑖𝑑𝑟𝑎𝑛𝑔𝑒 =
2
EXCEL TIME!!!
MEASURES OF
DISPERSION
RANGE
• The simplest and easiest way to determine measure of dispersion
is the range. This is the difference between the lowest and highest
value in a dataset.

R𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒


EXCEL TIME!!!
AVERAGE DEVIATION
AVERAGE DEVIATION
• The average deviation in a dataset is the absolute difference
between an element and a given point.
• Also called mean absolute deviation.

Σ|𝑥−𝑥|ҧ
AD =
𝑁
EXCEL TIME!!!
VARIANCE AND
STANDARD DEVIATION
VARIANCE AND
STANDARD DEVIATION
• The standard deviation is the most widely used measures of
dispersion. The more spread the data points, the higher the
deviation.
• While variance is the square of standard deviation. It is the
mathematical expectation of the average squared deviations from
the mean.
EXCEL TIME!!!
MEASURES OF
LOCATION
QUARTILES, DECILES
AND PERCENTILES
QUARTILES AND
PERCENTILES
• When presenting or analyzing dataset, it is sometimes helpful to
group subjects into several equal groups. For example, to create
four equal groups, we need the values that split the data such that
25% of the observations are in each group. The cut off points are
called quartiles, when dataset is split into 100 equal parts, that is
called percentiles.

The general term for such cut off points are quantiles.
EXCEL TIME!!!
MIDHINGE,
INTERQUARTILE
RANGE, & QUARTILE
DEVIATION
MIDHINGE
• The midhinge is the mean of the first and third quartiles in the
dataset. It is used to overcome potential problems introduced by
extreme values (or outliers) in the dataset.
EXCEL TIME!!!
INTERQUARTILE
RANGE
• The interquartile range (IQR), also called midspread or middle fifty
is the difference between the 3rd and 1st quartile.
EXCEL TIME!!!
QUARTILE DEVIATION
• The quartile deviation (QD) is a slightly better measure of aboslute
dispersion than the range. But it ignores the observation on the
tails.
EXCEL TIME!!!
COEFFICIENT OF
VARIATION
COEFFICIENT OF
VARIATION
• In any given two samples with the same units of measures, the
variance and standard deviation for each can be compared.
• In cases when one is interested to compare standard deviations of
two (or more) different units, coefficient of deviations can be
applied.

𝑠
CV = (100)
𝑥ҧ
EXCEL TIME!!!
KURTOSIS
KURTOSIS
• Kurtosis is from the Greek word kyrtos or kurtos, meaning bulging.
• In statistics kurtosis (or excess) is a statistical measure used to
describe the distribution of observed data around the mean.
• In measures the relative peakedness or flatness of a distribution (as
compared to the normal distribution, which shows a kurtosis of 0)
2 TYPES OF KURTOSIS
• Leptokurtic (kurtosis > 0)
• Mesokurtic (kurtosis = 0)
• Platykurtic (kurtosis < 0)
EXCEL TIME!!!
SKEWNESS
SKEWNESS
• The coefficient of skewness measures the general shape of the distribution of the lack of
symmetry of a distribution.
• The range of possible skewness values is theoretically unbounded but it normally ranges
from -3 to +3 and it relates the difference between the mean and the median to the
standard deviation. The direction of the long tail of the distribution points to the direction of
the skewness.
• It is possible to have skewness values higher or lower than 3 and they indicated significant
skew, meaning the distribution has a long tail to the right or to the left.
EXCEL TIME!!!
OUTLIERS
OUTLIERS
• An outlier is an observation point that is distant from other observations or an observation
that lies outside the overall pattern of a distribution. A dataset should be checked from
extremely high or extremely low values called outlier.
• Outliers can strongly affect the mean and standard deviation of a variable.

• Mild Outlier
< [ Q1-1.5(IQR) ] or > [Q3+1.5(IQR) ]

Extreme Outlier
< [ Q1-3(IQR) ] or > [Q3+3(IQR) ]
EXCEL TIME!!!
BOXPLOT
BOXPLOT
• A boxplot or box-and-whisker plot is a graph of a dataset obtained by drawing a horizontal
line from the minimum data value to the Q1, drawing a horizontal line from the Q3 up to the
maximum data value, and drawing a box whose vertical pass through Q1 and Q3, with a
vertical line inside the box passing through the median or second quartile (Q2).
EXCEL TIME!!!
Descriptive Statistics
with Microsoft Excel
DataSense Analytics

Presented By:
%
Engr. Raniel B. Taripe, CIE MBA-FM

You might also like