Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 76

STATISTICAL

ANALYSIS WITH
SOFTWARE
APPLICATION
WEEK 2

KREANNE LOCSON FALCASANTOS-DIAZ


Faculty, ADZU
In this session, we will…
 discuss how to construct a frequency distribution;
 explain the characteristics of the mean, median, and mode, respectively;
 illustrate the respective measures of location, namely:
o Percentiles
o Quartiles
o Deciles.
THE FREQUENCY DISTRIBUTION
A frequency distribution is the organization of raw data in table form,
using classes and frequencies.

Types of frequency distributions:

◦ Categorical frequency distribution

◦ Ungrouped frequency distribution

◦ grouped frequency distribution

The frequency or the frequency count for a data value is the number of times
the value occurs in the data set.
Categorical frequency distribution represents data that can be placed in
specific categories.
Example:
Twenty-five incoming freshmen were given a blood test to determine their blood type.
The data set is
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
CLASS TALLY FREQUENCY PERCENT
5
A 5 25
=20 %

7
B 7 25
=28 %

9
O 9 25
=36 %

AB 4 4
25
=16 %

TOTAL 25 100
Ungrouped frequency distribution lists the data values with the corresponding
number of times or frequency count with which each value occurs.
Example:
The following data represent the number of defective bulbs observed each day over a
25-day period for a manufacturing process. Summarize the information with a frequency
distribution.
CLASS FREQUENCY
(DEFECTS)
6 4
7 1
9 2
10 5
11 6
12 3
14 1
16 1
20 1
21 1
𝐓𝐎𝐓𝐀𝐋 𝟐𝟓
Grouped frequency distribution is obtained by constructing classes (or
intervals) for the data, and then listing the corresponding number of values
(frequency count) in each interval.

To construct a frequency distribution, follow these rules:


1. There should be between 5 and 20 classes.
2. It is preferable but not absolutely necessary that the class width be an odd number.
3. The classes must be mutually exclusive.
4. The classes must be continuous.
5. The classes must be exhaustive.
6. The classes must be equal in width.
Constructing a Grouped Frequency Distribution
To construct a frequency distribution, follow these steps:
1. Determine the classes.
◦ Find the highest and lowest values.
◦ Find the range.
◦ Select the number of classes desired.
◦ Find the width by dividing the range by the number of classes and rounding up.
◦ Select a starting point (usually the lowest value or any convenient number less than the
lowest value); add the width to get the lower limits.
◦ Find the upper class limits.
◦ Find the boundaries.
2. Tally the data.
3. Find the numerical frequencies from the tallies, and find the cumulative frequencies.
Number of classes
 Some Statisticians use “ “ rule.

 rule is just a guide
 If the rule suggests you need 6 classes, also consider using 5 or 7 classes ... but certainly not 3 or 9.

k 1 2 3 4 5 6 7 8 9
2 4 8 16 32 64 128 256 512
NOTE
 the class limits should have the same decimal place value as the data, but the class boundaries
should have one additional place value and end in a 5.
Example:
The data below represent the record high temperatures in degrees Fahrenheit (F) for
each of the 50 cities in the Philippines this April. Construct a grouped frequency distribution for
the data, using 7 classes.

112 100 127 120 134 118 105 110 109 112

110 118 117 116 118 122 114 114 105 109

107 112 114 115 118 117 118 122 106 110

116 108 110 121 113 120 119 111 104 111

120 113 120 117 105 110 118 112 114 114
◦ Find the highest value and lowest value:
and .

◦ Find the range: ,

so

◦ In this case, we will use 7 classes to construct the frequency distribution.

◦ Find the class width by dividing the range by the number of classes.
CLASS LIMITS CLASS BOUNDARIES TALLY FREQUENCY
100 −104 99.5 − 10 4.5 II 2
105 9 1 04.5 −10 9.5 IIIII − III 8
110 −114 1 09.5 −1 14.5 IIIII − IIIII − IIIII − III 18
115 9 1 14.5 −119.5 IIIII − IIIII − III 13
120 −124 1 19.5 −1 24.5 IIIII − II 7
125 9 1 24.5 −1 29.5 I 1
130 −134 1 29.5 −1 34.5 I 1
Sometimes it is necessary to use a cumulative frequency distribution. A cumulative
frequency distribution is a distribution that shows the number of data values less than or equal
to a specific value (usually an upper boundary).

CLASS LIMITS CLASS BOUNDARIES FREQUENCY CUMULATIVE


FREQUENCY
100 −104 99.5 − 10 4.5 2 𝟐
105 9 1 04.5 −10 9.5 8 2+
110 −114 1 09.5 −1 14.5 18 28
115 9 1 14.5 −119.5 13 2+ 8+18+ 13=𝟒𝟏
120 −124 1 19.5 −1 24.5 7 2+ 8+18+ 13+7=𝟒𝟖
125 9 1 24.5 −1 29.5 1 2+ 8+18+ 13+7+ 1=𝟒𝟗
130 −134 1 29.5 −1 34.5 1 2+ 8+18+ 13+7+ 1+1=𝟓𝟎
THE FREQUENCY DISTRIBUTION
USING MS EXCEL
The “frequency function” can be found in Formulas menu under the statistical category
by following the below steps as follows:

 Go to Formula menu.

 Click on More Function.


 Under Statistical category choose Frequency Function.
 We will get the Frequency Function Dialogue box as shown
Example
In order to calculate frequency, we have to group the data obtained from students’
scores/marks.
Create a new column named Frequency, then use the frequency formulation on G column by selecting
G3 to G9.
Note that we need to select the entire frequency column to avoid an error value.

◦ In the previous slide, we have selected column F as data array and Student marks as Bin array
=FREQUENCY (F3:F9,C3:C22) and press CTRL+SHIFT+ENTER.

◦ Once we hit the CTRL+SHIFT+ENTER we can see the open and closing parenthesis as shown below.
Example
For creating a pivot table we have to go to the insert menu and select pivot table.
Drag down the Sales in Row Labels. Drag down the same sales in Values.
Set the pivot field setting to count to get the sales count numbers.
Click on the row label sales number and right click then Choose Group option.
We will get the grouping dialogue box below:
Edit the grouping numbers starting at 5000 and Ending at 18000 and it Group By 1000 and then click ok.
We will obtain the result below.
To create a graphical representation of this result, go to insert menu and select the Column chart.
We obtain the graph below.
Example
Go to Data Menu on the right top, we can find the data analysis. Click on the data analysis which is
highlighted as shown below.
We obtain a dialogue box as shown below. Choose Histogram option and Click ok.
We then obtain the histogram dialogue box as shown below.
Select the Input Range and Bin Range as shown below.
Make sure to tick the boxes for label option, Cumulative Percentage, Chart Output and then Click OK.
We obtain this result…
THE MEASURES OF
CENTRAL
TENDENCY
Data Description

Central Position
Tendency

Variation
Measures of Central Tendency
◦ The mean is the sum of the values, divided by the total number of values.
◦ The median is the midpoint of the data array.
◦ The midrange is defined as the sum of the lowest and highest values in the data set, divided by 2.
◦ The value that occurs most often in a data set is called the mode.

Median
Mode

Mean
Mean (Arithmetic Mean)
◦ Mean (Arithmetic Mean) of Data Values
◦ Sample mean
Sample Size
n

◦ Population mean
X i
X1  X 2    X n
X i 1

n n
Population Size
N

X i
X1  X 2    X N
 i 1

N N
Mean (Arithmetic Mean)
◦ The Most Common Measure of Central Tendency
◦ Affected by Extreme Values (Outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6
Median
◦ Robust Measure of Central Tendency
◦ Not Affected by Extreme Values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

◦ In an Ordered Array, the Median is the ‘Middle’ Number


◦ If n or N is odd, the median is the middle number
◦ If n or N is even, the median is the average of the 2 middle numbers
Mode
◦ A Measure of Central Tendency
◦ Value that Occurs Most Often
◦ Not Affected by Extreme Values
◦ There May Not Be a Mode
◦ There May Be Several Modes
◦ Used for Either Numerical or Categorical Data

Mode = 9

No Mode

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
THE MEASURES OF
CENTRAL TENDENCY
USING EXCEL
To compute for the mean, simply use the average function by typing =average(data array) on an empty cell,
then press ENTER.
After pressing the ENTER button, we obtain the result below:

This implies that average score of


students is 48.2.
You can also use the Sigma icon or the Formulas tab in computing for the mean.
To compute for the median, simply use the median function by typing =median(data array) on an empty
cell, then press ENTER.
After pressing the ENTER button, we obtain the result below:

This implies that median score of


students is 42.5.
To compute for the mode, we can use
the mode function by typing
=mode(data array) on an empty cell,
then press ENTER. However, a data set
can have more than one mode. Thus, we
can use the =mode.mult(data array).
 To do this, first highlight multiple
cells vertically, then enter the
command =mode.mult(data
array) and press
CTRL+Shift+Enter.
After pressing the CTRL+Shift+Enter buttons we obtain the result below:

This implies that this data set is


multimodal because it has 4 modes
namely, 20, 15, 55 and 33.
THE MEASURES OF LOCATION
Location or Position
◦ Used to describe the position of a data value in relation to the rest of the data.

◦ Types:
1. Quartiles
2. Percentiles
3. Deciles
Quartiles
Q1, Q2, Q3
divides ranked scores into four equal parts
25% 25% 25% 25%

(minimum)
Q1 Q2 Q3 (maximum)

(median)
Q1 – Lower Quartile
At most, 25% of data is smaller than Q1.

It divides the lower half of a data set in half.


Q2 - Median
◦ The median divides the data set in half.

◦ 50% of the data values fall below the median and 50% fall above.
Q3 – Upper Quartile
◦ At most, 25% of data is larger than Q3.

◦ It divides the upper half of the data set in half.


Quartile Formulas:

𝑛 +1
𝑄1 = 𝑡h
4

2(𝑛+1) 𝑛+1
𝑄 2= = 𝑡h
4 2

3(𝑛 +1)
𝑄 3= 𝑡h
4
Interquartile Range

◦ The inter quartile range is Q3-Q1

◦ 50% of the observations in the distribution are in the inter quartile range.

◦ The following figure shows the interaction between the quartiles, the median and the inter
quartile range.
Deciles
D 1, D 2, D 3, D 4, D 5, D 6, D 7, D 8, D 9
divides ranked data into ten equal parts

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

D1 D2 D3 D4 D5 D6 D7 D8 D9
Percentiles
◦ Values of the variable that divide a ranked set into 100 subsets.

◦ For example, P30 would be at 30%.


Quartiles Deciles
Q1 = P25 D1 = P10
Q2 = P50 D2 = P20
D3 = P30
Q3 = P75 •


Dz9 =
P90
THE MEASURES OF LOCATION
USING EXCEL
To compute for percentiles,
simply use the percentile function
by typing =percentile(data array,
position) on an empty cell, then
press ENTER.
Let’s compute .
After pressing the ENTER button, we obtain the result below:

This implies that a raw score of 20


correspondents to the 25th percentile.
Thus a student whose raw score is 20 is
higher than 25% of the students in the
class.
To compute for quartiles, simply
use the quartile function by typing
=quartile(data array, position)
on an empty cell, then press
ENTER.
Let’s compute .
After pressing the ENTER button, we obtain the result below:

This implies that a student whose raw


score is 66.75 performed better than
75% of the students in the class.
REFERENCES

 Bluman, A.G.(2014) Elementary statistics: A step by step approach, ninth editionMcGraw-Hill


Education, 2 Penn Plaza, New York, NY 10121. ISBN 978–0–07–353498–5.

 Excel frequency distribution. (2020). EDUKBA. https://www.educba.com/excel-frequency-


distribution/

You might also like