Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Jullie Carmelle H.

Chatto
BSA-II
STAT 1N
Statistical Analysis with Computer Application
Statistical Terms
 Statistics – is a branch of mathematics that deals with the systematic method of collecting,
classifying, presenting, analyzing and interpreting quantitative or numerical data. (del
Rosario, 1999).
 Data – are the quantities (numbers) or qualities (attributes) measured or observed that are
to be collected and/or analyzed.
 Variable – refers to a property that can take on different values or categories which cannot
be predicted with certainty.
Examples: smoking habit, attitude toward the head, height, faculty ranks.
 Variate - the actual values of the variables. It is commonly termed as random variable.
 Constant – refers to the fundamental quantities that do not change in value.
 Population – is a set of data consisting of all conceivable possible observations of a
certain phenomenon.
 Sample – is a finite number of items selected from a population possessing identical
characteristics with those of the population from which it was taken.

Divisions of Statistics:
A. Descriptive Statistics – is concerned with the collection, classification and presentation of
data designed to summarize and describe the group characteristics. It refers to the
methods of summarizing and presenting data in the form which will make them easier to
analyze and interpret.
B. Inferential Statistics – refers to the drawing of conclusions or judgment about a population
based on a representative sample systematically taken from the same population. Its aim is
to give concise information about large groups of data without dealing with and each and
every element of these groups. It is the process of drawing and making decision on the
population based on evidence obtained from a sample. It includes estimation and
hypothesis testing.

Classifications of Statistics:
A. Parametric Statistics – is an approach which assumes a random sample from a normal
distribution and involves testing of hypothesis about the population parameter. This
approach is appropriate generally for interval and ratio data.
B. Nonparametric Statistics – is a statistical approach for estimating and hypothesis testing
when no underlying data distribution is assumed. This procedure is appropriate if there is
not enough sample size to assess the form of the distribution.

Types of Data:
A. Categorical Data
a. Nominal Scale – is a categorical data having unordered scales.
Examples: gender, mode of transportation, nationality, occupation, civil status.
b. Ordinal Scale – is a categorical data having ordered scales.
Examples: pain level (none, mild, moderate, severe), social status
B. Continuous Data
a. Interval Scale – is a continuous data having interval.
Examples: temperature, income
b. Ratio Scale - is a continuous data having both equal intervals and an absolute zero
point.
Examples: weight in pounds, height in centimeters, age in years.

Types of Variables:
A. Response Variable or Dependent Variable (Y) – is a variable which is affected by or
related to the value of some other variables. It can be continuous or categorical data.
B. Explanatory Variable or Independent Variable (X) – is a variable that is thought to
influence or affect the values of the response variable. It can be continuous or categorical data.
C. Controlled Variable – a sample that remains the same throughout the experiment.

Classification of Variables:
A. Qualitative Variable – is one whose categories are simply used as labels to distinguish
one group from another. This variable has values that are intrinsically nonnumeric
(categorical). Can be reassigned numeric values but they are still intrinsically qualitative.
Examples: sex (male = 1, female = 0), occupation, race.
B. Quantitative Variable – is one whose categories can be measured and ordered
according to quantity. These are variable values that are intrinsically numeric.
Jullie Carmelle H. Chatto
BSA-II
Examples: number of children in a family, number of students in a class.

Divisions of Quantitative Variables:


A. Discrete Variable – consists of variates which do not progress from one value to the
next without a break, and maybe represented by a whole number.
B. Continuous Variable – consists of variates which do not progress from one value to
the next without a break, and maybe represented by a whole number or a fraction.

STAT 1N
Statistical Analysis with Computer Application
Steps in Statistical Inquiry or Investigation
1. Collection of Data
2. Processing/Organizing of Data
3. Presentation of Data
4. Analysis of Data
5. Interpretation of Data

DATA COLLECTION
Data collection is the process of gathering and measuring information on variables of interest,
in an established systematic fashion that enables one to answer stated research questions,
test hypotheses, and evaluate outcomes.

NB: The data collected must be valid, reliable, relevant and consistent with other information
to the problem at hand.

Categories/Sources of Data:
1. Primary Data – refer to data obtained directly from an original source by means of
actual observations or by conducting interviews.
2. Secondary Data – refer to data or information that come from existing records
(published and/or unpublished) in usable form such as surveys, census, business
journals and magazines, newspapers, commercial publications, and others such as
theses and dissertations, and research papers, etc.
3. Internal Data – data taken from the company’s own record of operations such as sales
records, production records, personnel records, etc.
4. External Data – data that come from outside sources and not from the company’s own
records.

Methods of Data Collection:


1. Interview or Direct Method
This method is a data gathering device wherein the research worker or
interviewer gets the needed data/information from the respondent or interviewee
verbally and directly in a face-to-face contact. One marked advantage of this method is
that skillful interviewer may draw from the interviewee certain types of personal and
confidential information which may not be possible through the other methods of data
collection.
2. Questionnaire or Indirect Method
This is a data-gathering instrument consisting of a list of well-planned written
questions related to a particular topic sent by mail or individuals, with space provided for
responses to each question given out to acquire the needed data/information.
3. Registration Method
This is a data-gathering by means of registration.
Examples: records of births, marriages and deaths (NSO), records of Filipino
voters (COMELEC).
4. Observation
Observation as means of gathering data is employed when certain data or
information cannot be secured adequately or validly through the use of the other
methods of data collection except through the use of observation. Observation must be
specific, systematic, quantitative, expert and its results must be checked and
substantiated.
5. Experimentation
Data and information can also be gathered by means of experimentation.

NB: After data have been collected, they have to be processed.


Jullie Carmelle H. Chatto
BSA-II

DATA PROCESSING
Data processing occurs when data is collected and translated into a usable information. It
starts with data in its raw form and converts it into a more readable format (graphs, documents,
etc.)

Steps in Processing Data:


1. Data Collection
Data is pulled from available sources. It is important that the data sources available are
trustworthy and well-built so the data collected (and later used as information) is of the
highest possible quality.

2. Data Preparation/Editing
Editing raw data is necessary to detect errors and omissions, and to ensure that the data
gathered are accurate, consistent with other information, complete, and should be arranged
in such a way as to facilitate coding and classification. Data preparation often referred to as
“pre-processing” is the stage at which raw data is cleaned up and organized for the following
stage of data processing. During preparation, raw data is diligently checked for any errors.
The purpose of this step is to eliminate bad data (redundant, incomplete, or incorrect data)
and begin to create high-quality data.

3. Data Input/Coding
The clean data is then entered into its destination and translated into a language that is
understandable. Data input/coding is the first stage in which raw data begins to take the
form of usable information. Coding means assigning numerals and other symbols to the
data collected to be able to group them into a limited number of classes or categories.

4. Processing/Classification
This refers to the sorting of the data and grouping them on the basis of some similarity.

5. Data Output/Interpretation
The output/interpretation stage is the stage at which data is finally usable to non-data
scientists. It is translated, readable, and often in the form of graphs, videos, images, plain
text, etc.

STAT 1N
Collecting and Organizing Data in a Table
The study of statistics begins with the collection of data or measurements. Data collected
should be organized systematically for easier and faster interpretation. They may be
presented in any of the following forms:
 textual form – if data to be presented is few
 tabular and graphical forms – when more detailed information about the data is to be
presented
A table is used when you want to present data in a systemic and organized manner so
that reading and interpretation will be simple and easier. When a table is used, you must
remember the following:
the title of the table columns must be properly labelled
indicate the date of the survey identify the source of the data
arrange the data systematically in
columns

Example 1:
University of Bohol
College of Business and Accountancy
Enrolment, Second Semester, SY 2019 - 2020
Year Level Male Female
First 136 158
Second 112 105
Third 96 193
Jullie Carmelle H. Chatto
BSA-II
Fourth 88 102
Total 432 558
You will observe that the table above shows clearly the enrolment data of University of Bohol,
College of Business and Accountancy for the second semester of school year 2019-2020.
Another type of tabular presentation is the frequency table also known as a frequency
distribution. It is an arrangement of the data that shows the frequency of occurrence of
different values of the variables.
A frequency table is constructed by listing the measurements from highest to lowest,
then making a tally marks to record how often each number occurs. After tallying, count the
marks and record them in the proper column.
Example 2:
The scores of 45 students on a 20-point Math quiz are as follows:
17 20 15 18 19 16 11 10 15 16 13 20
12 12 13 14 11 10 14 13 12 11 14 18
13 15 14 10 15 16 17 17 18 20 15 19
18 17 16 15 12 12 19 19 20

Prepare a frequency table for the set of data.


Solution:
To prepare a frequency table for the given set of scores, the scores are listed
from highest to lowest, tally marks are made and counted. The counted tally marks will then
be recorded under the column frequency. Notice that every 5 th tally crosses the first four
tallies. This is done to make counting of marks easier especially if the number of cases is big.
Score Tallies Frequency
20 //// 4
19 //// 4
18 //// 4
17 //// 4
16 //// 4
15 //// - / 6
14 //// 4
13 //// 4
12 //// 5
11 /// 3
10 /// 3
Total 45

STAT 1N
FREQUENCY DISTRIBUTION
Frequency Distribution is a tabular arrangement of data showing its classification or grouping
according to magnitude or size.
Example: The distribution of bi-monthly salaries of 75 employees of Dragon Company.

Bi-monthly salary in Pesos Number of Employees


5,100 – 5,599 10
5,600 – 6,099 15
6,100 – 6,599 22
6,600 – 7,099 13
7,100 – 7,599 9
7,600 – 8,099 6
n = 75

Components of a Frequency Distribution


1. Class Limits - the end numbers of a class.
In the example above, the class limits are: 5,100 – 5,599; 5,600 – 6,099; 6,100 –
6,599; 6,600 – 7,099; 7,100 – 7,599; 7,600 – 8,099
- The frequency distribution is composed of lower limit and upper limit.
The upper limits of the example above are: 5,599, 6,099, 6,599, 7,099, 7,599, 8,099.
The lower limits are: 5,100, 5,600, 6,100, 6,600, 7,100, 7,600
2. Class Boundaries - are the “true “class limits.
- The lower boundary can be obtained by subtracting the lower limit by ½ or 0.5.
The lower boundary of the first class interval (from top) is P5,099.5 and etc.
- The upper boundary can be obtained by adding the upper limit by ½ or 0.5.
Jullie Carmelle H. Chatto
BSA-II
The upper boundary of the first class interval (from top) is P5,599.5.
3. Class Mark – also known as “class midpoint “. In usual case it is represented by X.
– can be obtained by getting the average of either class limits / class intervals
or class boundaries.
The class mark of the first class interval (from top) is (5,100 + 5,599) ÷ 2 is
P5,349.5 or (5,099.5 + 5,599.5) ÷ 2 is P5,349.5

4. Class Interval - the range of values used in defining a class.


- it is simply the length of a class.
- it is the difference or distance between the upper and lower boundaries
of each class.

5. Class size - the width of each class interval.


In the example above the class size is 500.
- simply find the difference between the lower limits or upper limits or
boundaries.
In the example above class size is obtained by subtracting 5,600 by 5,100
or 5,599 by 6,099.

Steps in Constructing a Frequency Distribution


1. Determine the range using the formula R = highest value – lowest value.
2. Determine the class interval.
Class interval = ___________ range____________
tentative number of classes
3. The ideal number of classes is between 5 and 15.
4. Set – up the class intervals.
5. Start the first class with a lower limit either equal to or a little bit less than the lowest
observed value.
6. Set up the class boundaries if necessary.
7. Tally the number of observations into the appropriate class intervals.
8. Use class intervals of uniform size.
9. Each item should belong to one class only.
10. Avoid using classes that overlap.
11. If possible, do not use open-ended classes like under P100 and P800 and above.
12. To determine the tentative number of classes the formula K = √ n where K is the
number of classes and n is the total number of observations can be used or the Sturges
Rule can also be used Rule: K = 1 + 3.33 log n, where n is the total frequency or total
observed value.

Example:
The following are the scores obtained by 35 students in a quiz in math.
67 54 73 55 80 80 72
72 78 47 65 60 65 90
82 93 45 57 77 42 64
100 95 35 70 61 51 85
66 98 69 88 83 55 73
Construct a frequency distribution showing the frequency, class limits, class boundaries and
class midpoints.

Step 1: Determine the tentative number of classes.


Since there are only 35 observations, then 7 is more appropriate number of classes.
K = 1 + 3.33 log n = 1 + 3.33 log 35 = 6.141747 – round up to 7

Step 2: Find the range


R = 100 - 35 = 65

Step 3: Find the class interval


Class Interval = 65 ÷ 7 = 9.2857 – round off to 9

Step 4: Set up the frequency table and tally.


FREQUENCY TABLE
Scores Tally Frequency (f)
35 – 44 II 2
45 – 54 IIII 4
55 – 64 IIIII – I 6
65 – 74 IIIII – IIIII 10
75 – 84 IIIII – I 6
Jullie Carmelle H. Chatto
BSA-II
85 - 94 IIII 4
95 – 104 III 3
n = 35

Frequency table showing the frequency, class limits, class boundaries and class marks.

Class Intervals Frequency (f) Class Boundaries Class Marks


35 – 44 2 34.5 – 44.5 39.5
45 – 54 4 44.5 – 54.5 49.5
55 – 64 6 54.5 – 64.5 59.5
65 – 74 10 64.5 – 74.5 69.5
75 – 84 6 74.5 – 84.5 79.5
85 - 94 4 84.5 – 94.5 89.5
95 – 104 3 94.5 – 104.5 99.5
n = 35

CUMULATIVE FREQUENCY DISTRIBUTION


Class F %F F≤ %F≤ F≥ % F≥
Intervals
35 - 44 2 5.714 % 2 5.714 % 35 100 %
45 - 54 4 11.429 % 6 17.143 % 33 94.286 %
55 - 64 6 17.143 % 12 34.286 % 29 82.857 %
65 - 74 10 28.571 % 22 62.857 % 23 65.714 %
75 - 84 6 17.143 % 28 80 % 13 37.143 %
85 - 94 4 11.429 % 32 91.429 % 7 20 %
95 - 104 3 8.571 % 35 100 % 3 8.571 %
n = 35

(F ≤) = “less than” cumulative frequency distribution is obtained by adding the frequencies of


the class intervals from the lowest to the highest.
(2 + 4 = 6, 6 + 6 = 12, 12 + 10 = 22, 22 + 6 = 28, 28 + 4 = 32, 32 + 3 = 35
(F ≥) = “more than” cumulative frequency distribution is obtained by subtracting the total
frequency by each of the frequency under (f).
(35 – 2 = 33, 33 – 4 = 29, 29 – 6 = 23, 23 – 10 = 13, 13 – 6 = 7, 7 – 4 = 3).

How to obtain % F, %F ≤, % F≥
%F %F≤ %F≥
2/35 = 5.714 % 2/35 = 5.714 % 35/35 = 100 %
4/35 = 11.429 % 6/35 = 17.143 % 33/35 = 94.286 %
6/35 = 17.143 % 12/35 = 34.286 % 29/35 = 82.857 %

Interpretation of the cumulative frequency distribution:


F
– 2 students got a score from 35 to 44
– 4 students got a score from 45 to 54
– 6 students got a score from 55 to 64
%F
– 5.714 % of the students got a score from 35 to 44
– 11.429 % of the students got a score from 45 to 54
– 28.571 % of the students got a score from 65 to 74
F≤
– 2 students got a score of 44 and below
– 6 students got a score of 54 and below
– 28 students got a score of 84 and below
%F≤
– 5.714 % of the students got a score of 44 and below
– 62.857 % of the students got a score of 74 and below
– 34.286 % of the students got a score of 64 and below
F≥
– 35 students got a score of 35 and above
– 33 students got a score of 45 and above
– 23 students got a score of 65 and above
Jullie Carmelle H. Chatto
BSA-II
%F≥
– 100 % of the students got a score of 35 and above
– 65.714 % of the students got a score of 65 and above
– 20 % of the students got a score of 85 and above

PRESENTATION OF DATA
A. Tabular – presentation using tables
B. Textual – this mode of presentation combines text and figures in a statistical report.
The common example of textual presentation is the news item
C. Graphical – most effective means in presenting statistical data.
- presentation using graphs
Types of graphs:
1. bar graph
2. line graph
3. circle graph or pie chart
4. scatter diagram
5. pictograph or pictogram

GRAPHICAL REPRESENTATION OF THE FREQUENCY DISTRIBUTION


1. Histogram – is a special bar graph constructed by plotting the class boundaries on the
horizontal axis or x – axis against frequencies plotted on the y - axis
2. Frequency polygon - a closed broken line curve constructed by plotting the class marks
on the x – axis against the class frequencies plotted on the y – axis
- to close the frequency polygon, the coordinate end points are terminated
on the x – axis at the midpoints of empty classes before the lowest and
after the highest class intervals
- an empty class is a class with zero frequency
3. Ogive – is the graph of the cumulative frequency distribution.
- it is constructed by plotting the class boundaries on the horizontal or x –
axis against the cumulative “less than” and cumulative “more than” on the
y – axis
- to complete the ogive, the “less than” curve is terminated at the upper
boundary of the lowest class interval
4. Circle graph or Pie chart

TYPES OF FREQUENCY CURVES


1. Normal Curve – represents a symmetrical distribution.
2. Positively skewed – represents asymmetric distributions that tails off to the right
3. Negatively skewed – represents asymmetric distributions that tails off to the left.

STAT 1N
QUANTILES OF UNGROUPED DATA
Quantiles are values that divide the data (distribution) into a given number of equal parts. Like
median, the quantiles are also “positional measures” Some of the quantiles are:

a.) Quartiles – values that divide the distribution into 4 equal parts.
 Q1 – (1st quartile) which is 25 % or less of the given distribution
 Q2 – (2nd quartile) which is 50 % or less of the given distribution
 Q3 – (3rd quartile) which is 75 % or less of the given distribution
 Q4 – (4th quartile) which is 100 % or less of the given distribution
b.) Deciles – values that divide the distribution into ten (ten) equal parts.
 D1 – (1st decile) which is 10 % or less of the given distribution
 D2 – (2nd decile) which is 20 % or less of the given distribution
 D3 – (3rd decile) which is 30 % or less of the given distribution
 D4 – (4th decile) which is 40 % or less of the given distribution
 D10 – (10th decile) which is 100 % or less of the given distribution
c.) Percentiles – values that divide the distribution into 100 equal parts
 P1 – (1st percentile) which is 1 % or less of the given distribution
 P2 – (2nd percentile) which is 2 % or less of the given distribution
 P3 – (3rd percentile) which is 3 % or less of the given distribution
 P4 – (4th percentile) which is 4 % or less of the given distribution
 P100 – (100th percentile) which is 100 % or less of the given distribution
Jullie Carmelle H. Chatto
BSA-II
HOW TO SOLVE FOR ANY QUANTILES
1. Array the data according to magnitude or size.
2. Compute the position of the given quartile in the distribution using the formula
P (n + 1) where: P = is the desired percentage
100 n = number of items or score
3. Locate the item (or score ) corresponding to the obtained position in the distribution.
Always start from the lowest score.
4. If the obtained position is not exact, interpolate if necessary.

Examples
1) Find the 20th percentile or P20 of the following scores 40, 45, 42, 37, 36, 32, 28, 26, 25.

Solution:
1. Array the scores in decreasing order of magnitude.
45 42 40 37 36 32 28 26 25

2. Locate the position of the score corresponding to the 20 th percentile using the
formula
P (n + 1) = 20 (9 + 1) = 2
100 100
3. Locate the second score from the lowest. The answer is 26. Thus, P 20 = 26. This
means that 20 % falls below 26.

2) Find the 5th decile (or D5) of the following scores: 19, 25, 38, 45, 65, 81.

Solution:
1. Arrange the scores in an increasing order of magnitude.
19 25 38 45 65 81
2. Locate the position of D5 using the formula: P (n + 1)/10 = 5 (6 + 1)/10 = 3.5
Since 3.5 is not an exact number, we have to interpolate using the following steps:
a. Get the difference between the third and fourth scores from the lowest score, since
3.5 is between the 3rd and 4th scores. 45 – 38 = 7.
b. Multiply the difference obtained by the decimal in no. 2. That is 7 x 0.5 = 3.5.
c. Add the product to the lower score (38) to obtain D 5. Therefore, D5 = 38 + 3.5 = 41.5
This means that 50 % of the scores lie below 41.5.

MEASURES OF CENTRAL TENDENCY (GROUPED DATA)


Grouped Data – refer to sets of data presented in the forms of frequency distributions. These
are data grouped or classified into categories for better presentation and analysis.

ARITHMETIC MEAN: GROUPED DATA


To compute for the mean of grouped data, we need to determine the midpoint of each
class interval. Since in a frequency distribution, there is no way of getting the sum of the
individual observed values needed in determining the mean, a reasonable assumption is made
that the midpoint of a class interval is equal to the average of all observed values within that
interval. This means that all observed values belonging to each class will be treated as equal
to the midpoint of each class.

TWO METHODS OF COMPUTING FOR THE MEAN OF GROUPED DATA :


1. Long method:
X̅ = ∑ Xi fi_
n
where: X̅ = mean
Xi, X1, X2, Xn = are the class midpoints
fi, f1, f2, fn = are the corresponding frequencies
2. Short Method:
A shorter method of finding the mean of grouped data is with the use of CODING.
The coded formula for the mean is: X̅ = X0 + [∑ ui fi] x C
[n]
where: X0 = assumed mean or coded mean
C = class size
Instead of using the actual class midpoints or class marks, codes (denoted by u) are used
which is composed of consecutive integers assigned to each class. The coding technique is
as follows:
1. Choose one of the class midpoints, preferably at or near the center of the distribution or
choose the class interval with the highest frequency as the assumed mean denoted by X 0.
Jullie Carmelle H. Chatto
BSA-II
2. Under the new column u, write the zero-value code opposite of X 0, and assign positive
integers whole numbers) to the classes higher in value than the class with the assumed
mean, and consecutive negative integers to those classes lower in value.
3. Multiply the coded values with their corresponding frequencies and compute for the
algebraic sum.
4. Substitute the given coded formula and compute for the mean.

Example:
The following is the distribution of the wages of 50 workers of HGW Manufacturing Co.
taken during a particular week last May.
Weekly Wages (in peso) Number of Workers ( fi )
870 – 899 4
900 – 929 6
930 – 959 10
960 – 989 13
990 – 1019 8
1020 – 1049 7
1050 – 1079 2
TOTAL 50
Determine the mean using the two methods.
Solution:
Weekly Number of Class Xi fi ui ui fi
Wages ( in Workers ( fi Midpoints
peso ) ) (Xi)
870 – 899 4 884.5 3538 -3 -12
900 – 929 6 914.5 5487 -2 -12
930 – 959 10 944.5 9445 -1 -10
960 – 989 13 974.5 (X0) 12668.5 0 0
990 – 1019 8 1004.5 8036 1 8
1020 – 7 1034.5 7241.5 2 14
1049
1050 – 2 1064.5 2129 3 6
1079
TOTAL n = 50 48545 -6
Long Method:
X̅ = ∑ Xi fi
n
= 48545
50
X̅ = P970.90
Short Method:
X̅ = X0 + [∑ ui fi] x C
[n]
Where: C = class size = 30
X0 = assumed mean
X̅ = 974.5 + [- 6 x 30]
[50]
= 974.5 – 3.6
X̅ = P970.90

MEDIAN: GROUPED DATA


The median of a frequency distribution (grouped data) could be found by the following formula:

Me = LMe + n/2 - F ≤Me x C


fMe
where :
LMe = lower limit ( strictly lower boundary ) of the median class.
N = total number of observations
F≤Me = cumulative frequency immediately preceding the median class.
fMe = frequency of the median class
C = class size
The median class is the class which contains the n th value.
2
Jullie Carmelle H. Chatto
BSA-II

Example : Find the median of the following frequency distribution.

Weekly Wages (in peso) Number of Workers F≤


( fi )
870 – 899 4 4
900 – 929 6 10
930 – 959 10 20 F≤Me
960 – 989 Me class 13 33
990 – 1019 8 41
1020 – 1049 7 48
1050 – 1079 2 50
TOTAL 50

To determine the median class:


Solve for n/2 = 50/2 = 25th

Then set up the cumulative less than (F≤) frequency and locate where the 25 th item is in
the distribution. The median class is the 960 – 989 class interval where the 25 th items
fall.

Solve for Me:


Me = LMe + [n/2 - F ≤Me] x C
[ fMe ]

= 959.5 + [ 50/2 - 20 ] x ( 30 )
13
= 959.5 + [ 25 - 20 ] x ( 30 )
13
= 959.5 + 11.54

Me = P971.04

MODE OF GROUPED DATA


To determine the mode of grouped data, we have to find first the modal class. In
a frequency distribution, the modal class is the class with the highest frequency.

The formula in finding the mode is:


M0 = LM0 + [ ___d1____ ] x C
[ d 1 + d2 ]
where :
LMo = lower boundary of the modal class.
d1 = difference between the frequency of the modal class and the frequency
of the
class next lower in value
d2 = difference between the frequency of the modal class and the frequency
of the
class next higher in value
C = class size

Example: Find the mode of the following frequency distribution.


Jullie Carmelle H. Chatto
BSA-II
Weekly Wages (in peso) Number of Workers ( fi )
870 – 899 4
900 – 929 6
930 – 959 10
960 – 989 13
990 – 1019 8
1020 – 1049 7
1050 – 1079 2
TOTAL 50

Solution:

Weekly Wages (in peso) Number of Workers ( fi Lower class


) boundary
870 – 899 4 869.5
900 – 929 6 899.5
930 – 959 10 ] d1 = 3 929.5
960 – 989 – 13 ] 959.5
modal class
990 – 1019 8] d2 989.5
= 5
1020 – 1049 7 1019.5
1050 – 1079 2 1049.5
TOTAL n = 50

The modal class is the 960 – 989 class.

M0 = LM0 + [ ___d1____ ] x C
[ d 1 + d2 ]

= 959.5 + [ __3__ ] x 30
[ 3+5 ]

= 959.5 + _90_
8
M0 = P970.75

STAT 1N
Measures of Central Tendency – Ungrouped Data
Measures of Central Tendency
Description of statistical data can be quite brief or elaborate depending on the nature of
the data or what we intend to do. Sometimes, presenting data as they are, in raw form and
letting them speak for themselves may be quite satisfactory but, data summarized further by
Jullie Carmelle H. Chatto
BSA-II
means of appropriate statistical description give more useful information. One of these
apropriate statistical description is the MEASURES OF CENTRAL TENDENCY.
MEASURES OF CENTRAL TENDENCY of a given set of data is the value around which the
whole set of
data tend to cluster. It is represented by a single
number which
summarizes and describes the whole set.

The most commonly used measures of central tendency are :


a. arithmetic mean
b. median
c. mode

MEASURES OF CENTRAL TENDENCY OF UNGROUPED DATA

Ungrouped data - refer to data not organized into frequency distribution.

1. Arithmetic Mean – maybe defined as an arithmetic average.


- it is the sum of the observed values divided by the number of observations.
- it is a computed average and its magnitude is influenced by every value in the
set.
- it is the location measure most frequently used, but can be misleading when
the
distribution contains extremely large or small values.

FORMULAS
n
Ʃ Xi
u = __i = 1_____
N

where : u ( read as mu ) = population mean


N = total number of items in the population
Xi = the ith observed value
Ʃ = ( summation symbol ) means the sum of
i = value of each item

n
Ʃ Xi
X̅ = __i = 1___
n

where : X̅ ( read as bar x ) = sample mean


n = total number of items in the sample

A. UNWEIGHTED or SIMPLE MEAN - takes into consideration each of the item value without
regard to
their relative importance.

Example :
What is the mean age of a group of 8 children whose ages are : 8, 8 ½, 9, 10, 9
½, 10, 12, 13 ?

Solution :
Given : n = 8
n
Jullie Carmelle H. Chatto
BSA-II
Ʃ Xi
X̅ = __i = 1___
n

= 8 + 8 ½ + 9 + 10 + 9 ½ + 10 + 12 + 13
8
= _80_
8
= 10 years old

B. WEIGHTED MEAN - takes into consideration the proper weights assigned to the observed
values
according to their relative importance.

n
Ʃ
X̅ = __i = 1__Wi Xi____________
Ʃ Wi

Where : Wi = weight of each item


Xi = value of each item
X̅ = mean
Example :
A market vendor sold 3 dozens of eggs at P72.00 per dozen, 5 dozens at P77.40
per dozen, and 2 dozens at P85.80 per dozen. Find the weighted mean price per dozen
of eggs the vendor sold.
Solution : X̅ = _W1 X1 +__ W2 X2 +__W3 X3___
W 1 + W2 + W 3
= 3 ( P72.00 ) + 5 ( P P77.40 ) + 2 ( P85.80 )
3 + 5 + 2
= P216.00 + P387.00 + P171.60
10
= P774.60
10
= P77.46 per dozen

2. MEDIAN - is the midpoint of the distribution.


- half of the values in the distribution fall below the median, and the other half fall
above it.
- for distribution having an even number of arrayed observations, the median is
the average
of the two middle values.
- for odd number of arrayed observations, the median is the middle value.
- is the most appropriate locator of center, since it has resistance to extreme value.
- it is a positional average, hence, its value depends on its position relative to the
number of
observations in the array and on the number of items in the distribution.
Examples :
1. Find the median of the following set of observations : 1, 8, 7, 4, 3
Solution :
Array the set of observations and find the median.
1, 3, 4, 7, 8
The median is 4, which is the middle item.
2. Compute for the median from the following set of data : 14, 12, 7, 9, 10,
6
Solution :
Array the data and compute for the median.
6, 7, 9, 10, 12, 14
Median = 9 + 10
2
Jullie Carmelle H. Chatto
BSA-II
= 9.5

3. MODE - the value that appears with the highest ( greatest ) frequency.
- the value that appears most often.
Examples :
1. Determine the mode of the following distribution : 3, 8, 10, 5, 3, 5, 2, 5, 7
The mode is 5. It is uni-modal.
2. Find the mode of the following distribution : 20, 15, 10, 9, 7, 20, 10, 10, 20
The modes are 20 and 10.
3. Determine the mode of the following distribution : 7, 5, 10, 23, 11, 8, 15
There is no mode since all frequencies are not repeated.
Activity:
1. The following distribution are the scores obtained by 10 applicants in the
Entrance Examination for 1st year college of ABM School:
60 75 85 90 98 80 75 75 95 90
Find the: a. mean
b. median
c. mode

2. In a certain Statistics class, a student obtained a score of 90 in a 30-minute


quiz, a score of 85 in a one hour quiz and a score of 88 in a 1 ½ hour quiz.
Find the student’s mean score in the 3 tests.

STAT 1N
QUANTILES OR FRACTILES: GROUPED DATA
In a frequency distribution, the quantiles or fractiles is a value at or below which a given
fraction of the distribution must lie. Like the median, the quantiles or fractiles are also
positional measures.

Quartiles – are values that divide a distribution into 4 equal parts.

To compute for the quartiles, we use the following formulas:

Q1 = LQ1 + n/4 - F ≤ Q1 x C
fQ1

Q2 = LQ2 + 2n/4 - F ≤ Q2 x C
fQ2

Q3 = LQ3 + 3n/4 - F ≤ Q3 x C
fQ3
where:
Q1, Q2, Q3 = are the quartiles
fQ1, f2, fQ3 = are their respective frequencies
LQ1, LQ2, LQ3 = are their respective class boundaries
F ≤ Q1, F ≤ Q2, F ≤ Q3 = are the respective cumulative less than frequency immediately
preceding the given
quartile class
n = total frequency
C = class size

Example: Referring to the distribution below, find the Q 1, Q2 and Q3.

Weekly Wage ( in peso ) Number of Workers ( f ) Cumulative “less than”


frequency ( F≤ )
870 – 899 4 4
900 – 929 6 10
Q1 930 – 959 10 20
Q2 960 – 989 13 33
Jullie Carmelle H. Chatto
BSA-II
Q3 990 – 1019 8 41
1020 – 1049 7 48
1050 - 1079 2 50
n = 50

First find:
Q1 class = n/4 = 50/4 = 12.5
Q2 class = 2n/4 = n/2 = 50/2 = 25
Q3 class = 3n/4 = 3 ( 50 )/4 = 37.5
Solutions:
Q1 = LQ1 + n/4 - F ≤ Q1 x C
fQ1
= 929.5 + 50/4 - 10 x 30
10
= 929.5 + 7.5
= P937.00

Q2 = LQ2 + 2n/4 - F ≤ Q2 x C
fQ2
= 959.5 + [2(50)/4] - 20 x 30
13
= 959.5 + 25 – 20 x 30
13
= 959.5 + 11.54
= P971.04

Q3 = LQ3 + 3n/4 - F ≤ Q3 x C
fQ3
= 989.5 + 3(50) - 33 x 30
__4________
8
= 989.5 + 37.5 - 33 x 30
8
= 989.5 + 16.88
= P1,006.38

DECILES: GROUPED DATA


DECILES – are values that divide the distribution into 10 equal parts.
PERCENTILES – are values that divide the distribution into 100 equal parts.
The quartiles, deciles and the percentiles are computed in the same way as the
median. For example, to find the decile D4, we use the formula:
D4 = LD4 + 4n - F ≤ D4 x C
_10________
f4

To determine percentile P30, we use the formula:

P30 = LP30 + 30n - F ≤ P30 x C


_100________
f P30

Example: Referring to the distribution below, find D 6 and P40.

Weekly Wage ( in peso ) Number of Workers ( f ) Cumulative “less than”


frequency ( F≤ )
870 – 899 4 4
Jullie Carmelle H. Chatto
BSA-II
900 – 929 6 10
P40 930 – 959 10 20
D6 960 – 989 13 33
990 – 1019 8 41
1020 – 1049 7 48
1050 - 1079 2 50
n = 50

Solution:
D6 = LD6 + 6n - F ≤ D6 x C
_10________
f6
= 959.5 + 6(50) - 20 x 30
_ 10_______
13
= 959.5 + 23.08
= P982.58

P40 = LP40 + 40n - F ≤ P40 x C


_100________
fP40
= 929.5 + 40(50) - 10 x 30
_ 100________
10
= 929.5 + 20 - 10 x 30
10
= P959.50

Activity :
Referring to the distribution below, find the following:
1. Q4
2. D8
3. D1
4. P20
5. P75

You might also like