Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

APPLIED STATISTICS FOR BUSINESS AND ECONOMICS 1.

3 TYPES OF DATA
MODULE 1 DATA TYPES. The characteristics of data will help the researcher to
select and apply proper statistical procedures to present the data
1.1 INTRODUCTION accurately.

WHAT IS STATISTICS? 2 TYPES OF DATA


 Statistics is a branch of mathematics that deals primarily with QUALITATIVE DATA – also known as categorical data, expresses
collection, presentation, analysis, and interpretation of qualities, attributes or responses. They are normally in nominal form where
quantitative data to draw sound conclusions in phases of mathematical operations cannot be used and answerable only by two
uncertainty. Thus, examining these connections as well as the responses such as gender. We can “quantify” such data by transforming
branches of statistics. them into dummy data – that is, giving a numeric value to their presence or
absence such as 0 for male and 1 for female.
QUANTITATIVE DATA – also known as numerical data, are obtained from
measurement. It expresses an amount using a specified level or type of
measurement. They can be discrete (whole numbers) or continuous
(fractions, decimals, infinite numbers).
o Discrete variable can assume only certain values, and there are
gaps between values. Example: Number of bedrooms in
(1,2,3,4). Notice that a home can have 3 or 4 bedrooms, but it
cannot have 3.56 bedrooms. Thus, there is a “gap” between
possible values. Typically, discrete variables result from counting.
o Continuous variable can assume any value within a specific
range. Example: Grade point average (GPA), we could report the
Importance of Statistics GPA of a student as 1.2576952. The usual practice is to round it
off to 2 decimal places – 1.26. Typically, continuous variables
1. Data are everywhere. result from measuring.
2. Statistical techniques are used to make decisions that affect
our daily lives. 1.4 LEVELS OF MEASUREMENT
3. Knowledge of statistical methods will help you understand how There are four levels of measurement: nominal, ordinal, interval, and ratio.
decisions are made and give a better understanding on how it The lowest, or the most primitive, measurement is the nominal level. The
can affect an individual. highest, or the level that gives us the most information about the
observation, is the ratio level of measurement.
1.2 DESCRIPTIVE AND INFERENTIAL
1. NOMINAL DATA. The simplest form of data. No scales, no
measurements involved as these are just counts to differentiate
classes or categories and hence, no mathematical operation can be
performed. They are mutually exclusive (an individual or object is
included only in one category) and exhaustive (individual, object or
measurement is included must appear in one category).

Example. Country, Race, and Hair color


2. ORDINAL DATA. These are the ones used in ranking (or levels RATIO DATA. The highest form of measurement, it has the properties of
that indicates superiority from one level to another). It indicates a interval measurement plus two more properties: (1) it has a meaningful
position in a series or order such as 1st, 2nd, 3rd. zero point – the complete absence of the data being measured and; (2) the
ratio between two numbers is meaningful.
Example:

1.5 POPULATION AND SAMPLE

POPULATION is the entire set of individuals or objects on interest, or the


measurements obtained from all individuals or objects of interest. Such
observations have common attributes, which are classified or grouped to
determine certain movements, which can be useful to a researcher. The
scope of such data can be from individuals to even across-country cases. If
all data are considered, then the results can be said as accurate.
3. INTERVAL DATA. Possesses the characteristic of ordinal numbers
and differentiates between classes or categories, normally of equal SAMPLE refers to taking a particular part to represent the population and
unit of measurement between each score. Examples of which are facilitate easy processing in terms of lesser cost and practicality. On the
Temperature, IQ Scores, Personality Scores, and the likes. other hand, since it is only a part, then sampling error is expected. The
researcher should establish first the reliability of such before drawing any
conclusion or generalizations from the sample in reference to the
population.
EXAMPLE:
o Modal Instance— based on typical case or certain indicators like
1.6 SAMPLING PROCEDURES income, age etc. (which makes it highly subjective; used in informal
studies).
NON-PROBABILITY SAMPLING PROBABILITY SAMPLING
Definition and Procedure in Drawing a Sample Example: A researcher wants to study the typical video game user. They
Subjects are chosen without regard Subjects are chosen on the basis do an initial study of a wide range of people and find that the majority of
to their probability. It does not of known probability (chance). The people playing video games are prosperous males between the ages of 18
involve random selection and no exact reversal of NPS, it involves and 25. They then recruit only people who fit this criteria to do their study.
sampling frame. random selection and follows a
sampling frame. o Expert Sampling— sampling expert subjects on a certain topic or
Used when….. criterion.
Pilot studies or small-scale project Whenever possible
market Example: A study of expert research engineers starts with an exploration of
Advantages who other engineers look up to and who are most valued by their
Convenient, speedy, less costly Samples are representative of the employers. The result determines that 'expert' can be defined as only those
population thus, can be used to who have been awarded ten or more patents and who have at least twelve
draw sound conclusions about the years of experience.
population.
Disadvantages o Quota Sampling—involves a pre-defined number of samples and
Less representative, lack of Costly, rigorous (hence, slow normally used in two-criterion observations such as gender. The
accuracy due to selection bias, process) and poses considerable two sides can be proportional or non – proportional.
cannot be readily used to draw inconvenience on the researcher
conclusions about the population. Quota sampling is of two types; first proportionate quota sampling
represent the characteristics of major population by sampling a proportional
METHODS FOR NON-PROBABILITY (NON-RANDOM) SAMPLING total. Example if we are interested in studying population of 40 percent of
females and 60 percent of males. We need a 100 size for the sample: the
1. ACCIDENTAL/CONVENIENCE SAMPLING— also known as selection will not stop unless the target is hit before stopping. Meanwhile
Haphazard or Convenience or “Man on the Street Interview ” when the exact number of either male or female is gotten, say 40 females,
Sampling Technique. It involves choosing the samples (subjects) the selection for the male has to continue in the same process, eventually
based on availability, proximity, and convenience of the researcher when a legitimate female comes across, it will not be selected because
or by the use of volunteers (self - selected samples). their number is already completed.

2. PURPOSIVE SAMPLING— subjects are chosen based on the o Snowball Sampling— also known as “pyramiding”, it involves
purpose of the study; hence they are practically a “pre-defined samples based on recommendations from the prior sample
group” who knows where, whom and how to go with the research
goals. This technique is useful in reaching targeted groups but Example: A researcher is studying environmental engineers but can only
highly discouraged especially in economic and business research find five. She asks these engineers if they know any more. They give her
due to its biased nature. Nonetheless this technique can still be several further referrals, who in turn provide additional contacts. In this
useful in qualitative assessments. Common methods of purposive way, she manages to contact sufficient engineers.
sampling are as follows:
METHODS FOR PROBABILITY (RANDOM) SAMPLING 2. SYSTEMATIC SAMPLING— employs the use of intervals (k)
drawn by dividing the number of population by the number of target
1. SIMPLE RANDOM SAMPLING— a sample selected so that each sample (N / n).
item or person in the population has the same chance of being
included. Example: Sales division of Computer Graphic Inc. needs to quickly
estimate the mean dollar revenue per sale during the past month. It finds
A. Fishbowl Method— uses the notion of “lottery” thus for each that 2,000 sales invoices were recorded and stored in file drawers and
observation, the probability of being selected is 1/N where N is decides to select 100 invoices to estimate the mean dollar revenue. Simple
the total number of observations in the population. random sampling requires the numbering of each invoice before using the
random number table to select the 100 invoices.
Example: A population consists of 845 employees of Mitra Industries A
sample of 52 employees are to be selected from that population. One way First, k is calculated as the population size divided by the sample size. For
of ensuring that every employee in the population has the same chance of Computer Graphic Inc., we would select every 20th (2,000/100) invoice
being chosen is to first write the name of each employee on a small slip of from the file drawers; in so doing, the numbering process is avoided. If k is
paper and deposit all of the slips in a box. not a whole number, then round down.

B. Table of Random Numbers— uses an array of randomly Random sampling is used in the selection of the first invoice. For example,
arranged numbers to determine who will be included in the a number from a random number table between 1 and k, or 20, would be
sample. selected. Say the random number was 18. Then, starting with the 18th
invoice, every 20th invoice (18, 38, 58, etc.) would be selected as the
Example: To select a sample of employees, you first choose a starting sample.
point in the table. Any starting point will do. Suppose the time is 3:04. You
might look at the third column and then move down to the fourth set of 3. STRATIFIED SAMPLING— taking sample per strata or groups
numbers. The number is 03759. Since there are only 845 employees, we within a population (say, the number of students per college in a
will use the first three digits of a five-digit random number. Thus, 037 is the certain university). The process is done by reversing systematic
number of the first employee to be a member of the sample. sampling: % k = (n / N) x 100

Example: College students can be grouped as full time or part time,


male or female, or traditional or nontraditional. Once the strata are
defined, we can apply simple random sampling within each group or
stratum to collect the sample.

4. CLUSTER SAMPLING— taking sample from a group of the same


or similar elements gathered or occurring closely together like
geographical area etc.

Example: Suppose you divided NCR into 12 primary units, then selected at
random four regions—2, 7, 4, and 12 — and concentrated your efforts in
these primary units. You could take a random sample of the residents in
each of these regions and interview them. (Note that this is a combination The margin of error is the error we expect to commit in getting the sample
of cluster sampling and simple random sampling.) since it is an estimate parameter.
Example:
5. MULTISTAGE SAMPLING— also called multistage cluster A group of researchers was tasked by the Department of Education to
sampling, is exactly what it sounds like – sampling in stages. survey whether student in Metro Manila is in favor to move the start of
classes from June to September. If there are 1,000,000 students and 10%
It is a more complex form of cluster sampling, in which smaller groups are margin of error are expected, compute the sample size.
successively selected from large populations to form the sample population Given: N = 1,000,000 e = 10%
used in your study. Due to this multi-step nature, the sampling method is
sometimes referred to as phase sampling.

Example:

1.8 ECONOMICS AND BUSINESS STATISTICS


Economic statistics therefore is a method of statistical techniques using
business and / or economic data in order to draw sound conclusions
intended to describe, validate, and forecast economic behavior in times
1.7 DETERMINING THE SAMPLE SIZE when we are faced with uncertainty. This branch of economic science is
primarily one of the founding blocks in understanding a far more advanced
Most of the surveys conducted are done on a sample basis because of branch called Econometrics.
financial and economic considerations, time and manageability of data
involved. If the population size is used when sampling from a finite Importance of Statistics to Different Professions
population of (N) individuals, the sample size (n) may be obtained from the Statistics as a science of numbers is very much needed and appreciated
Slovin’s formula: by virtually all professionals out there. This branch of science gives them
power to reach sound decisions when faced in phases of uncertainty.
Examples of users of such drawn information includes but not limited to the
following: economist, businessmen, engineers, political scientist,
sociologist, weather forecaster, medical doctor, industrial psychologist,
chemist.
collects them by the same process. A question may be structured or
unstructured questions. A structured question is also called a
MODULE 2 close-ended question. It is a type of question where all respondents
must tick or check from a list of choices the best appropriate
2.1 INTRODUCTION response. On the other hand, unstructured question is a type of
question where the purpose is to solicit opinion, ideas, or perception
WHAT IS PRIMARY AND SECONDARY DATA? of a person. This method of collecting data is more economical than
interview because it can involve a greater number of individuals in a
PRIMARY DATA. Data coming from primary sources which include population with the same amount of funds. However, the researcher
government agencies, business establishments, organizations and cannot expect that all the questions mailed will be retrieved since
individuals who carry original data or have firsthand information relevant to many respondents will simply ignore answering the questionnaire.
a given problem.

SECONDARY DATA. Data coming from secondary sources which


include newspapers, magazines, journals and published materials.

2.2 METHODS OF DATA COLLECTION

A. DIRECT OR INTERVIEW METHOD. Interview is a face-to-face


conversation between two persons in which the one soliciting
information (interviewer) and the one supplying the data
(interviewee). Interview method is an effective method when the
elements in a sample are not so numerous. The researcher could
either used personal or telephone interview. The advantage of the
interview method is that the question can be repeated, rephrased,
or modified for better understanding. However, as the method
requires a face-to-face conversation, it may be too costly or
expensive. It also demands much time. C. DOCUMENTS OR REGISTRATION METHOD. This method of
collecting data makes use of important documents such as
Example: the number of households, birth rates, death rates and
o What is your greatest accomplishment? marriages that can be found in both private and government
o Can you tell me about a time where you encountered a business offices. It is very economical not only in terms of cost but also
challenge? in terms of cost but also in terms of time and effort.
o For social media marketing, which platforms do you prefer and
why? Example:

B. INDIRECT OR QUESTIONNAIRE METHOD. Questionnaire is an Philippine Statistics Authority takes care of keeping birth, death, and
instrument that contains prepared set of questions. This can be marriage records.
distributed either via mail or hand carry to the intended person and Commission on Election takes care of updating the list of registered voters.
Companies records of individual employees are kept and later used as hectare. Add a little more for the cost of marketing and the margin is still
basis for promotion and salary increase. substantial. (Philippine Panorama of the Manila Bulletin, June 17, 2007,
p.16)
D. OBSERVATION METHOD. This method is used when we want to
conduct a study by way of direct observation. Obtaining data B. TABULAR FORM . This method presents data in rows and
pertaining to behavior of an individual or a group of individuals at columns. It is more convenient and understandable than textual
the time of occurrence of a given situation. Subjects may be method because the numerical information is displayed in a more
observed individually or collectively depending on the objectives of concise and systematic manner by using a vertical and horizontal
the investigator. One limitation of this method lies in the fact that in lines which describes the corresponding heading. A statistical table
most cases, observation is made only at the time of occurrence of has four essential components: table heading, body, stub, and box
the appropriate events. head.

Example: A newly discovered medicine will increase immune system. Table heading – shows the table number and the title. Table number
Using observation method, select a sample of individuals and ask them to serves to give the table an identity while the title briefly explains what are
drink the said medicine for a specified period of time. After the lapse of the being presented.
said period, each of the sampled individuals will now be asked whether Body – shows the main part of the table which contains the quantitative
medicine has increased or improved their immune system. information.
Stubs – shows the opposite rows of the body and usually to the left are
E. EXPERIMENT METHOD. This method of collecting data is used to labels. These are classifications or categories which are presented as
find the cause-and-effect relationship. Data needed to find out the values of a variable.
cause-and-effect relationship may be obtained through a series of Box heads – the caption that appear above the column. In addition to
experiments. Secondary data can be obtained from: journals and these components, footnotes may be placed immediately below the main
periodicals, newspapers, tables, unpublished or published research part of the table and a source note may be included to acknowledge the
papers and thesis and dissertations. origin of the data which may appear below the title or below the footnote.

Example: A businessman wanted to find out the effect of gasoline additive


to the gasoline consumption of cars.

2.3 METHODS OF DATA PRESENTATION

A. TEXTUAL FORM. This method which is also called the paragraph


method combines text and figures in a statistical report. This
method presents data in paragraph form and becomes effective
when the objectives is to call the reader’s attention to some data
that require special emphasis.

Example: Asha, Raffy says, is profitable to grow because it yields 3.5 to 4


tons of pods per hectare, equivalent to about 1.8 tons of shelled nuts. At
the farm gate price of Php 60 per kilo shelled nuts, the gross could be Php
108,000, Raffy said the average cost of production is Php 30,000 per
C. GRAPHICAL FORM. This method shows visual presentation of the 3. Pie Graph. A circle or pie graph is commonly used method for
data. Graphs provide us an easier way to identify patterns of a set displaying information in a graphical form. As the name suggest, a
of data. The graphical form is the most effective way of presenting circle graph consists of a circular region divided into sections that
statistical data because important relationship is brought about do not overlap, and each section represents a part or percentage of
clearly. Comparison and trends of quantitative values are readily the whole being considered.
available to enable ease of communication of results or information.
Graphs are various types: bar graph, line graph, pie chart, To get an idea of how the family budgeted the monthly income of Php
pictograph, and statistical maps. 15,000, the table below shows the distribution.

1. Bar Graph. A bar graph can be used to organized data/information


visually. Bar graphs are helpful in comparing quantities. Bar graph
can be horizontal or vertical.

2. Line Graph. A line graph is the most practical and effective device
which shows a general trend, pattern or changes over a given time.
It makes use of ordered pairs and graph of ordered pairs in a
coordinate plane. The categories or time periods are chronologically
arranged on the horizontal axis and the relevant values are 4. Picture Graph. A picture graph or pictograph used to describe the
indicated in the vertical axis. This figure below illustrates line graph. difference among a few quantities. It is very effective tool for
attracting attention since it uses pictures or symbols to indicate the
message of the obtained numerical information.
2.4 FREQUENCY DISTRIBUTION The ideal number of class intervals should be 5 to 15. Less than 8 class
intervals are recommended for a data with less than 50
A frequency distribution is a tabular arrangement of the data by using observations/values. For a data with 50 to 100 observations/values, the
categories or classes and their corresponding frequencies. The frequency suggested number should be greater than 8. Please note that the few
of a particular observation is the number of times the observation occurs in number of class intervals will result to crowded data while too many
a category or class. numbers of class intervals tend to spread out the data too much.

Example: Steps in Constructing Frequency Distribution Table

A sample of fifty customer at a newly open supermarket has been 1. Decide on the number of class interval to use between 5-15. Too
selected at random. The following data show the customers’ ages. many class intervals result to several empty class intervals while too few
creates long details. Use the Sturge’s formula whenever possible.

2. Compute the Range. This is the difference between the highest value
and the lowest value in the set of data.

R = HO – LO

The numbers above whether arranged or not by magnitude are called Where:
raw data.
R = Range
One of the most convenient rules which gives explicit guidelines for the HO = Highest Observation
number of classes to be use is the Sturge’s Rule. This number of classes
is determined according to the following formula: LO = Lowest Observation

K = 1 + 3.3 log N Where: Solution:

K = number of class intervals R = 60-10


N = total number of observations
log N = logarithm of N to the base 10 R =50

There are 50 observations, and we can determine the number of classes 3. Determine the class size or class width. This is the distance or gap
using the Sturge’s Rule as follows: between the lower limit and the upper limit. It is obtained by dividing the
range by the number of classes.
class boundary. The class boundary is the midway between the upper limit
and lower the limit of the next higher-class interval.

If we are dealing with discrete data, the class boundaries are obtained by
Note: If the number of observations is in tenths, ex. if the highest value is adding 0.5 to the upper limit and at the same time subtracting 0.5 from the
4.9 and the lowest value = 1.8 and using class intervals the class size is lower limit. For example, we have a class of 3-5. The class boundaries are
obtained as follows. 2.5-5.5. The 2.5 lower boundary is obtained by subtracting 0.5 from 3. The
upper boundary 5.5 is obtained by adding 0.5 to 5.

If data are continuous data, the number to be added to the upper limit and
the number of decimal places a particular observation or case has.
Assuming we have a class of 15.4-18.6. Observe that there is one decimal
place. To get the lower-class boundary and the upper-class boundary,
subtract 0.05 from 0.4 and add 0.5 to 0.6. The lower- and upper-class
boundaries are 15.35 – 18.65, respectively. Suppose we have the class
Similarly, if the number of observations is in hundredths, ex. if the highest 0.346-0.418, what will be its class boundaries? There are three decimal
value = 17.68 and the lowest value = 15.29, respectively and using 9 class places. To get the lower-class boundary, subtract 0.005 from 0.348 to get
intervals, the suggested class size is obtained as follows. 0.3455. To get the upper-class boundary, add 0.005 to 0.418 to get 0.4185.
The class boundaries are therefore 0.3455 – 0.4185.

7. Find the Class Mark or Class Midpoint – this is needed in computing


the mean and some measures of variability. It is obtained by taking the
average of the lower and upper limit.

8. Tally the row scores and indicate the frequency for each of the
4. Choose an appropriate lower limit for the first-class interval. This class intervals.
number shall be less than or equal to the lowest value in the data. It is
more convenient to use a lower limit that is divisible by the class width. Add 9. Get the relative frequency. This gives us the percentage of
the class width to obtain the next lower-class limit. Keep on adding the observations in a particular class of interest. This is obtained by dividing
class width to get all the other lower-class limits. the frequency of the class by the total number of frequency/observations.

5. Find the upper-class limits. If the class size is rounded off the unit’s
place, subtract 1 from the second lower class limits to arrive at the first
upper class limit. Subtract 0.1 from the result, if rounded off to the tenth
place and subtract 0.01 if rounded to the hundredths place.

6. Determine the class boundaries. The class boundaries are the true
limits of a class interval made up of the lower-class boundary and upper-
10. Add the frequencies and indicate the sum.
Step 8: Put a mark beside the appropriate class for each number in the
data set, write down the frequencies and find the less than and the greater
than cumulative frequencies, F< and F>, respectively. The less than
cumulative frequencies (F<) are determined by adding the frequency of the
first-class interval to the next to obtain the frequency of the second-class
interval; then result

added to the next until the total frequency is arrived at. To illustrate using
the table below, we shall start with 7 as the frequency of the first-class
interval under F<; then 7 + 18 = 25, the frequency of the second interval of
F<; 25 + 10 = 35, the frequency of the third interval for F<, etc.

The greater than cumulative frequencies (F>) are arrived at by assigning


the total frequency for the first interval of the F>, then subtracting the
frequency of the first-class interval of the distribution from this total
frequency we shall obtain the frequency of the second interval of F>; and
repeating the process until the frequencies of the last interval of the
frequency distribution and last interval of the frequency distribution and the
last interval under F> are equal, then 50 - 7 = 43, 43 – 18 = 25, etc.

Hence the table below shows the frequencies of the interval F< and F> of
the frequency distribution.
SAMPLE MEAN
MODULES 3 & 4

3.1 MEASURE OF CENTRAL TENDENCY


Measures of central tendency commonly referred to us as an average. The
purpose of an average is to pinpoint the center in the set of observations.
Three measures of central tendency commonly used I business and
economics are arithmetic mean, median and mode.

Statistics is any measure calculated from sample data, Thus, measures of


location from a sample are statistic. On the other hand, numerical values
Example: The monthly salaries of six librarian in the University of the
calculated from population data are called parameters. In order to
Philippines are Php 17,200, Php 17,500, Php 18400, Php 17,800, Php
distinguish one type from the other, the following notations shall be used:
18,100 and Php 19,000. Find the average monthly salary.

3.2 METHODS OF CENTRAL TENDENCY OF UNGROUPED DATA


WEIGHTED MEAN
Ungrouped or raw data are those data which are not yet organized or
There are some cases where values are given more importance than
arranged into frequency distribution.
others. The mean derived in this case is known as the weighted mean.
A. ARTIHMETIC MEAN Where:
Arithmetic mean or arithmetic average is defined as the sum of the values 𝒙̅ = weighted mean
the variables divided by the number of observations. The definition is the 𝑤𝑖 = weight of each item
same for both sample and the population, although we use a different 𝑥𝑖 = value of each item
symbol to refer to each.
∑𝑤 = sum of weights
POPULATION MEAN
Where: Example:
𝜇 = population mean
N = total number of observations
Xi = observed value
∑ = summation notation
WWW construction firm has 10 workers who are paid Php 350 per day, 5
workers who are paid Php 455 per day and 2 workers who are paid Php
600 per day. What is the weighted daily wage of 17 construction workers?

Example 2:
Worker participation in management is a new concept that involves
employees in corporate decision making. The following data are the
percentages of employees involved in worker participation programs in a
sample of 12 firms. 32,33,35,42,43,42,5,46,44,47,48,48. Find the median.

𝒙̅ = 𝑃h𝑝 410.2941176 𝑜𝑟 𝑃h𝑝 410.29


B. MEDIAN
The median of ungrouped data arranged in an array (increasing or
decreasing order of magnitude? Is the middle observation for an odd
number of items or the arithmetic mean of two middle values when the The median is 43.5, which is
number of items in the distribution is even. the arithmetic mean of the
two middle values.

Example 1: Note: The median 43.5 is not


The weekly wages in pesos of a production worker in a manufacturing found in the given set of
company are: Php 720, Php 738, Php 750, Php 720, Php 732, Php 690, observations. The mean and
Php 684, Php 762, Php 762, Php 696 and Php 690. Find the median. median may or may not be
the same value.

C. MODE
The mode for the ungrouped data is defined as the value that appears with
the highest frequency. That is, the item that appears most often, usually
denoted by 𝒙̂ (read as x hat). It is generally used with nominal data. It can
be easily identified by inspection of an ungrouped set of data by getting the
score or item which occurs most frequently.

When all values appear with the same frequency, the mode does not exist.
The median is Php 720, which A distribution with the only one mode is called unimodal while a
is the middle item when the
items are arranged in
distribution which has two modes is bimodal; and for the same sets of data Decile: Data set is divided into 10 equally divided parts.
with the three or more modes is known as multimodal. Quartile: Data set is divided into 4 parts.
Note: At certain points, these three measures will have the same values.

Example 1:
Find the mode of the following set of items: 3,5,8,9,10
Answer:
The answer is no mode because there is no value that occurs more
than once.

Example 2:
Find the mode of the following set of items: 33,45,38,38, 49,60
Answer:
The mode is 38. It is unimodal.

Example 3:
Find the mode of the following set of items: 13,13,14,12,15,18,17,17
Answer: The following guidelines will help identify percentile location.
The mode is 13 and 17. Both values appeared twice, then we can say 1. If Lp is whole number, the percentile location is the Lth in the
that this is bimodal. ordered set of observations.
2. If Lp is not a whole number, the percentile location between the Lth
Example 4: and (L + 1)st, by taking the difference between the Lth and (L+1)st
Find the mode of the following set of items: 3,5,7,7,8,5,5,8,8,9,10,9,9 location and multiply the result by the decimal portion of Lp.
Answer: 3.
The mode is 5, 8, and 9. Values appeared thrice, then we can say that Note: Deciles and quartiles, which are synonymous with percentiles in
this is trimodal or multimodal. equal intervals of 10 and 25, respectively, can be calculated using the
same formula by replacing the “P” with a “D” or “Q.”

4.1 MEASURES OF RELATIVE STANDING: UNGROUPED DATA Example:


Measures of central location provide us information about central values or List of the daily wages of 20 tailors of LMN Dressing Company are the
averages. Another set of measures that helps us describe data set are the ff:
measures of relative standing. Quantiles are extension of the median
concept; these are the values which divide a set of data into equal parts.
The measures of relative standing are the quartiles, deciles, and
percentiles.

Percentile: The whole data set is equally divided into 100 parts.
th
The product of 50 and .7 is 35. Add 35 to 500, the 14 value. So now
decile 7 is 535 (500 + 35).
Compute for - a: Q1 b: D7 c: P87 (values is in Php)
Interpretation:
Decile 7 tells us that the lower 70% has wages less than 535 and the
upper 70% has wages greater than 535.

th th
Quartile 1 or Percentile 25 lies between the 5 and 6 ordered value
because the percentile location is not a whole number. We will take the th th
th th th th Percentile 87 lies between the 18 and 19 ordered value because the
difference between the 5 and 6 value. The 5 value is 290 and the 6 percentile location is not a whole number. We will take the difference
value is 300. The difference between 290 and 300 is 10. Multiply 10 by the th th th th
decimal portion of the percentile location, which in our case is .25. between the 18 and 19 value. The 18 value is 615 and the 19 value
th is 630. The difference between 615 and 630 is 15. Multiply 15 by the
The product of 10 and .25 is 2.5. Add 2.5 to 290, the 5 value. So now decimal portion of the percentile location, which in our case is .27.
quartile 1 is 292.5 (290 + 2.5). th
The product of 15 and .27 is 4.05. Add 4.05 to 615, the 18 value. So
Interpretation: now percentile 87 is 619.05 (615 + 4.05).
Quartile 1 tells us that the lower 25% has wages less than 292.5 and
the upper 25% has wages greater than 292.5. Interpretation:
Percentile 87 tells us that the lower 87% has wages less than 619.05
and the upper 87% has wages greater than 619.05.

D. Supposethatintheexampleabovenis23,with3morevaluesgreaterthan
650,findPercentile 75.

th th
Decile 7 or Percentile 70 lies between the 14 and 15 ordered value
because the percentile location is not a whole number. We will take the
th th th
difference between the 14 and 15 value. The 14 value is 500 and the
th
15 value is 550. The difference between 500 and 550 is 50. Multiply 50
by the decimal portion of the percentile location, which in our case is .7.
Percentile location is a whole
number. Therefore percentile 75 is Steps in Identifying the Quartile 1 class.
th 1. Divide the number of observations by 4.
the 18 observation, which is 615.
2. Go over the entries in the less than cumulative frequency column.
The class that has a sum of frequencies greater than the n/4 is the
quartile 1 class.
Solution:
4.2 MEASURES OF RELATIVE STANDING: GROUPED DATA
Steps:
1. n/4 = 43/4 = 10.75
Quartiles: Grouped Data
2. Quartile 1 class is the second class because the sum of the
The formula for quartiles will patterned from the median formula. If we
frequencies of the second class is greater than 10.75.
compute for example quartile 1, the formula is:

Where:
lbQ1 = lower boundary of quartile 1 class
n = number of observations
cfq1 = cumulative frequency before quartile 1 class
fq1 = frequency of quartile 1 class
𝑖 = class interval
Deciles: Grouped Data
The formula for deciles will also be patterned from the median formula.
Suppose we want to know the decile 7. The formula for decile 7 is given
below.
Where:
LbD1 = lower boundary of decile 7
class n = number of observations
cfD1 = cumulative frequency
before decile 7 class
fD1 = frequency of decile 7 class
𝑖 = class interval
Steps in Identifying the Decile 7 class.
1. Multiply data set by 7 and divide the product by 10. Steps in Identifying the Percentile 85 class.
2. The decile 7 class is the class that has a sum of frequencies greater 1. Multiply data set by 85 and divide the product by 100.
than the result of step 1. 2. The percentile 85 class is the class that has a sum of frequencies
greater than the result of step 1.

Solution: Solution:
Steps: Steps:
1. 7n/10 = 7 (43)/10 = 30.1 1. 85n/10 = 85 (43)/100 = 36.55
2. Decile 7 class is the class that has a sum of frequencies greater 2. Percentile 85 class is the fourth class. Sum of frequencies of that
than the result of step 1. class is greater than 36.77.
3. The third class is the decile 7 class because the sum of the
frequencies in the class which is 31 is greater than 30.1.

Percentiles: Grouped Data


The formula for percentiles divides the set of observations into 100 equal
divisions. Same scenario, this will be also patterned in the median formula.
For example, we want to find out the percentile 85 location. The formula is
given below:
The semi-interquartile range (SIQR) or quartile deviation (QD) indicates the
variation or dispersion of the values covering the middle 50% of the
distribution of the data is found by getting half of the value or distance
between or distance between the third quartile or upper quartile and the
first quartile or the lower quartile.

MODULE 5
Example for Ungrouped Data:
5.1 MEASURES OF VARIABILITY (DISPERSION): RANGE A company produces the following number of units for a given period:
21, 25, 20, 28, 30, 23, 22, 31, 32, 27, 19, 33, 24, 29, 26 and 34.
A. Range
The simplest measure of dispersion is the range. It is the difference Determine the following: A. Range B. Interquartile Range (IQR) and C.
between the largest and the smallest values in a data set. The range for Semi-interquartile (SQR)
the ungrouped data is obtained by finding the difference between the
highest and the lowest value. For grouped data, the range is determined by Solution:
subtracting the lower boundary of the lowest class interval from the upper
boundary of the highest-class interval of a frequency distribution because
the class boundaries are considered the true limits.

Ungrouped data: Range (R) = Highest Value (H)–Lowest Value (L) or H -


L Grouped data: Range (R) = Upper Boundary of the Highest-Class
Interval (𝑈𝐵𝐻𝐶𝑖) – Lowest-Class Interval (𝐿𝐵𝐿𝐶𝑖) or 𝑈𝐵𝐻𝐶𝑖 – 𝐿𝐵𝐿𝐶𝑖
To find the IQR and SIQR, arrange first the production units according to
Interquartile Range increasing order and the calculate the 𝑄1 𝑎𝑛𝑑 𝑄3 as shown below.
Quartile divides the distribution of numerical into four equal parts. The first
or lower quartile lies on the 25% of the total number values, while the third
or the upper quartile is on the 75%.

The interquartile range (IQR) is found by finding the difference between the
value of the third quartile (𝑄3) or upper quartile and the first quartile (𝑄1) or
lower quartile.

Semi-Interquartile Range
Example for Grouped Data: B. IQR – Calculate first and third quartile using the formulas that
Table below shows the average production of 60 employees of we use on Quartiles Group Data
manufacturing company in a given week. Find the A. Range B. Interquartile
Range (IQR) and C. Semi-interquartile (SQR)
5.2 MEASURES OF VARIABILITY (DISPERSION): MEAN ABSOLUTE
DEVIATION (MAD)
Mean Deviation or average deviation is defined as the average of the
absolute deviations of the individual values of a set of numerical data from
either mean, the median or mode. Among the three, the mean is the most
preferred and commonly used measure of central tendency for computing
the deviation or average deviation.

Where:
xi = refers to the individual value for ungrouped data, and the midpoint of
each class interval for grouped data.
𝑥̅ = the mean of data
n = the total number of frequencies
fi = the frequency of each class interval

For the ungrouped data, the mean deviation or average deviation is


determined by the following procedures below:

1. Arrange the values from lowest to highest or vice-versa


2. Compute the value of the mean
3. Find the individual absolute value of each deviation from the mean
4. Find the sum of the absolute value in Step 3 and
5. Substitute the values in the formula and solve.

Example:
The following are random data of the number of fabric conditioner sold by a
grocery store in 10 days. 34, 22, 23, 27, 16, 35, 25, 18, 33 and 37. For the grouped data, the mean deviation or average deviation is
determined by the following procedures below:

1. Compute the mean 𝑥̅ of the distribution


2. Subtract the mean from each of the midpoints and write the
absolute values of the results under the column 𝑥 − 𝑥̅
3. Find the product of items under column f and items under column
𝑥− ̅𝑥
4. Add the products in Step 3 to obtain the value of ∑ 𝑓(𝑥 − 𝑥̅ )
5. Divide the sum obtained in Step 4 by n.

Example: Calculate the mean deviation (MD) or average deviation


(AD) of the scores obtained by 90 applicants for employment as shown
on the table below
1. Arrange the values according to magnitude lowest to highest or vice
versa
2. Calculate the mean
3. Obtain the individual deviations from the mean
4. Square each deviation and write the results under column |𝒙 −𝒙̅ | 𝟐
5. Find the sum squared deviations
6. Divide the sum in Step 5 by n-1 for sample data or by n for
population data.

Example:
Determine the variance of the following 10 sample data: 10, 12, 9, 18, 14,
16, 14, 18, 19 and 20
5.2 MEASURES OF VARIABILITY (DISPERSION): VARIANCE

Variance is defined as the average of the squared deviations from the


mean. The square root of this variance is known as standard deviation. The
2
variance for a sample data is denoted by S (read as S squared or the
square of S) while the symbol for variance of the population is 𝜎2 and read
as sigma squared.

To determine the variance of an ungrouped data, let us follow the steps


below:

You might also like