Professional Documents
Culture Documents
APPLIED STATISTICS FOR BUSINESS AND ECONOMICS Midterms Reviewer
APPLIED STATISTICS FOR BUSINESS AND ECONOMICS Midterms Reviewer
3 TYPES OF DATA
MODULE 1 DATA TYPES. The characteristics of data will help the researcher to
select and apply proper statistical procedures to present the data
1.1 INTRODUCTION accurately.
2. PURPOSIVE SAMPLING— subjects are chosen based on the o Snowball Sampling— also known as “pyramiding”, it involves
purpose of the study; hence they are practically a “pre-defined samples based on recommendations from the prior sample
group” who knows where, whom and how to go with the research
goals. This technique is useful in reaching targeted groups but Example: A researcher is studying environmental engineers but can only
highly discouraged especially in economic and business research find five. She asks these engineers if they know any more. They give her
due to its biased nature. Nonetheless this technique can still be several further referrals, who in turn provide additional contacts. In this
useful in qualitative assessments. Common methods of purposive way, she manages to contact sufficient engineers.
sampling are as follows:
METHODS FOR PROBABILITY (RANDOM) SAMPLING 2. SYSTEMATIC SAMPLING— employs the use of intervals (k)
drawn by dividing the number of population by the number of target
1. SIMPLE RANDOM SAMPLING— a sample selected so that each sample (N / n).
item or person in the population has the same chance of being
included. Example: Sales division of Computer Graphic Inc. needs to quickly
estimate the mean dollar revenue per sale during the past month. It finds
A. Fishbowl Method— uses the notion of “lottery” thus for each that 2,000 sales invoices were recorded and stored in file drawers and
observation, the probability of being selected is 1/N where N is decides to select 100 invoices to estimate the mean dollar revenue. Simple
the total number of observations in the population. random sampling requires the numbering of each invoice before using the
random number table to select the 100 invoices.
Example: A population consists of 845 employees of Mitra Industries A
sample of 52 employees are to be selected from that population. One way First, k is calculated as the population size divided by the sample size. For
of ensuring that every employee in the population has the same chance of Computer Graphic Inc., we would select every 20th (2,000/100) invoice
being chosen is to first write the name of each employee on a small slip of from the file drawers; in so doing, the numbering process is avoided. If k is
paper and deposit all of the slips in a box. not a whole number, then round down.
B. Table of Random Numbers— uses an array of randomly Random sampling is used in the selection of the first invoice. For example,
arranged numbers to determine who will be included in the a number from a random number table between 1 and k, or 20, would be
sample. selected. Say the random number was 18. Then, starting with the 18th
invoice, every 20th invoice (18, 38, 58, etc.) would be selected as the
Example: To select a sample of employees, you first choose a starting sample.
point in the table. Any starting point will do. Suppose the time is 3:04. You
might look at the third column and then move down to the fourth set of 3. STRATIFIED SAMPLING— taking sample per strata or groups
numbers. The number is 03759. Since there are only 845 employees, we within a population (say, the number of students per college in a
will use the first three digits of a five-digit random number. Thus, 037 is the certain university). The process is done by reversing systematic
number of the first employee to be a member of the sample. sampling: % k = (n / N) x 100
Example: Suppose you divided NCR into 12 primary units, then selected at
random four regions—2, 7, 4, and 12 — and concentrated your efforts in
these primary units. You could take a random sample of the residents in
each of these regions and interview them. (Note that this is a combination The margin of error is the error we expect to commit in getting the sample
of cluster sampling and simple random sampling.) since it is an estimate parameter.
Example:
5. MULTISTAGE SAMPLING— also called multistage cluster A group of researchers was tasked by the Department of Education to
sampling, is exactly what it sounds like – sampling in stages. survey whether student in Metro Manila is in favor to move the start of
classes from June to September. If there are 1,000,000 students and 10%
It is a more complex form of cluster sampling, in which smaller groups are margin of error are expected, compute the sample size.
successively selected from large populations to form the sample population Given: N = 1,000,000 e = 10%
used in your study. Due to this multi-step nature, the sampling method is
sometimes referred to as phase sampling.
Example:
B. INDIRECT OR QUESTIONNAIRE METHOD. Questionnaire is an Philippine Statistics Authority takes care of keeping birth, death, and
instrument that contains prepared set of questions. This can be marriage records.
distributed either via mail or hand carry to the intended person and Commission on Election takes care of updating the list of registered voters.
Companies records of individual employees are kept and later used as hectare. Add a little more for the cost of marketing and the margin is still
basis for promotion and salary increase. substantial. (Philippine Panorama of the Manila Bulletin, June 17, 2007,
p.16)
D. OBSERVATION METHOD. This method is used when we want to
conduct a study by way of direct observation. Obtaining data B. TABULAR FORM . This method presents data in rows and
pertaining to behavior of an individual or a group of individuals at columns. It is more convenient and understandable than textual
the time of occurrence of a given situation. Subjects may be method because the numerical information is displayed in a more
observed individually or collectively depending on the objectives of concise and systematic manner by using a vertical and horizontal
the investigator. One limitation of this method lies in the fact that in lines which describes the corresponding heading. A statistical table
most cases, observation is made only at the time of occurrence of has four essential components: table heading, body, stub, and box
the appropriate events. head.
Example: A newly discovered medicine will increase immune system. Table heading – shows the table number and the title. Table number
Using observation method, select a sample of individuals and ask them to serves to give the table an identity while the title briefly explains what are
drink the said medicine for a specified period of time. After the lapse of the being presented.
said period, each of the sampled individuals will now be asked whether Body – shows the main part of the table which contains the quantitative
medicine has increased or improved their immune system. information.
Stubs – shows the opposite rows of the body and usually to the left are
E. EXPERIMENT METHOD. This method of collecting data is used to labels. These are classifications or categories which are presented as
find the cause-and-effect relationship. Data needed to find out the values of a variable.
cause-and-effect relationship may be obtained through a series of Box heads – the caption that appear above the column. In addition to
experiments. Secondary data can be obtained from: journals and these components, footnotes may be placed immediately below the main
periodicals, newspapers, tables, unpublished or published research part of the table and a source note may be included to acknowledge the
papers and thesis and dissertations. origin of the data which may appear below the title or below the footnote.
2. Line Graph. A line graph is the most practical and effective device
which shows a general trend, pattern or changes over a given time.
It makes use of ordered pairs and graph of ordered pairs in a
coordinate plane. The categories or time periods are chronologically
arranged on the horizontal axis and the relevant values are 4. Picture Graph. A picture graph or pictograph used to describe the
indicated in the vertical axis. This figure below illustrates line graph. difference among a few quantities. It is very effective tool for
attracting attention since it uses pictures or symbols to indicate the
message of the obtained numerical information.
2.4 FREQUENCY DISTRIBUTION The ideal number of class intervals should be 5 to 15. Less than 8 class
intervals are recommended for a data with less than 50
A frequency distribution is a tabular arrangement of the data by using observations/values. For a data with 50 to 100 observations/values, the
categories or classes and their corresponding frequencies. The frequency suggested number should be greater than 8. Please note that the few
of a particular observation is the number of times the observation occurs in number of class intervals will result to crowded data while too many
a category or class. numbers of class intervals tend to spread out the data too much.
A sample of fifty customer at a newly open supermarket has been 1. Decide on the number of class interval to use between 5-15. Too
selected at random. The following data show the customers’ ages. many class intervals result to several empty class intervals while too few
creates long details. Use the Sturge’s formula whenever possible.
2. Compute the Range. This is the difference between the highest value
and the lowest value in the set of data.
R = HO – LO
The numbers above whether arranged or not by magnitude are called Where:
raw data.
R = Range
One of the most convenient rules which gives explicit guidelines for the HO = Highest Observation
number of classes to be use is the Sturge’s Rule. This number of classes
is determined according to the following formula: LO = Lowest Observation
There are 50 observations, and we can determine the number of classes 3. Determine the class size or class width. This is the distance or gap
using the Sturge’s Rule as follows: between the lower limit and the upper limit. It is obtained by dividing the
range by the number of classes.
class boundary. The class boundary is the midway between the upper limit
and lower the limit of the next higher-class interval.
If we are dealing with discrete data, the class boundaries are obtained by
Note: If the number of observations is in tenths, ex. if the highest value is adding 0.5 to the upper limit and at the same time subtracting 0.5 from the
4.9 and the lowest value = 1.8 and using class intervals the class size is lower limit. For example, we have a class of 3-5. The class boundaries are
obtained as follows. 2.5-5.5. The 2.5 lower boundary is obtained by subtracting 0.5 from 3. The
upper boundary 5.5 is obtained by adding 0.5 to 5.
If data are continuous data, the number to be added to the upper limit and
the number of decimal places a particular observation or case has.
Assuming we have a class of 15.4-18.6. Observe that there is one decimal
place. To get the lower-class boundary and the upper-class boundary,
subtract 0.05 from 0.4 and add 0.5 to 0.6. The lower- and upper-class
boundaries are 15.35 – 18.65, respectively. Suppose we have the class
Similarly, if the number of observations is in hundredths, ex. if the highest 0.346-0.418, what will be its class boundaries? There are three decimal
value = 17.68 and the lowest value = 15.29, respectively and using 9 class places. To get the lower-class boundary, subtract 0.005 from 0.348 to get
intervals, the suggested class size is obtained as follows. 0.3455. To get the upper-class boundary, add 0.005 to 0.418 to get 0.4185.
The class boundaries are therefore 0.3455 – 0.4185.
8. Tally the row scores and indicate the frequency for each of the
4. Choose an appropriate lower limit for the first-class interval. This class intervals.
number shall be less than or equal to the lowest value in the data. It is
more convenient to use a lower limit that is divisible by the class width. Add 9. Get the relative frequency. This gives us the percentage of
the class width to obtain the next lower-class limit. Keep on adding the observations in a particular class of interest. This is obtained by dividing
class width to get all the other lower-class limits. the frequency of the class by the total number of frequency/observations.
5. Find the upper-class limits. If the class size is rounded off the unit’s
place, subtract 1 from the second lower class limits to arrive at the first
upper class limit. Subtract 0.1 from the result, if rounded off to the tenth
place and subtract 0.01 if rounded to the hundredths place.
6. Determine the class boundaries. The class boundaries are the true
limits of a class interval made up of the lower-class boundary and upper-
10. Add the frequencies and indicate the sum.
Step 8: Put a mark beside the appropriate class for each number in the
data set, write down the frequencies and find the less than and the greater
than cumulative frequencies, F< and F>, respectively. The less than
cumulative frequencies (F<) are determined by adding the frequency of the
first-class interval to the next to obtain the frequency of the second-class
interval; then result
added to the next until the total frequency is arrived at. To illustrate using
the table below, we shall start with 7 as the frequency of the first-class
interval under F<; then 7 + 18 = 25, the frequency of the second interval of
F<; 25 + 10 = 35, the frequency of the third interval for F<, etc.
Hence the table below shows the frequencies of the interval F< and F> of
the frequency distribution.
SAMPLE MEAN
MODULES 3 & 4
Example 2:
Worker participation in management is a new concept that involves
employees in corporate decision making. The following data are the
percentages of employees involved in worker participation programs in a
sample of 12 firms. 32,33,35,42,43,42,5,46,44,47,48,48. Find the median.
C. MODE
The mode for the ungrouped data is defined as the value that appears with
the highest frequency. That is, the item that appears most often, usually
denoted by 𝒙̂ (read as x hat). It is generally used with nominal data. It can
be easily identified by inspection of an ungrouped set of data by getting the
score or item which occurs most frequently.
When all values appear with the same frequency, the mode does not exist.
The median is Php 720, which A distribution with the only one mode is called unimodal while a
is the middle item when the
items are arranged in
distribution which has two modes is bimodal; and for the same sets of data Decile: Data set is divided into 10 equally divided parts.
with the three or more modes is known as multimodal. Quartile: Data set is divided into 4 parts.
Note: At certain points, these three measures will have the same values.
Example 1:
Find the mode of the following set of items: 3,5,8,9,10
Answer:
The answer is no mode because there is no value that occurs more
than once.
Example 2:
Find the mode of the following set of items: 33,45,38,38, 49,60
Answer:
The mode is 38. It is unimodal.
Example 3:
Find the mode of the following set of items: 13,13,14,12,15,18,17,17
Answer: The following guidelines will help identify percentile location.
The mode is 13 and 17. Both values appeared twice, then we can say 1. If Lp is whole number, the percentile location is the Lth in the
that this is bimodal. ordered set of observations.
2. If Lp is not a whole number, the percentile location between the Lth
Example 4: and (L + 1)st, by taking the difference between the Lth and (L+1)st
Find the mode of the following set of items: 3,5,7,7,8,5,5,8,8,9,10,9,9 location and multiply the result by the decimal portion of Lp.
Answer: 3.
The mode is 5, 8, and 9. Values appeared thrice, then we can say that Note: Deciles and quartiles, which are synonymous with percentiles in
this is trimodal or multimodal. equal intervals of 10 and 25, respectively, can be calculated using the
same formula by replacing the “P” with a “D” or “Q.”
Percentile: The whole data set is equally divided into 100 parts.
th
The product of 50 and .7 is 35. Add 35 to 500, the 14 value. So now
decile 7 is 535 (500 + 35).
Compute for - a: Q1 b: D7 c: P87 (values is in Php)
Interpretation:
Decile 7 tells us that the lower 70% has wages less than 535 and the
upper 70% has wages greater than 535.
th th
Quartile 1 or Percentile 25 lies between the 5 and 6 ordered value
because the percentile location is not a whole number. We will take the th th
th th th th Percentile 87 lies between the 18 and 19 ordered value because the
difference between the 5 and 6 value. The 5 value is 290 and the 6 percentile location is not a whole number. We will take the difference
value is 300. The difference between 290 and 300 is 10. Multiply 10 by the th th th th
decimal portion of the percentile location, which in our case is .25. between the 18 and 19 value. The 18 value is 615 and the 19 value
th is 630. The difference between 615 and 630 is 15. Multiply 15 by the
The product of 10 and .25 is 2.5. Add 2.5 to 290, the 5 value. So now decimal portion of the percentile location, which in our case is .27.
quartile 1 is 292.5 (290 + 2.5). th
The product of 15 and .27 is 4.05. Add 4.05 to 615, the 18 value. So
Interpretation: now percentile 87 is 619.05 (615 + 4.05).
Quartile 1 tells us that the lower 25% has wages less than 292.5 and
the upper 25% has wages greater than 292.5. Interpretation:
Percentile 87 tells us that the lower 87% has wages less than 619.05
and the upper 87% has wages greater than 619.05.
D. Supposethatintheexampleabovenis23,with3morevaluesgreaterthan
650,findPercentile 75.
th th
Decile 7 or Percentile 70 lies between the 14 and 15 ordered value
because the percentile location is not a whole number. We will take the
th th th
difference between the 14 and 15 value. The 14 value is 500 and the
th
15 value is 550. The difference between 500 and 550 is 50. Multiply 50
by the decimal portion of the percentile location, which in our case is .7.
Percentile location is a whole
number. Therefore percentile 75 is Steps in Identifying the Quartile 1 class.
th 1. Divide the number of observations by 4.
the 18 observation, which is 615.
2. Go over the entries in the less than cumulative frequency column.
The class that has a sum of frequencies greater than the n/4 is the
quartile 1 class.
Solution:
4.2 MEASURES OF RELATIVE STANDING: GROUPED DATA
Steps:
1. n/4 = 43/4 = 10.75
Quartiles: Grouped Data
2. Quartile 1 class is the second class because the sum of the
The formula for quartiles will patterned from the median formula. If we
frequencies of the second class is greater than 10.75.
compute for example quartile 1, the formula is:
Where:
lbQ1 = lower boundary of quartile 1 class
n = number of observations
cfq1 = cumulative frequency before quartile 1 class
fq1 = frequency of quartile 1 class
𝑖 = class interval
Deciles: Grouped Data
The formula for deciles will also be patterned from the median formula.
Suppose we want to know the decile 7. The formula for decile 7 is given
below.
Where:
LbD1 = lower boundary of decile 7
class n = number of observations
cfD1 = cumulative frequency
before decile 7 class
fD1 = frequency of decile 7 class
𝑖 = class interval
Steps in Identifying the Decile 7 class.
1. Multiply data set by 7 and divide the product by 10. Steps in Identifying the Percentile 85 class.
2. The decile 7 class is the class that has a sum of frequencies greater 1. Multiply data set by 85 and divide the product by 100.
than the result of step 1. 2. The percentile 85 class is the class that has a sum of frequencies
greater than the result of step 1.
Solution: Solution:
Steps: Steps:
1. 7n/10 = 7 (43)/10 = 30.1 1. 85n/10 = 85 (43)/100 = 36.55
2. Decile 7 class is the class that has a sum of frequencies greater 2. Percentile 85 class is the fourth class. Sum of frequencies of that
than the result of step 1. class is greater than 36.77.
3. The third class is the decile 7 class because the sum of the
frequencies in the class which is 31 is greater than 30.1.
MODULE 5
Example for Ungrouped Data:
5.1 MEASURES OF VARIABILITY (DISPERSION): RANGE A company produces the following number of units for a given period:
21, 25, 20, 28, 30, 23, 22, 31, 32, 27, 19, 33, 24, 29, 26 and 34.
A. Range
The simplest measure of dispersion is the range. It is the difference Determine the following: A. Range B. Interquartile Range (IQR) and C.
between the largest and the smallest values in a data set. The range for Semi-interquartile (SQR)
the ungrouped data is obtained by finding the difference between the
highest and the lowest value. For grouped data, the range is determined by Solution:
subtracting the lower boundary of the lowest class interval from the upper
boundary of the highest-class interval of a frequency distribution because
the class boundaries are considered the true limits.
The interquartile range (IQR) is found by finding the difference between the
value of the third quartile (𝑄3) or upper quartile and the first quartile (𝑄1) or
lower quartile.
Semi-Interquartile Range
Example for Grouped Data: B. IQR – Calculate first and third quartile using the formulas that
Table below shows the average production of 60 employees of we use on Quartiles Group Data
manufacturing company in a given week. Find the A. Range B. Interquartile
Range (IQR) and C. Semi-interquartile (SQR)
5.2 MEASURES OF VARIABILITY (DISPERSION): MEAN ABSOLUTE
DEVIATION (MAD)
Mean Deviation or average deviation is defined as the average of the
absolute deviations of the individual values of a set of numerical data from
either mean, the median or mode. Among the three, the mean is the most
preferred and commonly used measure of central tendency for computing
the deviation or average deviation.
Where:
xi = refers to the individual value for ungrouped data, and the midpoint of
each class interval for grouped data.
𝑥̅ = the mean of data
n = the total number of frequencies
fi = the frequency of each class interval
Example:
The following are random data of the number of fabric conditioner sold by a
grocery store in 10 days. 34, 22, 23, 27, 16, 35, 25, 18, 33 and 37. For the grouped data, the mean deviation or average deviation is
determined by the following procedures below:
Example:
Determine the variance of the following 10 sample data: 10, 12, 9, 18, 14,
16, 14, 18, 19 and 20
5.2 MEASURES OF VARIABILITY (DISPERSION): VARIANCE