Statistics Modules

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

History of Statistics

 The term statistics came from the Latin phrase “ratio status” which means study of practical
politics or the statesman’s art.
 In the middle of 18th century, the term statistik (a term due to Achenwall) was used, a
German term defined as “the political science of several countries”
 From statistik it became statistics defined as a statement in figures and facts of the present
condition of a state.

Examples:
1) Comparing the effects of five kinds of fertilizers on the yield of a particular variety of corn.
2) Determining the income distribution of Filipino families.
3) Comparing the effectiveness of two diet programs.
4) Prediction of daily temperatures.
5) Evaluation of student performance.

Hence, Statistics is a scientific body of knowledge that deals with collection of data,
organization or presentation of data, analysis and interpretation of data.
Descriptive Statistics may answer such questions:
1. How many male and female students are interested in Market Research?
2. What are the highest and lowest scores obtained in Standardized admission exam?
3. What are the characteristics of the most successful students according to research?
4. Which group of employees has produced more outputs?
Inferential Statistics This draws inferences about the population based on the data gathered
from samples using the techniques of descriptive statistics. Descriptive statistics therefore is the
backbone of Inferential Statistics. It may answer questions like:

1. Is there a significant difference between the performance of the male and female students in
statistics?
2. Is there a significant difference between the proportion of students who are interested to take
COURSERA online and those who are not?
3. Is there a significant correlation between educational attainment and job satisfaction?

With inferential statistics, we are trying to reach conclusions that extend beyond the
immediate data alone. Or, we may use inferential statistics to make judgement of the probability
that an observed difference between groups is a dependable one, or one that might have
happened due to chance. It is a matter of deciding between reality and coincidence.
Key Definitions

A universe is the collection of things or observational units under consideration.

A population is the set of all possible values of the variable.

Parameters are numerical measures that describe the population or universe of interest. Usually
donated by Greek letters;  (mu),  (sigma),  (rho),  (lambda),  (tau),  (theta), 
(alpha) and  (beta).

Example: There are 8,756 math education students enrolled in Davao Region this school year.
The average age of math education students is 20.
N = 8,756 and μ=20 are parameters because they both
describe the population
Sample is a small proportion or part of a population; a representative of the population in a
research study.

Statistic are numerical measures of a sample.

Example: Out of the 8,756 students enrolled in Education (major in Math) 2, 890 are male. The
average age of male students is 21.

n=2,893 and x =21 are statistic because they both


describe the sample.
A sample is expected to mirror the population from which it comes; however, there is no
guarantee that any sample will be precisely representative of the population from which it was
drawn.

Variable - a variable is a characteristic of objects, people, or events that can take of different
values. It can vary in quantity (e.g. weight of people) or quality (e.g. hair color of people).
Variables can be classified in different ways.

Types of Variables. There are basically two types of random variable yielding two types of data:
qualitative and quantitative.

VARIABLES

Qualitative Quantitative

Discrete Continuous

Qualitative variable (non-numerical values)- a variable that is conceptualized and analyzed as


distinct categories, with no continuum implied. Also, ter,ed, categorical variable; observations
that are put in the same or different classes, each being considered as possessing some common
characteristic that is not shared by those in other classes.

Example: eye color, gender, occupation, religious preference.

Quantitative variable (numerical values) - A variable that is conceptualized and analyzed along
a continuum implied.. It differs in amount of degree.

a. Discrete (countable)- is a variable which consist of either a finite number of


values or countable number of values.

Example: courses, siblings, laptop brand

b. Continuous (measurable) - a variable which can assume any of an infinite


number of values, and can be associated with points on a continuous line interval.

Example: height, weight, math aptitudes

c. Constant - A constant is a characteristics of objects, people, or events that does


not vary. For example, the temperature at which water boils ( 1000C) is a constant.

In the broadest sense, all collected data are “measured” in some form. For example, even discrete
quantitative data data can be thought of as arising by a process of “measurement through
counting.” the four widely recognized levels of measurement - the nominal, ordinal, interval and
ratio.

Levels of Measurements

1. Nominal level measurement is mutually exclusive and exhaustive meaning. It is used to


differentiate classes or categories for purely classification or identification purposes. It is the
weakest form of measurement because no attempt can be made to account for differences within
the particular category or to specify any ordering or direction across the various categories.
Nominal data are variable.

Mutually Exclusive is a property of a set of categories such that an individual or objects is


included in only one category.

Exhaustive is a property of a set of categories such that each individual or objects must appear in
a category.

2. Ordinal level of scale of measurement is used in ranking. It is somewhat strongest form of


measurement because an observed value classified into one category is said to possess mpre of a
property being scaled than does an observed value classified into another category. Nevertheless,
within a particular category no attempt is made to account for differences between the classified
values. Moreover, ordinal scaling is still a weak form of measurement because no meaningful
numerical statements can be made about the differences between the categories. That is, the
ordering implies only which category is “greater” or “lesser”- not how much “greater” or
“lesser”. Ordinal data are discrete variables.

Qualitative Variable Categories

Student class designation Freshman, Sophomore, Jnnior, Senior


Faculty Rank Instructor, Assistant Professor, Associate Professor
Hotel Ratings , , , ,
Accounts for order; no indication of distance between positions.

3. Interval level of measurement - is used to classify order and differentiate between classes or
categories in terms of degrees of differences. Interval data are either discrete or continuous
variables. e.g. Temperature ( in 0C or 0F)
Equal intervals; no absolute zero

4. Ratio level of measurement - differs from interval level of measurement only in one aspect;
it has a true zero point (complete absence of the attitude being measured). With an absolute value
point it can be said that the ratios of two observations is “twice as fast’, “half as long” or others.
Ratio data are either discrete or continuous variables. Has absolute zero.

Example: weight in pounds, age (in years or days), salary (in Philippine peso)
What is sampling?

Sampling is the act, process, or technique of selecting an appropriate sample, or a


representative part of a population for the purpose of determining the characteristics of the
whole population.

A sample is a group in a research study on which information is obtained. A population


is a group to which the results of the study are intended to apply. In almost all researches, the
sample is smaller than the population, since researchers rarely have access to all the members of
the population.

One of the most important steps in the research process is to select the sample of
individuals who will participate as a part of the study. Sampling refers to the process of selecting
these individuals.

What is the purpose of sampling?

The purpose of sampling is for the researchers to be able to draw conclusions about the
population from the study on samples. We must use inferential statistics which enables us to
determine a population’s characteristics by directly observing or studying only a portion (or
sample) of the population. We use sample rather than a complete enumeration (a census) of the
population because it is convenient and cheaper to observe a small part rather that the whole.

WHY DO WE USE SAMPLES?

1. Reduced Cost
2. Greater Speed or Timeliness
3. Greater Efficiency and Accuracy
4. Greater Scope
5. Convenience
6. Necessity
7. Ethical Considerations

TWO TYPES OF SAMPLES


1. Probability sample
2. Non-probability sample

PROBABILITY SAMPLES
Samples are obtained using some objective chance mechanism, thus involving
randomization.
They require the use of a complete listing of the elements of the universe called the
sampling frame.
The probabilities of selection are known.
They are generally referred to as random samples.
They allow drawing of valid generalizations about the universe/population.
1. Simple Random Sampling is a process of selecting n sample size in the population via
random numbers or through lottery.
2. Systematic Sampling is a process of selecting kth element in the population unitl the desired
number of subjects or respondents is attained.
Example: For instance we have the data shown below; say we want to consider every 5 th on the
list.
23 34 12 14 13 23 24 39 27 23
12 15 16 23 26 28 23 22 19 34
24 22 18 30 23 24 17 18 15 12
th
Therefore, the samples from every 5 from left to right are 13, 23, 26, 34, 23 and 12.
3. Stratified sampling is a process of subdividing the population into subgroups or strata and
drawing members at random from each subgroup or stratum.
Example: Given the population of a certain university and a target sample population of 5455,
determine the sample size of each subgroup or courses.
Field of Specialization Population
Education 6,000
Agricultural Engineering 500
Information Technology 2,000
Agribusiness 1,000
Accountancy 2,500
Total 12,000

N
n=
To determine the sample size, we will use the Slovin’s formula: 1+ Ne 2
Where: n= is the desired sample size
N = population size
e = margin of error
Example1: Find n if N = 12,000 and e= 2%.
N 12 ,000 12 ,000 12 , 000
n= = =
2 1+4 .8
= =5 , 455
2
1+Ne 1+12 ,000(. 02) 5.8

So, the number of sample size at 2% margin of error is 5,455.


To determine the sample size in each subgroup, we will simply multiply the sample population
with respect to each subgroup percentage in reference to the population, the computation is
shown in the last column of the table below.

Field of Specialization Population Percentage Sample Size Found by


4. Education 6,000 50.00 2,728 0.50 x 5455
Agricultural Engineering 500 4.16 227 0.0416x5455
Information Technology 2,000 16.66 909 0.166x5455
Agribusiness 1,000 8.33 455 0.0833x5455
Accountancy 2,500 20.33 1136 0.2033x5455
Total 12,000 100.00 5455
Cluster Sampling is a process of selecting clusters from a population which is
very large or widely spread out over wide geographical area.
Example: If we want to know the opinion of the residents of Davao region regarding the COVID
pandemic. We may use the cluster sampling by subdividing the city into district then select at
random the number of district to be used as sample.

NON-PROBABILITY SAMPLES
It is a sampling procedure where samples selected in a deliberate manner with little or no
attention to randomization; it is also called non-probability sampling.

1. Convenience Sampling is a process of selecting a group of individuals who (conveniently are


available for study.
Example: A researcher may only include close friends and clients to be included in the
sample population.
2. Purposive sampling is a process of selecting based from judgment to select a sample which
the researcher believed, based on prior information, will provide the data they need. The
disadvantage of purposive sampling is that the researcher’s judgment may be in error- he or she
may not be correct in estimating the representative-ness of a sample or their expertise regarding
the information needed. It is also called judgment sampling.
Example: A human resource director interviews the qualified applicants in a supervisory
position. (Note: Qualified applicants are selected by the HR Director which is based from
his own judgment.)
3. Quota sampling is applied when an investigator collects information from an assigned
number, or quota of individuals from one of several sample units fulfilling certain prescribed
criteria or belonging to one stratum. Their advantage is that they are cheaper to administer.
Example: To get the most popular noontime show, each field researcher is given a quota
of 200 viewers per area.
4. Snowball sampling is a technique in which one or more members of a population are located
and used to lead the researchers to other members of the population.
Example: Imagine attempting to obtain the frame that includes all homeless people in Davao
Region. To obtain a sample of homeless individuals, for example, the researcher will
interview individuals on the street or at homeless shelter.

After the research problem has been laid, the next step is to determine the methods to collect
data. Here are the five basic methods in collecting data:

1. Direct or Interview method. It is face-to-face encounter between the interviewer and the
interviewee. The interview may vary according to the preference of either or both parties.
However, this method id time-consuming, expensive, and has limited field coverage.
2. Indirect or Questionnaire Method. Unlike direct method, this method utilized questionnaires
to obtain information. It can be done by mail or hand-carried to the intended respondents.
3. Registration Method. This method of gathering information is governed by laws. Example:
birth certificates, death certificates, and licenses, etc.
4. Observation method. This method is used to data that are pertaining to behaviors of an
individual or a group of individuals at the time of occurrence of a given situation are best
obtained by observation. One limitation of this method is observation is made only at the time or
occurrence of the appropriate events.
5. Experiment method. This is used to determine the cause and effect relationship of certain
phenomena under controlled conditions. This method usually employed by scientific researchers.

1. Textual Method. This method presents the collected data in narrative and paragraph forms.
2. Tabular Method. This method presents the collected data in table which are orderly arranged
in rows and columns for easier and more comprehensive comparison of figures.
3. Graphical Method. This method presents the collected data in visual or pictorial form to get a
clear view of data (e.g. histogram, pie chart, pareto chart, pictograph, etc.)
When conducting a statistical research, investigation or study, the researcher must gather data for
the particular variable under investigation. To describe situations, make conclusions, and draw
inferences about events, the researcher must organize the data gathered in some meaningful way.
The easiest way and widely used of organizing data is to construct a frequency distribution . A
frequency distribution is a grouping of the data into categories showing the number of
observations in each of the non-overlapping classes.
After organizing the data, the next move of the researcher is to present the data so they
can be understood easily by those who will benefit from reading the study. The most useful
method of presenting the data is by constructing graphs and charts. There are number of ways to
plot graphs and charts, and each one has a specific purpose.
TEXTUAL PRESENTATION OF DATA
Good statistical presentation requires making it easy for readers to understand and
interpret the data, and to identify key pattern or trends.
Data presented in paragraph or in sentences, are said to be in textual form. This includes
enumeration of important characteristics, emphasizing the most significant features and
highlighting the most striking attributes of the set of data. Please see example below.
The data are Math test score of 15 students out of 50 items: 47, 48, 49, 42, 42, 36, 38, 40,
35, 50, 44, 45, 45, 50, 50. Make simple analysis by writing findings, drawing conclusions and
making an inference.
Writing the data in 35, 36, 38, 40, 42, 42,
numerical order may help 44, 45, 45, 47, 48, 49,
to analyze the data.
50, 50, 50
Findings: The lowest score is 35, and the highest is 50. Three students got a perfect score of 50;
one got 35, 36, 38, 40, 44, 47, 48 and 49 while 2 got 42 and 45. If the passing mark is 70%, it
shows that nobody failed in the test.
Conclusions: I therefore conclude that the students perform well in the test.
Inference: If this trend will continue, then it is likely that nobody will fail in this Math class.

Definitions

Findings are results of an investigation.


Example: Sixty-four percent of the 100 sample service crews in
randomly selected JolliDo outlets are 21 years old and below.
Conclusion: an opinion based on findings; a generalization on population based
on the result of the investigation on samples.
Example: I therefore conclude that 64% of all JolliDo employees
nationwide are 21 years old and below.
Inference: an educated guess or a meaningful prediction based on findings and
conclusions.
Example: If this trend continues, then job applicants who are 22 years
old and above have a slim chance of being accepted at JolliDo.

Though analysis can be done from the text, it is however, recommended to organize the
data in tables for better comparison of values and the quicker and better analysis of details.
Furthermore, if data are presented in plain text, readers sometimes get bored, thus table and
graphs are oftentimes used.
Defining Some Terms
Before we get started in constructing frequency distribution, we must define some terms
that are essential to understand deeper the nature of data that are displayed in a frequency
distribution.
Raw data is the data collected in original form.
Range is the difference of the highest value and the lowest value in a distribution.
Frequency distribution is the organization of data in a tabular form, using mutually
exclusive classes showing the number of observations each.
Class Limits is the highest and lowest values describing a class.
Class Boundaries is the upper and lower values of a class for group frequency
distribution whose values has additional decimal place more than the class limits and end with
the digit 5.
Interval (width) is the distance between the class lower boundary and the class upper
boundary and it is denoted by the symbol i.
Frequency (f) is the number of values in a specific class of a frequency distribution.
Relative Frequency is the value obtained when the frequencies un each class of the
frequency distribution is divided by the total number of values.
Percentage is obtained by multiplying the relative frequency by 100%.
Cumulative Frequency (cf) is the sum of the frequencies accumulated up to the upper
boundary of a class in a frequency distribution.
Midpoint is the point halfway between the class limits of each class and is representative
of the data within that class.
A grouped frequency distribution is used when the range of the data set is large; the
data must be grouped into classes whether it is categorical data or interval data. For interval data
the classes is more than one unit in width. The procedure for constructing the frequency
distribution is discussed in the succeeding sections.

Determining the Class Interval


Generally the number of classes for a frequency distribution table varies from 5 to 20,
depending primarily on the number of observations in the data set. It is preferably to have more
classes as the size of a data set increases. The decision about the number of classes depends on
the method used by the researcher.
1. Rule 1: to determine the number of classes is to use the smallest positive integer
k
k such that 2 ≥n, where n is the total number of observations. Using the formula 2-1 we
can obtain the ideal class interval.

Range HV −LV
=
Class Interval (i) = Number of Classes k (Formula 2-1)
where: HV = highest value in the data set ; k = number of classes
LV = lowest value in the data set ; i = suggested class interval

2. Rule 2: Another way to determine the class interval is by Formula 2-2

Range
Class Interval (i) = 1+3. 322(log N ) , where N = the number of observation
3. Rule 3: Another guideline to determine the class interval is to have an ideal
number of classes then apply Formula 2-3.

HV - LV
Class Interval (i )=
Number of Classes (Formula 2-3)
Example: EPA Travel Agency, a nationwide local travel agency , offers special rates on summer
period. The owner wants additional information on the ages of those people taking travel tours.
A random samples of 50 customers taking travel last summer revealed these ages.

18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48

Construct e frequency distribution using Rule 2 and determine the following:


a. Range c. Relative frequencies
b. Interval d. Percentages
c. Class Limits e. Cumulative frequencies
d. Class boundaries f. Midpoints
Solution:
Step 1: Arrange the raw data in ascending order.
18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77

Step 2: Determine the classes.


 Find the range.
Range = HV - LV = 77 - 18 = 59
 Determine the class interval or width.
Range
Class Interval i = 1+3. 322(log N )
77−18 59
= =
1+3 .322(log 50 ) 1+3 .322(1 . 698970004)
59
= =8 . 88≈9
6 . 643978354
Note: Round the value of the interval up to the nearest whole number if there is a
remainder.
 Select a starting point for the lowest class limit. The lowest value in the data set is 18, this
will serve as our starting point.
 Set the individual class limit. We will add 9 to each lower class limit until reaching the
number of classes (18, 27, 36,45,54,63 and 72). To obtain the upper class limits, we need to
subtract one unit to the lower limit of the second class to obtain the upper limit of the first
class. That is, 27-1 = 26. Then add the interval (or width) to each upper limit to obtain all the
upper limits (26, 35, 44, 53, 62,71 and 80).
 Set the class boundaries in each class. To obtain the class boundaries, we need to subtract
0.5 from each lower class limit and add 0.5 to each upper class.
Step 3: Tally the raw data.
Class Limits Class Boundaries Tally
18-26 17.5-26.5 |||
27-35 26.5-35.5 ||||
36-44 35.5-44.5 ||||-||||
45-53 44.5-53.5 ||||-||||-||||
54-62 53.5-62.5 ||||-||||-|
63-71 62.5-71.5 ||||-|
72-80 71.5-80.5 ||

Step 4: Convert the tallied data into numerical frequencies.


Class Limits Class Boundaries Tally Frequency
18-26 17.5-26.5 ||| 3
27-35 26.5-35.5 |||| 5
36-44 35.5-44.5 ||||-|||| 9
45-53 44.5-53.5 ||||-||||-|||| 14
54-62 53.5-62.5 ||||-||||-| 11
63-71 62.5-71.5 ||||-| 6
72-80 71.5-80.5 || 2

f
Step 5: Determine the relative frequency. (Formula: N )
Class Limits Class Boundaries Frequency Relative
Frequency
18-26 17.5-26.5 3 0.06
27-35 26.5-35.5 5 0.10
36-44 35.5-44.5 9 0.18
45-53 44.5-53.5 14 0.28
54-62 53.5-62.5 11 0.22
63-71 62.5-71.5 6 0.12
72-80 71.5-80.5 2 0.04
50
f
×100 %
Step 6: Determine the percentage. (Formula: N )
Class Limits Class Boundaries Frequency Relative Percentage
Frequency
18-26 17.5-26.5 3 0.06 6%
27-35 26.5-35.5 5 0.10 10%
36-44 35.5-44.5 9 0.18 18%
45-53 44.5-53.5 14 0.28 28%
54-62 53.5-62.5 11 0.22 22%
63-71 62.5-71.5 6 0.12 12%
72-80 71.5-80.5 2 0.04 4%
50 100%

Lower + upper class limit


Step 7: Determine the midpoints. (Formula: 2 )
Class Limits Class Boundaries Frequency Relative Percentag Midpoint
Frequenc e
y
18-26 17.5-26.5 3 0.06 6% 22
27-35 26.5-35.5 5 0.10 10% 31
36-44 35.5-44.5 9 0.18 18% 40
45-53 44.5-53.5 14 0.28 28% 49
54-62 53.5-62.5 11 0.22 22% 58
63-71 62.5-71.5 6 0.12 12% 67
72-80 71.5-80.5 2 0.04 4% 76
50 100%
e.g (18+26)/2 = 22
(27+35)/2 = 31

Step 8: Determine the cumulative frequencies.


Class Class Frequency Relative Percentage Midpoint Cumulative
Limits Boundaries Frequency Frequency
18-26 17.5-26.5 3 0.06 6% 22 3
27-35 26.5-35.5 5 0.10 10% 31 8
36-44 35.5-44.5 9 0.18 18% 40 17
45-53 44.5-53.5 14 0.28 28% 49 31
54-62 53.5-62.5 11 0.22 22% 58 42
63-71 62.5-71.5 6 0.12 12% 67 48
72-80 71.5-80.5 2 0.04 4% 76 50
50 100%
e. g. 3 3+5+9+14+11 = 42
3+5 = 8 3+5+9+14+11+6 = 48
3+5+9 = 17 3+5+9+14+11+6+2 = 50
3+5+9+14 = 31

Now, that the data are arranged in a frequency distribution table, it is easier to give
findings, draw informed conclusions and make sound inferences.
Findings:
A. Basic findings are those which you can see directly from the table;

Stem-and-Leaf Plot
A statistician named John Tukey introduces the stem-and-leaf plot. The objective of this
method is to some extent overcomes the loss of actual observations brought about by the
histogram. The advantage of the stem-and-leaf plot over the histogram is that we can see the
actual observations.
The stem is the leading digit or digits and the leaf is the trailing digit. The stem is placed
at the first column and the leaf at the second column.
Example: EPA Travel Agency, a nationwide local travel agency , offers special rates on summer
period. The owner wants additional information on the ages of those people taking travel tours.
A random samples of 50 customers taking travel last summer revealed these ages.

18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77

Construct a stem-and-leaf plot.


Solution:
The stems (leading digits) for the raw data are 1, 2, 3, 4, 5, 6, 7. The leaves fro each stem
(trailing digit) are recorded at the same row and are rank-ordered to form a stem-and-leaf plot.

Stem Leaf
1 8, 9
2 4, 7, 8, 9
Tens digit 3 1, 4, 6, 6, 7, 8, 9, 9 Units Digit
(Leading 4 0, 2, 4, 5, 6, 6, 7, 8, 8, 8, 9, 9 (trailing digits)
Digits) 5 0, 1, 1, 2, 3, 4, 4, 5, 6, 7, 8, 8, 9
6 0, 1, 2, 3, 4, 6, 7, 8
7 0, 4, 7

Graphing Frequency Distribution


When the data set contains large number of values, making conclusions from an ordered
array or stem-and-leaf plot is often difficult. We will need graphs or charts in such situations.
There are number of graphs or charts to visually show numerical data. These include histogram,
frequency polygon and cumulative frequency (ogive).
In this section, we discussed several graphical methods that are used for interval data.
The most important of these graphical methods is the histogram. Histogram is a powerful
graphical technique used to summarize interval data, but it also helps explain an important aspect
of probability.
A. Histogram
A histogram is a graph in which the classes are marked on the horizontal axis (x-axis) and
the class frequencies on the vertical axis (y-axis). The height of the bars represents the class
frequencies, and the bars are drawn adjacent to each other. Nevertheless, the histogram focuses
on the frequency of each class and sacrifices whatever information was contained in the actual
observations.
B. Frequency Polygon
A frequency polygon is graph that displays the data using points which are connected by
lines. The frequencies are represented by the heights of the points at the midpoints of the classes.
The vertical axis represents the frequency of the distribution while the horizontal axis represents
the midpoints of the frequency distribution.
C. Cumulative frequency polygon
A cumulative frequency polygon or ogive is a graph that displays the cumulative
frequencies for the classes in a frequency distribution. The vertical axis represents the cumulative
frequency of the distribution while the horizontal axis represents the upper class boundaries (real
upper limits) of the frequency distribution.
Example:
Class Class Frequency Relative Percentage Midpoint Cumulative
Limits Boundaries Frequency Frequency
18-26 17.5-26.5 3 0.06 6% 22 3
27-35 26.5-35.5 5 0.10 10% 31 8
36-44 35.5-44.5 9 0.18 18% 40 17
45-53 44.5-53.5 14 0.28 28% 49 31
54-62 53.5-62.5 11 0.22 22% 58 42
63-71 62.5-71.5 6 0.12 12% 67 48
72-80 71.5-80.5 2 0.04 4% 76 50
50 100%
Construct a bar chart, frequency polygon and cumulative frequency polygon. What conclusions
can you reached based on the information presented in the histogram.
Solution:
a. Constructing a bar chart.
1. Find the class limits of each class.
2. Draw and label the x-axis and y-axis.
3. Represent the frequency on the y-axis and the class limits on the x-axis.
4. Use the frequency to represent the height and draw the vertical bars.
5. The class frequencies are scaled along the vertical axis and the class limits along the
horizontal axis.

Bar Chart of Age of Travellers


16
14
14
12
10 11
Frequency

8 9
6
6
4 5
2 3
2
0
18-26 27-35 36-44 45-53 54-62 63-71 72-80
Age (Class Limits)

Figure 2.1: Bar Chart of Age of Travellers


b. Constructing a Frequency Polygon
1. Find the midpoint of each class.
2. Draw and label the x-axis and y-axis.
3. Represent the frequency on the y-axis and the midpoint on the x-axis.
4. Connect adjacent points with line segments. Draw a line back to the x-axis at the beginning
and end of the graph.

Age of Travellers
16
14 14
12
11
10
Frequency

9
8
6 6
5
4
3
2 2
0 0 0
1 13 2 22 3 31 440 5
49 586 67 7 76 8 85 9
Age (Class Midpoints)

Figure 2.2 Frequency Polygon of Age Travellers

c. Constructing a Cumulative Frequency Polygon (ogive)


1. Draw and label the x-axis and y-axis.
2. Represent the frequency on the y-axis and the upper class boundaries on the x-axis.
3. Connect adjacent points with line segments.

Age of Travellers
60

50 48 50
Cumulative Frequency

40 42

30 31

20
17
10 8
3
0
1 26.5 2 35.5 344.5 4
53.5 5
62.5 71.56 80.5 7
Age (Upper Boundary)

Figure 2.3: Ogive for Age of Travellers

Other Types of Graphs


As discussed in the previous section, the only allowable calculations on nominal data is
to count the frequency of each value of the variable. We can graphically display the counts in
three ways: pareto charts, bar charts, and pie charts. This section also includes on how to
graphically display time series graph, pictograph and scatter plot.
A. Pareto Chart
A pareto chart is a graph used to represent a frequency distribution for a categorical data
(nominal-level) and frequencies are displayed by the heights of vertical bars, which are arranged
in order from highest to lowest.
B. Bar Chart (Bar Graph)
A bar chart is similar to histogram. The bases of the rectangle are arbitrary intervals
whose centers are codes. The height of each rectangle represents the frequency of that category.
It is also applicable for categorical data (or nominal data).
C. Pie Chart (Circle Graph)
A pie chart is a circle divided into portions that represent the relative frequencies (or
percentages) or of the data belonging to different categories. The date in a pie chart should be
categorical or nominal-level.
D. Time Series Graph
A time series graph represents data that occur over specific period of time under
observation. In addition, it shows for a trend or pattern on the increase or decrease over the
period of time.
E. Pictograph
A pictograph immediately suggests the nature of the data being shown. It is a
combination of the attention-getting quality and the accuracy of the bar chart. Appropriate
pictures arranged in a row (sometimes in a column) present the quantities for comparison.
Now, we will illustrate how to construct the pareto chart, histogram, pie chart, time series
graph, pictograph and scatter plot using the succeeding examples.
Example 1: Using the information in the table about the favorite snacks of 870 youths, construct
pareto chart, histogram and pie chart.

Products Sales
Junk Foods 135
Candy 250
Ice Cream 185
Chocolate 210
Others 90

Solution:
a. Constructing a Pareto Chart
Steps: 1. Arrange the data from highest to lowest according to frequency.

Products Sales
Candy 250
Chocolate 210
Ice Cream 185
Junk Foods 135
Others 90
1. Draw and label the x-axis (Products) and y-axis (sales).
2. Construct the chart by arranging the frequency from the highest to lowest and form left to
right. Make a bar with the same width and draw the height corresponding to the frequencies
Figure 2.4: Pareto Chart of Favorite Snacks

b. Constructing a Pie Chart

Steps: 1. Since there are 360 0 in a circle, the frequency of each class must be covered into a proportional
part of the circle. This conversion is done by applying the formula

f
Degrees= ()
n
( 3600 )
, where f = frequency ; n = sum of frequencies

Hence, the following conversion are obtained. The degrees should total to 360 0.

Candy
(250
870 )
0
( 360 )=103 0

Chocolate
(210
870 )
0
( 360 )=87 0

1. Each frequency must also be converted to a percentage and has a total of 100 0. This percentage can be
done by applying the formula

Percentage= ( nf ) ( 100 % ) , where f = frequency , and n = sum of frequencies

Candy
(250
870 )
( 100 % ) =29 %
Junk Foods
(135
870 )
( 100 % ) =16 %

Chocolate
(210
870 )
( 100 % ) =24 %
Ice Cream
(185
870 )
( 100 % ) =21%
2. Using a protractor, graph each section and write its name and appropriate percentage, as shown in
Figure 2.5.

Figure 2.5: Pie Chart of Favorite Snacks

Example 2. Using the information in the table about the dollar to peso exchange rate from January to
December 0f 2019, construct a time series graph.

Solution:
Steps: 1: Draw and label the x-axis and y-axis.
2. Label the x-axis for months and y-axis for Peso per US dollar
3. Plot each point according to the table.
4. Draw a line segments connecting adjacent points.

Figure 2.6: Time Series Graph of Peso-Dollar Rate


Example 3: The EPAR Realty Inc. Is a real estate who develops household in Rizal province. The
information in the table show the number of house construction fro 2010 - 2015. Construct a
pictograph.
Year 2006 2007 2008 2009 2010
No. Of Houses 400 250 600 550 700

Solution:

Step: 1. Draw and label the x-axis and y-axis.


2. Label the x-axis for years and y-axis for Number of Houses.
3. Draw a house to represent the number of houses.

Legend: = 200 houses


Figure 2.7: Pictograph of Number of Houses
Example 4: The owner of a chain of halo-halo stores would like to study the effect of
atmospheric temperature on sales during the summer season. A random sample of 12 days is
selected with the results given as follows.
Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature 0F 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales 147 143 147 168 206 155 192 211 209 187 200 150
Put the data on a scatter diagram.
Solution:
Steps: 1. Draw and label the x-axis and y-axis.
2. Label the x-axis for Temperature and y-axis for Sales.
3. Plot the points of each ordered pairs in the Cartesian coordinate system.
Figure 2.8: Scatter Plot of Sales

Guidelines for Developing Good Graphs / Charts


Good graphcal diplays tll what the data are conveying. Sadly many graphs or charts
shown in newspaper and magazines are misleading, incorrect, or complicated that must not be
used. In order to correctly devlop a good graphs / charts there are some guidelines that needs to
bear in ind such as:
1. The graph/ chart should include a title.
2. The scales for all axes shoul b included.
3. The scle on the y-axis should start at zero.
4. The graph/ chart should not disfigure the data.
5. Th x-axis ady-axis shold be properly labeled.
6. Te graph/chart should not contain unnecessary decorations.
7. The simplest possible graph/chart should be used for any data set.

SECOND WEEK OUTPUT


This will be done by pair! Please choose your partner ad wrk together in doing this
output.
You are going to gather at least TWO different sets of data from your classmates.
Examples:
1. Age and Height
2. Gender and Favorite Cartoon Character.

Requirements are the following:


1. n≥30
2. Write a brief explanation of the nature of the data.
a. What data did you gather?
b. Why did you gather such data; what do you want to know from your
classmates?
3. Organize them in a frequency distribution and/or a contingenc table
a. Table number and title
b. All columns should have headers
4. Construct a graph (using one graph is enough)
a. Indicate figure number and title
b. Show percentage for pie chart
c. Both axes shoul be prperly labeled if bar graphs will be used.
5. Write at least 3 relevant findings based on the results of the data gathered.
6. Give conclusions based on findings.

Abstraction: Any data set can be characterized by measuring its central tendency. A measure of central
tendency, commonly referred to as an average, is a single value that represents a data set. Its purpose is
to locate the center of a data set. The arithmetic mean, often called as the mean, is the most frequently
used measure of central tendency. The mean is the only common measure in which all values play an
equal role meaning to determine its values you would need to consider all the values of any given data
set. The mean is appropriate to determine the central tendency of an interval or ratio data. The symbol
x , called “x bar”, is used to represent the mean of a sample and the symbol μ, called “mu”, is used to
denote the mean of a population. A. Properties of Mean 1. A set of data has only one mean. 2. Mean can
be applied for interval and ration data. 3. All values in the data set are included in computing the mean.
4. The mean is very useul in comparing two or more data sets. 5. Mean is affected by the extreme small
or large values on a data set. 6. The mean cannot be computed for the data in a frequency distribution
with an open-ended classs. B. Mean for Ungrouped Data Sample Mean: n x x x x n x x n n i i      
 ... 1 1 2 3 Population Mean: N X X X X N N N N i        ... 1 1 2 3  Example 1: The daily rates
of eight employee of a certain Municipality of Davao del Sur are Php 550, 420, 560, 500, 700, 670, 860,
480. Find the mean daily rate of employee. 592.50 8 4.740 8 550 420 560 500 700 670 860 480 ... 1 1 2 3
                 n x x x x n x x n n i i The sample mean daily salary of employees is Php
592.50. Example 2: Find the population mean of the ages of the middle-management employees of a
certain company. The ages are 53, 45, 59, 48, 54, 46,51, 58 and 55. Solution: 52.11 9 469 9 53 45 59 48
54 46 51 58 55 ... 1 1 2 3                   N X X X X N N N N i  The mean population
age of middle-management employee is 52.11 C. Sample Mean for the Grouped Data Sample Mean: n fx
x   where: x = sample mean f = frequency x = the value of any particular observation  fx = sum of all
the products of f and x n = total number of values in the sample Example 3: Using the example provided
in Module 2, EPA Travel Agency. Determine the mean of the frequency distribution on the ages of 50
people taking travel tours. Solution: Class Limits Frequency Midpoint (x) fx 18-26 27-35 36-44 45-53 3 5 9
14 22 31 40 49 66 155 360 686 54-62 63-71 72-80 11 6 2 58 67 76 638 402 152 50  fx  2 ,459 Applying
the formula, to obtain the value of the sample mean. 49.18 50 2,459     n fx x Weighted Mean,
Geometric Mean and Combined Mean A. Weighted Mean The weighted mean is particularly useful
when various classes or groups contribute differently to the total. The weighetd mean is found by
multipying each value by its corresponding weight and dividing by the sum of the weights. n n n n i i n i i
i w w w w x w x w x w w x x             ... ... 1 2 1 1 2 2 1 1 Where: xw = weighted mean wi =
corresponding weight xi = the value of any particular observations or measurement Example 1: At the
Mathematics Department of Davao del Sur State College there are 18 instructors, 12 assistant
professors, 7 professors, and 3 professors. Their monthly salaries are Php 30,500, 33,700, 38,600 and
45,000. What is the weighted mean salary? Solution:             33,965 40 1,358,600 18 12 7 3
18 30,500 12 33,700 7 38,600 3 45,000 ... ... 1 2 1 1 2 2 1 1                      n n n
n i i n i i i w w w w x w x w x w w x x The weighted mean salary is Php 33,965.00. B. Geometric Mean The
geometric mean of a set of n positve numbers is defined as the n th root of the product of the n
numbers. There are two main applications of geometric mean, the first is to average paercents, indexes,
and relatives; the second is to establish the average percent increase in production, sales or other
business transaction or economic series from one period of time to another.       n n GM x x x  x 
1 2 3 1 value at the start of the period at the end of the period  n1  value GM where: GM =
geometric mean xi = the value of any particular observations or measurement n = number of
observations Example 2: Suppose the profits earned by the EPA Construction Company on five projects
were 5, 6, 4, 8 and 10, respectively. What is the geometric mean profit? Solution: x1=5, x2=6, x3=4, x4=8
x5=10, n=5       564810 9,600 6.26 5 5 1 2 3     n n GM x x x  x The geometric mean
profit is 6.26 percent. Eample 3: Badminton as a sport grew rapidly in 2008. from January to December
2008 the number of badminton clubs in Metro Manila increased from 20 to 155, Compute the mean
monthly percent increase in the number of badminton clubs. Solution: 1 7.75 0.2046 20 155 1 value at
the start of the period at the end of the period 11 12-1 1      n  value GM Hence, badminton
clubs are increasing a rate of almost 0.2046 0r 20.46% per month. C. Combined Mean Note: The
geometric mean cannot be computed if one of the numbers is zero or negative. The combined mean is
the grand of all the values in all groups when two or more groups are combined. There will be times
when we want to determine a mean from a number of other means. In order to compute the combined
mean for a grouped of mean, we must know the size of each, or N. The formula is n n n n i n i i i CM N N
N N X N X N X N N X X               1 2 2 2 1 1 1 1 where: X CM = combined mean X i =
sample means Ni = sample size Example 4: A study comparing the typical household incomes for 3
districts in the City of Davao was initiated to see where differences in household incomes lie across
districts. The mean household incomes for a sample of 45 different families in three districts of Davao
are shown in the following table. Calculate a combined mean to obtain the average hoisehold income
for all 45 families in Davao sample. District 1 District 2 District 3 X 1= 30,400 Php 12 N1  27,300 X 2 N2
18 X 3  42,500 15 N3  Solution: 33,193.33 45 1,493,700 12 18 15 30,400(12) 27,300(18) 42,500(15) 1
2 2 2 1 1 1 1                    n n n n i n i i i CM N N N N X N X N X N N X X   Thus,
the combined mean household income in three districts of Davao City PhP 33,193.33.

You might also like