Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Lecture Guide grouped or ungrouped.

Grouped data are organized into


categories with corresponding frequencies while an ungrouped
Introduction to Statistics data is not organized in any way.
Statistics is a field of study which practices most of us In this course, it also important to understand the
may unknowingly doing in our daily lives. These include asking concepts of population and sample. A population refers to an
your friends about themselves like birthday, height and weight entire group of people, objects, events, or measurements. A
as well as taking note of your average daily expenses so that you population can thus be said to be an aggregate observation of
may be able to budget your allowance. The surveys that are subjects grouped together by a common feature. On the other
being published (like by the Social Weather Stations, SWS) may hand, a sample is a part or subset of a population chosen for
help us in making decisions involving the subject matter of study. A descriptive measure derived from the population is
concern. It is used in various fields of study such as, but not called a parameter while a value of measurement from a sample
limited to, business and economics, biology and other life is called a statistic.
sciences, natural sciences, agriculture, psychology and other Variable is a characteristic or attribute of persons or
social sciences, education, engineering and other technological objects, that changes or can take different values for different
fields of study. individuals. In contrast, a constant is a symbol of a quantity that
Statistics deals with the collection, organization, does not change or vary. Variables can be classified as shown
presentation, analysis and interpretation of numerical data below.
used as information for decision-making (Cabero, 2013). There
are two divisions of Statistics. TYPES OF VARIABLES
1. Qualitative – attribute of a person or object which are
Descriptive statistics is concerned with collecting, organizing, expressed non- numerically.
summarizing and presenting data. It does not involve making 2. Quantitative – takes on numerical value as it differs in degree.
generalizations or predictions. Quantitative variables can be further classified into,
Example: The market researcher of a manufacturing company a. Discrete – can take integer values. They are
constructs a graph showing the fluctuations in sales for a major countable.
product line during the last 3 years. Example: Number of books printed by Rex Bookstore
Notice that the situation described implied that the for the year 2020
researcher collected, organized and presented the data using a b. Continuous – can assume any value at any point
graph. There are no generalizations made, and so, is an example along an interval.
of a descriptive statistics. Example: The length of time (in hours) students
review for the Final
Inferential statistics involves generalizations, predictions, Examination.
estimations, or approximations.
Example: The manager of a department store records the Variables may also be independent or dependent.
number of buying customers daily for seven consecutive weeks Independent variables are variables that are manipulated by
and then estimates the average number of buying customers for researchers and whose effects are measured and compared.
the following weeks. The other name for independent variable is predictor. The
It could be observed that after the data was gathered independent variables predict or forecast the values of the
by the manager, he used the result of the data collection in dependent variable in the model. Thus, dependent variables
approximating the average number of buying customers for the are the values that are predicted by the independent variables.
succeeding weeks.
Practice!
Practice! Classify the following as qualitative or quantitative. If it is
Classify the following situations if it involves descriptive or quantitative, state if it is discrete or continuous.
inferential statistics. 1. Weight of female 3rd year BSOA students.
2. Place of birth
1. In 2020, the gross sales of KN95 Company increased
3. Number of students per section
by 100%.
2. Three out of five Filipinos say the national 4. Lot area of LSPU campuses.
government, rather than the local government, is more 5. Citizenship of foreign students in LSPU
responsible for solving the Covid-19 crisis. Variables can also be identified as to its level of
3. It is expected that the interest rates for bank loans measurement. This could be illustrated in the figure below.
will decrease by 2% in 2021.
4. Forty percent of the inventories was sold by the
online seller.

Remember that in Statistics, we would be dealing with


data, which are individual pieces of factual information
recorded and used for the purpose of analysis. They are the raw
information from which statistics are formulated. Data could be
1. Nominal – allows us to qualitatively distinguish data into 5. Correlation of inventory management system and
categories. All qualitative variables use this scale. the financial performance of a manufacturing company.
Examples: gender, political affiliation, religion, region, status
2. Ordinal – categorized variables are ranked according to a III. Determine whether the numerical value is a parameter or a
common characteristic. statistic. Explain. (2pts. Each)
Examples: ranking of honor students, assessments of levels of 1) A recent survey by the alumni of a major university indicated
job performance, ranking of candidates in a beauty contest that the average salary of 1,000 of its 9,000 graduates was
3. Interval – each level or rank has precise differences between P15,000.
measures. However, it does not have a “true zero” point, which
means, when the value of the measurement is zero, it does not
mean that it has “no” value.
Examples: temperature, IQ level 2)The average salary of all assembly-line employees at a certain
4. Ratio – the same characteristics with an interval scale but true car manufacturer is P25,000.
zero exists.
Examples: weight, height, area

PRACTICE! 3)The average late fee for 360 credit card holders was found to
Classify each as nominal, ordinal, interval or ratio level of be P500.
measurement.

1. Size of classroom
2. Places of residence IV. State whether each is quantitative or qualitative.
3. Thermostat setting in summer 1. Number of years of service in the teaching
4. The scent of a flower profession.
5. The order of birth in the family 2. Fields of specialization of doctors in a hospital.
3. Citizenship.
4. Amount of imported rice in NFA’s warehouses
Performance Tasks 5. Representation expenses of CEOs of Pinoy
I. State five (5) instances in your daily life that you practice Investment Company
Statistics. Explain.
a. V. Identify each as discrete or continuous.
1. Student enrollment in a university
2. Number of television sets per household.
b. 3. Amount of money a college student spends on
books per semester
4. Inflation rate
c. 5. Daily tons of garbage.

VI. Determine the independent and dependent variables in the


d. following research titles: (2 pts. Each)
1. Inflation and Its Effect on Pricing Strategy of Car
Dealers in Manila
e. Independent variable:
Dependent Variable:

2. Use of Balanced Scorecard and Its Effects to the


II. In each of these statements, tell whether descriptive or Organizational Performance of Selected Small and
inferential statistics was used. Medium Enterprises in Laguna
Independent variable:
1. According to the Human Resource Department of a Dependent Variable:
certain company, the total number of employees is
5,000. 3. Level of Implementation of Purchasing Policy and Its
2. The dean recorded enrolment statistics of the Relationship to the Service Quality of Café Business
college for the last 6 semesters and then determined Independent variable:
if there will be a relative increase or decrease in the Dependent Variable:
enrolment for the next semester.
3. Advertising expenses for Jaya Com in 2019 was P10 VII. Classify each as nominal, ordinal, interval and ratio level of
million. data.
4. The average salary of a random sample of 50 private 1. License plates numbers
high school teachers in 2019 was P10,000. 2. Evaluation of students to faculty members in a
Likert Scale Examples:
3. Intelligence quotient of Managers in McDonalds
4. Cost of smart phones.
5. Brand of smart phones

SUMMATION NOTATION
In the succeeding chapters, we would be studying the
different numerical values that will describe a set of data. These
values involve formulas which will compose a symbol we call the
summation notation.

Compute for the values of the following

The summation notation is represented by the symbol


∑, which is the capital Greek letter sigma, correspondingly
equivalent to the English alphabet S. By definition,

which could be read as “the summation of x sub i as i goes from


1 to n”. Notice that the notation means adding the expression xi
, with the subscript i evaluated starting with the number at the
bottom and ending with the number on top of the summation
notation.

For example,

Practice!
Given the values

Note that subscripts are numbers used to designate or


represent each number.

PRACTICE! Compute for the values of the following


Expand each of the following summation notation.

1.

2.

3.

Properties of Summation

Performance Tasks
I. Express each of the following expression as a summation
notation: (3 points each)
II. Given the values Slovin’s formula

Compute for the values of the following. Show your solution.


(3 points each) Example:
You decided to make a study about the financial
literacy of LSPU-SPCC students. You found out from the
Registrar that there are 9,500 students enrolled this 1st
semester of school year 2020-2021. Your teacher said that the
margin of error to be used is 5%. Determine an acceptable
number of sample that you can use for your study.

From the problem, we could identify the following values


N = 9,500
e = 5% or 0.05
so, we can compute for n or the sample size as,
Methods of Data Collection and Presentation
In previous topics, we learned that data are pieces of
information gathered, recorded and utilized for the purpose of
analysis. Statisticians need data to work on and the finished
product would be a useful and intelligent guide for decision-
making. There are two types of data according to its source.

TYPES OF DATA
1. Primary – collected directly from the source or which Therefore, out of the 9,500 students, you can choose
also refer to first-hand information. 384 students to be your subject of study. You already have an
Examples: survey through questionnaire, personal acceptable number of sample in your study.
investigation
2. Secondary – collected by someone else and has Another question would be – how would you choose
undergone statistical treatment. those that are going to be part of your sample? For this, we
Examples: research journal, newspaper define the probability and non-probability sampling techniques.
Sampling technique – process used to determine which
There are different ways to collect data. The most individual members of the population could be included in the
appropriate method could depend on the kind of research one sample. There are two forms of sampling: the probability
is doing as well as the available resources for him/her. sampling where there is an equal chance for every member of
the population to be selected while non-probability sampling
METHODS OF COLLECTING DATA technique does not calculate the chance for each member to be
1. Direct method – data are gathered through the conduct of part of the sample. Though it would be advisable to have a
interview. The researcher collects data through a series of probability-based sample, other factors such as availability of
questions asked to the subject of the study. resources, cost, and time should also be considered.
2. Indirect method – data are collected through the conduct of
survey questionnaires. Questionnaires are sent or given to PROBABILITY SAMPLING TECHNIQUES
subjects of study to be answered for. 1. Random sampling – it is important for this method to identify
3. Observation method – to look at and to take note of a each member of the population since all of these would have an
behavior at the appropriate time and situation. equal chance of being selected as samples.
4. Registration – data are acquired from private and Example: If you are a researcher that will make a study on 3rd
government offices such as Philippines Statistics Office, year BSOA students in LSPU-SPCC which is composed of 160
Bangko Sentral ng Pilipinas and Local government. students, identify all the students, arrange the names in
5. Experiment – the researcher investigates if one variable alphabetical order and associate a number for each. You can
would have an effect to another variable. draw lots like in a raffle where there are the numbers of the
students in the draw lot. The number that you would be drawing
In doing one’s study, specifically if one would be would be part of your sample. You will do this until you have
dealing with a large number of population, a sample from this reached your desired number of sample.
large group of people could be chosen. An acceptable number 2. Systematic sampling – in this method, you will choose every
of sample that could be taken from a population could be member of the population with the same difference between
computed using the Slovin’s Formula. any two consecutive numbers.
Example: From the example above, the first step in systematic
random sampling is to pick an integer that is less than the total
number of the population. This will be your first subject. For 3. Purposive – the researcher selects a sample based on
example, you choose subject number 4. The next step is to their knowledge about the population and the study
choose another integer, which will be the number of individuals itself. The purpose of the research is the main
between subjects. Let’s say we choose 6. Then systematic consideration in choosing the sample.
sampling would give you your subjects as students with number 4. Snowball Sampling - where the identified research
4, 10, 16, 22, 28, and so on. subjects look for other participants of the study. This
Systematic random sampling allows researchers to method is particularly appropriate when subjects are
create samples without using a random number generator. hard to find. An example could be a study on illegal
3. Stratified sampling – the subjects are initially grouped into drug users.
different classifications such as demographic profile (gender,
age, or income status.) These groups (called strata) should be Once you have collected your data needed for your
mutually exclusive, that is, there should be no overlapping study, it is important that you know how you will present the
subjects for the groups. From each of the groups, the researcher materials you gathered. Below are the different ways on how
gets his subjects. It means that the sample will consist of all you can present your data.
categories defined by the researcher. It could be illustrated in
the figure below. METHODS OF DATA PRESENTATION
1. Textual form – data are presented in paragraph and narrative
form.
2. Tabular form – quantitative information are summarized in
rows and columns.
a. Table heading – displays the table number and title.
b. Body – main part of the table which comprises the
numerical data and information.
c. Stubs – give account to the information found in the
4. Cluster sampling – the researcher divides the population into rows of the table.
separate groups, called clusters. Then, a simple random sample d. Box heads – describe the data found in the columns
of clusters is selected from the population. The sampling of the table.
process takes a simple random sample of clusters and all units e. Footnotes – clarify information in the table that
belonging to the chosen clusters are included in the sample. An may not be clearly shown or presented from the title,
illustration of cluster sampling is shown in the figure below. captions and stubs.
f. Source note – origin of the data.

5. Multi-stage sampling – significant clusters of the chosen


subjects are split into sub-groups at various stages to make it
simpler for data collection.

3. Graphical form – data are presented in charts, graphs or


pictures utilizing the power of visual display to communicate
information efficiently. There are different types of graphs.
a. Bar graph – consists of a vertical and horizontal axis
and displays data as rectangular bars with lengths
proportional to the values that they represent.

NON-PROBABILITY SAMPLING
1. Convenience (accidental) sampling – from the name
itself, it can be deduced that this sampling method is
based on what is convenient to the researcher. An
example is when a researcher chooses samples who
are readily accessible to him.
2. Quota – the selected sample has the same
proportions of subjects as the entire population with
respect to known characteristics or traits.
b. Line graph – represents continuous data and Make a stem-and-leaf plot.
appropriate for predicting future events over time. Arranging the data from highest to lowest we have 14 18
24
15 19 31
15 19 32
17 20 35
18 20 40
18 21 49

Then the stem would be the first digits. And


c. Pie chart – useful when one wants to represent correspondingly arranging the leaves, we will have our final
proportions of the total being considered.\ stem-and leaf plot as,
1 455788899
2 0014
3 125
4 09

FREQUENCY DISTRIBUTION TABLE


In this section, we will be discussing on the
construction of a table displaying the data organized in
categories with their corresponding frequencies. It is called the
frequency distribution table.
To construct a frequency distribution table, consider
the example below while understanding the steps in the
d. Pictograph – uses symbols or pictures to represent the construction of a frequency distribution table.
frequency or percentage of the obtained numerical Example: Mrs. Valenzuela made a survey on the public
information. utility (tricycle) drivers in San Pablo City, Laguna. She recorded
the ages of her participants as follows:
23 45 42 55
28 49 49 58
38 45 45 29
28 50 25 26
24 46 43 60
42 43 32 57
44 43 42 43
49 41 26 29
e. Stem-and-leaf diagram - also called a stem-and-leaf plot,
49 59 52 20
invented by John Tukey is a graphical display that summarizes
22 45 59 25
data while maintaining the individual data points. In the said
1. Determine the lowest (LV) and highest value (HV) in the data
diagram, the "stem" is a column of the unique elements of data
set. Compute the range, R = HV-LV. In the above example,
after removing the last digit. The final digits ("leaves") of each
R = 60 – 20 = 40
column are then placed in a row next to the appropriate column
2. Determine the number of classes or categories, k, using the
and sorted in numerical order.
formula k=√𝑁, where N is the number of observations in the
Steps in Constructing a Stem and Leaf Plot
data set. Round-off k to the nearest integer. In this case, total
1. Divide each data into two parts which is assumed to be the
number of data is 40, therefore,
stem and leaf.
k = √40 = 6.32 rounded off to 6
2. List the stem in column. Then write a vertical line on its right.
3. Calculate the class size, c, using the formula c=R/k. Round off
3. For each number, record the leaf portion in the same row as
c to the nearest value with precision the same as those of the
its corresponding stem.
raw data. Since we already have the values for R and k, then
4. Order the leaves from lowest to highest.
Note that it would be easier to make a stem-and-leaf plot if you
first arrange the data from lowest to highest.
Example: Daniel, the owner of King of Hearts Milk Tea but since the raw data are expressed in whole numbers , we will
recorded the age of those who ordered online in one round off the value of c to 7. Therefore c = 7.
day. 4. Choose an appropriate lower limit for the first class interval
18 24 35 LL1. It is appropriate to select the lowest value as the lower limit
49 14 15 of the first class interval. Thus, in our example, LL1 = 20.
17 20 19 5. Find the upper class limits. (UL1 = LL1 + c – 1/10X where x is
18 21 18 the number of decimal place of raw data)
19 20 31
15 32 40 100
We can now identify the classes or categories starting with the 8. SWe can also compute the less than (<) or greater than (>)
first interval from (4) and (5) and adding the class size to cumulative frequency. It is obtained by adding successively the
succeeding lower and upper class limits, as shown below, frequencies of all the classes. The < cumulative frequency starts
Class Intervals from the lowest to the highest class interval while the >
20 – 26 cumulative frequency starts adding the frequency from the
27 – 33 highest to the lowest class interval.
34 – 40
41 – 47
48 – 54
55 – 61

6. Determine the class boundaries. The class boundaries are the


true limits of a class interval made up of the lower class
boundary and upper class boundary. The class boundary is the
midway between the upper limit and the lower limit of the next 9. There is also what we call relative frequency. Relative
higher class interval. frequencies can be written as fractions, percents, or decimals.
We can compute this by dividing each frequency by the total
number of observations. Thus, we have.
Computing for the first class interval, we have

Therefore, adding the true class boundaries column, \

Therefore, we can present the whole frequency distribution


table as shown below:

Notice that the upper true class boundary is equal to the lower
true class boundary of the succeeding class interval.

Find the class mark or midpoint of each class interval as follows:

For the first class interval, we can compute the class mark as GRAPHICAL REPRESENTATION OF A FREQUENCY
DISTRIBUTION
1. Histogram – a bar graph plotting the class mark on the
Adding another column for class mark, we have horizontal or x-axis and the corresponding frequency on the
vertical or y-axis. In our example, we can represent the
frequency distribution as follows:

7. Tally the raw scores and indicate the frequency for each of
the class intervals. Add the frequencies and indicate the sum.
2. Frequency Polygon – a line graph plotting the frequencies
against the class mark. 3. You want to find if there is a relationship between the number
of hours a student sleeps and her academic performance.

4. You want to know the percentages of household with


different income levels there are in the Philippines.

5. You give your classmates randomly two kinds of cookies, one


from your recipe and another from your mother. You want to
3. Less/Greater than cumulative frequency polygon or ogive–
know which students will return for a second cookie.
a line graph with lower class boundary on the x-axis and the
cumulative frequency on the y-axis.

II. Suppose that there are 2,400 city government employees and
you want to survey them to find out which tools are best suited
to their jobs. You decide that a margin of error of 0.05 is
considerable. Using Slovin's formula, how many employees
would you survey? Show your complete solution. (5 points)

4. Relative frequency polygon – a line graph with class mark on III. Miss Bernardo, a guidance counselor is studying the mental
the x-axis and the relative frequency on the y-axis. wellbeing of the 7,540 students at her university during the
Covid 19 pandemic. She decides to start by asking a random
sample of 60 students. (3 points each)
Identify the type of sampling in each of the following survey
methods.

a) The psychologist assigns each student a number from 1 to


7,540. She selects the sample by randomly choosing one of the
first 100 numbers and every 110th number thereafter.

b) The guidance counselor assigns each student a number from


0001 to 7540 and uses a computer to randomly generate a list
Performance Tasks of 60 numbers to select the students for the sample.
I. For each situation, identify the appropriate method of data
collection you would use. Explain your answer. (3 points each)

c) Students are listed by the barangay they live in. The guidance
1. You want to know if students who enter the classroom early
counselor randomly selects six barangays and then randomly
would sit on the front part of the classroom.
selects five students from each one.

2. You want to determine if the movie “Through Night and Day”


d) An equal proportion of students are randomly selected from
is a good movie.
each discipline.
IV. A sample of fifty customers at a newly-opened supermarket Learning Resources
has been selected at random. The • Sirug, W. S. (2015). Introduction to Business
following data show the customers’ ages. Statistics: A Comprehensive Approach, Revised Edition. Manila:
Mindshapers Co., Inc.
12 23 19 21 35 • Bueno, D. C. (2016). Introduction to Statistics
10 21 27 23 21 (Concepts and Applications in Research). Quezon City: Great
23 16 23 11 13 Books Trading.
41 32 21 21 39 • Cabero, J. B. et. al. (2013). Business Statistics.
59 42 27 28 60 Mandaluyong City: Anvil Publishing, Inc.
37 28 29 19 29 • Barradas, J. (n.d.). Advanced Statistics. San Pablo
33 20 27 18 53 City: San Pablo Colleges
23 18 19 50 47 • James, G. (2009). An Introduction to Statistical
25 17 34 46 21 Learning with Applications in R. New York: Springer
48 26 47 14 52 • Population Definition. Retrieved from
https://www.investopedia.com/terms/p/population.asp
1. Construct a stem-and-leaf plot. (5 points) • Independent and Dependent Variable. Retrieved
from https://www.statisticssolutions.com/independent-and-
dependent-variables/

------------------------------------------------------------------------------------
---------------------------------------
Intellectual Property

2. Construct a complete frequency distribution table. Show This module is for educational purpose only. Under section
solution for each column. (20 points) Sec. 185 of RA 8293, which states, “The fair use of a
copyrighted work for criticism, comment, news reporting,
teaching including multiple copies for classroom use,
scholarship, research, and similar purposes is not an
infringement of copyright”.

The unauthorized reproduction, use, and dissemination of this


module, without joint consent of the authors and LSPU, is
3. Represent the frequency distribution table in no. 2 using a strictly prohibited and shall be prosecuted to the full extent of
a. Histogram (5 points) the law, including appropriate administrative sanctions, civil,
and criminal.
------------------------------------------------------------------------------------
---------------------------------------

b. Cumulative frequency polygon (5 points)

Self-Assessment

You might also like