Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Methods of Data Collection

Collection of Data ● 1st step in conducting statistical inquiry

● Refers to the data gathering
● Involves acquiring information published literature, surveys through
questionnaires or interviews, experiments, documents and records,
tests or examinations and other forms of data gathering instruments

Data gathering Systematic method of collecting and measuring data from different
sources of information in order to provide answers

Investigator The person who conducts the inquiry.

Enumerator The one who helps in collecting information

Respondent Where information is collected

3 basic methods of collecting data

Retrospective study Would use the population or sample of the historical data which had been
archived over some period of time

Observational study Process or population is observed and disturbed as little as possible, and
the quantities of interests are recorded

Design experiment Deliberate or purposeful changes in the controllable variables of the

system or process are done. The resulting system output data must be
observed, and an inference or decision about which variables are
responsible for the observed changes in output performance is made

Planning and Conducting Surveys

Survey Method of asking respondents some well-constructed questions. It is an

efficient way of collecting information and easy to administer wherein a
wide variety of information can be collected. The researcher can be
focused and can stick to the questions that interest him and are necessary
in his statistical inquiry or study

Can be done via face-to-face interviews or self-administered through the

use of questionnaires

Steps in designing a survey 1. Determine the objectives of your survey: What questions do you want
to answer?

2. Identify the target population sample: Whom will you interview? Who
will be the respondents? What sampling method will you use?

3. Choose an interviewing method: face-to-face interview, phone

interview, self administered paper survey, or internet survey.

4. Decide what questions you will ask in what order, and how to phrase

5. Conduct the interview and collect the information.

6. Analyze the results by making graphs and drawing conclusions.

Sampling Process of selecting units (e.g. people, organizations) from a population of


Sample Representative of the target population

Target population Entire group a researcher is interested in. The group about which the
researcher wishes to draw conclusions

2 ways of selecting a sample

Non-probability sampling ● Judgment or subjective sampling.

● Convenient and economical but the conclusions made are not so
● Most common types: convenience sampling, purposive sampling and
quota sampling
Convenience sampling The researcher uses a device in obtaining the information from the
respondents which favors the researcher but can cause bias to the

e.g. Online Poll, surveying your friend.

Purposive sampling The selection of respondents is predetermined according to the

characteristic of interest made by the researcher.

e.g. interviewing group of ME students regarding ME course

Quota sampling Proportional quota sampling

- major characteristics of the population by sampling a proportional
amount of each is represented.
- e.g. if you know the population has 40% women and 60% men,
and that you want a total sample size of 100, you will continue
sampling until you get those percentages and then you will stop.

Non-proportional quota sampling

- less restrictive
- A minimum number of sampled units in each category is specified
and not concerned with having numbers that match the proportions
in the population.

Probability sampling ● Every member of the population is given an equal chance to be

selected as a part of the sample.
● Several probability techniques: simple random sampling, stratified
sampling and cluster sampling.

Simple random sampling Basic sampling technique where a group of subjects (sample) is selected
for study from a larger group (population). Each individual is chosen
entirely by chance and each member of the population has an equal
chance of being included in the sample.

Stratified sample Obtained by taking samples from each stratum or sub-group of a

population. When a sample is to be taken from a population with several
strata, the proportion of each stratum in the sample should be the same
as in the population.

Cluster sampling Sampling technique where the entire population is divided into groups, or
clusters, and a random sample of these clusters are selected. All
observations in the selected clusters are included in the sample

Systematic sampling Probability sampling method where researchers select members of the
population at a regular interval.

Planning and Conducting Experiments: Introduction to Design of Experiments

Design of Experiments (DOE) ● Tool to develop an experimentation strategy that maximizes learning
using minimum resources.
● Widely and extensively used by engineers and scientists in improving
existing processes through maximizing the yield and decreasing the
variability or in developing new products and processes.
● It is a technique needed to identify the "vital few" factors in the most
efficient manner and then directs the process to its best setting to
meet the ever-increasing demand for improved quality and increased

Six stages to be carried out for ● Describe - identifying objectives and important factors that is relevant
the design of experiments in carrying out the experiment
● Specify - determining the best setting in accomplishing the objective
of the experiment
● Design - design model process that will be used in the experiment
and conduct initial run test
● Collect - generate and record data runs
● Fit - validate the result of trial thru conducting additional run to
confirm if objectives were achieved.
● Predict
Basic Concepts of Probability and Statistics

Probability Branch of mathematics concerned with theories of uncertainty, ways of

measuring uncertainty and the application of techniques involving uncertainty.

Statistics Branch of mathematics that examines and investigates ways to process and
analyze the data gathered.

Two major areas of statistics

Descriptive statistics includes those methods concerned with collecting, organizing, summarizing
and presenting data without drawing inference about a large group.

Inferential statistics refers to those methods concerned with the analysis of a subset of data
(Inductive Statistics or leading to predictions and inferences about the entire set of data.
Statistical Inference)

Statistical terms

Population consist of the totality of the observations with which we are concerned

Sample collection of some of the elements obtained from the population

Parameter any numerical value describing a characteristics of a population

Statistic any numerical value describing a characteristic of a sample

Constant characteristics or properties where the members of the population are the

Variable is a characteristics that changes or varies over time for different individuals or
objects under consideration

Types of variable

1. Qualitative variables measure a quality or characteristic on each experimental unit

e.g. eye color, gender

2. Quantitative variables measure a numerical quantity or amount on each experimental unit

e.g. number of accidents, volume in a glass, weight of package

a. Discrete variable countable number of values

e.g. number of family members

b. Continuous Uncountable number of values

e.g. time, distance, volume, height

Variables according to scale of measurement

NominaIl Values represent categories with no inherent order

e.g. gender, civil status

Ordinal Values represent categories with inherent order (ranking)

e.g. educational background, quality of service, grades

Interval Values represent ordered categories with equal intervals between them

e.g. temperature

Ratio Comparing the z variables

e.g. employment size

Data gathering instruments 1. Questionnaires

2. Interviews
3. Experiments
4. Observations
Sampling method 1. Simple Random Sampling
2. Stratified Random Sampling
3. Systematic Random Sampling
4. Cluster Sampling
5. Stage Sampling
6. Slovin’s Formula

How to present data? 1. Graphs

2. Table charts

Graphs for Qualitative data

Graphs for Qualitative Data ● What values of the variable have been measured
● How often each values has occurred

Three measures available for this purpose

Frequency the number of times a score or group of score (class) occurs in a population or

Relative Frequency the frequency of one score or group of scores divided by the total frequency of
all the observations

Relative Frequency = frequency/n

n = sum of frequencies

Percentage of The percentage of measurement in each category


Graphs for Qualitative Data

Describing data by the amount measured in each category

Pie Chart circular graph that shows how the measurements are distributed among the

displays how the total quantity is distributed among


one sector of a circle is assigned to each category; the angle of each sector
should be proportional to the proportion of measurements (relative frequency)
in that category

Angle =relative frequency x 360°

Bar Chart the height of the bar measures how often a particular category was observed

uses the height of the bar to display the amount in a

particular category

Describing data by time series

Line Chart When a quantitative variable is recorded overtime at equally spaced intervals
(such as daily, weekly, monthly, quarterly), the data set forms a time series.
Time series data are most effectively presented on a line chart with time as
the horizontal axis.

Describing data by frequency of occurrence

Relative Frequency For a quantitative data set is a bar graph in which the height of the bar
Histogram represents the proportion or relative frequency of occurrence for a particular
class or sub-interval being measured. The classes or sub-intervals are plotted
along the horizontal axis.

Frequency Distribution For ungrouped data It is a tabulation of data showing the frequency of
occurrence of the different values of the variable.

For grouped data It is a tabulation of data showing the number of

observations that fall in each of the classes.

Class/Class Interval a symbol defining the arbitrary groupings.

e.g. 9-11

Class Limits the end numbers of the class or class interval.

e.g. 9 and 11 where 9 is the lower class limit and 11 is
the upper class limit

Class Interval Size difference between two successive lower class limits or
two successive upper class limits.

Class Boundary halfway between the lower limit of one class and the
upper limit of the preceding. It is the exact limit.
e.g. In the interval 9-11, 8.5 is the lower class
boundary and 11.5 is the upper class boundary

Class Mark the midpoint between the upper and lower class
boundaries or class limits of a class interval

Class Width the difference between upper and lower class

boundaries of a class interval

Class Frequency (f) the number of observations falling in a particular class.

Relative Frequency the frequency of one observation or group of

observations divided by the total frequency of all

Cumulative the frequency of any class plus the frequencies of all

Frequency preceding classes in a distribution.

Histogram a vertical bar graph that shows the frequencies of

scores or classes of scores by the height of the bar.

Frequency Polygon a graph on which the frequencies of classes are

plotted at the class mark and the class marks are
connected by straight lines.

The choice of a graph

Histogram preferred when only one distribution is to be presented.

Frequency Polygon more useful and better in comparing two or more distributions graphically on
the same axes.

The ogive useful in making estimates of quantities, medians and other similar points of
relative positions.

The Pie diagram useful when one wishes to picture proportions in a striking way.
(circle graph)

Properties of Frequency Distribution

Properties of Frequency Frequency distributions differ from each other in terms of their four
Distribution important properties: central location, variation, skewness and

Central Location refers to the value near the center of frequency distribution.

Variation refers to the extent of spreading out of individual measures from the measure
of central tendency.

Kurtosis refers to the flatness or peakedness of one distribution relative to another.

Skewness refers to the symmetry or asymmetry of a frequency distribution.

Three Measures of Central Tendency

Three Measures of Central Any measure indicating the center of a set of data arranged
Tendency in an increasing or decreasing order of magnitude.
Mean the arithmetic average of all the scores or groups of scores in a distribution. It
is denoted by the symbol (μ) for population mean and X-bar for sample mean.

Ungrouped data Grouped data

X = score or measures in the series

n = number of measures in the series
f = frequency
M = midpoint or classmark
A = assumed mean

Median point on the scale of measurement that divides a series of ranked

observations into halves such that half of the observations fall above it and the
other half fall below it.

Ungrouped data Grouped data

L = exact lower limit of the class containing the median

fc = sum of all frequencies below L
fm = frequency of the class interval containing the median
n = number of cases or observations
i = class interval

Mode point on the measurement scale with the maximum frequency in the given

Ungrouped data Grouped data

Mode is the measurement

which occurs most frequently

Basic Concepts of Probability

Basic Concepts of Probability chance of something will happen.

Definition 1

sample space (S) The set of all possible outcomes of a statistical experiment

Element, Sample point, Each outcome in a sample space

Member (of the sample space)

Statement or rule method Describes a sample space with a large or infinite number of sample points

Definition 2

Event subset of a sample space.

Definition 3

The complement of an event A S = {book,cellphone,mp3, paper,stationery,laptop}

with respect to S is a subset of
all elements of S that are not in Let A = {book, stationery, laptop, paper}
A. We denote the complement
of A by the symbol A’. Then the complement of A is:
A’ = {cellphone, mp3}

Definition 4

Intersection The intersection of two events A and B denoted by the symbol

A∩B, is the event containing all elements that are common to A and B.
Definition 5

Disjoint Two events A and B are mutually exclusive, or disjoint, if A∩B = Ф, that is,
if A and B have no elements in common.

Definition 6

Union The union of the two events A and B, denoted by the symbol AUB, is the
event containing all the elements that belong to A or B or both.

Let A = {a,b,c} and B = {b,c,d,e}; then A U B = {a,b,c,d,e}

Counting Sample Points

Theorem 1 If an operation can be performed in n1 ways, and if for each of these ways
a second operation can be performed in n2 ways, then the two operations
can be performed together in n1n2 ways.

How many sample points are there in the sample space when a pair of dice
is thrown once?
N1= 6
N1N2 = 6(6) = 36 samples

Theorem 2 If an operation can be performed in n1 ways, and if for each of these a

second operation can be performed in n2 ways, and for each of the first two
a third operation can be performed in n3 ways, and so forth, then the
sequence of k operations can be performed in n1n2...nk ways.

Theorem 3 A permutation is an arrangement of all or part of a set of objects. The

number of permutations of n objects is n!

Example: In how many ways can 5 examinees be lined up to go inside the

testing centers?

5! = 120 ways

Theorem 4 The number of permutations of n distinct objects taken r at a time is

Theorem 5 The number of permutations of n object arranged in a circle is (n-1)!

Permutations that occur by arranging objects in a circle are called circular

Example: In how many ways can 6 students be seated in a round dining

(6-1)! = 120 ways

Theorem 6 The number of distinct permutations of n things of which n1 are of one kind,
n2 of a second kind,..., nk of a kth kind is

Theorem 7 The number of ways of partitioning a set of n objects into r cells with n1
elements in the first cell, n2 elements in the second, and so forth, is

Theorem 8 The number of combinations of n distinct objects taken r at a time is

Theorem 9 If an experiment can result in any of N different equally likely outcomes,
and if exactly n of these outcomes correspond to event A, then the
probability of event A is

Theorem 10 If A and B are any two events, then

P(A U B) = P(A) + P(B) - P(A∩B)

Corollary 2.1: If A and B are mutually exclusive, then

P(A Ս B) = P(A) +P(B)

Corollary 2.1: is an immediate result of theorem 10, since if A and B

are mutually exclusive, A∩B=0 and then P(A∩B) = P(Ф)=0

Corollary 2.2: If A1,A2,...,An are mutually exclusive then,

P(A1UA2U,...U An) = P(A1) + P(A2) +...P(An)

Theorem 11 For three events A, B and C ,

P(A U B U C) =
P(A) + P(B) + P(C) - P(A∩B) - P(A∩C) - P(B∩C) + P(A∩B∩C)

Theorem 12 If A and A’ are complementary events, then

P(A) + P(A’) = 1

Since A U A' = S and the sets A and A' are disjoint

Theorem 13 If in an experiment the events A and B can both occur, then

P(A Ո B) = P(A)P(B|A)

provided P(A) > 0

P(A) = P(A∩B)/ P(B|A)

P(B|A) = P(A∩B)/P(A)

Theorem 14 Two events A and B are independent if and only if

P(A Ո B) = P(A)P(B)

Theorem 15 If in an experiment, the events A1,A2,...,Ak can occur, then

P(A1ՈA2ՈA3...Ո Ak) =

If the events A1,A2,...,Ak are independent, then

P(A1ՈA2Ո...Ո Ak) = P(A1)P(A2)...P(Ak)....

Theorem 16 If events B1, B2,..,Bk constitute a partition of the sample space S such that
P(Bi) ≠ 0 for i = 1,2,...,k, then for any
event A of S,

P(A) = ∑ P(Bi∩A) = ∑ P(Bi)P(A│Bi)

Theorem 17 If events B1,B2,..,Bk constitute a partition of the sample space S such that
P(Bi) ≠ 0 for i = 1,2,...,k, then for any event A of S such that P(A) ≠ 0.

P(Br│A) = P(Br∩A)/∑ P(Bi∩A)

= P(Br)P(A│Br)/∑ P(Bi)P(A│Bi)

You might also like