Professional Documents
Culture Documents
Eda 1
Eda 1
Data gathering Systematic method of collecting and measuring data from different
sources of information in order to provide answers
Retrospective study Would use the population or sample of the historical data which had been
archived over some period of time
Observational study Process or population is observed and disturbed as little as possible, and
the quantities of interests are recorded
Steps in designing a survey 1. Determine the objectives of your survey: What questions do you want
to answer?
2. Identify the target population sample: Whom will you interview? Who
will be the respondents? What sampling method will you use?
4. Decide what questions you will ask in what order, and how to phrase
them.
Target population Entire group a researcher is interested in. The group about which the
researcher wishes to draw conclusions
Simple random sampling Basic sampling technique where a group of subjects (sample) is selected
for study from a larger group (population). Each individual is chosen
entirely by chance and each member of the population has an equal
chance of being included in the sample.
Cluster sampling Sampling technique where the entire population is divided into groups, or
clusters, and a random sample of these clusters are selected. All
observations in the selected clusters are included in the sample
Systematic sampling Probability sampling method where researchers select members of the
population at a regular interval.
Design of Experiments (DOE) ● Tool to develop an experimentation strategy that maximizes learning
using minimum resources.
● Widely and extensively used by engineers and scientists in improving
existing processes through maximizing the yield and decreasing the
variability or in developing new products and processes.
● It is a technique needed to identify the "vital few" factors in the most
efficient manner and then directs the process to its best setting to
meet the ever-increasing demand for improved quality and increased
productivity.
Six stages to be carried out for ● Describe - identifying objectives and important factors that is relevant
the design of experiments in carrying out the experiment
● Specify - determining the best setting in accomplishing the objective
of the experiment
● Design - design model process that will be used in the experiment
and conduct initial run test
● Collect - generate and record data runs
● Fit - validate the result of trial thru conducting additional run to
confirm if objectives were achieved.
● Predict
Basic Concepts of Probability and Statistics
Statistics Branch of mathematics that examines and investigates ways to process and
analyze the data gathered.
Descriptive statistics includes those methods concerned with collecting, organizing, summarizing
and presenting data without drawing inference about a large group.
Inferential statistics refers to those methods concerned with the analysis of a subset of data
(Inductive Statistics or leading to predictions and inferences about the entire set of data.
Statistical Inference)
Statistical terms
Population consist of the totality of the observations with which we are concerned
Constant characteristics or properties where the members of the population are the
same
Variable is a characteristics that changes or varies over time for different individuals or
objects under consideration
Types of variable
Interval Values represent ordered categories with equal intervals between them
e.g. temperature
Graphs for Qualitative Data ● What values of the variable have been measured
● How often each values has occurred
Frequency the number of times a score or group of score (class) occurs in a population or
sample
Relative Frequency the frequency of one score or group of scores divided by the total frequency of
all the observations
Pie Chart circular graph that shows how the measurements are distributed among the
categories
one sector of a circle is assigned to each category; the angle of each sector
should be proportional to the proportion of measurements (relative frequency)
in that category
Bar Chart the height of the bar measures how often a particular category was observed
Line Chart When a quantitative variable is recorded overtime at equally spaced intervals
(such as daily, weekly, monthly, quarterly), the data set forms a time series.
Time series data are most effectively presented on a line chart with time as
the horizontal axis.
Relative Frequency For a quantitative data set is a bar graph in which the height of the bar
Histogram represents the proportion or relative frequency of occurrence for a particular
class or sub-interval being measured. The classes or sub-intervals are plotted
along the horizontal axis.
Frequency Distribution For ungrouped data It is a tabulation of data showing the frequency of
occurrence of the different values of the variable.
Class Interval Size difference between two successive lower class limits or
two successive upper class limits.
Class Boundary halfway between the lower limit of one class and the
upper limit of the preceding. It is the exact limit.
e.g. In the interval 9-11, 8.5 is the lower class
boundary and 11.5 is the upper class boundary
Class Mark the midpoint between the upper and lower class
boundaries or class limits of a class interval
Frequency Polygon more useful and better in comparing two or more distributions graphically on
the same axes.
The ogive useful in making estimates of quantities, medians and other similar points of
relative positions.
The Pie diagram useful when one wishes to picture proportions in a striking way.
(circle graph)
Properties of Frequency Frequency distributions differ from each other in terms of their four
Distribution important properties: central location, variation, skewness and
kurtosis.
Central Location refers to the value near the center of frequency distribution.
Variation refers to the extent of spreading out of individual measures from the measure
of central tendency.
Three Measures of Central Any measure indicating the center of a set of data arranged
Tendency in an increasing or decreasing order of magnitude.
Mean the arithmetic average of all the scores or groups of scores in a distribution. It
is denoted by the symbol (μ) for population mean and X-bar for sample mean.
Mode point on the measurement scale with the maximum frequency in the given
distribution
Definition 1
sample space (S) The set of all possible outcomes of a statistical experiment
Statement or rule method Describes a sample space with a large or infinite number of sample points
Definition 2
Definition 3
Definition 4
Disjoint Two events A and B are mutually exclusive, or disjoint, if A∩B = Ф, that is,
if A and B have no elements in common.
Definition 6
Union The union of the two events A and B, denoted by the symbol AUB, is the
event containing all the elements that belong to A or B or both.
Theorem 1 If an operation can be performed in n1 ways, and if for each of these ways
a second operation can be performed in n2 ways, then the two operations
can be performed together in n1n2 ways.
How many sample points are there in the sample space when a pair of dice
is thrown once?
N1= 6
N2=6
N1N2 = 6(6) = 36 samples
5! = 120 ways
Theorem 6 The number of distinct permutations of n things of which n1 are of one kind,
n2 of a second kind,..., nk of a kth kind is
Theorem 7 The number of ways of partitioning a set of n objects into r cells with n1
elements in the first cell, n2 elements in the second, and so forth, is
P(A U B U C) =
P(A) + P(B) + P(C) - P(A∩B) - P(A∩C) - P(B∩C) + P(A∩B∩C)
P(A Ո B) = P(A)P(B|A)
Theorem 16 If events B1, B2,..,Bk constitute a partition of the sample space S such that
P(Bi) ≠ 0 for i = 1,2,...,k, then for any
event A of S,
Theorem 17 If events B1,B2,..,Bk constitute a partition of the sample space S such that
P(Bi) ≠ 0 for i = 1,2,...,k, then for any event A of S such that P(A) ≠ 0.
= P(Br)P(A│Br)/∑ P(Bi)P(A│Bi)