Handout-A-Preliminaries (Advance Statistics)

EFREN S.
TELLERMO-Course Professor
 Used to mean numerical facts presented in forms
such as in tables or graphs.
 a calculation on a collection of numerical values.
 a methodology for arranging data in a format
useful for decision making.
 Statistical procedures can be very analytical and
use theoretical information from probability
functions to make decisions where randomness
plays a part in observed outcomes.
 experimental analysis in the traditional sciences and
social sciences.
 quality control.
 forecasting for the purpose of planning (business,
government, etc.)
 statistical reports of business activities
 estimating
 testing
 any procedure which relies on sampling
 simulation and experimentation
 useful for communicating information through
statistical data presentation techniques.
 useful in understanding the techniques based on
sampling which are used by decision-makers in
your field of study, your workplace, and the
world around you and to apply them yourselves.
 To be technically literate in a complex technical
world, a person should understand the meaning
of the statistical measures on which decisions
are based.
-is concerned with collecting, organizing, summarizing,
analyzing and interpreting data for the purpose of
making accurate decisions in the face of uncertainty.
Descriptive Statistics
-involves methods of organizing, picturing and
summarizing information form samples/population.
Inferential Statistics
-involves methods of using information from a
sample to draw conclusions regarding the population.
Population
A collection of all possible individuals, objects, or
measurements of interest.
Sample
A selection of some of the objects from the population.
Measurements/observations from part of the population.
Population parameter
A numerical measure that describes an aspect of population.
Sample statistic
A numerical measure that describes an aspect of a sample.
-refers to the kinds of information researchers
obtain on the subjects of their research.
Qualitative Data (Usually non-numerical labels or categories called attributes) -
referred to as being qualitative when the observations made are arrived at by
classifying according to category or description.
Example:
 ˆThe religious denomination of community members is to be recorded and analyzed.
(Example Data: “Christians”, “Muslim”, ...)
 The ranks of military personnel at an armed forces base are recorded. (Example Data:
“Corporal”, “Sergeant”, “Corporal”, ...)
Quantitative Data (Numerical observations) - data are called quantitative when

the observations are arrived at by either measuring or counting.
Example:
 The volume of fuel purchased by customers at a self serve gas bar was to be analyzed.
The volume was measured and displayed by a device on the gas pump.
 The number of phone calls received per month per household for charity solicitation
in a certain neighbourhood was analyzed. A random sample of households were
asked to record a log of their charity phone calls.
Discrete Data (Counts)
-data are called discrete when the possible values of the variable
are countable. It often results from a counting process. In this case the
variable’s value is known exactly. Assuming the counting is done
accurately, a count should have no error attached to it.
Example:
 The number of children per household.
 The number of magazines subscribed to by a household.
 The number of birds visiting a bird feeder in an hour.
Continuous Data (Measurements)

-data are called continuous when the variable can assume any
real number. Such data often result from a measuring process. Because
of the limitations of the measuring process, the value of an observation
can only be determined to the precision of the device used to record
the measurement. All measurements contain some error.
Example:
 height, weight, volume
 Nominal level this applies to data that consists of names, labels, or categories. In this form,
data can only be categorized such as by religious denomination, political affiliation, etc.
Outside of counting the number of observations in each category, there are very few
arithmetic calculations that can be done on this data.
 Ordinal level –this applies to data that can be arranged in order or can be rank ordered as
well as counted. However, differences between data values either cannot be determined or
meaningless. An example is rating information with the options of good, average, poor.
 Interval level - this applies to data that can be arranged in order but the differences is
meaningful. Data can be quantified to the extent that it can be placed on a number scale.
The number scale has the limitation of not having a meaningful zero point for comparison
purposes. The Celsius or Fahrenheit temperature scale is a an example of this. Zero degrees
on these temperature scales does not represent an absence of temperature. A rating scale
for things like consumer preference is another example.
 Ratio level is the most precise level of measurement. Data in this form can be placed on a
number line with a meaningful zero point. A meaningful zero indicates the absence of a
quantity. The weight of the net contents of a packaged consumer product is an example.
Problem set 1. Classify the data produced by the following variables as
qualitative/quantitative and for a quantitative variable decide on whether it is discrete or
continuous. Identify the level of measurement (nominal/ordinal/interval/ratio) of each
variable.
qualitative/ discrete/ nominal/ordinal/
Items quantitative continuous interval/ratio .
1. Scores in a test
2. Height of GS students
3. Color of hair of Cebuanos
4. Favourite foods of the native
people
5. Score in a basketball game
6. Weight of the box of apples
7. Travel time from school to
home
8. Body temperature
9. Socio-economic status of the
Guimarasnons
10. Grades of the students in
Statistic
For descriptive purposes, sometimes a single number
is used to represent an entire array of observations. A
statistical measure of the center of distribution is a
value that is representative of the entire array of
observations. Another name for a measure of the
center is an average value. As the name suggests,
there is a tendency for a collection of observations to
cluster around some central value.
Refers to the “Middle” Value Or perhaps a
typical value of the data, and is measured using
the mean, median, or mode. Each of these
measures is calculated differently, and the one
that is best to use depends upon the situation.
 Also known as the arithmetic mean.
 If the word “average” is used without a qualifier,
the arithmetic mean is the average meant.
 It has the property of being the x-value at the
“balance point” or “center of gravity” of the
distribution curve.
 The mean requires a quantitative variable,
typically at the interval/ratio level of measurement.
 The mean is the most commonly-used
measure of central tendency. When we
talk about an "average", we usually refer
to the mean.
 Sometimes it is useful to give more
weighting to certain data points, in which
case the result is called the weighted
arithmetic mean.
𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
∑x
X= n
Where: ∑ – Summation
x – scores
n – Total number of scores
Advantages:
-Best measure for regular distribution, it is the
most reliable, stable and with the least probable error. It
is the most generally recognized measure of central
tendencies.
Disadvantages:
-Mean does not supply information about the
homogeneity of the group. The more heterogeneous the
set of observation or group of individuals is, the less
satisfactory is the mean as measure of central
tendency.
• The median is determined by sorting the data set
from lowest to highest values and taking the data
point in the middle of the sequence. There is an
equal number of points above and below the median.
• The median is that value in the distribution such that
half of the observations are less than this value and
half are greater than this value.
• For example, in the data set {1,2,3,4,5} the median is
3; there are two data points greater than this value
and two data points less than this value. In this case,
the median is equal to the mean
Determining the Position of Median:
N+1
Position =
2
Example:
4, 6, 7, 8, 10, 12, 15, 21
Advantages:
-Is the best measure of central tendency when the
distribution is irregular / skewed. It may be located in an
open-end distribution or when the data are incomplete
Disadvantages:
- It has larger possible error than the mean. It
does not lend itself to algebraic treatment.
• The mode is that value of the variable that
occurs the most often. Since it occurs the most
often, it is the x value with the greatest
frequency on the frequency polygon.
• The mode is the most frequently occurring
value in the data set. For example, in the data
set {1,2,3,4,4}, the mode is equal to 4. A data
set can have more than a single mode, in
which case it is multimodal. In the data set
{1,1,2,3,3} there are two modes: 1 and 3
• The mode may not be unique since a
distribution may have more than one mode.
• There is no calculation required to find the
mode since it is obtained by inspection of the
data.
• For data measured at the nominal level, it is
the only average that can be found.
Advantages:
- The mode is always a real value since it is simple
to approximate by observation.
Disadvantages:
- The mode is inapplicable to a small number of
cases when the values may not be repeated.
 The range is obtained by computing the
difference between the largest observed value
of the variable in a data set and the smallest
one.
 Range = Max −Min.
▪ Example 1.
 What is the range of the 7

participants in bike race had the
following finishing times in
minutes? 28, 22, 26, 29, 21, 23, 24.
sd = (x-x)n2
 It is a measure of how spread out numbers are.

 It tells how far the typical data point in a
distribution strays from the mean distribution.
 Formula :
2
෌ (𝑥−𝑥)
sd =
𝑛
 Compute for the standard deviation of the
scores of 10 randomly selected students in a
test:
44, 50, 38, 96, 42, 47, 40, 39, 44, 50

X X – 49 ( x-49)²
44 -5 25 sd=
2416
9
50 1 1
38 -11 121 268.44
sd=
96 45 2025
42 -7 49 sd= 16.38
47 -2 4
40 -9 81
39 -10 100
46 -3 9
50 1 1
2416
Problem set 2
The following are the grades of the of the

students: 75, 77, 78, 79, 80, 81, 83, 84, 95, 88.
Compute for
 mean, median, and mode
 Range and standard deviation

Handout-A-Preliminaries (Advance Statistics)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handout-A-Preliminaries (Advance Statistics)

Uploaded by

Copyright:

Available Formats

EFREN S.

Quantitative Data (Numerical observations) - data are called quantitative when

Continuous Data (Measurements)

 What is the range of the 7

 It is a measure of how spread out numbers are.

44, 50, 38, 96, 42, 47, 40, 39, 44, 50

The following are the grades of the of the

You might also like