Professional Documents
Culture Documents
STAT 1N Notes
STAT 1N Notes
Chatto
BSA-II
STAT 1N
Statistical Analysis with Computer Application
Statistical Terms
Statistics – is a branch of mathematics that deals with the systematic method of collecting,
classifying, presenting, analyzing and interpreting quantitative or numerical data. (del
Rosario, 1999).
Data – are the quantities (numbers) or qualities (attributes) measured or observed that are
to be collected and/or analyzed.
Variable – refers to a property that can take on different values or categories which cannot
be predicted with certainty.
Examples: smoking habit, attitude toward the head, height, faculty ranks.
Variate - the actual values of the variables. It is commonly termed as random variable.
Constant – refers to the fundamental quantities that do not change in value.
Population – is a set of data consisting of all conceivable possible observations of a
certain phenomenon.
Sample – is a finite number of items selected from a population possessing identical
characteristics with those of the population from which it was taken.
Divisions of Statistics:
A. Descriptive Statistics – is concerned with the collection, classification and presentation of
data designed to summarize and describe the group characteristics. It refers to the
methods of summarizing and presenting data in the form which will make them easier to
analyze and interpret.
B. Inferential Statistics – refers to the drawing of conclusions or judgment about a population
based on a representative sample systematically taken from the same population. Its aim is
to give concise information about large groups of data without dealing with and each and
every element of these groups. It is the process of drawing and making decision on the
population based on evidence obtained from a sample. It includes estimation and
hypothesis testing.
Classifications of Statistics:
A. Parametric Statistics – is an approach which assumes a random sample from a normal
distribution and involves testing of hypothesis about the population parameter. This
approach is appropriate generally for interval and ratio data.
B. Nonparametric Statistics – is a statistical approach for estimating and hypothesis testing
when no underlying data distribution is assumed. This procedure is appropriate if there is
not enough sample size to assess the form of the distribution.
Types of Data:
A. Categorical Data
a. Nominal Scale – is a categorical data having unordered scales.
Examples: gender, mode of transportation, nationality, occupation, civil status.
b. Ordinal Scale – is a categorical data having ordered scales.
Examples: pain level (none, mild, moderate, severe), social status
B. Continuous Data
a. Interval Scale – is a continuous data having interval.
Examples: temperature, income
b. Ratio Scale - is a continuous data having both equal intervals and an absolute zero
point.
Examples: weight in pounds, height in centimeters, age in years.
Types of Variables:
A. Response Variable or Dependent Variable (Y) – is a variable which is affected by or
related to the value of some other variables. It can be continuous or categorical data.
B. Explanatory Variable or Independent Variable (X) – is a variable that is thought to
influence or affect the values of the response variable. It can be continuous or categorical data.
C. Controlled Variable – a sample that remains the same throughout the experiment.
Classification of Variables:
A. Qualitative Variable – is one whose categories are simply used as labels to distinguish
one group from another. This variable has values that are intrinsically nonnumeric
(categorical). Can be reassigned numeric values but they are still intrinsically qualitative.
Examples: sex (male = 1, female = 0), occupation, race.
B. Quantitative Variable – is one whose categories can be measured and ordered
according to quantity. These are variable values that are intrinsically numeric.
Jullie Carmelle H. Chatto
BSA-II
Examples: number of children in a family, number of students in a class.
STAT 1N
Statistical Analysis with Computer Application
Steps in Statistical Inquiry or Investigation
1. Collection of Data
2. Processing/Organizing of Data
3. Presentation of Data
4. Analysis of Data
5. Interpretation of Data
DATA COLLECTION
Data collection is the process of gathering and measuring information on variables of interest,
in an established systematic fashion that enables one to answer stated research questions,
test hypotheses, and evaluate outcomes.
NB: The data collected must be valid, reliable, relevant and consistent with other information
to the problem at hand.
Categories/Sources of Data:
1. Primary Data – refer to data obtained directly from an original source by means of
actual observations or by conducting interviews.
2. Secondary Data – refer to data or information that come from existing records
(published and/or unpublished) in usable form such as surveys, census, business
journals and magazines, newspapers, commercial publications, and others such as
theses and dissertations, and research papers, etc.
3. Internal Data – data taken from the company’s own record of operations such as sales
records, production records, personnel records, etc.
4. External Data – data that come from outside sources and not from the company’s own
records.
DATA PROCESSING
Data processing occurs when data is collected and translated into a usable information. It
starts with data in its raw form and converts it into a more readable format (graphs, documents,
etc.)
2. Data Preparation/Editing
Editing raw data is necessary to detect errors and omissions, and to ensure that the data
gathered are accurate, consistent with other information, complete, and should be arranged
in such a way as to facilitate coding and classification. Data preparation often referred to as
“pre-processing” is the stage at which raw data is cleaned up and organized for the following
stage of data processing. During preparation, raw data is diligently checked for any errors.
The purpose of this step is to eliminate bad data (redundant, incomplete, or incorrect data)
and begin to create high-quality data.
3. Data Input/Coding
The clean data is then entered into its destination and translated into a language that is
understandable. Data input/coding is the first stage in which raw data begins to take the
form of usable information. Coding means assigning numerals and other symbols to the
data collected to be able to group them into a limited number of classes or categories.
4. Processing/Classification
This refers to the sorting of the data and grouping them on the basis of some similarity.
5. Data Output/Interpretation
The output/interpretation stage is the stage at which data is finally usable to non-data
scientists. It is translated, readable, and often in the form of graphs, videos, images, plain
text, etc.
STAT 1N
Collecting and Organizing Data in a Table
The study of statistics begins with the collection of data or measurements. Data collected
should be organized systematically for easier and faster interpretation. They may be
presented in any of the following forms:
textual form – if data to be presented is few
tabular and graphical forms – when more detailed information about the data is to be
presented
A table is used when you want to present data in a systemic and organized manner so
that reading and interpretation will be simple and easier. When a table is used, you must
remember the following:
the title of the table columns must be properly labelled
indicate the date of the survey identify the source of the data
arrange the data systematically in
columns
Example 1:
University of Bohol
College of Business and Accountancy
Enrolment, Second Semester, SY 2019 - 2020
Year Level Male Female
First 136 158
Second 112 105
Third 96 193
Jullie Carmelle H. Chatto
BSA-II
Fourth 88 102
Total 432 558
You will observe that the table above shows clearly the enrolment data of University of Bohol,
College of Business and Accountancy for the second semester of school year 2019-2020.
Another type of tabular presentation is the frequency table also known as a frequency
distribution. It is an arrangement of the data that shows the frequency of occurrence of
different values of the variables.
A frequency table is constructed by listing the measurements from highest to lowest,
then making a tally marks to record how often each number occurs. After tallying, count the
marks and record them in the proper column.
Example 2:
The scores of 45 students on a 20-point Math quiz are as follows:
17 20 15 18 19 16 11 10 15 16 13 20
12 12 13 14 11 10 14 13 12 11 14 18
13 15 14 10 15 16 17 17 18 20 15 19
18 17 16 15 12 12 19 19 20
STAT 1N
FREQUENCY DISTRIBUTION
Frequency Distribution is a tabular arrangement of data showing its classification or grouping
according to magnitude or size.
Example: The distribution of bi-monthly salaries of 75 employees of Dragon Company.
Example:
The following are the scores obtained by 35 students in a quiz in math.
67 54 73 55 80 80 72
72 78 47 65 60 65 90
82 93 45 57 77 42 64
100 95 35 70 61 51 85
66 98 69 88 83 55 73
Construct a frequency distribution showing the frequency, class limits, class boundaries and
class midpoints.
Frequency table showing the frequency, class limits, class boundaries and class marks.
How to obtain % F, %F ≤, % F≥
%F %F≤ %F≥
2/35 = 5.714 % 2/35 = 5.714 % 35/35 = 100 %
4/35 = 11.429 % 6/35 = 17.143 % 33/35 = 94.286 %
6/35 = 17.143 % 12/35 = 34.286 % 29/35 = 82.857 %
PRESENTATION OF DATA
A. Tabular – presentation using tables
B. Textual – this mode of presentation combines text and figures in a statistical report.
The common example of textual presentation is the news item
C. Graphical – most effective means in presenting statistical data.
- presentation using graphs
Types of graphs:
1. bar graph
2. line graph
3. circle graph or pie chart
4. scatter diagram
5. pictograph or pictogram
STAT 1N
QUANTILES OF UNGROUPED DATA
Quantiles are values that divide the data (distribution) into a given number of equal parts. Like
median, the quantiles are also “positional measures” Some of the quantiles are:
a.) Quartiles – values that divide the distribution into 4 equal parts.
Q1 – (1st quartile) which is 25 % or less of the given distribution
Q2 – (2nd quartile) which is 50 % or less of the given distribution
Q3 – (3rd quartile) which is 75 % or less of the given distribution
Q4 – (4th quartile) which is 100 % or less of the given distribution
b.) Deciles – values that divide the distribution into ten (ten) equal parts.
D1 – (1st decile) which is 10 % or less of the given distribution
D2 – (2nd decile) which is 20 % or less of the given distribution
D3 – (3rd decile) which is 30 % or less of the given distribution
D4 – (4th decile) which is 40 % or less of the given distribution
D10 – (10th decile) which is 100 % or less of the given distribution
c.) Percentiles – values that divide the distribution into 100 equal parts
P1 – (1st percentile) which is 1 % or less of the given distribution
P2 – (2nd percentile) which is 2 % or less of the given distribution
P3 – (3rd percentile) which is 3 % or less of the given distribution
P4 – (4th percentile) which is 4 % or less of the given distribution
P100 – (100th percentile) which is 100 % or less of the given distribution
Jullie Carmelle H. Chatto
BSA-II
HOW TO SOLVE FOR ANY QUANTILES
1. Array the data according to magnitude or size.
2. Compute the position of the given quartile in the distribution using the formula
P (n + 1) where: P = is the desired percentage
100 n = number of items or score
3. Locate the item (or score ) corresponding to the obtained position in the distribution.
Always start from the lowest score.
4. If the obtained position is not exact, interpolate if necessary.
Examples
1) Find the 20th percentile or P20 of the following scores 40, 45, 42, 37, 36, 32, 28, 26, 25.
Solution:
1. Array the scores in decreasing order of magnitude.
45 42 40 37 36 32 28 26 25
2. Locate the position of the score corresponding to the 20 th percentile using the
formula
P (n + 1) = 20 (9 + 1) = 2
100 100
3. Locate the second score from the lowest. The answer is 26. Thus, P 20 = 26. This
means that 20 % falls below 26.
2) Find the 5th decile (or D5) of the following scores: 19, 25, 38, 45, 65, 81.
Solution:
1. Arrange the scores in an increasing order of magnitude.
19 25 38 45 65 81
2. Locate the position of D5 using the formula: P (n + 1)/10 = 5 (6 + 1)/10 = 3.5
Since 3.5 is not an exact number, we have to interpolate using the following steps:
a. Get the difference between the third and fourth scores from the lowest score, since
3.5 is between the 3rd and 4th scores. 45 – 38 = 7.
b. Multiply the difference obtained by the decimal in no. 2. That is 7 x 0.5 = 3.5.
c. Add the product to the lower score (38) to obtain D 5. Therefore, D5 = 38 + 3.5 = 41.5
This means that 50 % of the scores lie below 41.5.
Example:
The following is the distribution of the wages of 50 workers of HGW Manufacturing Co.
taken during a particular week last May.
Weekly Wages (in peso) Number of Workers ( fi )
870 – 899 4
900 – 929 6
930 – 959 10
960 – 989 13
990 – 1019 8
1020 – 1049 7
1050 – 1079 2
TOTAL 50
Determine the mean using the two methods.
Solution:
Weekly Number of Class Xi fi ui ui fi
Wages ( in Workers ( fi Midpoints
peso ) ) (Xi)
870 – 899 4 884.5 3538 -3 -12
900 – 929 6 914.5 5487 -2 -12
930 – 959 10 944.5 9445 -1 -10
960 – 989 13 974.5 (X0) 12668.5 0 0
990 – 1019 8 1004.5 8036 1 8
1020 – 7 1034.5 7241.5 2 14
1049
1050 – 2 1064.5 2129 3 6
1079
TOTAL n = 50 48545 -6
Long Method:
X̅ = ∑ Xi fi
n
= 48545
50
X̅ = P970.90
Short Method:
X̅ = X0 + [∑ ui fi] x C
[n]
Where: C = class size = 30
X0 = assumed mean
X̅ = 974.5 + [- 6 x 30]
[50]
= 974.5 – 3.6
X̅ = P970.90
Then set up the cumulative less than (F≤) frequency and locate where the 25 th item is in
the distribution. The median class is the 960 – 989 class interval where the 25 th items
fall.
= 959.5 + [ 50/2 - 20 ] x ( 30 )
13
= 959.5 + [ 25 - 20 ] x ( 30 )
13
= 959.5 + 11.54
Me = P971.04
Solution:
M0 = LM0 + [ ___d1____ ] x C
[ d 1 + d2 ]
= 959.5 + [ __3__ ] x 30
[ 3+5 ]
= 959.5 + _90_
8
M0 = P970.75
STAT 1N
Measures of Central Tendency – Ungrouped Data
Measures of Central Tendency
Description of statistical data can be quite brief or elaborate depending on the nature of
the data or what we intend to do. Sometimes, presenting data as they are, in raw form and
letting them speak for themselves may be quite satisfactory but, data summarized further by
Jullie Carmelle H. Chatto
BSA-II
means of appropriate statistical description give more useful information. One of these
apropriate statistical description is the MEASURES OF CENTRAL TENDENCY.
MEASURES OF CENTRAL TENDENCY of a given set of data is the value around which the
whole set of
data tend to cluster. It is represented by a single
number which
summarizes and describes the whole set.
FORMULAS
n
Ʃ Xi
u = __i = 1_____
N
n
Ʃ Xi
X̅ = __i = 1___
n
A. UNWEIGHTED or SIMPLE MEAN - takes into consideration each of the item value without
regard to
their relative importance.
Example :
What is the mean age of a group of 8 children whose ages are : 8, 8 ½, 9, 10, 9
½, 10, 12, 13 ?
Solution :
Given : n = 8
n
Jullie Carmelle H. Chatto
BSA-II
Ʃ Xi
X̅ = __i = 1___
n
= 8 + 8 ½ + 9 + 10 + 9 ½ + 10 + 12 + 13
8
= _80_
8
= 10 years old
B. WEIGHTED MEAN - takes into consideration the proper weights assigned to the observed
values
according to their relative importance.
n
Ʃ
X̅ = __i = 1__Wi Xi____________
Ʃ Wi
3. MODE - the value that appears with the highest ( greatest ) frequency.
- the value that appears most often.
Examples :
1. Determine the mode of the following distribution : 3, 8, 10, 5, 3, 5, 2, 5, 7
The mode is 5. It is uni-modal.
2. Find the mode of the following distribution : 20, 15, 10, 9, 7, 20, 10, 10, 20
The modes are 20 and 10.
3. Determine the mode of the following distribution : 7, 5, 10, 23, 11, 8, 15
There is no mode since all frequencies are not repeated.
Activity:
1. The following distribution are the scores obtained by 10 applicants in the
Entrance Examination for 1st year college of ABM School:
60 75 85 90 98 80 75 75 95 90
Find the: a. mean
b. median
c. mode
STAT 1N
QUANTILES OR FRACTILES: GROUPED DATA
In a frequency distribution, the quantiles or fractiles is a value at or below which a given
fraction of the distribution must lie. Like the median, the quantiles or fractiles are also
positional measures.
Q1 = LQ1 + n/4 - F ≤ Q1 x C
fQ1
Q2 = LQ2 + 2n/4 - F ≤ Q2 x C
fQ2
Q3 = LQ3 + 3n/4 - F ≤ Q3 x C
fQ3
where:
Q1, Q2, Q3 = are the quartiles
fQ1, f2, fQ3 = are their respective frequencies
LQ1, LQ2, LQ3 = are their respective class boundaries
F ≤ Q1, F ≤ Q2, F ≤ Q3 = are the respective cumulative less than frequency immediately
preceding the given
quartile class
n = total frequency
C = class size
First find:
Q1 class = n/4 = 50/4 = 12.5
Q2 class = 2n/4 = n/2 = 50/2 = 25
Q3 class = 3n/4 = 3 ( 50 )/4 = 37.5
Solutions:
Q1 = LQ1 + n/4 - F ≤ Q1 x C
fQ1
= 929.5 + 50/4 - 10 x 30
10
= 929.5 + 7.5
= P937.00
Q2 = LQ2 + 2n/4 - F ≤ Q2 x C
fQ2
= 959.5 + [2(50)/4] - 20 x 30
13
= 959.5 + 25 – 20 x 30
13
= 959.5 + 11.54
= P971.04
Q3 = LQ3 + 3n/4 - F ≤ Q3 x C
fQ3
= 989.5 + 3(50) - 33 x 30
__4________
8
= 989.5 + 37.5 - 33 x 30
8
= 989.5 + 16.88
= P1,006.38
Solution:
D6 = LD6 + 6n - F ≤ D6 x C
_10________
f6
= 959.5 + 6(50) - 20 x 30
_ 10_______
13
= 959.5 + 23.08
= P982.58
Activity :
Referring to the distribution below, find the following:
1. Q4
2. D8
3. D1
4. P20
5. P75