Professional Documents
Culture Documents
Advance Research and Statistics
Advance Research and Statistics
RESEARCH
AND STATISTICS
by sir lee knows
The Importance of Statistics
It is obvious that society can’t be run effectively on the
basis of hunches or trial and error, and that in business and
economics much depends on the correct analysis of
numerical information. Decisions based on data will provide
better results than those based on intuition or gut feelings.
What applies to this wider world applies to undertaking
research into the wider world. And learning to use statistics in
your studies/researches will have a wider benefit than helping
you towards a qualification. Once you have mastered the
language and some of the techniques in order to make sense
of your investigation, you will have supplied yourself with a
knowledge and understanding that will enable you to cope
with the information you will encounter in your everyday life.
Statistical thinking permeates all social interaction.
For example, take these statements:
*‘The earlier you start thinking about the topic of your
research project, the more likely it is that you will produce
good work.’
*‘You will get more reliable information about that from a
refereed academic journal than a newspaper.’
Or these questions:
*‘Which university should I go to?’
*‘Should I buy a new car or a second-hand one?’
*‘Should the company buy this building or just rent it?’
*‘Should we invest now or wait till the new financial year?’
*‘When should we launch our new product?’
All of these require decisions to be made, all have costs
and benefits either financial or emotional), all are based upon
different amounts of data, and all involve or necessitate some
kind of statistical calculation. This is where an understanding
of statistics(knowledge of statistical techniques) and research
will come in handy.
Division of Statistics
1.Descriptive Statistics refers to the gathering,
tabulation and organization of data. It discusses the
characteristics, attributes, position and dispersion of
data.
2. Inferential Statistics is the logical process from
sample analysis to a generalization or conclusion about a
population.
Sources of Data
1.Primary Data are data that come from an original
source, and are intended to answer specific research
questions.
2. Secondary Data are data that are taken from
previously recorded data.
Types of Data
1.Qualitative Data is the data that is mere description
of the data being talked about. It depicts the
characteristics of the data, the attributes or description
of certain behaviours.
2. Quantitative Data is something that can be
measured or counted.
Constants and Variables
Constant. A constant is a characteristic of objects,
people or events that does not vary.
Variable. A variable is a characteristic of objects,
people or events that do vary.
Classification of Variables
Experimental Classification. A researcher may
classify variables according to the functions they
serve in the experiment.
1.Independent Variable is a variable controlled by
the experimenter/researcher, and expected to have
an effect on the behaviour of the subject.(explanatory
variable).
2. Dependent Variable is some measure of the behaviour of
subjects and expected to be influenced by the independent
variable.
Levels of Data/Measurement
1. Nominal data is used to differentiate classes or
categories for purely classification or identification purposes.
It is the weakest form of measurement because no
attempt can be made to account for differences within
the particular category or to specify any ordering or
direction across the various categories. Nominal data are
discrete variables.
2. Ordinal data is used in ranking. Though ordinal data
is somewhat stronger than nominal data, it is still a weak
form of measurement because no meaningful numerical
statements can be made about differences between the
categories. The ordering implies only which category is
“greater” or “lesser” – not how much “greater” or “lesser”.
These are also discrete variables.
3. Interval data is used to classify order and differentiate
between classes or categories in terms of degrees of
differences. Interval data are either discrete or
continuous variables.
4. Ratio data differs from interval data only in one aspect; it
has a true zero point. It represents distances from a natural
origin like the length, weight, height, etc. Ratio data are
discrete or continuous variables.
1,500
n = ---------------
1 + 3.75
8,000
n = ---------------
1 + 20
As
3. GRAPHICAL PRESENTATION. The data are presented
in visual form. Graphs may appear in many forms.
TABULAR PRESENTATION OF DATA
ARRAY is the simplest arrangement of data. It is merely to
list the scores from highest to lowest and from lowest to
highest.
Ex. Arrange the following scores:
18, 10, 6, 15 23, 22,11 12, 14, 17, 19, 16, 7, 10
Ascending: 6, 7, 10,10, 11, 12, 14, 15, 16, 17, 18, 19, 22, 23
Descending: 23, 22, 19, 18, 17, 16, 15, 14, 12, 11, 10,10,7,6
RANKING OF SCORES is important in order
to identify the position of an observation, an
individual, an object in relation to the others
in the group according to some
characteristics, such as magnitude, quality
or importance.
Rank symbols are denoted by numbers,
thus, a rank of 1 is given to the highest
score, a rank of 2 to the next,etc.
Example: Below are scores of selected students in Algebra.
15 12 12 12 18 11 15 12 19 16
16 18 17 16 14 14 14 12 11 16
15 12 12 15 11 20 21 20 23 11
16 16 16 16 19 19 19 19 17 17
6. Cumulative frequencies.
Example
1. Class Interval-end numbers of an interval
Class Interval/class limit
15 - 19
10 - 14
5- 9
2. Class Frequency
Class Interval f
15 - 19 2
10 - 14 7
5- 9 1
n = 10
3. Class Boundaries - are the true class limits
Descending Order
CI f CB 15 - 14 = 1/2 = 0.5
15 - 19 2 14.5 - 19.5
10 - 14 7 9.5 - 14.5
5- 9 1 4.5 - 9.5
Ascending Order
CI f CB 200 - 150 = 50/2 = 25
100 - 150 4 75 - 175
200 - 250 2 175 - 275
300- 350 4 275 - 375
400 - 450 375 - 475
4. Class Marks
CI f CB M
15 - 19 2 14.5 - 19.5 17 15, 16, 17, 18, 19
10 - 14 7 9.5 - 14.5 12 10, 11, 12, 13, 14
5- 9 1 4.5 - 9.5 7
CI f CB M
100 - 150 4 75 - 175 125
200 - 250 2 175 - 275 225
300 - 350 4 275 - 375 325
5. Relative Frequencies (F)
CI f CB M F %F
15 - 19 2 14.5 - 19.5 17 0.2 20
10 - 14 7 9.5 - 14.5 12 0.7 70
5- 9 1 4.5 - 9.5 7 0.1 10
n = 10 1.0 100
5. Cumulative Frequencies (CF)
CI f CB M F %F <F >F
15 - 19 2 14.5 - 19.5 17 0.2 20 10 2
10 - 14 7 9.5 - 14.5 12 0.7 70 8 9
5- 9 1 4.5 - 9.5 7 0.1 10 1 10
n = 10
Constructing Frequency Distribution Table
Steps:
1. Determine the range of values. R = HS - LS
2. Determine the desired number of classes.
3. Locate the desired class interval.
a. if desired TNC is given
b. if TNC is not given
4. Formulate a frequency table making class intervals
starting the lower limit of first class interval with the lower value.
5. Get the number of data (frequency) for every class
interval.
6. Compute the class mark/class midpoint of each class
interval.
7. Get the class boundaries.
8. Obtain the relative frequencies (F).
9. Find the greater (>F) and less than (<F) cumulative
frequencies.
Example: Below are ages (in years) of selected employees of
a certain school in Lipa City.
19 21 23 34 38 36 25 28 21 23 25 29 39 42 48 50 52
28 27 22 25 26 29 38 39 42 40 43 34 36 51 49 48 47
23 20 25 27 29 33 32 39 39 38 19 54 50 41 42 48 36
26 32 28 27 23 25 23 24 29 36 42 51 48 49 39 25 28
Prepare a CFDT using:
a. Six (6) Tentative number of cases in descending order.
b. Using no TNC in ascending order.
Follow the Steps in Constructing CFDT
a. TNC = 6 and Descending Order
1st: R = Highest Age- Lowest Age; R = 54 - 19; R = 35
2nd: TNC = 6
3rd: Interval = R/TNC; I = 35/6; I = 5.83 or 6
4th: Formulate a Table in descending
order.
5th: Make the lowest age as the lowest
limit.
CL/CI f CB M/X F %F <F >F
µ = population mean
x = sample mean
2. MEDIAN (Md) is the middlemost data.
15
2nd: Mdn: 6,7,10,10,11,11,11,12,13,14,15,15,17,18,19
15 + 1
Mdn = ----------; Mdn = 8th score; Mdn = 12
2
Mo = 11
Seatwork: Compute the measures of central tendency of the
given set of data.
A. 10,14,14,14,15,16,16,16,17,17,18,18,19,20 (Male)
B. 6,7,9,10,12,13,15,16,17,19,20,21,22,23 (Female)
WEIGHTED MEAN
Another way of solving the mean wherein the weight of the
score/data are considered.
Example: 1. If a final examination in a class in Statistics
is given the weight of 5, class standing 2, average of
quizzes the weight of 4 and a student got the grades
of 88, 93 and 82, respectively, what would be the
student’s:
a. unweighted grade,
b. weighted grade,
c. median grade, and
d. modal grade?
Solutions:
a. Unweighted Mean
88+ 93 + 82
Mn = ------------------ ; Mn = 87.67
3
b. Weighted Mean
88 (5) + 93 (2) + 82 (4) 954
WM =------------------------------- ; WM = ------- ; WM = 86.73
5+2+4 11
c. Median
Arrange first the data: 82,82,82,82,88,88,88,88,88,93,93
11 + 1
Mdn = --------------; Mdn = 6th data; Mdn = 88
2
d. Mode:
Mo = 88
2. Below are the final grades of Student A during the second
semester of SY 2019 – 2020 in six subjects with the
number of credit units for each subject.
Subject Credit Units Grade
Accounting 1 6 2.25
Chemistry 5 1.75
Math 1 3 1.50
English 1 3 1.25
NSTP 102 2 1.25
PE 2 1 1.50
Required: Compute the mean grades (unweighted and
weighted), median grade and modal grade.
Solutions:
a. Unweighted Mean
2.25+1.75+1.50+1.25+1.25 +1.50
Mn = ------------------------------------------- ; Mn = 1.58
6
b. Weighted Mean
6(2.25)+5(1.75)+4(1.5)+5(1.25) 34.5
WM =------------------------------------------- ; WM = --------
6+5+4+5 20
WM = 1.725
c. Median:
1st: 1.25, 1.25, 1.25, 1.25, 1.25, 1.5, 1.5, 1.5, 1.5, 1.75,
1.75, 1.75, 1.75, 1.75, 2.25, 2.25, 2.25, 2.25, 2.25, 2.25
20 + 1
Mdn = --------------; Mdn = 10.5th data; Mdn = 1.75
2
d. Mode:
Mo = 88
3. Below is the result of the survey gathered by Ms. Tan in her
thesis entitled “Factors Affecting the Performance of
Selected Employees in Lipa City”. Compute the WM.
A. Salary and Other Financial Benefits. (N = 150)
Items 5 4 3 2 1 WM
Total/Composite Mean
Solutions:
Items 5 4 3 2 1 WM
- 197
Mn = 51.5 + ----- (6); Mn = 51.5 - 17.38; Mn = 34.12years
68
143
_
Mn = 21.5 + -------- (6) ; Mn = 21.5 + 12.62;
68
Mn = 34.12 years
b. Median:
Compute first the halfsum: 68/2 = 34th age
34 - 31
Mdn = 30.5 + ----------- (6) ; Mdn = 30.5 + 2 ;
9
Mdn = 32.5 years
c. Mode.
Formula:
fMo - f1
Mo = LMo +------------------ (i)
2fMo - f1 - f2
where: LMo = lower boundary of the modal class;
fMo = frequency of the mal class
f1 = frequency second higher in value
f2 = frequency third higher in value
i = interval
c. Mode:
19 - 14
Mo = 24.5 + --------------------- (6)
2(19) - 14 - 12
5
Mo = 24.5 + ------ (6)
12
CL/CI f CB <F
49 - 54 8 48.5 - 54.5 68
31 - 36 9 30.5 - 36.5 40
25 - 30 19 24.5 - 30.5 31
19 - 24 12 18.5 - 24.5 12
Total 68
Solution:
a. Q3
1st: Find the value of nN/4.
Q3 = 3(68)/4; Q3 = 51st age
2nd: Use the table to look where the 51st age belongs.
fQ3
51 - 40
Q3 = 36.5 + --------------- (6); Q3 = 36.5 + 4.71; Q3 = 41.21 years
14
Solution:
b. D6
1st: Find the value of nN/10.
D6 = 6(68)/10; D6 = 40.8th age
2nd: Use the table to look where the 40.8th age belongs.
3rd: Use the formula to solve D6
nN/10 - <F
D6 = LD6 + ------------------------------- (i)
fD6
40.8 - 40
D6 = 36.5 + ----------------- (6); D6 = 36.5 + 0.34; D6 = 36.84 years
14
Solution:
b. P85
1st: Find the value of nN/100.
P85 = 85(68)/100; P85 = 57.8th age
2nd: Use the table to look where the 57.8th age belongs.
3rd: Use the formula to solve P85
nN/100 - <F
P85 = LP85 + ------------------------------ (i)
fP85
57.8 - 54
P85 = 42.5 + --------------- (6); P85= 42.5 + 3.8; P85 = 46.30 years
6
MEASURES OF VARIABILITY
In summarizing a given set of data, sometimes, the MCT
are not enough to give useful information. They have to be
supplemented by other measures of description such as
the measures of variability which indicate the extent to
which values in a distribution are spread out around a
central tendency.
MOST COMMONLY USED MEASURES OF VARIABILITY
1. RANGE. it is the diference between the highest and the
lowest data.
Formula: R = Highest Data - Lowest Data.
2. QUARTILE DEVIATION. This maybe used to minimize the
effect of extremely low and high values on the measure of
spread.
Formula:
Q3 - Q 1
QD = ---------------
2
3. MEAN ABSOLUTE DEVIATION (MAD). It is the
average deviation of the absolute values in a distribution
from the mean.
Absolute values are the values of the number
irrespective of signs.
Formula: ∑ I x - MnI
MAD = ------------------; x = given data
N
Example
1. Ms. A and Ms. B are applying for a
secretarial position in a well known
company in Gumaca Quezon. They are
required to undergo a test to determine
their typing ability. The results of the test
were as follow:
Ms. A 20, 22, 25, 25, 25, 28, 30
Ms. B. 21, 23, 25, 25, 25, 27, 29
Who performed better?
Using the MCT, the mean, median and mode of Ms. A and
Ms. B are the same.
Ms. A: Mn = 25; Md = 25 and Mo = 25
Ms. B: Mn = 25; Md = 25 and Mo = 25
Measures of Variability
1. Range
RA = 30 - 20 RB = 29 - 21
RA = 10 RB = 8
2. QD
MAD
Typing Typing
Score (Ms A) lx-Mnl Score (Ms B) lx-Mnl
20 5 21 4
22 3 23 2
25 0 25 0
25 0 25 0
25 0 25 0
28 3 27 2
30 5 29 4
∑lx-Mnl = 16 ∑lx-Mnl = 12
Formula: ∑ I x - MnI
MAD = ------------------; x = given data
N
HOMEWORK:
Below are the daily wages of selected employees (in peso) of a
factory in Lucena City.
450, 400, 325, 375, 475, 455, 440, 460, 465, 450, 425, 385, 390, 405
405, 425, 430, 435, 440, 495, 505, 510, 505, 510, 495, 400, 395, 390
380, 350, 355, 350, 360, 365, 375, 385, 395, 345, 385, 425, 400, 415
365, 385, 385, 390, 400, 400, 400, 485, 390, 365, 405, 450, 455, 465
Required:
1. Construct a CFDT using 7 TNC in descending order.
2. Compute the MCT (use 2 methods for mean).