Professional Documents
Culture Documents
Data Collection
Data Collection
Presents:
Probability and Statistics (Lec #2)
LECTURE #2
30-44 1500
20-29 4500
10-19 9000
Stratified Sampling
A researcher wishes to conduct a survey among the student from the different departments. He
wishes to find out how many from each department is needed to represent the population.
Suppose, the distribution of the population is as follow
Management 1,725
Finance 955
Entrepreneurship 828
All members in
each selected
group are used.
Perform a systematic sample of 80 companies from the 1,304 companies that are members of the
Manufacturer’s Association.
Multi-Stage Sampling
A multi-stage sampling uses several stages phases in getting random
samples from the general population; this is useful in conducting
nationwide survey or any survey involving a very large population.
Sampling Techniques
Non-probability sampling the selection of units is solely determined by rules or
guidelines set by the researcher/investor. It does not involved random sampling
Convenience Samples
A convenience sample consists only of available
members of the population.
Example:
You are doing a study to determine the number of years of education
each teacher at your college has. Identify the sampling technique
used if you select the samples listed.
1.) You randomly select two different departments and survey each
teacher in those departments.
2.) You select only the teachers you currently have this semester.
3.) You divide the teachers up according to their department and then
choose and survey some teachers in each department.
Quota sampling
A quota sampling like stratified sampling, it first identifies the strata and their
proportions as they are represented in the population, and then convenience or
judgement sampling is use to select the required number of subject from each
stratum’
Snowball Sampling
A snowball, (referral method), useful when population are inaccessible or hard to
find.
Judgement sampling
A judgement sample is a selection of documents that
is based on the opinion of the auditor, rather than a
statistical sampling technique that uses random
selections. The resulting selection may reflect the
biases of the person making the selection, and so
could yield unreliable results
Non-Probability Sampling
Identifying the Sampling Technique
Example continued:
You are doing a study to determine the number of years of education each
teacher at your college has. Identify the sampling technique used if you
select the samples listed.
2.) This is a convenience sample because you are using the teachers that are
readily available to you.
3.) This is a stratified sample because the teachers are divided by department
and some from each department are randomly selected.
Sampling
Sampling is the act, process or technique of selecting
suitable sample, or a representative part of a
population for the purpose of determining parameters
or characteristics of the whole population.
●
Basic Formula
Example
In a population of 22,000 student enrolled at Saint Louis University in a
particular semester, what sample size is needed to get an accurate result
for a study using a margin error of a) 1%, b) 2,5%, c) 5%
Summation Notation
• means the sum of the possible values of the variable x
• means sum of the squares of the values of the variable x
• means the square pf the sum of the values of the variable x.
• means sum of the set of values that are each 2 more than the value of
x.
• is used when large quantities of data are collected. Where i to n
indicate the range of values to be summed.
Example
Given the following values of x: 2,5,-3,4,1,1
Find , , ,
Example
Student 1 2 3 4 5 6 7 8
number
No. of 4 5 7 4 6 5 8 4
subject
Find , ,
Presentation of Data
Textual Presentation
• use of words, statement and paragraphs to present
data or information.
Example:
Graphical Presentation
• Is a method wherein the set of data is presented by
visual forms called graphs (pie graphs, bar graph or
histogram, line graph and other graphs).
• However limitations of graphs such as: graphs
require more skill and time to prepare; graphs can
only be made after data have been presented in
tabular forms: graphs are not as precise as tables.
●Tabular Presentation
Stem and Leaf method
Tabular Representation
• Tabular presentation is the use of tables, one which is the frequency
distribution table. After the data have been gathered, they have to be put
into a form that will make them easier to handle and to interpret.
Stem-and-Leaf Plot
In a stem-and-leaf plot, each number is separated into a stem (usually
the entry’s leftmost digits) and a leaf (usually the rightmost digit). This is an
example of exploratory data analysis.
Example:
The following data represents the ages of 30 students in a statistics
class. Display the data in a stem-and-leaf plot.
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
FroydWess - Online Notes
Constructing Stem-and-Leaf
Guidelines
1–4 4
5–8 5
Lower Class
9 – 12 3 Frequencies
13 – 16 4
17 – 20 2
FroydWess - Online Notes
Two Types of Frequency Distribution
30 15
32 30
34 10
36 20
38 24
40 12
42 8
Two Types of Frequency Distribution
Class Frequency, f
1–4 4
5–1=4 5–8 5
9–5=4 9 – 12 3
13 – 9 = 4 13 – 16 4
17 – 13 = 4 17 – 20 2
The class width is 4.
The range is the difference between the maximum and minimum data
entries.
FroydWess - Online Notes
Frequency of a Nominal Data
Male and Female College students
Major in ECE
SEX FREQUENCY
MALE 23
FEMALE 107
TOTAL 130
Definition:
1. Raw data – is the set of data in its
original form
2. Array – an arrangement of observations
according to their magnitude, whether in
increasing or decreasing order.
Advantages: easier to detect the smallest
and largest value and easy to find the
measures of position
FroydWess - Online Notes
Constructing a Ungrouped Frequency Distribution
Guidelines
32 29 32 33 31 29 29 27 31 31
32
31
30
29
28
27
26
TOTAL N=
Example 2
32.5 36.5 37.3 35.1 33.6 37.1 33.9 35.4 35 36
34.7 36.4 35.6 32.1 33.4 36.5 32.4 33.7 36.3 37.2
32.8 .7.5 35.7 35.4 35.9 34.1 36.1 34.4 34.5 34.1
Type Frequency
Motor Vehicle 43,500
Falls 12,200
Poison 6,400
Drowning 4,600
Fire 4,200
Ingestion of Food/Object 2,900
Firearms 1,400
(Source: US Dept. of
Transportation)
FroydWess - Online Notes
Pie Chart
To create a pie chart for the data, find the relative frequency (percent) of
each category.
Relative
Type Frequency
Frequency
Motor Vehicle 43,500
Falls 12,200
Poison 6,400
Drowning 4,600
Fire 4,200
Ingestion of Food/Object 2,900
Firearms 1,400
n = 75,200
FroydWess - Online Notes
Pie Chart
Relative
Type Frequency Angle
Frequency
Motor Vehicle 43,500 0.578 208.2°
Falls 12,200 0.162 58.4°
Poison 6,400 0.085 30.6°
Drowning 4,600 0.061 22.0°
Fire 4,200 0.056 20.1°
Ingestion of Food/Object 2,900 0.039 13.9°
Firearms 1,400 0.019 6.7°
Poison
8.5% Motor
vehicles
Falls 57.8%
16.2%
1 2 3 4 TOTAL
BSARCH 187 250 182 123 742
BSCE 623 471 511 261 1866
BSCHE 168 177 193 86 624
BSECE 78 69 81 38 266
BSEE 107 71 130 60 368
BSEM 3 19 2 0 24
BSGE 30 29 29 5 93
BSIE 42 37 42 19 140
BSME 219 246 255 87 807
BSMECE 39 44 34 8 125
Dot Plot
In a dot plot, each data entry is plotted, using a point, above a
horizontal axis.
Example:
Use a dot plot to display the ages of the 30 students in the statistics
class.
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Ages of Students
15 18 21 24 27 30 33 36 39 42 45 48 51 54 57
From this graph, we can conclude that most of the values lie
between 18 and 32.
Relative
Class Frequency, f Frequency
18 – 25 13 0.433
26 – 33 8 0.267
34 – 41 4 0.133
42 – 49 3 0.1
50 – 57 2 0.067
Relative Frequency
The relative frequency of a class is the portion or
percentage of the data that falls in that class. To find the
relative frequency of a class, divide the frequency f by the
sample size n.
Class frequency
Relative frequency =
Sample size
Relative
Class Frequency, f
Frequency
1–4 4 0.222
Relative frequency
Cumulative Frequency
The cumulative frequency of a class is the sum of the
frequency for that class and all the previous classes.
Ages of Students
Cumulative
Class Frequency, f Frequency
18 – 25 13 13
26 – 33 + 8 21
34 – 41 + 4 25
42 – 49 + 3 28
50 – 57 + 2 30
Midpoint
The midpoint of a class is the sum of the lower and upper
limits of the class divided by two. The midpoint is
sometimes called the class mark.
Midpoint =
FroydWess - Online Notes
Midpoint
Example:
Find the midpoints for the “Ages of Students” frequency
distribution.
Ages of Students
Class Frequency, f Midpoint
18 – 25 13 18 + 25 = 43
21.5
26 – 33 8 43 ÷ 2 = 21.5
29.5
34 – 41 4
37.5
42 – 49 3
45.5
50 – 57 2
53.5
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Example continued:
The lower class limits are 18, 26, 34, 42, and 50.
The upper class limits are 25, 33, 41, 49, and 57.
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Ages of Students
77 82 92 93 88 88 90 62 68
83 60 88 73 90 95 87 68 80
87 85 83 72 97 87 77 68 90
87 90 92 90 95 80 83 78 77
75 90 93 83 87 87 82 92 78
77 80 80 85 63 88 90 70 98
77 87 93 85 80 93 93 87 83
78 85 97 90 93 87 98 88 88
95 82 80 85 78 73 78 92 75
75 85 75 97 88 85 95 80 88
Frequency Histogram
Class boundaries are the numbers that separate the classes without
forming gaps between them.
The horizontal scale of a histogram can be marked with either the class
boundaries or the midpoints.
FroydWess - Online Notes
Frequency Histogram
Example:
The following data represents the ages of 30 students in a statistics
class. Construct a frequency distribution that has five classes.
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
1 1 Ages of Students
4 3
1
2
1
0 8
8
f 6
4
4 3
2 2
0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
Frequency Polygon
A frequency polygon is a line graph that emphasizes the
continuous change in frequencies.
1 Ages of Students
4
1
2
1
0 Line is extended
8
to the x-axis.
f 6
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) Midpoints
0.5
0.433
(portion of students)
Ages of Students
Relative frequency
0.4
0.3
0.267
0.2
0.133
0.1
0.1 0.067
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
FroydWess - Online Notes
Cumulative Frequency Graph
A cumulative frequency graph or ogive, is a line graph
that displays the cumulative frequency of each class at its
upper class boundary.
3 Ages of Students
Cumulative frequency
0
(portion of students)
2
4
1
The graph ends
8 at the upper
1 boundary of the
2 last class.
6
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
FroydWess - Online Notes
More Graphs and Displays
Type Frequency
Motor Vehicle 43,500
Falls 12,200
Poison 6,400
Drowning 4,600
Fire 4,200
Ingestion of Food/Object 2,900
Firearms 1,400
(Source: US Dept. of
Transportation)
FroydWess - Online Notes
Pareto Chart
Accidental
45000 Deaths
40000
35000
30000
25000
20000
15000
10000
5000
Poison
From the scatter plot, you can see that as the number of absences
increases, the final grade tends to decrease.
FroydWess - Online Notes
Times Series Chart
A data set that is composed of quantitative data entries taken at regular
intervals over a period of time is a time series. A time series chart is
used to graph a time series.
Example:
The following table lists the Month Minutes
number of minutes Robert used
on his cell phone for the last six January 236
months. February 242
March 188
April 175
Construct a time series chart May 199
for the number of minutes June 135
used.
FroydWess - Online Notes
Times Series Chart
25
0
20
0
Minutes
15
0
10
0
5
0
0
Jan Feb Mar Apr May June
Month
FroydWess - Online Notes
Grouped Frequency of Interval
Data
Given the following raw scores in Calculus
Examination,
47 56 42285641565559
78 50 55573862526665
79 33 34374742686254
80 68 48563977806271
57 526070
A-strongly favorable, 90
B-favorable, 48
C-slightly favorable, 88
D-slightly unfavorable,
48
E-unfavorable, 15
F-strongly unfavorable,
25
The Histogram of Person’s Age with
Frequency of Travel
age freq RF
19-20 20 39.2%
21-22 21 41.2%
23-24 4 7.8%
25-26 4 7.8%
27-28 2 3.9%
total 51 100%
FroydWess - Online Notes
Exercises
From the previous grouped data on Calculus scores,
a. Draw its histogram using the frequency in the y
axis and midpoints in the x axis.
b. Draw the line graph or frequency polygon using
frequency in the y axis and midpoints in the x axis.
c. Draw the less than and greater than ogives of the
data. Ogives is a cumulation of frequencies by
class intervals. Let the y axis be the CF> and x
axis be LCB while y axis be CF< and x axis be
UCB
d. d. Plot the relative frequency using the y axis as
the relative frequency in percent value while in the
x axis the midpoints.