Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Course name : Probability and statistics

Course Code & Number:FET201

Credit hours: 3

Textbook :Elementary Statistics a Step by Step Approach,

8th Edition by Allan Bluman, McGraw/Hill.

Instructor: Associate Professor Dr . Aabed Mohammed


1- Introduction:
Definition of statistics, types of statistics, population , sample , variables and types of
variables, boundaries of a continuous variable.
2- Frequency distribution and graph:
 categorical frequency distribution
 grouped frequency distribution
 un grouped frequency distribution
 histogram, frequency polygon, ogive, stem and leaf plots.
 Other Types of Graphs ( Pie graph, Bar graph, Pareto chart, Time Series graph
3- Data description
 measures of central tendency
 measures of variation
 measures of position
4- Probability
 Basic concept: probability experiment- outcome- sample space- event- Tree diagram
 Probability of an event, complement of an event, mutually exclusive events 2
 Addition rule, Multiplication Rules, Conditional Probability.
5- Discrete probability distributions
 Probability Distributions
 Mean, variance, standard deviation, and expectation
 The binomial distribution
6- The Normal Distribution:
 Properties of normal distribution
 The Standard normal distribution
 Application of the normal distribution
7-Correlation and Regression
 Correlation- scatter plot- Linear Correlation Coefficient. levels of correlation
 Regression- Equation of regression

1- Introduction and Basic Concepts
Statistics: is the science of conducting studies to collect
,organize,summarize,analyze and drawing conclusions from data.
A population: consists of all subjects (human or otherwise) that
are being studied.
Example: All students who registered in the university last year.
A sample : is a group of subjects selected from a population
Example: A group of students who registered in the department of

Types of statistics

Descriptive Statistic Inferential statistics

consists of generalizing from

consists of the collection , samples to populations, and
organization , summarization hypothesis testing, determining
and presentation of data. relationships among variables,
and making predictions.

EX:”the average age of the

student is 14 years” EX: the relationship between
smoking and lung cancer”

Inferential statistics uses probability, that is the probability is a tool of

inferential statistics.
The Variable and its classifications
 A Variable: is characteristic or attribute that can assume
different values.
 Data: are the values that the variables can assume.
 data set : Collection of data values . Each value in the
data set is called a data value or a datum.
Types of data
Most data can be put into the following categories:
• Qualitative
• Quantitative
1- Qualitative data are also often called categorical data and
are generally described by words or letters. For instance:
Color: black, dark, brown, light brown, blonde, gray, or red.
Blood type: A, B, O, or AB.
Major: IT, IS, Mechatronics, Biomedical, …
2-Quantitative data are always numbers and are the result of
counting or measuring. For example number students, Age
,Height , Weight, … ,temperature …..etc.
Quantitative variables can be divided in to two types:
Discrete Variables: assume values that can be counted
For example: number of children in a family ,
number of students in classroom, the number of phone
calls you receive for each day of the week.
Continuous Variables: results of measurements
For example: lengths, weights, or times.

Types of data


Quantitative Qualitative
(numerical) (categorical)

Determine the correct data type (quantitative or qualitative).
Indicate whether quantitative data are continuous or discrete.
a. the number of pairs of shoes you own
b. the type of car you drive
c. the distance it is from your home to the nearest grocery
d. the number of classes you take per school year.
e. the type of calculator you use
f. weights of sumo wrestlers
g. number of correct answers on a quiz
h. IQ (Intelligent quotient.)
a. quantitative discrete.
b. qualitative, or categorical
c. quantitative continuous
d. quantitative discrete. .
e. qualitative, or categorical
f. quantitative continuous
g. quantitative discrete.
h. quantitative continuous

The boundaries of a continuous variable
The boundaries of a continuous variable are given in one
additional decimal place and always end with the digit 5.

Exercise: Give the boundaries of each value.

a. 36 inches. b. 105.4 miles. c. 72.6 tons.
d. 5.27 centimeters. e. 5 ounces.
2- Frequency Distribution
Data collected in original form is called raw data.
A frequency distribution is the organization of raw data in
table form, using classes and frequencies. Categorical
frequency distributions.

Example 2-1. Twenty-five army inductees were given a

blood test to determine their blood type. The data set is

Construct a frequency distribution for the data.

IIII 5/25=0.2 20
7/25=0.28 28
9/25=0.36 36
4/25=0.16 16

Grouped Frequency Distribution
Example: The following data represent the record high
temperatures for each of the 50 states. Construct a grouped
frequency distribution for the data using 7 classes.

Determine the classes.
Determine the lowest value (L), L=100,
highest value (H), H=134.
Find the range (R). Range= highest value – smallest value
Find the class width.
Class width = Range/number of classes
=34/7 = 5
Rounding Rule: Always round up if a remainder

Constructing a Grouped Frequency Distribution
 For convenience sake, we will choose the lowest data
value, 100, for the first lower class limit.
 The subsequent lower class limits are found by
adding the width to the previous lower class limits.
Class Limits
The first upper class limit is one
100 - 104
105 - 109 less than the next lower class limit.
110 - 114
The subsequent upper class limits
115 - 119
120 - 124 are found by adding the width to the
125 - 129 previous upper class limits.
130 - 134
Constructing a Grouped Frequency Distribution

Exercise: Construct a relative Frequency and a percent Frequency for this

The class width from the frequency distribution table
class width = Lower (or upper)class limit of one class - Lower(or upper)class limit of
preceding class
Class width = (upper class limit – lower class limit of the same class)+1
Class width = upper class boundary – lower class boundary of the same

The class midpoint Xm

Lower limit +upper limit Lower boundary +upper boundary
X  
m 2 2

Xm of any class = Xm of preceding class +the class width

Exercise: Find the midpoint for the classes in the previous example. 19
Rules for Classes in Grouped Frequency
1. There should be 5-20 classes.
2. The classes must be mutually exclusive.
3. The classes must be continuous.
4. The classes must be exhaustive.
5. The classes must be equal in width (except in open-
ended distributions).

Cumulative Frequency
A cumulative frequency distribution is a distribution that
shows the number of data values less than or equal to a specific
value (usually an upper boundary).

Un Grouped Frequency Distribution
When the range of the data values is relatively small, a frequency distribution
can be constructed using single data values for each class. This type of
distribution is called an ungrouped frequency distribution

The data shown here represent the number of miles per gallon (mpg) that 30
selected four-wheel-drive sports utility vehicles obtained in city driving.
Construct a frequency distribution.

STEP 1 Determine the classes.
Determine the lowest value (L), L=12, highest value (H), H=19.
Find the range (R), R=H-L=19-12=7.

Cumulative Frequency

2-2 Graphs
3 Most Common Graphs in Research
1. Histogram
2. Frequency Polygon
3. Cumulative Frequency Polygon (Ogive)

1- Histograms
The histogram is a graph that displays the data by using
contiguous (unless the frequency of a class is 0) vertical bars of
various heights to represent the frequencies of the classes.

1: Draw and label the x and y axes. The x axis is
always the horizontal axis, and the y axis is always
the vertical axis.
2: Represent the class boundaries on the x axis.
and the frequency on the y axis.
3: Using the frequencies as the heights, draw vertical
bars for each class.
Example 2-4
Construct a histogram to represent the data for
the record high temperatures for each of the 50
states (see Example 2–2 for the data).
100 - 104 2
105 - 109 8
110 - 114 18
115 - 119 13
120 - 124 7
125 - 129 1
130 - 134 1
Histograms use class boundaries and
frequencies of the classes.
Class Class
Limits Boundaries
100 - 104 99.5 - 104.5 2
105 - 109 104.5 - 109.5 8
110 - 114 109.5 - 114.5 18
115 - 119 114.5 - 119.5 13
120 - 124 119.5 - 124.5 7
125 - 129 124.5 - 129.5 1
130 - 134 129.5 - 134.5 1

Histograms use class boundaries and
frequencies of the classes.

Frequency Polygon
The frequency polygon is a graph that displays the data by using
lines that connect points plotted for the frequencies at the class
midpoints. The frequencies are represented by the heights of the
1: Draw and label the x and y axes.
2: Represent the midpoint, on the x axis.
3: Choose a suitable scale for the frequencies, and label it on the y
4: Connect adjacent points with line segments. Draw a line back to
the x axis at the beginning and end of the graph, at the same
distance that the previous and next midpoints would be located.
Example 2-5
Construct a frequency polygon to represent the
data for the record high temperatures for each of
the 50 states.
100 - 104 2
105 - 109 8
110 - 114 18
115 - 119 13
120 - 124 7
125 - 129 1
130 - 134 1
Frequency Polygons
Frequency polygons use class midpoints
and frequencies of the classes.
Class Class
Limits Midpoints
100 - 104 102 2
105 - 109 107 8
110 - 114 112 18
115 - 119 117 13
120 - 124 122 7
125 - 129 127 1
130 - 134 132 1

Frequency Polygons
Frequency polygons use class midpoints
and frequencies of the classes.

An Ogive (Cumulative Frequency Polygon
The ogive is a graph that represents the cumulative
frequencies for the classes in a frequency distribution.
1: Draw and label the x and y axes.
2: Represent the class boundaries on the x axis
3: Choose a suitable scale cumulative frequencies, and
label it on the y axis.
4: Plot the points and then draw the bars or lines.

Example 2-6
Construct an ogive to represent the data for the
record high temperatures for each of the 50
states (see Example 2–2 for the data).
100 - 104 2
105 - 109 8
110 - 114 18
115 - 119 13
120 - 124 7
125 - 129 1
130 - 134 1
Ogives use upper class boundaries and
cumulative frequencies of the classes.
Class Class Cumulative
Limits Boundaries Frequency
100 - 104 99.5 - 104.5 2 2
105 - 109 104.5 - 109.5 8 10
110 - 114 109.5 - 114.5 18 28
115 - 119 114.5 - 119.5 13 41
120 - 124 119.5 - 124.5 7 48
125 - 129 124.5 - 129.5 1 49
130 - 134 129.5 - 134.5 1 50

Ogives use upper class boundaries and
cumulative frequencies of the classes.
Class Boundaries
Less than 99.5 0
Less than 104.5 2
Less than 109.5 10
Less than 114.5 28
Less than 119.5 41
Less than 124.5 48
Less than 129.5 49
Less than 134.5 50
An ogive (Cumulative Frequency Polygon)

Ogives use upper class boundaries and
cumulative frequencies of the classes.

2.2 Relative Frequency Graphs
If proportions are used instead of frequencies, the
graphs are called relative frequency graphs.

Relative frequency graphs are used when the

proportion of data values that fall into a given class
is more important than the actual number of data
values that fall into that class.

Example 2-7 Page #57
Construct a histogram, frequency polygon, and ogive
using relative frequencies for the distribution (shown
here) of the miles that 20 randomly selected runners
ran during a given week. Class
5.5 - 10.5 1
10.5 - 15.5 2
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2

The following is a frequency distribution of
miles run per week by 20 selected runners.
Divide each
Class Relative
Frequency frequency by
Boundaries Frequency the total
5.5 - 10.5 1 frequency to
1/20 = 0.05
10.5 - 15.5 2 get the
2/20 = 0.10
15.5 - 20.5 3 relative
3/20 = 0.15
20.5 - 25.5 5 frequency.
5/20 = 0.25
25.5 - 30.5 4 4/20 = 0.20
30.5 - 35.5 3 3/20 = 0.15
35.5 - 40.5 2 2/20 = 0.10

f = 20 rf = 1.00

Use the class boundaries and the
relative frequencies of the classes.

Frequency Polygons
The following is a frequency distribution of
miles run per week by 20 selected runners.
Class Class Relative
Boundaries Midpoints Frequency
5.5 - 10.5 8 0.05
10.5 - 15.5 13 0.10
15.5 - 20.5 18 0.15
20.5 - 25.5 23 0.25
25.5 - 30.5 28 0.20
30.5 - 35.5 33 0.15
35.5 - 40.5 38 0.10

Frequency Polygons
Use the class midpoints and the
relative frequencies of the classes.

The following is a frequency distribution of
miles run per week by 20 selected runners.
Class Cumulative Cum. Rel.
Boundaries Frequency Frequency
5.5 - 10.5 1 1 1/20 = 0.05
10.5 - 15.5 2 3 3/20 = 0.15
15.5 - 20.5 3 6 6/20 = 0.30
20.5 - 25.5 5 11 11/20 = 0.55
25.5 - 30.5 4 15 15/20 = 0.75
30.5 - 35.5 3 18 18/20 = 0.90
35.5 - 40.5 2 20 20/20 = 1.00
f = 20
Ogives use upper class boundaries and
cumulative frequencies of the classes.
Cum. Rel.
Class Boundaries
Less than 5.5 0
Less than 10.5 0.05
Less than 15.5 0.15
Less than 20.5 0.30
Less than 25.5 0.55
Less than 30.5 0.75
Less than 35.5 0.90
Less than 40.5 1.00
Use the upper class boundaries and the
cumulative relative frequencies.

Shapes of Distributions

Shapes of Distributions

Other Types of Graphs
Stem and Leaf Plots
A stem and leaf plots is a data plot that uses part of a data
value as the stem and part of the data value as the leaf to
form groups or classes.

It has the advantage over grouped frequency distribution

of retaining the actual data while showing them in graphic


At an outpatient testing center, the number of

cardiograms performed each day for 20 days is shown.
Construct a stem and leaf plot for the data.

25 31 20 32 13
14 43 2 57 23
36 32 33 32 44
32 52 44 51 45
Step 1 Arrange the data in order:
02, 13, 14, 20, 23, 25, 31, 32, 32, 32,
32, 33, 36, 43, 44, 44, 45, 51, 52, 57

Step 2 Separate the data according to the first digit, as shown.
02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36
43, 44, 44, 45 51, 52, 57

Stem and Leaf Plot

Stem Leaf
0 2
1 3 4
2 0 3 5
3 1 2 2 2 2 3 6
4 3 4 4 5
5 1 2 7

An insurance company researcher conducted a survey on the number of car
thefts in a large city for a period of 30 days last summer. The raw data are
shown. Construct a stem and leaf plot by using classes 50–54, 55–59, 60–64,
65–69,70–74, and 75–79.
52 62 51 50 69
58 77 66 53 57
75 56 55 67 73
79 59 68 65 72
57 51 63 69 75
65 53 78 66 55
Step 1 Arrange the data in order:

Step 2 Separate the data according to the classes.

Stem and Leaf Plot

Stem Leaf

 The Pie Graph:
Pie graphs are used extensively in statistics. The
purpose of the pie graph is to show the
relationship of the parts to the whole
The pie graph is used to represent the nominal or
categorical variable
A pie graph is a circle that is divided into
sections according to the percentage of
frequencies in each category of the distribution.
Example: Construct a pie graph showing the blood
types of the army inductees described in Example 2–1.
The frequency distribution is repeated here.
Step 3 Using a protractor, graph each section and write its name and
corresponding percentage, as shown in following figure .
The average amounts spent by college freshmen for
school items are shown. Construct a pie graph.
Electronics/computers $728
Dorm items $344
Clothing $ 141
Shoes $ 72

Convert the frequency to degrees, also the frequency to

f f
Degree =  360 Percent =  100
n n
728 728
El ect r onics  360  204 Electronics  100  56%
1285 1285
344 344
D o rm item s  360  96 Do rm item s  100  27%
1285 1285
141 141
Clothing  360  40 Clothing  100  11%
1285 1285
72 72
Shoes  360  20 Shoes  100  6%
1285 1285
Step 3 Using a protractor, graph each section and write its name and
corresponding percentage, as shown in following figure .
Bar Graphs
When the data are qualitative or categorical, bar graphs can
be used to represent the data. A bar graph can be drawn
using either horizontal or vertical bars.
A bar graph represents the data by using vertical or
horizontal bars whose heights or lengths represent the
frequencies of the data.

Bluman, Chapter 2 62
Pareto Charts
A Pareto chart is used to represent a frequency
distribution for a categorical variable, and the
frequencies are displayed by the heights of vertical bars,
which are arranged in order from highest to lowest.

Step 1 Arrange the data from the largest to smallest
according to frequency.

Step 2 Draw and label the x and y axes.

Step 3 Draw the bars corresponding to the

Pareto Charts

The graph shows that the number of homeless people

is about the same for Atlanta and Chicago and a lot
less for Baltimore and St. Louis.
The Time Series Graph
When data are collected over a period of time, they can
be represented by a time series graph.
The number of homicides that occurred in the
workplace for the years 2003 to 2008 is shown. Draw a
time series graph for the data.


Step 1 Draw and label the x and y axes.

Step 2 Label the x axis for years and the y axis for the
Step 3 Plot each point according to the table.
Step 4 Draw line segments connecting adjacent point.

There was a slight decrease in the years ’04, ’05, and
’06, compared to ’03, and again an increase in ’07. The
largest decrease occurred in ’08.


You might also like