Professional Documents
Culture Documents
Priinciples of Statistics PDF
Priinciples of Statistics PDF
Priinciples of Statistics PDF
1.1 Introduction
Statistical techniques are those techniques which are used in
conducting the statistical enquiry concerning to certain
Phenomenon. They include all the statistical methods beginning
from the collection of data till interpretation of those collected data.
One of the important statistical methods is collection of data. There
are different methods for collecting primary and secondary data.
Although the tradition of collection of data and its use for various
purposes is very old, the development of modern statistics as a
subject is of recent origin. The development of the subject took place
mainly after sixteenth century. The notable mathematicians who
contributed to the development of statistics are Galileo, Pascal, De-
Mere, Ferment and Cardenas of the 17th century. Then in later years
the subject was developed by Abraham De Mover (1667 - 1754),
Marquis De Laplace (1749 - 1827), Karl Friedrich Gauss (1777 -
1855), Adolph Quenelle (1796 - 1874), Francis Galton (1822 - 1911),
etc. Karl Pearson (1857 - 1937), who is regarded as the father of
modern statistics, was greatly motivated by the researches of Galton
and was the first person to be appointed as Galton Professor in the
University of London. William S. Gusset (1876 - 1937), a student of
Karl Pearson, propounded a number of statistical formulae under
the pen-name of 'Student'.
R.A. Fisher is yet another notable contributor to the field of statistics.
His book 'Statistical Methods for Research Workers', published in
1925 marks the beginning of the theory of modern statistics.
The science of statistics also received contributions from notable
1.7.1. Nominal is the lowest level. Only names are meaningful here.
1.7.2. Ordinal adds an order to the names.
1.7.3. Interval adds meaningful differences.
1.7.4. Ratio adds a zero so that ratios are meaningful.
1.8. Exercises
Exercise 1
Exercise 2
Describe types of data.
Exercise 3
Explain the Levels of Measurement.
Exercise 4
Explain the Simple Random Sampling.
Exercise 5
Where do you get data and information (Data Sources).
Exercise 6
List all functions of Statistics.
Exercise 7
Explain what is meant by the term population.
Exercise 8
Explain what is meant by the term sample.
Exercise 9
Explain how a sample differs from a population.
Exercise 10
Explain what is meant by the term sample data.
2.1 Introduction
Step (3)
Finding Class Limits:
Each class has two limits, lower limit and upper limit.
The limits could actually appear in the data and have gaps between
the upper limit of one class and the lower limit of the next.
Lower class limit = The smallest data value in the sample or less.
Upper class limit = Lower class limit + (W - 1).
Mid-Class (Mid-point): The number in the middle of the class.
It is found by adding the upper and lower limits for each class and
dividing by two.
Mid-Class = (Upper class Limit + Lower class Limit) / 2.
Example1
The number of items rejected daily by a manufacturer because of
defects was recorded for the 30 days. The results are shown below:
4 9 13 7 12 15 5 8 5 7 15 17 19 8 6
6 4 10 8 22 16 9 5 3 9 21 14 13 18 7
Construct:
1. Frequency distribution table.
2. Relative frequency,
3. Percent frequency,
n =30.
𝟒
K=2.5* √𝟑𝟎
K = 5.8508.
Step (2):
Width of the Class (W): From the data in Example1, the Largest
data value = 22 and the Smallest data value = 3. Hence, the range (R)
is,
R = 22 – 3 = 19, and we already have an approximate number of
classes K = 5.8508.
Therefore,
W=R/K
W = 19 / 5.8508 = 3.2474 ≈ 3, which is rounded-up to 3.
Step (3) Class Limits: We can find the class limits as follows:
Lower class limit =the smallest data value in the sample or less = 3.
Upper class limit = Lower class limit +(W - 1 ) = 3+ (3 – 1) = 5.
We define the first class limits as (3 – 5), second class limits (6 – 8),
third class limits (9 –11), forth class limits(12 – 14), fifth class limits
(15 – 17), sixth class limits (18 – 20), and seventh class limits( 21 –
23). The smallest data value, 3, is included in the (3 – 5) class.
Note that the sum of all the relative frequencies must always be
equal to 1.00 and the sum of all Percent frequencies must always be
equal to 100. In the above example, we see that 0.233333% of all
rejected items are between 3 and 5 items, and 0.066667% of all
rejected items are between 18 and 20 items.
4- Recall the Cumulative Frequency; the cumulative frequency
distribution shows the number of items with values less than or equal
to the upper class limit of each class.
Table 2.3 provides the Ascending and Descending Cumulative
frequency distribution for items rejected daily data.
Total 30 - --
Freq.(f i ) 6 8 4 4 4 2 2
Mid-Class 4 7 10 13 16 19 22
Xi
Example 2
Suppose the age of 10 students is:
21, 18, 19, 21, 20, 25, 22, 23, 24, and 26.
Find: 1) Frequency distribution table,
2) Relative frequency,
3) Percent frequency,
4) Cumulative frequency and
5) Mid-class.
The Solution:
n =10.
𝟒
K =2.5* √𝟏𝟎
K = 4.4456
Freq. f i 2 3 2 2 1 10
Note that the sum of the relative column frequency must always be
equal to 1.00 and the sum of the relative column frequency must
always be equal to 100 in the above example.
4- Calculate Ascending and Descending Cumulative Frequencies.
Table 2.7 shows the frequency distribution and Ascending and
Descending frequency distribution for the student's age data:
Total 10
Freq. f i 2 3 2 2 1 10
Frequency(fi) 2 3 2 2
The Solution:
Frequency on the y-axis and the class limits on the X-axis.
Figure 2.1 Histogram for data in example 4.
3.5
2.5
2 Freq
Column1
1.5
Column2
1
0.5
0
18-19 ﻛﺎﻧون اﻷول-۲۰ 22-23 24-25
Example 5
Draw a line graph from the following data.
Class 3-5 6-8 9-11 12-14 15-17 18-20 21-23
Freq.(f i ) 6 8 4 4 4 2 2
Mid-Class 4 7 10 13 16 19 22
Xi
The Solution:
We put mid-class on the X-Axis and frequency on the Y-Axis to get
the following graph.
5 frequency
Column1
4
Column2
3
0
4 7 10 13 16 19 22
Example 6
Draw O -give (Curve Line) for Ascending and Descending
Cumulative Frequency for the following data.
Ascending and Descending Cumulative Frequencies.
Items ≤5 ≤ ≤ 11 ≤ ≤ 17 ≤ 20 ≤ 23
rejected 8 14
daily
Ascending 6 14 18 22 26 28 30
Cumulative
Descending 30 23 16 12 8 4 2
The Solution
Put cumulative frequency on the vertical y-axis.
Put the class boundaries on the horizontal X-axis
Figure 2.3 O give for the data in example 6.
35
30
25
20
15 Series 1
10 Series 2
Series 3
5
0
Items Less Less Less Less Less Less
rejected than or than or than or Less than or than or than or
daily equal to equal to equal to than or equal to equal to equal to
5 8 11 equal to 17 20 23
14
Example7
We asked first year students in Accounting Dept./ Cihan University
about their Blood group. Their responses are listed below:
Data from a Sample of 50 Students
(AB)+ A+ B+ O- (AB)+ O- (AB)+ B+ O+ (AB)+
A+ B+ A+ (AB)+ B+ (AB)- B- B+ B+ O+
B+ B- A- (AB)- B- B- (AB)+ O+ A+ B+
(AB)- (AB)+ O+ B- A+ A+ B+ O+ A+ A+
Find:
1.Frequency distribution table.
2. Relative frequency,
3. Percent frequency.
4. Ascending Cumulative frequency
The solution:
1. To develop a frequency distribution table for these data, we
count the number of times each of the Blood groups appears in the
data set. (AB)+ appears 10 times, O+ appears 6 times, B+ appears 9
times, and so on. These counts are summarized in the frequency
distribution Table 2.11 below:
Table 2.11
Frequency Distribution of Blood group
25 Dr. Abood Mohammed Jameel
Statistics and Probability for Business and Financial Sciences
Dr. Abood Mohammed Jameel
Blood group (AB)+ O+ O- B+ B- A+ A- (AB)- Total
Frequency ( f i ) 10 6 3 9 6 9 3 4 50
Table 2.12
Relative, Percent and Ascending Cumulative
Frequency Distribution of Student's Blood Group
Frequency Frequency Relative Percent % Ascending
(f i ) Frequency Frequency Cumulative
Blood
(R.f i ) (P.f i ) Frequency
group
(AB)+ 10 0.20 20 10
O+ 6 0.12 12 16
O- 3 0.06 06 19
B+ 9 0.18 18 28
B- 6 0.12 12 34
A+ 9 0.18 18 43
A- 3 0.06 06 46
(AB)- 4 0.08 08 50
10
Frequency ( fi )
6
Column2
Column1
4
0
(AB)+ O+ O- B+ B- A+ A- (AB)-
AB+ 0.2
O+ 0.12
O- 0.06
B+ 0.18
B- 0.12
A+ 0.18
A- 0.06
AB- 0.08
2.5 Exercises
Exercise 1
Define the following; Element, Variable, Sample, Statistical
Population, types of data.
Exercise 2
Consider the following data
Good Very Good Excellent Good Very Good
Find:
1-Frequency distribution Table
2-Relative frequency and Percent frequency distribution
3-Ascending and Descending Cumulative frequency
Exercise 3
Consider the following data:
Apple L.G H.P IBM Sony Dell L.G H.P IBM Sony L.G
H.P IBM Sony Apple L.G H.P IBM Sony Apple L.G
H.P Apple IBM L.G H.P IBM Sony Apple Dell L.G H.P
Dell L.G H.P Dell L.G H.P Dell Sony
Find:
1.Frequency distribution Table,
2. Relative frequency and Percent frequency distribution,
3. Ascending and Descending Cumulative frequency.
Exercise 4
the data below shows the different types of blood group for 40 people,
where: 1 =Type A+ blood group. 2 = Type O+ blood group.
3 = Type B+ blood group. 4 = Type (AB)+ blood group.
3 4 4 3 2 3 4 3 1 4 4 2 4 3 1 4 4 2 4 4
2 3 2 3 3 2 3 2 1 3 2 3 3 4 1 4 2 3 4 1
4 9 13 7 12 15 5 8 5 7 15 17 19 8 3
4 10 8 22 16 9 5 3 9 21 14 13 18 7 5
Group A B C D E Total
Freq. ( f i ) 10 25 40 30 15 120
Draw:
1. Bar –Graph.
2. Pie –Chart (Circle- graph)
Exercise 7
suppose you have the following data
Frequency 7 7 4 4 4 2 2 30
Find:
Exercise 8
The data below shows the death due to variety of causes for 45 people,
Where 1 = heart disease 2 = cancer 3 = accidents 4 = other.
2 3 3 4 1 4 2 3 4 1 3 2 4 3 4
3 4 4 3 2 3 4 3 1 4 4 2 3 1 4
2 4 4 2 3 2 2 3 2 2 3 3 4 1 4
Group A B C D E Total
Group A B C D E Total
Group A B C D E Total
Exercise 12
Consider the following table:
Find:
1.The (f 3 ) of class (16-20).
2. The Width of the class (W).
3. The Range (R).
Exercise 13
Consider the following table:
Freq. f i 20 25 50 15 10 120
fi 25 35 55 --- 10 150
Find:
1. The (f i ) of class (15-18).
2. The Width of the class (W).
3. The Range (R).
Exercise 15:
Consider the following table
fi 25 35 55 25 10 150
Find:
1.The Ascending and Descending Cumulative frequency distribution.
2. Draw O-give graph.
Exercise16:
Consider the following data.
14 21 23 21 16
19 22 25 16 16
20 23 16 20 19
24 26 15 22 24
20 22 24 22 20
Exercise18:
Consider the following table:
10-20 60 20 60
21-31 30
43-53 18
54-64 10
Total 300
Exercise19:
Suppose you have the following Data:
5-10 22
11-16 140 28
----- 17
29-34 13
Exercise20
consider the following data.
Exercise21
doctor’s office staff studied the waiting times for patients who arrive
at the office with a request for emergency service. The following data
with waiting times in minutes were collected over 20 days.
2 5 10 12 4 4 5 17 11 8
9 8 12 21 6 8 7 13 18 4 3
Use classes of 0-4, 5-9, and so on in the following:
Show the frequency distribution. Show the relative frequency
distribution.
Chapter Three
(Measures of Location)
3.1 Introduction
Summarization of the data is a necessary function of any statistical
analysis. As a first step in this direction, the huge mass of data is
summarized in the form of tables and frequency distributions. In
order to bring the characteristics of the data into sharp focus, these
tables and frequency distributions need to be summarized further.
A measure of central tendency or an average is very essential and an
important summary measure in any statistical analysis. An average
is a single value which can be taken as representative of the whole
distribution.
3.2 Mean
Before the discussion of the mean, we shall introduce certain notations.
Consider that there are n observations whose values are denoted
by X1 , X 2 , ... X n respectively. The sum of these observations X1 +
�
𝑿 = ∑ X i /n … 3.2
10 20 25 30 35 28 22 15 18 12
� = ∑ X i / n, = ( X 1 + X 2 + . . . + X 10 ) / n
𝑿
� =(10 + 20 +25 + 30 +35+ 28 +22 +1 5 +1 8 +12 )/10 = 215 /10 = 21.5
𝑿
Example 2
� = ∑ X i / n, = ( X 1 + X 2 + . . . + X 30 ) / n
𝑿
Example 3
Find median of the following observations:
20, 15, 25, 28, 18, 16, 30.
The Solution: Writing the observations in ascending order, we get 15,
16, 18, 20, 25, 28, 30. Since n = 7, i.e., odd, the median is the size of (
7+1) /2 = 4 th, i.e., 4th observation. Hence, median, denoted by Md =
20.
Note: The same value of Md will be obtained by arranging the
observations in descending order of magnitude.
Example 4
Find the median of the following data:
245, 230, 265, 236, 220, 250.
The Solution:
43 Dr. Abood Mohammed Jameel
Statistics and Probability for Business and Financial Sciences
Dr. Abood Mohammed Jameel
Arranging these observations in ascending order of magnitude, we
get 220, 230, 236, 245, 250, 265.
Here n = 6, i.e., even.
Median will be the mean of the size of 6/2 =3, i.e., 3rd and [(6/2) + 1] =
4, i.e 4th observations. Hence Md = [ (236+245) /2] = 240.5
Example 5
Compute mode of the following data:
3, 4, 5, 10, 15, 3, 6, 7, 9, 12, 10, 16, 18,
20, 10, 9, 8, 19, 11, 14, 10, 13, 17, 9, 11
The Solution:
Writing this in the form of a frequency distribution, we get
Values: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Frequency: 2 1 1 1 1 1 3 4 2 1 1 1 1 1 1 1 1 1
Mode = 10
Example 6
Consider the following data set:
1 2 3 4 5 6 7 8 8 9 8 10 11 12 13 14 15
� = 1+ 2+ 3+ 4+ 5+ 6+ 7+ 8+8+9+8+…+15/17
1. The Mean: 𝑿
= 136/17 = 8.
2. The Median, We arrange the values in an ascending order.
1 2 3 4 5 6 7 8 8 8 9 10 11 12 13 14 15.
The median is the value in the middle (Red value)
3.The Mode will be equal to 8.
3.5 Summary:
The Mean is used in computing other statistics (such as the variance). It
does not exist for open ended grouped frequency distributions. It is
often not appropriate for skewed distributions such as salary
information.
The Median is the center number and is good for skewed distributions
because it is resistant to change. The Mode is used to describe the most
typical case. The mode can be used with nominal data whereas the
others can't. The mode may or may not exist and there may be more
than one value for the mode
3.6 Exercises
Exercise 1
The following are behavioral ratings as measured for 10 cases.
3 , 6 , 4 , 3 , 4 , 4 , 5 , 2 , 5, 4
Compute:
A) The mean ,
B) The median and
C) The mode.
Exercise 3
Suppose the following data represent 25 cell phone prices sold in the
Erbil area. Data are in dollars ($).
119 121 120 118 119 118 119 120 121 117 125 128
121 118 129 128 121 119 124 121 123 127 130 129 121
Exercise 4
A sample of 20 college Lecturers showed the following hours taken
during the first semester, 2011-2012.
15 14 16 22 24 16 18 20 22 24
36 34 36 22 20 18 22 36 18 20
R = XL – XS … 4.1
Since the range only uses the largest and smallest values, it is greatly
affected by extreme values, that is - it is not resistant to change.
Example1
18 16 17 12 23 21 18 22 26 19 25
Range (R) = 26 – 16 = 10
σ2 = ∑ (X i - µ)2 / N … 4.2
= population variance
… 4.3
� )2 / n-1
S2 = ∑ (X i - 𝑿 … 4.4
= sample variance
= individual value
= sample mean
n = number of values
Degrees of freedom.
There are (n – 1) degrees of freedom in computing the variance, because
if (n -1) values are known, the nth one is determined automatically.
This is because all of the values of (xi - x) must add to zero.
… 4.5
or
S = √ S2
Example 2
Consider the following data set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The Solution:
To solve problem like this, we have to prepare the following table:
Table 4.1
Xi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 120
�)
(X i - 𝑿 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 0
� )2
(X i - 𝑿 49 36 25 16 9 4 1 0 1 4 9 16 25 36 49 280
Hence,
� ) = 1+ 2 +3 + … +14+ 15 =120 /15= 8
1.The Mean, (𝑿
� )* 100%
C.V = (S / 𝑿 … 4.6
Example 3
Find the coefficient of variation ( C.V.) for data in Example 4.2
The Solution:
Mean = 8, S= 4.47213, then the C.V can be calculated by:
C.V = (4.47213 / 8)*100% = 55.9016%.
Example 4
The following data represent daily salaries (ID) paid to 15 employees
working in a Constructing company.
50 55 60 65 80 68 78 75
60 72 65 77 50 65 70
Calculate:
1. The Variance (S2).
2. The Standard deviation(S).
3. The Coefficient of Variation (C.V).
The Solution:
We can calculate the Variance (S2) , the Standard deviation(S) and The
Coefficient of Variation (C.V) from the data in the following Table 4.2:
Table 4.2
Xi 50 55 60 65 80 68 78 75 60 72 65 77 50 65 70 990
Hence,
1.The Mean: 990/15 = 66
2.The variance: S2 = 1306 /14 = 93.2857
3.The standard deviation S = √ 93.2857 = 9.6585
4. C.V = ( 9.6585 / 66 ) * 100% = 14.6341 %
Example 5
Suppose you have the following 16 observations:
14 16 17 18 21 17 25 13
15 17 19 22 23 20 22 25
The Solution:
From the data in the Table 4.3, we get;
Xi 14 16 17 18 21 17 25 13 15 17 19 22 23 20 22 25 304
(Xi − X ) -5 -3 -2 -1 2 -2 6 -6 -4 -2 0 3 4 1 3 6 0
( X i − X ) 2 25 9 4 1 4 4 36 36 16 4 0 9 16 1 9 6 210
Example 6
The number of absence hours was recorded by a sample of 20 Students
as follows:
3 4 4 5 6 6 6 6 7 7
7 7 8 9 10 10 11 12 15 17
Calculate: 1.The Variance (S2), 2.The Standard deviation (S ).
3. Coefficient of variation (C.V).
The Solution:
From the data in the Table 4.4, we get;
1.The Mean: 160/20 = 8
2.The variance: S2 = 250 /20 = 12.5
3.The standard deviation S = √ 12.5 = 3.54
4. C.V = (3.54 / 8) * 100% = 44.25 %
4.7 Measure of Position
Standard Scores (z-scores): The standard score is obtained by
subtracting the mean and dividing the difference by the standard
deviation. The symbol is Z, also called a Z-score.
� )/S
Z = (X -µ) / σ. Or Z = (X - 𝑿 … 4.7
Chapter Five
Mean, Variance, and Standard deviation
For Grouped (Classified) data.
∑
i =1
wi X i
Xw = n … 5.1
∑w i =1
i
Example 1
A student in Accounting Department/ Cihan University-Erbil has passed
his final exam/first semester and got the following marks:
Table 5.1
Principles of Accounting 65 5
Principles of Statistics 70 3
Principles of Management 73 3
Microeconomics 77 3
Financial Mathematics 60 3
Computer Skills 85 2
English Language 80 3
Microeconomics 77 3 231
Example 2
Grades 50 60 74 80 54 70
Units (W i ) 2 3 4 3 2 3
The Solution:
We have to find the following table:
Table 5.3
Weighted Mean (Average)
Grades X i Units (w i ) X i *w i
50 2 100
60 3 180
74 4 296
80 3 240
54 2 108
70 3 210
Total 17 1134
� )2 * w i } / ∑ (w i - 1)
S2 = { ∑ ( X i – 𝑿 … 5.2
Example 3
Using data in Table 5.1 to calculate the Variance (S2 ).
The Solution:
Create the following Table 5.4
Table 5.4
Subject Mark � ) ( X i –𝑿
Unit(W i ( X i – 𝑿 � )2 � )2*W i
( Xi – 𝑿
(X i ) )
Principles of Statistics 70 3 -2 4 12
Principles of 73 3 1 1 3
Management
Microeconomics 77 3 5 25 75
� = ∑ w i * X i / ∑ w i = 1584/22 = 72.
𝑿
S = √ S2 … 5.3
Example 4
Use data in Table 5.3, to calculate the Standard deviation (S ).
The Solution:
S = √ S2 = √ 69.9047 = 8.3609
Example 5
Consider the following Table 5.5
Table 5.5
Wi 25 35 55 25 10 150
Calculate:
1.The weighted Mean,
Table 5.6
Weighted Mean, Variance, and Standard deviation
Class Wi Mid-Class=X i � ) ( X i –𝑿
X i *W i ( X i –𝑿 � )2 � )2 *
(X i – 𝑿
Wi