Statistics 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 112


“The science which deals with the collection, tabulation, analysis and interpretation of numerical data.” –
Croxton and Cowden.
The values of different objects collected in a survey or recorded is called data in Statistics
Each value in the data is known as observation. Statistical may be classified as follows:
 Quantitative data 
 based on the characteristic
 Qualitative data 
 Discrete data 
 based on nature of the characteristic
 Continuous data 
 Nominal data 
 Ordinal data 

 based on level of measurement
 Interval data 
 Ratio data 

 Pr imary data 
 based on the ways of obtaining the data
 Secondary data 
(1) Nominal Scale or Categorical
The word nominal has come from this Latin word, i.e. ‘Nomen’. It means name. Therefore, under nominal
scale we divide the objects under study into two or more categories by giving them unique names.
This can be done by dividing the population into two categories male ‘M’ and female ‘F’

Category Name/Code
Male M
Female F

Here we have named male as ‘M’ and female as ‘F’. This is not the only way, we can also code male by ‘0’
and female by ‘1’ or we may use any other convenient symbols. So, we note that main thing is that we
have to give a unique name to each category.
Note 1: We note that in nominal scale we have just coded the objects. Sign of less than or greater than does not
make any sense in nominal scale. That is here we have coded Hindu, Muslim, by ‘1’ and ‘2’ respectively. But
Hindu > Muslim or Muslim > Hindu does not make any sense.
Similarly, male > female or female > male does not make any sense.
That is, we cannot talk about the order between two categories in case of nominal scale.
(2) Ordinal Scale
As the name ordinal itself suggests that other than the names or codes given to the different categories, it also
provides the order among the categories. That is, we can place the objects in a series based on the orders or
ranks given by using ordinal scale. But here we cannot find actual difference between the two categories.
Example grades in an examination, A A+ B B+ C C+, etc.
(3) Interval Scale
Nominal scale gives only names to the different categories, ordinal scale moving one step further also provides
the concept of order between the categories and interval scale moving one step ahead to ordinal scale also
provides the characteristic of the difference between any two categories.

Interval scale is used when we want to measure years/historical time/calendar time, temperature (except in the
Kelvin scale), sea level, marks in the tests where there is negative marking also, etc. Mathematically, this scale
includes +, – in addition to >, < and ,  .

(4) Ratio Scale

Ratio scale is the highest level of measurement because nominal scale gives only names to the different
categories, ordinal scale provides orders between categories other than names, interval scale provides the
facility of difference between categories other than names and orders but ratio scale other than names, orders
and characteristic of difference also provides natural zero (absolute zero). In ratio measurement scale values of
characteristic cannot be negative. Ratio scale is used when we want to measure temperature in Kelvin, weight,
height, length, age, mass, time, plane angle, etc.
Quantitative Data
As the name quantitative itself suggests that it is related to the quantity. In fact, data are said to be quantitative
data if a numerical quantity (which exactly measure the characteristic under study) is associated with each
Generally, interval or ratio scales are used as a measurement of scale in case of quantitative data. Data based on
the following characteristics generally gives quantitative type of data. Such as weight, height, ages, length,
area, volume, money, temperature, humidity, size, etc.
Qualitative Data
As the name qualitative itself suggests that it is related to the quality of an object/thing. It is obvious that
quality cannot be measured numerically in exact terms. Thus, if the characteristic/attribute under study is such
that it is measured only on the bases of presence or absence then the data thus obtained is known as qualitative
Generally nominal and ordinal scales are used as a measurement of scale in case of qualitative data. Data based
on the following characteristics generally gives qualitative data. Such as gender, marital status, qualification,
colour, religion, satisfaction, types of trees, beauty, honesty, etc.
Discrete Data
If the nature of the characteristic under study is such that values of observations may be at most countable
between two certain limits then corresponding data are known as discrete data
For example,
(i) Number of books on the self of an elmira in a library form discrete data. Because number of books may
be 0 or 1 or 2 or 3,…. But number of books cannot take any real values such as 0.8, 1.32, 1.53245, etc.
Continuous Data
Data are said to be continuous if the measurement of the observations of a characteristic under study may be
any real value between two certain limits.
For example,
(i) Data obtained by measuring the heights of the students of a class of say 30 students form continuous data,
because if minimum and maximum heights are 152cm and 175 cm then heights of the students may take
any possible values between 152 cm and 175 cm. For example, it may be 152.2375 cm, 160.31326… cm,
(ii) Data obtained by measuring weights of the students of a class also form continuous data because weights of
students may be 48.25796…kg, 50.275kg, 42.314314314…kg, etc.
Primary Data
Data which are collected by an investigator or agency or institution for a specific purpose and these people are
first to use these data, are called primary data. That is, these data are originally collected by these people and
they are first to use these data.

For example, suppose a research scholar wants to know the mean age of students of M.Sc. Chemistry of a
particular university. If he collects data related to the age of each student of M.Sc. Chemistry of that particular
university by contacting each student personally then data so obtained by the research scholar is an example of
primary data for the same research scholar.
Secondary Data
Data obtained/gathered by an investigator or agency or institution from a source which already exists, are called
secondary data. That is, these data were originally collected by an investigator or agency or institution and has
been used by them at least once. And now, these data are going to be used at least second time.
For example, consider the same example as discussed in case of primary data. If the research scholar collects
the ages of the students from the record of that particular university, then the data thus obtained is an example
of secondary data. Note that, in both the cases data remain the same, only way of collecting the data differs.
Methods of collection of primary data
(1) Direct Personal Investigation Method
(2) Telephone Method
(3) Indirect Oral Interviews Method
(4) Local Correspondents Method
(5) Mailed Questionnaires Method
(6) Schedules Method
Methods of collection of secondary data
(1) Published Sources
(a) International Publications , Government Publications in India, Published Reports of Commissions and
Committees , Research Publications , Reports of Trade and Industry Associations, Published Printed Sources ,
Published Electronic Sources
(2) Unpublished Sources
Difference between primary and secondary data
Factor of difference Primary data Secondary data
Definition Data collected by an investigator The data obtained/gathered by an
or agency or institution for a investigator or agency or
specific purpose and these people institution from a source which
are first to use these data already exists
Time Long time is required for Less time is required for
collection collection
Money Needs more money for collection Needs less money for collection
Reliability More reliable less reliable
hand First hand data Second hand data
Manpower Needs more man power Needs less man power
Adequacy More adequate Less adequate
Suitability More suitable Less suitable

Frequency Distribution
When observations (raw data) are large in number then it is not easy to handle the data in this form. So it
becomes necessary to condense the data as far as possible without loosing any information of interest. We do
this with the help of frequency distribution.
An arrangement of the frequency corresponding to the value of the variable is called frequency distribution.

Let us consider the ages of 30 students selected at random from among those studying in a certain class.
20, 22, 25, 22, 21, 22, 25, 24, 23, 22, 21, 20, 21, 22, 23, 25, 23, 24, 22, 24, 21, 20, 23, 21, 22, 21, 20, 21, 22, 25.

Age of Tally Frequency

students Mark
20 |||| 04
21 |||| || 07
22 |||| ||| 08
23 |||| 04
24 ||| 03
25 |||| 04
Total 30

Discrete Frequency Distribution

A frequency distribution in which the information is distributed in different classes on the basis of a discrete
variable is known as discrete frequency distribution. For example, the above example is a discrete frequency
Continuous Frequency Distribution
A distribution in which the information is distributed in different classes on the basis of a continuous variable is
known as continuous frequency distribution.
Example: The marks of 30 students in statistics are given below:
10, 12, 25, 32, 27, 32, 38, 43, 39, 55, 29, 38, 57, 08, 06, 13, 27, 25, 29, 53,
55, 45, 35, 48, 47, 59, 15, 19, 48, 55
Classify the above data by taking a suitable class interval.
Method of Continuous Frequency Distribution
Exclusive Method
Under this method, a class interval is such that each upper class limit is excluded from the class interval. Here
in this method, class intervals are so fixed that the upper limit of one class is the lower limit of the next class. In
the following example there are 24 students who have secured the marks between 0 and 50. A student who
secured 20 marks would be included in class 20-30, not in 10–20. This method is widely followed in practice.
Example 3: 24 students appeared in an entrance test where all questions are objective type with 25% –ve
marking. The marks obtained out of 50 maximum marks are as follows:
17, 16, 7, 30, 21, 42, 44, 36, 22, 22, 25, 31, 31, 34, 30, 36, 35, 45, 25, 15,
20, 42, 40, 30
Prepare a frequency distribution by using exclusive method.
Solution: Frequency distribution of marks obtained by above 24 students is given below in table 13.8 using
exclusive method as follows:
Classes Tally No. of
bar Students
0-10 | 1
10-20 ||| 3
20-30 |||| | 6
30-40 |||| |||| 9
40-50 |||| 5
Total 24
Inclusive Method
Under the inclusive method of classification both lower class limit as well as the upper limit of a class is
included in that class itself. Following frequency distribution is formed using inclusive method for the data of
Example 3 given above.
Table 13.9: Frequency Distribution of 24 Students by Inclusive Method
Class Tally No. of
bar Students
0-9 | 1
10-19 ||| 3
20-29 |||| | 6
30-39 |||| |||| 9
40-49 |||| 5
Total 24

That means if data are classified in such a way that the lower as well as the upper class limits are included in
the same class interval, it is called inclusive class interval.
For converting data from inclusive form to exclusive form, first of all we find the half of the difference of lower
limit of that class and upper limit of the preceding class. This value is then subtracted from lower limit of each
class and added to the upper limit of each class. In the above example, this can be easily understood as (10–9)/2
= 0.5. So, the class intervals are as – 0.5- 9.5, 9.5-19.5, … , 39.5-49.5. If all the observations of data are
positive then the lower limit of first class can be taken 0. Therefore, in this case the class intervals are as 0-9.5,
9.5-19.5, …, 39.5-49.5.
Relative Frequency Distribution
A relative frequency corresponding to a class is the ratio of the frequency of that class to the total frequency.
The corresponding frequency distribution is called relative frequency distribution. If we multiply each relative
frequency by 100, we get the percentage frequency corresponding to that class and the corresponding frequency
distribution is called “Percentage frequency distribution”.
Example 1: A frequency distribution of marks of 50 students in a subject is as given below:
Class (Marks): 0-10 10-20 20-30 30-40 40-50
Frequency: 6 10 14 18 2
Prepare relative and percentage frequency distributions.
Solution: The relative and percentage frequency distributions can be formed as given in the following table:
Class Frequency Relative frequency Percentage Frequency (f/N) 
(Marks) (f) (f/N) 100
0-10 6 6/50 = 0.12 0.12  100 = 12 %
10-20 10 10/50 = 0.20 0.20  100 = 20 %
20-30 14 14/50 = 0.28 0.28  100 = 28 %
30-40 18 18/50 = 0.36 0.36  100 = 36 %
40-50 2 2/50 = 0.04 0.04  100 = 4 %
Total  f  N  50 1.00 100

Cumulative Frequency Distribution

The cumulative frequency of a class is the total of all the frequencies up to and including that class. A
cumulative frequency distribution is a frequency distribution which shows the observations ‘less than’ or ‘more
than’ a specific value of the variable.
The number of observations less than the upper class limit of a given class is called the less than cumulative
frequency and the corresponding cumulative frequency distribution is called less than cumulative frequency
distribution. Similarly, the number of observations corresponding to the value of more than the lower class
limit of a given class is called more than cumulative frequency and the corresponding cumulative frequency
distribution is called ‘more than’ cumulative frequency distribution. Following is an example, wherein ‘less
than’ and ‘more than’ cumulative frequency distributions have been obtained.
Example 2: For the following frequency distribution of marks of 50 students in a subject, form both types of
cumulative frequency distributions.
Class (Marks) 0-10 10-20 20-30 30-40 40-50
No. of Students 7 11 15 12 5
Solution: Cumulative frequency distributions are formed as given in the following table:
Given Frequency Less Than Cumulative More Than Cumulative Frequency
Distribution Frequency Distribution Distribution
Classes No. of Marks No. of students Marks No of students
Students Less than More than
0-10 07 10 07 0 50
10-20 11 20 18 10 43
20-30 15 30 33 20 32
30-40 12 40 45 30 17
40-50 05 50 50 40 05
Total 50

Components of a Table
The various components of a table may vary case to case depending upon the given data. But a good table must
contain at least the following components:
1. Table Number
2. Table Heading
3. Caption
4. Stub
5. Body of Table
6. Head Note
7. Foot Note
Graphs of Frequency Distributions
The graphical presentation of frequency distributions is drawn for discrete as well as continuous frequency
Let us first consider the frequency distribution of a discrete variable.
To represent a discrete frequency distribution graphically, we take the value of the variable on the X-axis and
corresponding frequency on the Y axis. The different values of the variable are then located as points on the
horizontal axis. At each of these points, a perpendicular bar is drawn to present the corresponding frequency.
Such a diagram is called a ‘Frequency Bar Diagram’. For example, if we take the frequency distribution for the
number of peas per pod for 198 pods as given in Table 15.1:

No of peas per pod 1 2 3 4 5 6 7
Frequency (number of pods) 14 23 66 40 26 18 11

Graphs for continuous frequency distributions:

Important : Graph for the continuous frequency is drawn only for Exclusive classes. If the data for inclusive
class interval is given then first we change in the exclusive classes then construct the graph.
(i) Histogram (equal class interval)
A histogram is drawn by constructing adjacent rectangles over the class intervals such that the length of the
rectangles is proportional to the corresponding class-frequencies.
The class-boundaries are located on the X-axis (horizontal axis) and the corresponding frequencies on the Y-
axis (vertical axis). Then a rectangle over each class is constructed in such a way that the (area) height is
proportion to the frequency of that class.
Histogram (unequal class interval)
If the class-intervals are not of equal size, then we first calculate the width of each class and then find the height
of the class by the formula

Class frequency
  The least width of the classes 
Width of its Class-interval
Let us draw a histogram to the following frequency distribution given below in the table 15.2
Class Intervals 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

Frequency 2 3 13 18 9 7 6 2
Histogram for the above data is given below.

let us consider the frequency distribution for unequal class intervals as given as
Class 0-10 10-20 20-30 30-40 40-70 70-80 80-100
Frequency 20 32 8 2 60 35 10
As it is a case of unequal class intervals, so we have to adjust the frequencies of the classes 40-70 and 80-100
by the formula suggested in equation 15.1. These calculations are shown in table 15.4 given below:
Class Interval Frequency Width of Heights of the rectangles
(CI) (CI)
0-10 20 10 20
10-20 32 10 32
20-30 8 10 8
30-40 2 10 2
40-70 60 30 (60/30)  10 = 20
70-80 35 10 35
80-100 10 20 (10/20)  10 = 5

ii) Frequency Polygon

Another method of presenting a frequency distribution graphically other than histogram is to use a frequency
polygon. In order to draw the graph of a frequency polygon, first of all the mid values of all the class intervals
and the corresponding frequencies are plotted as points with the help of the rectangular co-ordinate axes.
Secondly, we join these plotted points by line segments. The graph thus obtained is known as frequency
polygon, but one important point to keep in mind is that whenever a frequency polygon is required we take two
imaginary class intervals each with frequency zero, one just before the first class interval and other just after the
last class interval. Addition of these two class intervals facilitate the existence of the property that
Area under the polygon = Area of the histogram
For example, if we take the frequency distribution as given in Table 15.2 then, we have to first plot the points
(5, 2), (15, 3), …, (75, 2) on graph paper along with the horizontal bars. Then we join the successive points
(including the mid points of two imaginary class intervals each with zero frequency) by line segments to get a
frequency polygon. The frequency polygon for frequency distribution given in Table 15.2 is shown in Fig. 15.4.

Note 3: In some cases first class interval does not start from zero. In such situations we mark a kink on the
horizontal axis, which will indicates the continuity of the scale starting from zero. Let us take an example of
this type.

Example 1: Draw a frequency polygon for the following frequency distribution:

Class 40- 50- 60- 70- 80- 90- 100- 110-

Interval 50 60 70 80 90 100 110 120

Frequency 4 10 11 13 18 14 11 5

(iii)Frequency Curve
In simple words frequency curve is a smooth curve obtained by joining the points (not necessary all points) of
the frequency polygon such that
(a) Like frequency curve it also starts from the base line (horizontal axis) and ends at the base line.
(b) Area under frequency curve remains approximately equal to the area under the frequency polygon.
In other words, let us try to explain the concept theoretically. Suppose we draw a sample of size n from a large
population. Frequency curve is the graph of a continuous variable. So theoretically continuity of the variable
implies that whatever small class interval we take there will be some observations in that class interval. That is,
in this case there will be large number of line segments and the frequency polygon tends to coincides with the
smooth curve passing through these points as sample size (n) increases. This smooth curve is known as
frequency curve.
In the following example we have drawn both frequency polygon and frequency curve to make the idea clear
for you.
Example 2: Draw frequency polygon and frequency curve for the following frequency distribution.

Class 10- 20- 30- 40-50 50- 60- 70- 80-
Intervals 20 30 40 60 70 80 90
frequency 2 5 8 15 18 10 3 1

Solution: Frequency polygon and frequency curve for the above data is given below in Fig. 15.6.

(iv) Cumulative Frequency Curves

For drawing less than cumulative frequency curve (or less than ogive), first of all the cumulative frequencies
are plotted against the values (upper limits of the class intervals) up to which they correspond and then we
simply join the points by line segments, curve thus obtained is known as less than ogive. Similarly, more than
frequency curve (more than ogive) can be obtained by plotting more than cumulative frequencies against lower
limits of the class intervals. As we have already mentioned within brackets that less than cumulative frequency
curve and more than cumulative frequency curve are also called less than ogive and more than ogive
In other words we may define less than ogive and more than ogive as follow:
Less Than Ogive: If we plot the points with the upper limits of the classes as abscissae and the cumulative
frequencies corresponding to the values less then the upper limits as ordinates and join the points so plotted by
line segments, the curve thus obtained is nothing but known as “less than cumulative frequency curve” or “less
than ogive”. It is a rising curve.
More Than Ogive: If we plot the points with the lower limits of the classes as abscissae and the cumulative
frequencies corresponding to the values more than the lower limits as ordinates and join the points so plotted by
line segments, the curve thus obtained is nothing but known as “more than cumulative frequency curve” or
“more than ogive”. It is a falling curve.
Let us draw both the ogives (‘less than’ and ‘more than’) for the following frequency distribution of the weekly
wages of number of workers given as.
Weekly 0-10 10-20 20-30 30-40 40-50
No. of 45 55 70 40 10

Before drawing the ogives, we make a cumulative frequency distribution as given in table 15.6
Weekly No. of Less than Cumulative More than Cumulative
wages workers frequency distribution frequency distribution
Wages Number of Wages Number of
Less than workers More than workers
0-10 45 10 45 0 220
10-20 55 20 100 10 175
20-30 70 30 170 20 120
30-40 40 40 210 30 50
40-50 10 50 220 40 10

Note 4: Median may also the obtained by drawing dotted vertical line through the point of inter section of both
the ogives, when drawn on a single figure.

Pie diagram/chart is used when the requirement of the situation is to know the relationship between whole of a
thing and its parts, i.e. pie chart provides us the information that how the entire thing is divided up into
different parts. For example, if the total monthly expenditure of a family is Rs 1000, out of which Rs 250 on
food, Rs 200 on education, Rs 100 on rent, Rs 150 on transport, and Rs 300 on miscellaneous items are spent.
Then this gives us the information that 25%, 20%, 10%, 15% and 30% of the total expenditure of the family are
spent on food, education, rent, transport and miscellaneous items respectively. Here we note that if money
spent on food (say) increased from 25% to 30% then percentages of other head(s) must shrink so that total
remains 100%. Similarly, if money spent on any one of the heads decreased then percentages of other head(s)
must spread so that total remains 100%. That is why pie chart gives relationship between whole and its parts.
Steps used for constructing a pie chart.
Step 1 Find the total of different parts.
Step 2 Find the sector angles (in degrees) of each part keeping in mind that total angle around the centre of a
circle is of 3600.
Step 3 Find the percentage of each part taking the total obtained in step 1 as 100 percent.
Step 4 Draw a circle and divide it into sectors, where each sector (or area of the sector) of the circle with
corresponding angles obtained in step 2 will represent the size of corresponding parts. Diagram thus
obtained is nothing but pie chart fitted to the given data.
Example 10: A company is started by the four persons A, B, C and D and they distribute the profit or loss
between them in proportion of 4 : 3 : 2 : 1 . In year 2010 company earned a profit of Rs 14400. Represent the
shares of their profits in a pie chart.
Solution: Given ratio is 4 : 3 : 2 : 1
 sum of ratios = 4 + 3 + 2 + 1 = 10
Partner Profits (in Rs) Sector Angles Percentages
s (in degree)
4 5760
14400   5760  360  144
A 10 14400 5760
100  40
4 14400
or  360  144
3 4320
14400   4320  360  108
B 10 14400 4320
100  30
3 14400
or  360  108
2 2880
14400   2880  360  72
C 10 14400 2880
100  20
2 14400
or  360  72
1 1440 1440
14400   1440  360  36  100  10
D 10 14400 14400
or  360  36
Total 14400 360 100


Profits (in Rs)

10 %
Partner A
20% 40 %
Partner B
Partner C
30% Partner D

(i) In drawing the components on the pie diagram it is advised to follow some logical arrangements, pattern
or sequence. For example, according to size, with largest on top and others in sequence running clock
(ii) Pie chart is used only when
(a) total of the parts make a meaningful whole. For example, total of the expenditures of a family on
different items make a meaningful whole, but if in a city there are 100 doctors, 40 engineers, 50
milkmen, 80 businessmen then total of these do not make a meaningful whole so pie chart should not
be used here.
(b) observations of the different parts are observed at the same time.
We have discussed the method of drawing pie diagram, in this section. Let us discuss some limitations of the
pie diagram.
E9) Represent the following data of utilization of 100 paise of income by
XYZ company in year 2009-10.
Item/Head Money spent (in paise)
Manufacturing Expenses 42
Salaries of employees 14
Selling and distribution Expenses 8
Interest Charges 6
Advertisement Expenses 15
Excise duty of sales 5
Taxation 10
E10) Draw a pie diagram to represent the expenditure of Rs 100 over different budget heads as given below of
a family
Item Expenditure (in Rs.)
Food 25
Clothing 15
Education 20
Transport 10
Outing 10
Miscellaneous 5
Saving 15

Measures of Central Tendency
The term average in Statistics refers to a one figure summary of a distribution. It gives a value around which
the distribution is concentrated. For this reason that average is also called the measure of central tendency. For
example, suppose Mr. X drives his car at an average speed of 60 km/hr. We get an idea that he drives fast (on
Indian roads of course!). To compare the performance of two classes, we can compare the average scores in the
same test given to these two classes. Thus, calculation of average condenses a distribution into a single value
that is supposed to represent the distribution. This helps both individual assessments of a distribution as well as
in comparison with another distribution.

The following are the various measures of central tendency:

Arithmetic Mean, Weighted Mean, Median, Mode, Geometric Mean, Harmonic Mean

Properties of a Good Average

The following are the properties of a good measure of average:
1. It should be simple to understand
2. It should be easy to calculate
3. It should be rigidly defined
4. It should be liable for algebraic manipulations
5. It should be least affected by sampling fluctuations
6. It should be based on all the observations
7. It should be possible to calculate even for open-end class intervals
8. It should not be affected by extremely small or extremely large observations
Arithmetic mean (also called mean) is defined as the sum of all observations divided by the number of
observations. Arithmetic mean may be calculated for the following two types of data:
1. For Ungrouped Data (raw data)
Mathematically, if x1, x2,…,xn are the n observations then their mean is

(x  x 2  x 3  . . .  x n )
i 1
X 1 
n n
2. For Discrete Data
If fi is the frequency of xi (i =1, 2,…,k) the formula for arithmetic mean would be

f1x1  f 2 x 2  . . .  f k x k  X  
fi xi
X i 1
f1  f 2  . . .  f k  k

i 1

3. For Continuous Data

If fi is the frequency of xi (i=1, 2,…, k) where xi is the mid value of the ith class interval, the formula for
arithmetic mean would be

 f1x1  f2 x 2  . . .  fk x k  f x i i  fx  fx
X ,X i 1
 
 f1  f2  . . .  fk  k
i 1
i f
where, N = f1  f 2  ...  f k
Problem 1: Calculate mean of the weights of five students
54, 56, 70, 45, 50 (in kg)
Solution: If we denote the weight of students by x then mean is obtained by
x 1  x 2  ...  x n
X 
54  56  70  45  50 275
Thus, X   55 
5 5
Thus, average weight of students is 55 kg.
Problem 3: Calculate arithmetic mean for the following data
x 20 30 40
f 5 6 4

f x i i
Solution: X  i 1
  29.3
30 6 180
i 1
40 4 160
k k

 f i = 15
i 1
 f i x i = 440
i 1

Problem 4: For the following data, calculate arithmetic mean

Class Interval 0-10 10-20 20-30 30-40 40-50
Frequency 3 5 7 9 4

Class Interval Mid Value x Frequency f fx

0-10 05 03 15
10-20 15 05 75
20-30 25 07 175
30-40 35 09 315
40-50 45 04 180
k k

 f i =N=28
i 1
f x
i 1
i i = 760

f xi 1
i i
Mean = = 760/28 = 27.143
Merits and Demerits of Arithmetic Mean
Merits of Arithmetic Mean
1. It utilizes all the observations;
2. It is rigidly defined;
3. It is easy to understand and compute; and
4. It can be used for further mathematical treatment.
5. It is least affected by sampling fluctuations
Demerits of Arithmetic Mean
1. It is badly affected by extremely small or extremely large values;
2. It cannot be calculated for open end class intervals; and
3. It is generally not preferred for highly skewed distributions.

Three algebraic properties of mean:

1. ∑(x – Mean) = 0 i.e. sum of deviations of observations from their mean is zero. Deviation is also called
dispersion that will be discussed in detail in Unit 2 of this block.
2. Sum of squares of deviations taken from mean is least in comparison to the same taken from any other

3. Arithmetic mean is affected by both the change of origin and scale.
If U  , Then X  A  h U where, A and h are constant,
Combined Mean
If the arithmetic means and the number of observations of two or more related groups are known, as can
calculate the combined mean of these groups. The combined mean formula for two related groups is as under:
N1 X1  N 2 X 2
X12 
N1  N 2

Here, X12  Combined mean of two groups.

N1  No. of observation in first group, N2  No. of observation in second group.

X1  mean of the first group, X 2  mean of the second group.

Similarly, the formula can also be extended for k-groups as
N1  X1  N 2 X 2  ...  N k X k
X12...k 
N1  N 2  ...  N k

Example: The mean marks of 60 students in section A is 40 and mean marks of 40 students in section B is 45.
Find the combined mean of the 100 students in both the sections.
Solution: Here, N1  60, N 2  40, X1  40 and X 2  45. Using formula, the combined mean of all the 100 students
will be
N1 X1  N 2 X 2 60  40  40  45 2400  1800
X12     42 marks.
N1  N 2 100 100

Example: The mean wage of 100 workers in a factory running two shifts of 60 and 40 workers is 38. The mean
wage of 60 workers in the morning shift is Rs. 40. Find the mean wage of 40 workers in the evening shift.
Solution: Here, N1  60, N 2  40, X1  40 and X12  38. We are required to find the value of X 2 . Using the formula
for combined mean, i.e.,
N1 X1  N 2 X 2
X12 
N1  N 2

(60  40)  40 X2
We have, 38 
Or 3800 – 2400 = 40 X 2 or X 2   35 Rs.

So the mean wage of 40 workers in the evening shift is Rs. 35.

Example: Find the combined mean from the following data:
Group: 1 2 3
Number: 200 250 300
Mean: 25 20 15
Solution: Here, we are given related to three groups which can be symbolically put as
N1  200, N 2  250, N3  300, X1  25, X 2  20, X3  15

For combined mean, we put these values in formula and get

N1  X1  N 2 X 2  ...  N k X k 200  25  250  20  300  15
X12...k  
N1  N 2  ...  N k 200  250  300

5000  5000  4500 14500
   19.33
750 750

Correcting the Arithmetic Mean

Remark: For correcting the incorrect value of mean, first we find the corrected  X or  fX (in case of discrete
or continuous series). For this subtract the wrong items from the incorrect  X or   fX  and add to it the
correct items. Finally, on dividing the corrected  X by the number of observations, we get the corrected mean.
Example: The average marks of 80 students were found to be 40. Later, it was discovered that a score of 54
was misread as 84. Find the corrected mean of the 80 students.
Solution: We are given N = 80, X  40.
Since X   X N  X  NX  80  40  3200

But due to the error discovered,  X  3200 in not correct.

The correct  X  incorrect  X  misread observation+ correct observation.

= 3200 – 84 + 54 = 3170.

 The corrected average X 

Corrected  X  3170  39.625.
N 80
Example: Mean of 100 items is found to be 30. If at the time of calculation, two items are wrongly taken as 32
and 12 instead of 23 and 11, find the correct mean.
Solution: Given that N = 100, X  30.
 X  NX  100  30  3000. (Incorrect total of 100 items)
 Corrected  X = incorrect  X – wrong observations + correct observations
= 3000 – (32 + 12) + (23 +11) = 2990

 Corrected mean 
Corrected  X  2990  29.90
N 100

Median is that value of the variable which divides the whole distribution into two equal parts. Here it may be
noted that the data should be arranged in ascending or descending order of magnitude.
Median for Ungrouped Data
Mathematically, if x1, x2,…,xn are the n observations then for obtaining the median first of all we have to
arrange these n values either in ascending order or in descending order.

 n 1

Md    observatio n ; ( when n is odd )

 2 
th th
n n 
  observatio n    1 observatio n
Md    2 
; ( when n is even )
Problem 5: Find median of following observations
6, 4, 3, 7, 8
Solution: First we arrange the given data in ascending order as
3, 4, 6, 7, 8
Since number of observations, i.e. 5, is odd, so median is middle value that is 6.
Problem 6: Calculate median for the following data:
Solution: First we arrange given data in ascending order as 3,4,7,8,9,10
Here, Number of observations (n) = 6 (even). So we get the median by
th th
n n 
  observatio n    1 observatio n
Md    2 
th th
6 6 
  observatio n    1 observatio n
3rd observatio n  4 th observatio n 7  8 15
  2 
Md     7.5
2 2 2 2
For Ungrouped Data (when frequencies are given)
If x i are the different value of variable with frequencies f i then we calculate cumulative frequencies from f i then
median is defined by

M d = Value of variable corresponding to  

 f   N  th
 =   cumulative frequency.
 2  2
 
Note: If N/2 is not the exact cumulative frequency then value of the variable corresponding to next cumulative
frequencies is the median.

Problem 7: Find Median from the given frequency distribution

x 20 40 60 80
f 7 5 4 3
Solution: first we find cumulative frequency
x f c.f.
20 7 7
40 5 12
60 4 16
80 3 19


i 1
i  19
 19 
Md = Value of the variable corresponding to the   cumulative frequency
= Value of the variable corresponding to 9.5 since 9.5 is not among c.f.
So the next cumulative frequency is 12 and the value of variable against 12cumulative frequencyis 40. So
median is 40.
2. Median for Grouped Data
For class interval, first we find cumulative frequencies from the given frequencies and use the formula for
calculating the median following
N 
  C
Median  L   h
where, L = lower limit of the median class,
N = total frequency,
C = cumulative frequency of the pre-median class,
f = frequency of the median class, and
h = width of the median class.
Median class is the class in which the (N/2)th observation falls. If N/2 is not among any cumulative frequency
then next class to the N/2 will be considered as median class.
E6) Find Median and Mode for the following frequency distribution
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No.of students 5 10 15 20 12 10 8

Solution: First we shall calculate the cumulative frequency distribution

Marks f Cumulative Frequency

0-10 5 5
10-20 10 15
20-30 15 30=C
30-40 20=f 50
N 80
40-50 12 62

Here 50-60  40, 10 72
2 2
60-70 8 80
N= 80

Since, 40 is not in the cumulative frequency so, the class corresponding to the next cumulative frequency 50 is
median class. Thus 30-40 is median class.
N 
  C
Median  L  
2   h = 30  40  30  10 = 36.66
f 15
Merits of Median
1. It is rigidly defined;
2. It is easy to understand and compute;
3. It is not affected by extremely small or extremely large values; and

4. It can be calculated even for open end classes (like “less than 10” or “50 and above”).
Demerits of Median
1. In case of even number of observations we get only an estimate of the median by taking the mean of
the two middle values. We don’t get its exact value;
2. It does not utilize all the observations. The median of 1, 2, 3 is 2. If the observation 3 is replaced by
any number higher than or equal to 2 and if the number 1 is replaced by any number lower than or
equal to 2, the median value will be unaffected. This means 1 and 3 are not being utilized;
3. It is not amenable to algebraic treatment; and
4. It is affected by sampling fluctuations.

Mode is that observation in a distribution which has the maximum frequency. For example, when we say that
the average size of shoes sold in a shop is 7 it is the modal size which is sold most frequently.
For Ungrouped Data
Mathematically, if x1, x2,…, xn are the n observations and if some of the observation are repeated in the data,
say x i is repeated highest times then we can say the x i would be the mode value.
Problem 9: Find mode value for the given data
2, 2, 3, 4, 7, 7, 7, 7, 9, 10, 12, 12
Solution: Since 7 have the maximum frequency. Thus, mode is 7.
For Grouped Data:
Data where several classes are given, following formula of the mode is used
f1  f0
M0  L  h
2f1  f 0  f 2

where L = lower limit of the modal class,

f1 = frequency of the modal class,
f 0 = frequency of the pre-modal class,
f 2 = frequency of the post-modal class, and
h = width of the modal class.
Modal class is that class which has the maximum frequency.
Relationship between Mean, Median and Mode
Mode = 3 Median – 2 Mean
Note: Using this formula, we can calculate mean/median/mode if other two of them are known.

Merits of Mode
1. Mode is the easiest average to understand and also easy to calculate;
2. It is not affected by extreme values;
3. It can be calculated for open end classes;
4. As far as the modal class is confirmed the pre-modal class and the post modal class are of equal
width; and
5. Mode can be calculated even if the other classes are of unequal width.
Demerits of Mode
1. It is not rigidly defined. A distribution can have more than one mode;
2. It is not utilizing all the observations;
3. It is not amenable to algebraic treatment; and
4. It is greatly affected by sampling fluctuations.
Example: An incomplete distribution is given below:
Variable 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 Total
Frequency 12 30 ? 65 ? 25 18 229
You are told that the median value is 46. Find the missing frequencies.
Example : The median and mode of the following wage distribution are Rs. 33.5 and Rs. 34 respectively.
However, three frequencies are missing. Determine their values.
Wages (in 0– 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 Total
hundred Rs.)
Frequencies 4 16 ? ? ? 6 4 230
Harmonic Mean

Harmonic mean (H.M.) of a set of observations is the reciprocal of the arithmetic average of the reciprocals of
the observations. Thus,

H.M. 
1 1 1
  ... 
x1 x 2 xn

Weighted Harmonic Mean

In some cases it becomes necessary to calculate weighted mean. Suppose an automobile covers different
distances with different speeds, the average speed can be obtained by using weighted harmonic mean. The
formula for calculating weighted harmonic mean is

Weighted H.M. 
 (W / X)
Here, W1, W2,... WN stand for the respective weights of X1 , X2, ....XN .

Example: A cyclist cover his first five km at an average speed of 10 k.m. p.h., another three km at 8 km. p.h.
and the last two at 5 km p.h. Find the average speed of the entire journey and verify your answer.

Solution: Since the cyclist covers different distance with different speeds, the weighted harmonic mean will be
appropriate for computing average speed. Using formula (26)

Weighted H.M. 
 W  10  7.85 Km.p.h.
 (W / X) 1.275
(Speed) (Distance) W/X
10 5 0.500
8 3 0.375
5 2 0.400
 W  10  (W / X)  1.275

Thus, the average speed for the entire journey is 7.84 km p.h.

Example: A train goes at a speed of 20 miles per hour for the first 16 miles, at a speed of 40 m.p.h. for 20
miles. It covers the last 10 miles at a speed of 15 m.p. Find out its average speed.
Example: In the given frequency distribution two frequencies are missing and its mean is found to be 1.46.
Number of Accidents (X): 0 1 2 3 4 5 Total
Frequency (f) 46 ? ? 25 10 5 200
Find the missing frequencies.

Solution: Let the missing frequencies be f1 and f2
Then  f  N  200  86  f1  f2 X f fX
Or f1  f2  200  86  114 … (i) 0 46 0
1 f1 f1
Also, since X   f X  140  f1  2f2
= 1046 (Given) 2 f2 2f2
N 200 3 25 75
Or f1  2f2  292  140  150 … (ii) 4 10 40
Solving (i) and (ii), we get f1  76 and f2  38. 5 5 25
200  fX  140  f 1  2f 2

Example: The following data relate to the marks of 70 students in statistics. Find the mean.
Marks (More than): 20 30 40 50 60 70
No. of students : 70 63 40 30 18 7
Solution: In this example, a ‘more than’ cumulative frequency distribution is given. For computing mean, the
given distribution is converted into a simple frequency distribution as shown in the table:
Computing arithmetic Mean
Cumulative Classes f X X  55 fd’
d' 
classes 10
More than 20 20-30 70 – 63 = 7 25 –3 –21
More than 30 30-40 63 – 40 = 23 35 –2 –46
More than 40 40-50 40 – 30 = 10 45 –1 –10
More than 50 50-60 30 – 18 = 12 55 0 0
More than 60 60-70 18 – 7 = 11 65 1 11
More than 70 70-80 7–0=7 75 2 14
N = 70

 X  A h
 fd'  55  10  52   55  7.43  47.57 Marks.
N  70 
Example: Determine the modal value in the following series.
Value: 10 12 14 16 18 20 22 24 26 28 30 32
Frequency: 7 15 21 38 34 34 11 19 10 38 5 2
(ANS= 18)

Measures of Dispersion
Different measures of central tendency give a value around which the data is concentrated. But it gives no idea
about the nature of scatter or spread. For example, the observations 10, 30 and 50 have mean 30 while the
observations 28, 30, 32 also have mean 30. Both the distributions are spread around 30. But it is observed that
the variability among units is more in the first than in the second. In other words, there is greater variability or
dispersion in the first set of observations. Measure of dispersion is calculated to get an idea about the variability
in the data. Following are the different measures of dispersion:
1. Range
2. Standard Deviation
Range is defined as the difference between the maximum value of the variable and the minimum value of the
variable in the distribution. R  X Max  X Min
Variance (σ2) And Standard Deviation (σ )
Variance is the average of the square of deviations of the values taken from the mean. Standard deviation(SD)
is defined as the positive square root of variance. The formula is

1 n x i  x

Var (x) = σ2 =  x i  x  , SD = σ = Var(x) 

2 i 1
n i1 n

and for a frequency distribution, the formula is


1 k i i f x  x

σ2 =  f i x i  x  SD = σ = i 1

N i1 N
E8) Calculate standard deviation for the following data:
Class 0-10 10-20 20-30 30-40 40-50
Frequency 5 8 15 16 6

Coefficient of Variation (CV)

It is a relative measure of variability. If we are comparing the two data series, the data series having smaller CV
will be more consistent(reliable). It is defined as

CV = 100
Problem 10: Suppose batsman A has mean 50 with SD 10. Batsman B has mean 30 with SD 3. What do you
infer about their performance?
Solution: A has higher mean than B. This means A is a better run maker.
However, B has lower CV (3/30 = 0.1) than A (10/50 = 0.2) and is consequently more consistent.
Example: The sum of squares of 100 observations was calculated as 7961. Later, it was found that two values,
53 and 42 were wrongly read as 35 and 24 at the time of calculation. Find the corrected sum of squares.
 X  7961.
Solution: Given the incorrect 2

 Corrected  X = incorrect  X – (Squares of wrong observations) + (Squares of correct observations)

2 2

 Corrected X 2
 7931  (352  242 )  (532  422 )
= 7961 – (1225 + 576) + (2809 + 1764) = 10733.
Question1: The sum of square of 20 observations was worked out as 5100. But while calculating it, an
observation 31 was wrongly considered as 13. Find the corrected sum of squares.
Question2:The sum of squares of 50 observations is 4122. An observation 39 was wrongly includes in the
series. Find the sum of squares of the remaining 49 observations.
Question3:The arithmetic mean and the S.D. of a series of 20 items were calculated as 20 cm and 5 cm
respectively. But while calculating them, an item 13 was measured as 30. Find the correct arithmetic, mean and
standard deviation.
Question4:The mean and S.D. of 20 items are found to be 10 and 2 respectively. At the time of checking, it
was observed that one item 8 was incorrect. Find the mean and the S.D. if (i) the wrong item is omitted (ii) it is
replaced by 12.
Properties of Standard Deviation
1. The value of S.D. of a series remains unchanged if each variate value is increased or decreased by the same
constant value. In other words, we can say that the S.D is independent of change in origin.
Let Y  X  b. where b is a constant.
Then  Y   X , i.e., the S.D.’s of the variables X and Y will be equal.
Example: Suppose 5, 8, 17, 12 and 7 are five observations on a variable X. A new variables Y is obtained by
adding 2 (a constant) to each observation on X. Further, let Z be another variable defined by subtracting 3 from
each value on X. Find the standard deviations of the variable X, Y and Z, say  x ,  y and  Z respectively.
(Ans4.26, 4.26, 4.26)
2. If the value of variable X are multiplied (or divided) by a constant, the S.D. of the new observations can be
obtained multiplying (or dividing) the initial S.D. by the same constant. Symbolically,
If Y = kX, where k is a constant
Then  Y  k X
In other words, we can say that S.D. is affected by change in scale.
Example: Suppose 2, 6, 9, 5, 4 are five observations on a variable X. A new variable Y is obtained by
multiplying each observation on X by 3 (a constant). Further, another variable Z is obtained by dividing each
observation on X by 2. Then we find the S.D.’s of the variables X, Y and Z, say  X ,  Y ,  Z respectively.
(Ans: 2.32, 6.96, 1.16)

3. Combined standard deviation can be calculated if the standard deviations, means and number of items in
different groups are given. The formula used for computing combined standard deviation is as under:
Combined S.D. of two related groups.
 N ( 2  D12 )  N 2 ( 22  D 22 
12   1 1 
 N1  N 2 
12  Combined S.D. of two groups.
1  Standard deviation of the first group.
 2  Standard deviation of the second group.
N1  No. of observations in the first group.
N2  No. of observations in the second group.
D1  X1  X12 , X12  = combined mean of the two groups.
D 2  X 2  X12  .
X1  mean of the first group
X 2  mean of the second group.

Question5: The standard deviation of 5 items is found to be (50) . What will be the standard deviation if
the values of all the items are increased by 5? (Ans (50) . )
Question6: A sample of 35 values has mean 80 and S.D. 4. A second sample of 65 values from the same
population has mean 70 and S.D.3. Find the mean and standard deviation of the combined sample of 100
values. (Ans: 5.85)

Question7:Find the mean and the standard deviation of the two groups taken together:
Group Number Mean S.D.
A 113 160 22
B 120 150 20

(Ans: 154.85,21.58)
Question: Below are given the number of runs scored by two batsmen in eight matches:
Batsman A 27 16 39 45 101 80 40 52
Batsman B 0 100 80 5 60 40 10 121
Who is better run scorer? Also find which of the two batsmen is more consistent in scoring.
(Ans- Mean =50, 52 so batsman b is better run scorer, CV= 53.14%, 82.54% so Batsman A is more

When two variables are related in such a way that change in the value of one variable affects the value of
another variable, then variables are said to be correlated or there is correlation between these two variables.
In many practical applications, we might come across the situation where observations are available on two or
more variables. The following examples will illustrate the situations clearly
Heights and weights of persons of a certain group; Sales revenue and advertising expenditure in business; and
Time spent on study and marks obtained by students in exam.
If data are available for two variables, say x and y, it is called bivariate distribution.
Let us consider the example of sales revenue and expenditure on advertising in business. A natural question
arises in mind that is there any connection between sales revenue and expenditure on advertising? Does sales
revenue increase or decrease as expenditure on advertising increases or decreases?
If we see the example of time spent on study and marks obtained by students, a natural question appears
whether marks increase or decrease as time spent on study increase or decrease.
Definition: When two variables are related in such a way that change in the value of one variable affects the
value of another variable, then variables are said to be correlated or there is correlation between these two
Types of Correlation
(a) Positive Correlation
Correlation between two variables is said to be positive if the values of the variables deviate in the same
direction i.e. if the values of one variable increase (or decrease) then the values of other variable also increase
(or decrease). Some examples of positive correlation are correlation between
Heights and weights of group of persons; House hold income and expenditure; Amount of rainfall and yield of
crops; and Expenditure on advertising and sales revenue.
In the last example, it is observed that as the expenditure on advertising increases, sales revenue also increases.
Thus the change is in the same direction. Hence the correlation is positive.
In remaining three examples, usually value of the second variable increases (or decreases) as the value of the
first variable increases (or decreases).
(b) Negative Correlation
Correlation between two variables is said to be negative if the values of variables deviate in opposite direction
i.e. if the values of one variable increase (or decrease) then the values of other variable decrease (or increase).
Some examples of negative correlations are correlation between
Volume and pressure of perfect gas; Price and demand of goods; Literacy and poverty in a country; and Time
spent on watching TV and marks obtained by students in examination.
In the first example pressure decreases as the volume increases or pressure increases as the volume decreases.
Thus the change is in opposite direction.
Therefore, the correlation between volume and pressure is negative.
In remaining three examples also, values of the second variable change in the opposite direction of the change
in the values of first variable.
Scatter Diagram
Scatter diagram is a statistical tool for determining the potentiality of correlation between dependent variable
and independent variable. Scatter diagram does not tell about exact relationship between two variables but it
indicates whether they are correlated or not.
Let (xi , yi ) ; (i  1,2,..., n) be the bivariate distribution. If the values of the dependent variable y are plotted
against corresponding values of the independent variable x in the x y plane, such diagram of dots is called
scatter diagram or dot–diagram. It is to be noted that scatter diagram is not suitable for large number of
Coefficient of Correlation
Coefficient of correlation measures the intensity or degree of linear relationship between two variables.
If X and Y are two random variables then correlation coefficient between X and Y is denoted by r and defined
n n

  (x
(x i  x)(yi  y) i  x)(yi  y)
Cov(x, y) n i 1 i 1 SXY
r  Corr(x, y)     .
V(x) V(y) 1 n  1 n   n  n  SXXSYY


i 1
(x i  x) 
 n

i 1
(yi  y) 


i 1
(x i  x) 

i 1
(yi  y) 


x y i i  nxy
r i 1
 where SXY   x i y i  nxy
 n 2 2 
 SXX SYY i 1
 i x  nx   y i2  ny 2 
 i1  i1 
n n
SXX   x i2  nx 2 and SYY   y i2  ny 2
i 1 i 1

Properties of Correlation Coefficient

1. Correlation coefficient lies between -1 and +1.
2. Correlation coefficient is independent of change of origin and scale.
Description: Correlation coefficient is independent of change of origin and scale, which means that if a
quantity is subtracted and divided by another quantity (greater than zero) from original variables , i.e.
Xa Yb
U and V  then correlation coefficient between new variables U and V is same as correlation
h k
coefficient between X and Y, i.e. Corr ( x, y)  Corr (u, v) .
Property 3: If X and Y are two independent variables then correlation coefficient between X and Y is zero,
i.e. Corr ( x, y)  0 .
Problem 1: Find the correlation coefficient between advertisement expenditure and profit from the following
data: (Ans- 0.27)
Advertisement expenditure 30 44 45 43 34 44
Profit 56 55 60 64 62 63

Question: The coefficient of correlation between two variates X and Y is 0.8 and their covariance is 20. If
variance is 16, find the SD of Y series (Ans-6.25)
Question: From the data given below find the number of items
r  0.5,  XY  120,  X 2  90 and  y  8
Where X and Y are deviations from the arithmetic mean. (ANS: 10)

Example: A computer while calculating the correlation coefficient between two variables X and Y from 25
pairs of observations obtained the following results:
N  25,  X  125,  Y 100,  X 2  650,  Y 2  460 and XY  508.
It was, however, discovered at the time of checking that two pairs of observations were not correctly copied.
They were taken as (6, 14) and (8, 6) while the correct values were (8, 12) and (6, 8). Find the corrected value
of r.
Solution: Incorrect values Correct values
X Y X2 Y 2 XY X Y X2 Y 2 XY
6 14 36 196 84 8 12 64 144 96
8 6 64 36 48 6 8 36 64 48
Total 14 20 100 232 132 Total 14 20 100 208 144

Thus, Corrected  X = 125 – 14 + 14 = 125
Corrected  Y = 100 – 20 + 20 = 100
Corrected  X = 650 – 100 + 100 = 650

Corrected  Y = 460 – 232 + 208 = 436


Corrected  XY = 508 – 132 + 144 = 520

1 1 1 1
Corrected X   X  125  5 , Corrected Y   Y  100  4
n 25 n 25

x y i i  nxy
520  25  5  4
r i 1
  0.667
 n
 n
 650  25  5  5436  25  4  4
  x i2  nx 2   y i2  ny 2 
 i1  i1 

Question: With 10 observations each on two variables X and Y, the following data were observed:
X  12, X  3, Y  15,  y  4 and r  0.5. However, on subsequent verification it was found that one value of
X( = 15) and one value of Y(= 13) were wrongly taken as 16 and 18. Find the correct value of correlation
Cov (X, Y)  X  X Y  Y 
Since, r  
x , y N x y
  ( X  X) ( Y  Y)  N  r     x y

or XY  XY  YX  X Y  10  0.5 3 4
or XY  YX  XY   X Y  60
or XY  N Y X  N Y X  N X Y  60 [as  X  N X and  Y  N Y ]
or XY  60  N X Y  60 10 12 15
= 1860
Corrected  XY  1860  (16 18)  (15 13)  1767
Corrected X  12 10 16 15  119
Y  1510 18 13  145
 X    X  X2  X2
2 2

Now considering   2
x N  
N   N
Or 
X2  N 2X  X 2  10(9 144)  1530 
Similarly, Y 2
 
 N 2Y  Y 2  10(16  225)  2410

 Corrected X 2
 1530 162  152  1499
Corrected  Y 2410 182  132  2255

Therefore, the corrected value of r is (ANS = 0.369)

Concept of Rank Correlation
When the characters are not measurable then we use rank correlation. This type of situation occurs when we
deal with the qualitative study such as honesty, beauty, voice, etc. We denote rank correlation coefficient by rs ,
6 d i2
rs  1  i 1
where d i  R x  R y
n (n  1)2

This formula was given by Spearman and hence it is known as Spearman’s rank correlation coefficient formula.
Problem 1: Suppose we have ranks of 8 students of B.Sc. in Statistics and Mathematics. On the basis of rank
we would like to know that to what extent the knowledge of the student in statistics and mathematics is related.
Rank in Statistics 1 2 3 4 5 6 7 8
Rank in Mathematics 2 4 1 5 3 8 7 6
Rank in Rank in Difference of ranks
Statistics R x Mathematics R y di  R x  R y d i2
1 2 −1 1
2 4 −2 4
3 1 2 4
4 5 −1 1
5 3 2 4
6 8 −2 4
7 7 0 0
8 6 2 4
d 2
i  22
Here, n = number of paired observations =8

rs  1 
6 d 2
 1
6 X 22
 1
132 372
  0.74
n(n 2  1) 8 X 63 504 504

Thus there is a positive association between ranks of Statistics and Mathematics.

Question: Calculate rank correlation coefficient from the following marks given out of 200 by two jugs X and
Y in a music competition to 8 participants:

Participant No. 1 2 3 4 5 6 7 8
Marks awarded by X 74 98 110 70 65 85 88 59
Marks awarded by Y 121 133 170 102 90 152 160 85
(ANS: 0.9286)

Correlation coefficient measures the strength of linear relationship and direction of the correlation whether it is
positive or negative. When one variable is considered as an independent variable and another as dependent
variable, and if we are interested in estimating the value of dependent variable for a particular value of
independent variable, we study regression analysis. For example we might be interested in estimation of
production of a crop for particular amount of rainfall or in prediction of demand on the price or prediction of
marks on the basis of study hours of students.
Regression analysis is the process of constructing a mathematical model or function that can be used to predict
or determine one variable by another variable. In Regression analysis one variable is predicted by another
variable. The variable to be predicted is called the dependent variable and it is denoted by Y. the predictor is
called the independent variable, or explanatory variable, and is denoted as x. In simple regression analysis, only
a straight-line relationship between two variables is examined. Nonlinear relationships and regression model
with more than one independent variable can be explored by using multiple regression models, Regression
analysis is a statistical technique which is used to investigate the relationship between variables. Examples, the
effect of price increase on demand, the effect of change in the money supply on the increase rate, effect of
change in expenditure on advertisement on sales and profit in business

Definition: Regression analysis is a mathematical measure of the average relationship between two or more
There are two types of variables in regression analysis:
(a) Independent variable
(b) Dependent variable
The variable which is used for prediction is called independent variable. It is also known as regressor or
predictor or explanatory variable.
The variable whose value is predicted by the independent variable is called dependent variable. It is also
known as regressed or explained variable.

If scatter diagram shows some relationship between independent variable x and dependent variable y,then the
scatter diagram will be more or less concentrated round a curve, which may be called the curve of regression.
Lines of Regression
This regression line when x be the independent variable and y be the dependent variable.
Let the equation of line of regression of y on x be
Y  b 0  b1x

where, b 0  is know as the intercept

b1  is know as the slop of the regression line
Sxy  xy  nx y
b1   and b 0  y  b1x
Sxx  x 2  nx 2

Problem 1: Height of fathers and sons in inches are given below:

Height of father 65 66 67 67 68
Height of son 66 68 65 69 74
Find line of regression and calculate the estimated average height of son when the height of father is 68.5
Solution Let the height of son (x) is independent variable and height of father (y) is dependent variable. So let
the regression line
y  a  bx
Problem 2: Regression line of y on x and x on y respectively are
2 x  3y  8
5x  y  6
Then, find
(i) the mean values of x and y,
(ii) coefficient of correlation between x and y, and
(iii) the standard deviation of y for given variance of x = 5.
(i) Since regressions line of y on x and x on y passes through x and y so x and y are the intersection
points. Thus to get the mean values of variable x and y, we solve given simultaneous equations
2 x  3y  8
5x  y  6
By solving these equations as simultaneous equations we get x =2 and y = 4 which are means of x and y

(ii) To find the correlation coefficient, given equations are expressed as

y  2.67  0.67 x and … (1)
x  1.20  0.20 y … (2)

Note: To find the regression coefficient of y on x, regression line of y on x is

expressed in the form of y  a  bx , where b is the regression coefficient of y on
x. Similarly to find the regression coefficient of x on y, regression line of x on y is
expressed in the form of x  c  dy , where d is the regression coefficient of x on
y. In our problem, by dividing the first line by 3, i.e. by the coefficient of y gives
equation in the form y  a  bx which is y = 2.67 +0.67 x. Similarly, dividing the
second regression equation by 5 gives equation (2).

From equations (14) and (15) we find the regression coefficient of y on x and x on y respectively as
b yx = 0.67 and
b xy = 0.20
By the property of regression coefficients
 b yx  b xy  r  r  0.67  0.2 0 = 0.37
Thus, correlation coefficient r = 0.37

(iii) By the definition of regression coefficient of y on x i.e.

r y
b yx = =0.67
Variance of x i.e.,  2x  5   x  2.24
r y 0.37 y
b yx =  0.67    y  4.05 ,
x 2.24
Thus, the variance of y is 16.45.

E1) Marks of 6 students of a class in paper I and paper II of statistics are given below:

Paper I 45 55 66 75 85 100
Paper II 56 55 45 65 62 71
(i) both regression coefficients,
(ii) both regression lines, and
(iii) correlation coefficient.

E2) We have data on variables x and y as

x 5 4 3 2 1

y 9 8 10 11 12
(i) both regression coefficients,
(ii) correlation coefficient,
(iii) regression lines of y on x and x on y, and
(iv) estimate y for x =4.5.

E3) If two regression lines are

6x  15y  27
6 x  3y  15 ,
Then, calculate
(i) correlation coefficient, and
(ii) mean values of x and y.

E) Using the regression equation y = 90 + 50x, fill up the values in the table below.
Sample No (i) 12 21 15 1 24
xi 0.96 1.28 1.65 1.84 2.35

yi 1.38 160 178 190 210

ŷi 138

ê i 0

Note: yˆ i  90  50x and eˆ i  yi  yˆ i

E) A hosiery mill wants to estimate how its monthly costs are related to its monthly output rate. For that the
firm collects a data regarding its costs and output for a sample of nine months as given in Table below:

Output (tons)Production cost

(thousands of dollars)
1 2
2 3
4 4
8 7
6 6
5 5
8 8
9 8
7 6
1) Construct a scatter diagram for the data given above.
2) Calculate the best linear regression line, where the monthly output is the dependent variable and the
monthly cost is the independent variable.
3) Use this regression line to predict the firm’s monthly costs if they decide to produce 4 tons per month.
E) You are given the following information about advertising expenditure and sales:
Adv. Exp. (X) (Rs. Lakhs) Sales (Y) (Rs. Lakhs)
X 10 90
 3 12
Correlation coefficient = 0.8
(i) Obtain the two regression equations.
(ii) Find the likely sales when advertisement budget6 is Rs. 15 lakhs.
(iii) What should be the advertisement budget if the company wants to attain sales target of Rs. 120 lakhs?

E) In a partially destroyed laboratory record of an analysis of correlation data, the following result only the
Variance of X = 9
Regression equation 8X – 10Y + 66 = 0
40X – 18Y = 214
Find on the basis of the above information:
(i) The mean values of X and Y,
(ii) Coefficient of correlation between X and Y, and
(iii) Standard deviation of Y.

Distinction Between Correlation And Regression
Both correlation and regression have important role in relationship study but there are some distinctions
between them which can be described as follow:
(i) Correlation studies the linear relationship between two variables while regression analysis is a
mathematical measure of the average relationship between two or more variables.
(ii) Correlation has limited application because it gives the strength of linear relationship while the purpose of
regression is to "predict" the value of the dependent variable for the given values of one or more
independent variables.
(iii) Correlation makes no distinction between independent and dependent variables while linear regression
does it, i.e. correlation does not consider the concept of dependent and independent variables while in
regression analysis one variable is considered as dependent variable and other(s) is/are as independent


m1  m 2
tan  
1  m1m 2
 2  
 (1  r )   x  y
1 
  tan   2 
 r   x   y 
 

If r  0 i.e. variables are uncorrelated then tan       i.e. In this case lines of regression are
perpendicular to each other.

If r  1 i.e. variables are perfect positive or negative correlated then tan   0    0 or  . In this case,
regression lines either coincide or parallel to each other.

Forecasting and Time Series
The forecast is a prediction of future conditions based on an analysis of data received over a period.
Time Series
A time series is a set of observations taken at specified times, usually at equal intervals. In other words, a
series of observations recorded over time is known as a time series. In other wards, the data on any
characteristic collected with respect to time is called time series. Example of time series are the data
regarding population of a country recorded at the ten-yearly censuses, annual production of a crop, say, wheat
over a number of years, the wholesale price index over a number of months, and so on. In fact, data related
with business and economic activities, in general, recorded over time, give rise to a time series.
Components of Time Series
These characteristic movements of a time series may be classified in four different categories called
components of time series. In a long time series, generally, we have the following four components:
1. Trend or Long term movements.
2. Seasonal Component
3. Cyclic Component
4. Irregular or Random Component
Time Series

.Trend or Seasonal Cyclic Irregular or

Long term Component Component Random
movements Component

1. Trend
The general tendency of values of the data to increase or decrease during a long period of time is called
“trend”. Some time series show an upword trend while sometime show a downward trend. For example
upward trends are seen in data of population no of passengers in Metro etc. while data of births and deaths
show downward trend.
Time series may be show a linear or non linear trend. If a time series data are plotted on a graph paper and the
points on the graph paper more or less around a straight line then the tendency shown by data is called linear
trend. But if points do not less or more around a straight line the tendency shown by data is called non linear.

2. Seasonal component (variation)
The variations in the value of the data occur (operate) at a regular and periodic manner with in one year are
called seasonal variations seasonal variations may be quarterly monthly, weekly, daily. etc depending on the
type of data available. For example- sale of ice cream increase in summer season, sale of raincoat increase in
rainy season. The amplitudes of seasonal variation are different for different periods. There are two types of
seasonal variations due to Natural Forces.
Seasonal variations due to Natural forces:
Variations in time series that arise due to changes in seasons or weather conditions and climatic changes are
known as seasonal variations due to natural forces. For example, sales of umbrellas, rain coat increase very fast
in rainy season, the demand for AC goes up in summers.
Seasonal variation due to Man made conventions
variations in time series that arise due to change in fashions, habit, taste, customs of people in any society are
called seasonal variation due to manmade conventions. For example, in our country sale of gold, clothes goes
up in marriage season and festivals.

4. Cyclic Component (variations)

The oscillatory variations in the values of time series data with a period of oscillation of more than one year
are called cyclic variation or the cyclic component in a time series. One oscillation period is called one
cycle. Unlike the seasonal variation, the length (or duration) of a cycle in a cyclic variation is not same.
Cyclic variations are generally occured in commercial and economic time series in which length of a cycle
could vary from 2 to 10 years. So the cyclic variations are also called “business cycle”

4. Random or Irregular Movements (Variations)
The variations in a time series which do not repeat in a definite pattern are called irregular variation or irregular
component of a time series. We cannot think their time of occurrence, direction and magnitude. These
variations usually occur due to earthquakes, floods wars, accidents. For example- the amount of many to be
deposited in Indian’s bank raise high in the month of November 2016.
E) The following are annual profits (in thousands of rupees) in a business from:
Year 1993 1994 1995 1996 1997 1998
Profits (in ‘000 Rs.) 60 72 75 65 80 85
a) Use the method of least squares to fit a straight line to the above data.
b) Plot the above figures and draw the line.
c) Also make an estimate of the profits for the year 2000.
E) Using three and four years moving average to determine the trend and short-term fluctuations.
Year : 1991 1992 1993 1994 19995 1996 1997 1998 1999 2000
Production (‘000 tons) : 21 22 23 25 24 22 25 26 27 26
Also plot the data and moving average trend.
Forecasting Models
The Additive Model
One of the most widely used models is the additive forecasting model. In this model it is assumed that at any
time t, the time series is the sum of all the components. Symbolically, the model is
yt  Tt  Ct  St  It
where y t  the value of the time series at the time t
Tt  the long-term trend at time t
C t  the cyclic variation at time t
St  the seasonal variation at time t
I t  the irregular or random variation at time t
In additive model, it is assumed that the effect of the cyclic component (C t ) remains the same for all cycles and
that the effect of any seasonal variation (St ) remains the same during any year (or corresponding period).
Similarly, it is assumed that the irregular component (I t ) has the same effect throughout.
The Multiplicative Model
In the additive model, we have assumed that the time series is the sum of the trend, cyclical, seasonal and
random components. From practical experience, scientists have found that additive models are appropriate
when the seasonal variations remain unchanged, that is, the seasonal variations do not depend on the trend of
the time series.
However, in practice, there are a number of situations where the seasonal variations change over time. When
the seasonal variations exhibit an increasing or decreasing trend, we can try the multiplicative model. In the
multiplicative model it is assumed that the time series is obtained as a product of the four time series
components, that is,
yt  Tt . Ct . St . It
Multiplicative models are found to be appropriate for many economic time series data such as data related to
production of electricity, number of passengers going abroad, consumption of cold drinks, etc.

Random Experiment
An experiment in which all the possible outcomes are known in advance but we cannot predict as to which of
them will occur when we perform the experiment, e.g. Experiment of tossing a coin is random experiment as
the possible outcomes head and tail are known in advance but which one will turn up is not known.
Similarly, ‘Throwing a die’ and ‘Drawing a card from a well shuffled pack of 52 playing cards ‘are the
examples of random experiment.
Performing an experiment is called trial, e.g.Tossing a coin is a trial, Throwing a die is a trial.
Sample Space
Set of all possible outcomes of a random experiment is known as sample space and is usually denoted by S, and
the total number of elements in the sample space is known as size of the sample space and is denoted by n(S),
(i) If we toss a coin then the sample space is S = {H, T}, where H and T denote head and tail respectively
and n(S) = 2.
Q die has six faces 
(ii) If a die is thrown, then the sample space is S = {1, 2, 3, 4, 5, 6} and n(S) = 6.  
 numbered 1, 2, 3, 4, 5, 6
(iii) If a coin and a die are thrown simultaneously, then the sample space is
S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6} and n(S) = 12.
(iv) If a coin is tossed twice or two coins are tossed simultaneously then the sample space is
S = {HH, HT, TH, TT}, Here, n(S) = 4.
(v) If a die is thrown twice or a pair of dice is thrown simultaneously, then sample space is
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2),
(3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (5, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)} Here, n(S) = 36.
(viii) If a family contains two children then the sample space is S = {B1B2, B1G2, G1B2, G1G2}
where Bi denotes that i th birth is of boy, i = 1, 2, and G i denotes that i th birth is of girl, i = 1, 2.

This sample space can also be written as S = {BB, BG, GB, GG}
(ix) If a bag contains 3 red and 4 black balls and
(a) One ball is drawn from the bag, then the sample space is
R1, R 2 , B1, B2 , B3 , B4 , where R1, R2, R3 denote three red balls and B1 , B2 , B3 , B4
denote four black balls in the bag.
Remark 1: If a random experiment with x possible outcomes is performed n times, then the total number of
elements in the sample is x n i.e. n(S) = x n , e.g. if a coin is tossed twice, then n(S) = 2 2 =4; if a die is thrown
thrice, then n(S) = 63 = 216.
Sample Point
Each outcome of an experiment is visualised as a sample point in the sample space. e.g.
(i) If a coin is tossed then getting head or tail is a sample point.
(ii) If a die is thrown twice, then getting (1, 1) or (1, 2) or (1, 3) or…or (6, 6) is a sample point.
Set of one or more possible outcomes of an experiment constitutes what is known as event. Thus, an event can
be defined as a subset of the sample space, e.g.
i) In a die throwing experiment, event of getting a number less than 5 is the set {1, 2, 3, 4},

which refers to the combination of 4 outcomes and is a sub-set of the sample space

= {1, 2, 3, 4, 5, 6}.

ii) If a card is drawn form a well-shuffled pack of playing cards, then the event of getting a card of a spade
suit is 1S , 2S ,3S ,...,9S ,10S , JS ,QS , KS

where suffix S under each character in the set denotes that the card is of spade and J, Q and K represent
jack, queen and king respectively.

Exhaustive Cases
The total number of possible outcomes in a random experiment is called the exhaustive cases. In other words,
the number of elements in the sample space is known as number of exhaustive cases, e.g.
(i) If we toss a coin, then the number of exhaustive cases is 2 and the sample space in this case is {H, T}.
(ii) If we throw a die then number of exhaustive cases is 6 and the sample space in this case is {1, 2, 3, 4, 5,
Favourable Cases
The cases which favour to the happening of an event are called favourable cases. e.g.
(i) For the event of drawing a card of spade from a pack of 52 cards, the number of favourable cases is 13.
(ii) For the event of getting an even number in throwing a die, the number of favourable cases is 3 and the
event in this case is {2, 4, 6}.
Mutually Exclusive Cases
Cases are said to be mutually exclusive if the happening of any one of them prevents the happening of all others
in a single experiment, e.g.
(i) In a coin tossing experiment head and tail are mutually exclusive as there cannot be simultaneous
occurrence of head and tail.
Equally Likely Cases
Cases are said to be equally likely if we do not have any reason to expect one in preference to others. If there is
some reason to expect one in preference to others, then the cases will not be equally likely, For example,
(i) Head and tail are equally likely in an experiment of tossing an unbiased coin. This is because if someone
is expecting say head, he/she does not have any reason as to why he/she is expecting it.
(ii) All the six faces in an experiment of throwing an unbiased die are equally likely.
You will become more familiar with the concept of “equally likely cases” from the following examples, where
the non-equally likely cases have been taken into consideration:
(i) Cases of “passing” and “not passing” a candidate in a test are not equally likely. This is because a
candidate has some reason(s) to expect “passing” or “not passing” the test. If he/she prepares well for the
test, he/she will pass the test and if he/she does not prepare for the test, he/she will not pass. So, here the
cases are not equally likely.
(ii) Cases of “falling a ceiling fan” and “not falling” are not equally likely. This is because, we can give some
reason(s) for not falling if the bolts and other parts are in good condition.

Example 14: In a lottery, one has to choose six numbers at random out of the numbers from 1 to 30. He/ she
will get the prize only if all the six chosen numbers matched with the six numbers already decided by the
lottery committee. Find the probability of wining the prize.
Solution: Out of 30 numbers 6 can be drawn in
30  29  28  27  26  25 30  29  28  27  26  25
C6    593775 ways
6! 720
 Number of exhaustive cases = 593775
Out of these 593775 ways, there is only one way to win the prize (i.e. choose those six numbers that are already
fixed by committee).
Here, the number of favourable cases is 1.
Favourable cases 1
Hence, P(wining the prize) = =
Exhaustive cases 593775
Concept of odds in favour of and against the happening of an event
Let n be the number of exhaustive cases in a random experiment which are mutually exclusive and equally
likely as well. Let m out of these n cases are favourable to the happening of an event A (say). Thus, the number
of cases against A are n  m
Then odds in favour of event A are m: n  m (i.e. m ratio n  m ) and odds against A are n  m: m (i.e. n  m
ratio m )
Example 15: If odds in favour of event A are 3 : 4 , what is the probability of happening A?
Solution: As odds in favour of A are 3 : 4 ,  m  3 and n  m  4 implies that n  7.Thus,
m 3
Probability of happening A i.e. P(A) = = .
n 7
Example 16: Find the probability of event A if
(i) Odds in favour of event A are 4 : 3 (ii) Odds against event A are 5 : 8
Solution: (i) We know that if odds in favour of A are m:n , then
m 4 4
P(A)=  P(A) = =
m+n 4+3 7
(ii) Here, n  m  5 and m  8 , therefore, n=5 +8=13 .
Now, as we know that if odds against the happening of an event A are
m 8
n  m: n , then P(A) =  P(A) =
n 13
Example 17 If P(A) = then find
(i) odds in favour of A; (ii) odds against the happening of event A.
Solution: (i) As P(A) = ,  odds in favour of A in this case are 3:5  3 = 3:2
(ii) We know that if P(A) = , then odds against the happening of A are n  m:m
 In this case odds against the happening of event A are 5  3 : 3 = 2 : 3

Relative Frequency Approach And Statistical Probability

Classical definition of probability fails if

i) the possible outcomes of the random experiment are not equally likely or/and
ii) the number of exhaustive cases is infinite.
In such cases, we obtain the probability by observing the data. This approach to probability is called the relative
frequency approach and it defines the statistical probability.

Statistical (or Empirical) Probability
If an event A (say) happens m times in n trials of an experiment which is performed repeatedly under
essentially homogeneous and identical conditions (e.g. if we perform an experiment of tossing a coin in a room,
then it must be performed in the same room and all other conditions for tossing the coin should also be identical
and homogeneous in all the tosses), then the probability of happening A is defined as:
P(A) = lim .
n  n

As an illustration, we tossed a coin 200 times and observed 50 heads. Then probability of head= proportion of
m 50 1
heads i.e.  
n 200 4
Let S be a sample space for a random experiment and A be an event which is subset of S, then P(A) is called
probability function if it satisfies the following axioms
(i) P(A) is real and P(A)  0
(ii) P (S) = 1
(iii) If A1 , A 2 , ... is any finite or infinite sequence of disjoint (mutually exclusive) events in S, then

P(A1 or A2 or...or A n ) = P(A1 ) + P(A 2 ) +...+ P(A n )

Now, let us give some results using probability function. But before taking up these results, we discuss some
statements with their meanings in terms of set theory. If A and B are two events, then in terms of set theory, we
i) ‘At least one of the events A or B occurs’ as A  B
ii) ‘Both the events A and B occurs’ as A  B
iii) ‘Neither A nor B occurs’ as A  B
iv) ‘Event A occurs and B does not occur’ as A  B

v) ‘Exactly one of the events A or B occurs’ as (A  B)  (A B )

vi) ‘Not more than one of the events A or B occurs’ as  A  B    A  B    A  B  .

Similarly, you can write the meanings in terms of set theory for such statement in case of three or more events
e.g. in case of three events A, B and C, happening of at least one of the events is written as A B C.
1 Prove that probability of the impossible event is zero
2 Probability of non-happening of an event A i.e. complementary event A of A is given by P(A) =1 –

3. (i) P(A  B) = P(A) – P(A  B)

(ii) P(A  B) = P(B) – P(A  B)

Example 4: A, B and C are three mutually exclusive and exhaustive events associated with a random
experiment. Find P(A) given that :
3 1
P  B  = P  A  and P  C  = P  B 
4 3
Examples 5: If two dice are thrown, what is the probability that sum is

a) greater than 9, and b) neither 10 or 12.
a) P[sum > 9] = P[sum = 10 or sum = 11 or sum = 12]= P[sum =10] + P[sum = 11] + P[sum = 12]
3 2 1 6 1
= + + = =
36 36 36 36 6
[Q for sum = 10, there are three favourable cases (4, 6), (5, 5) and (6, 4). Similarly for sum =11 and 12, there
are two and one favourable cases respectively.]
Let A denotes the event for sum =10 and B denotes the event for sum = 12,

 
 Re quired probability = P  A  B  P A  B [Using De- Morgan's law

3 1 4 1 8
= 1 –P( A  B ) = 1– [P(A) + P(B)] = 1   +   1   1 =
 36 36  36 9 9
E5) Fourteen balls are serially numbered and placed in a bag. Find the probability that a ball is drawn bears a
number multiple of 3 or 5.
Addition Theorem on Probability for Two Events
Let S be the sample space of a random experiment and events A and B  S then
P(A  B)  P(A)  P(B)  P(A  B)
If events A and B are mutually exclusive events, then P(A  B)  P(A)  P(B) .
Similarly, for three non-mutually exclusive events A, B and C, we have
P  A  B  C   P  A   P  B  P  C   P  A  B   P  A  C   P  B  C   P  A  B  C 

and for three mutually exclusive events A, B and C, we have

P  A  B  C   P  A   P  B  P  C  .

Example 1: From a pack of 52 playing cards, one card is drawn at random. What is the probability that it is a
jack of spade or queen of heart?
Solution: Let A and B be the events of drawing a jack of spade and queen of heart, respectively.
1 1 Q there is one card each of jack 
 P(A)  and P(B) 
52 52 of spade and queen of heart 

Here, a card cannot be both the jack of spade and the queen of heart, hence A and B are mutually exclusive,
 applying the addition theorem for mutually exclusive events,
1 1 2 1
the required probability = P(A  B) = P(A) + P(B) =    .
52 52 52 26
Example 2: 25 lottery tickets are marked with first 25 numerals. A ticket is drawn at random. Find the
probability that it is a multiple of 5 or 7.
Solution: Let A be the event that the drawn ticket bears a number multiple of 5 and B be the event that it bears
a number multiple of 7.
A = {5, 10, 15, 20, 25},B = {7, 14, 21}Here, as A  B = ,

5 3 8
 A and B are mutually exclusive, and hence, P  A  B   P  A   P  B    
25 25 25
Example 3: Find the probability of getting either a number multiple of 3 or a prime number when a fair die is
Solution: When a die is thrown, then the sample space is S = {1, 2, 3, 4, 5, 6}
Let A be the event of getting a number multiple of 3 and B be the event of getting a prime number,
A = {3, 6}, B = {2, 3, 5}, A  B = {3}Here as A  B is not empty set,

 A and B are non-mutually exclusive and hence, the required probability = P(A  B)
2 3 1 2 + 3 1 4 2
= P(A) + P(B) – P( A  B ) =   =   .
6 6 6 6 6 3
Example 4: There are 40 pages in a book. A page is opened at random. Find the probability that the number of
this opened page is a multiple of 3 or 5.
Solution: Let A be the event that the drawn card is a card of ace and B be the event that it is red colour card.
Now as there are four cards of ace and 26 red colour cards in a pack of 52 playing cards. Also, 2 cards in the
pack are ace cards of red colour.
4 26 2
 P(A)  , P(B)  , and P(A  B) 
52 52 52
 the required probability = P(A  B)
= P(A) + P(B) – P(A  B)
4262 287
=    .
E1) A card is drawn from a pack of 52 playing cards. Find the probability that it is either a king or a red card.

E2) Two dice are thrown together. Find the probability that the sum of the numbers turned up is either 6 or 8.

Conditional Probability
We have discussed earlier that P(A) represents the probability of happening event A for which the number of
exhaustive cases is the number of elements in the sample space S. P(A) dealt earlier was the unconditional
probability. Here, we are going to deal with conditional probability.
Let us start with taking the following example:
Suppose a card is drawn at random from a pack of 52 playing cards. Let A be the event of drawing a black
colour face card. Then A = {Js, Qs, Ks, Jc, Qc, Kc} and hence P(A) = 6/52 = 3/26.
Let B be the event of drawing a card of spade i.e. B = {1s, 2s, 3s, 4s, 5s, 6s, 7s, 8s, 9s, 10s, Js, Qs, Ks}.
If after a card is drawn from the pack of cards, we are given the information that card of spade has been drawn
i.e., B has happened, then the probability of event A will no more be , because here in this case, we have
the information that the card drawn is of spade (i.e. from amongst 13 cards) and hence there are 13 exhaustive
cases and not 52. From amongst these 13 cards of spade, there are 3 black colour face cards and hence
probability of having black colour face card given that it is a card of spade i.e. P(AB) = 3/13, which is the
conditional probability of A given that B has already happened.
Note: Here, the symbol ‘’ used in P(AB) should be read as ‘given’ and not ‘upon’. P(AB) is the conditional
probability of happening A given that B has already happened i.e. here A happens depending on the condition
of B.
So, the conditional probability P(AB) is also the probability of happening A but here the information is given
that the event B has already happened. P(AB) refers to the sample space B and not S.
Remark 1: P(AB) is meaningful only when P(B)  i.e. when the event B is not an impossible event.
Multiplication Law of Probability
Statement: For two events A and B,
P(A  B) = P(A) P(BA), P(A) > 0 … (1)
= P(B) P(AB), P(B) > 0, … (2)
where P(BA) is the conditional probability of B given that A has already happened and P(AB) is the
conditional probability of A given that B has already happened.
P(A  B  C) = P(A) P(BA) P(CAB),
where P(CA  B) represents the probability of happening C given that A and B both have already happened.
Before defining the independent events, let us again consider the concept of conditional probability taking the
following example:
Suppose, we draw a card from a pack of 52 playing cards, then probability of drawing a card of spade is 13/52.
Now, if we do not replace the card back and draw the next card. Then, the probability of drawing the second
card ‘a card of spade’ if it being given that the first card was spade would be 12/51 and it is the conditional
probability. Now, if the first card had been replaced back then this conditional probability would have been
13/52. So, if sampling is done without replacement, the probability of second draw and that of subsequent
draws made following the same way is affected but if it is done with replacement, then the probability of
second draw and subsequent draws made following the same way remains unaltered.
So, if in the above example, if the next draw is made with replacement, then the happening or non-happening of
any draw is not affected by the preceding draws. Let us now define independent events.
Independent Events

Events are said to be independent if happening or non-happening of any one event is not affected by the
happening or non-happening of other events. For example, if a coin is tossed certain number of times, then
happening of head in any trial is not affected by any other trial i.e. all the trials are independent.

Two events A and B are independent if and only if P(BA) = P(B) i.e. there is no relevance of giving any
information. Here, if A has already happened, even then it does not alter the probability of B. e.g. Let A be the
event of getting head in the 4th toss of a coin and B be the event of getting head in the 5th toss of the coin. Then
the probability of getting head in the 5th toss is , irrespective of the case whether we know or don’t know the
outcome of 4 toss, i.e. P(BA) = P(B).

Multiplicative Law for Independent Events:

If A and B are independent events, then

P(A  B) = P(A) P(B).

This is because if A and B are independent then P(BA) = P(A) and hence the equation (1) discussed in Sec. 3.3
of this unit becomes P(AB) = P(A) P(B).

Similarly, if A, B and C are three independent events, then

P(A  B  C) = P(A) P(B) P(C).
Remark 2: Mutually exclusive events can never be independent.
Result: If events A and B are independent then prove that
(i) A and B are independent (ii) A and B are independent (iii) A and B are independent
Example 6: A die is rolled. If the outcome is a number greater than 3, what is the probability that it is a prime
Solution: The sample space of the experiment is
S= {1, 2, 3, 4, 5, 6}
Let A be event that the outcome is a number greater than 3 and B be the event that it is a prime number.
 A = {4, 5, 6}, B = {2, 3, 5} and hence A  B = {5}.
 P(A)  3/6, P(B) = 3/6, P(A  B) = 1/6.
P  A  B 1/ 6 1
Now, the required probability = P(BA) =  
P A 3/ 6 3

Example 7: A couple has 2 children. What is the probability that both the children are boys, if it is known that?

(i) younger child is a boy (ii)older child is a boy(iii) at least one of them is boy

Solution: Let Bi ,Gi denote that i th birth is of boy and girl respectively, i =1, 2.

Then for a couple having two children, the sample space is

S={B1B2 , B1G 2 ,G1B2 ,G1G 2}

Let A be the event that both children are boys then A = { B1B2 }

(i) Let B be the event of getting younger child as boy i.e.

B= { B1B2 ,G1B2 } . Hence A  B={B1B2 }

P(A  B) 1/ 4 1
required probability P(A  B)   
P(B) 2/4 2
(ii) Let C be the event of getting older child as boy, then C = {B1B2 , B1G 2 }

and hence A  C ={B1B2 } .

P(A  C) 1/ 4 1
 required probability P(A  C)    .
P(C) 2/4 2
(iii) Let D be the event of getting at least one of the children as boy, then
D  {B1B2 , B1G 2 , G1B2 } and hence

A  D ={B1B2 } .

P(A  D) 1/ 4 1
required probability P(A  D)   
P(D) 3/ 4 3
Example 8: An urn contains 4 red and 7 blue balls. Two balls are drawn one by one without replacement. Find
the probability of getting 2 red balls.
Solution: Let A be the event that first ball drawn is red and B be the event that the second ball drawn is red.
Q it is given that one red ball 
 P(A)  4 /11 and P(BA)= 3/10  
 has already been drawn 

 4  3  6
The required probability = P(A and B) = P(A) P(BA)     
 11  10  55

Example 9: Three cards are drawn one by one without replacement from a well shuffled pack of 52 playing
cards. What is the probability that first card is jack, second is queen and the third is again a jack.

Solution: Define the following events

E1 be the event of getting a jack in the first draw, E 2 be the event of getting a queen in second draw, and

E 3 be the event of getting a jack in third draw,

Q cards are drawn without 

 replacement and hence the 
 Required probability = P(E1  E 2  E3 )  P  E1  P  E2  E1  P  E3  E1  E2   
events are not independent 

4 4 3 1 2 1 2
= × × = × × = .
52 51 50 13 17 25 5525
Example 10: (i) If A and B are independent events with
P(A  B) = 0.8 and P(B) = 0.4 then find P(A).

(ii) If A and B are independent events with P(A) = 0.2, P(B) = 0.5 then find P(A  B) .

(iii) If A and B are independent events and P(A) = 0.4 and P(B) = 0.3, then find P(AB) and P(BA).
(iv) If A and B are independent events with P(A) = 0.4 and P(B) = 0.2, then find

P(A  B), P(A  B), P(A  B)

(i) We are given P(A  B)  0.8, P(B)  0.4 .

P(A  B)  P(A)  P(B)  P(A  B)  P(A)  P(B)  P(A)P(B) Q events A and B are independent 
 0.8 = P(A)  0.4  0.4P(A)  0.4 = (1  0.4) P (B) = 0.6 P (B)
0.4 4 2
 P(B)    .
0.6 6 3
(ii) We are given that P(A) = 0.2, P(B) = 0.5.
P(A  B)  P(A)  P(B)  P(A  B)  P(A) + P(B)  P(A)P(B) [ Q events A and B are independent]
0 = 0.7 – 0.10 = 0.6
(iii) We are given that P(A) = 0.4, P(B) = 0.3.
Now, as A and B are independent events,
 P(A  B)  P(A)P(B)
= 0.4  0.3 = 0.12
And hence from conditional probability, we have
P(A  B) 0.12 12 2 P(A  B) 0.12 12 3
P(A  B)      0.4 , and P(B  A)      0.3 .
P(B) 0.3 30 5 P(A) 0.4 40 10
(iv) We are given that P(A) = 0.4, P(B) = 0.2.
We know that if two events A and B are independent then
A and B; A and B; A and B are also independent events.

 P(A  B)  P(A)P(B) [Using the concept of independent events]

= (1  P(A))(P(B)) = (1  0.4) (0.2) = (0.6) (0.2) = 0.12
P(A  B)  P(A) P(B) [Q A and B are independent]
= (0.4) (1 – 0.2) = 0.32
P  A  B   P  A  P  B   1  0.4 1  0.2    0.6  0.8   0.48 .

Example 11: Three unbiased coins are tossed simultaneously. In which of the following cases are the events A
and B independent?
(i) A be the event of getting exactly one head
B be the event of getting exactly one tail
(ii) A be the event that first coin shows head
B be the event that third coin shows tail
(iii) A be the event that shows exactly two tails
B be the event that third coin shows head
Solution: When three unbiased coins are tossed simultaneously, then
the sample space is given by

(i) A = {HTT, THT, TTH}
A  B = {} = 
3 3 0
 P(A)  , P(B)  , P(A  B)   0
8 8 8
3 3 9
Hence, P(A) P(B) = × =
8 8 64
 P(A  B)  P(A)P(B)

 Events A and B are not independent.

(ii) A = {HHH, HHT, HTH, HTT} , B = {HHT, HTT, THT, TTT}, A  B  {HHT, HTT}
4 1 4 1 2 1
 P(A)   , P(B)   , P(A  B)  
8 2 8 2 8 4
1 1 1
P(A)P(B)     P(A  B)  Events A and B are independent.
2 2 4
(iii) A= {HTT, THT, TTH}, B = {HHH, HTH, THH, TTH}, A  B  {TTH}
3 4 1 1
P(A)= , P(B)   , P(A  B)  .
8 8 2 8
3 1 3
Hence, P(A)P(B)     P(A  B)  Events A and B are not independent.
8 2 16
Example 12: Two cards are drawn from a pack of cards in succession with replacement of first card. Find the
probability that both are the cards of ‘heart’.
Solution: Let A be the event that the first card drawn is a heart card and B be the event that second card is a
heart card.
As the cards are drawn with replacement,
A and B are independent and hence the required probability
 13   13  1 1 1
 P  A  B  P  A  P  B       =   .
 52   52  4 4 16
Example 13: A class consists of 10 boys and 40 girls. 5 of the students are rich and 15 students are brilliant.
Find the probability of selecting a brilliant rich boy.
Solution: Let A be the event that the selected student is brilliant, B be the event that he/she is rich and C be
the event that the student is boy.
15 5 10
P A  , P  B  , P  C   and hence
50 50 50 
the required probability = P(A  B  C)
 P(A)P(B)P(C) Q A, B and Careindependent 
 15  5  10   3  1  1  3
          
 50  50  50   10  10  5  500

E3) A card is drawn from a well-shuffled pack of cards. If the card drawn is a face card, what is the
probability that it is a king?
E4) Two cards are drawn one by one without replacement from a well shuffled pack of 52 cards. What is the
probability that both the cards are red?
E5) A bag contains 10 good and 4 defective items, two items are drawn one by one without replacement. What
is the probability that first drawn item is defective and the second one is good?
E6) The odds in favour of passing driving test by a person X are 3: 5 and odds in favour of passing the same
test by another person Y are 3: 2 . What is the probability that both will pass the test?


If A and B be two independent events, then probability of happening at least one of the events is
P  A  B  1  P(A  B)  1  P  A  B  [By DeMorgan' s Law]

 1  P  A  P  B Q A and B and hence A and B are independent.

Similarly if we have n independent event A1, A2, …, An, then probability of happening at least one of the events
P  A1  A 2  ...  A n   1  P  A1  P  A 2  ... P  A n 
i.e. probability of happening at least one of the independent events
= 1 – probability of happening none of the events.
Example 14: A person is known to hit the target in 4 out of 5 shots whereas another person is known to hit 2 out of 3
shots. Find the probability that the target being hit when they both try.
Solution: Let A be the event that first person hits the target and B be the event that second person hits the target.
4 2
 P  A   , P  B 
5 3
Now, as both the persons try independently,
the required probability = probability that the target is hit
= probability that at least one of the persons hits the target
= P(A  B)
 4  2   1  1 
 1  P  A  P  B   1  1  1    1      1   .
1 14
 5  3   5  3  15 15
E 7) A problem in statistics is given to three students A, B and C whose chances of solving it are 0.3, 0.5 and
0.6 respectively. What is the probability that the problem is solved?
Example 15: Husband and wife appear in an interview for two vacancies for the same post. The probabilities
2 1
of husband’s and wife’s selections are and respectively. Find the probability that
5 5
(i) Exactly one of them is selected (ii)At least one of them is selected (iii)None is selected.
Solution: Let H be the event that husband is selected and W be the event that wife is selected. Then,
2 1 2 3 1 4
P(H)  , P(W)   P(H)  1   , P(W)  1  
5 5 5 5 5 5

 By Addition theorem for 
(i) The required probability  P (H  W)  (H  W)  = P(H  W)  P(H  W)  
 mutually exclusive events 
Q selection of husband and 
= P H P  W  P H P  W  
 wife are independent 
2 4 3 1 8 3 11
= × + × = + = .
5 5 5 5 25 25 25

(ii) The required probability = P(H  W) = 1 P H P W     3 4

= 1 × = 1
5 5
12 13

25 25

3 4 12
(iii) The required probability = P(H  W) = P(H)P(W) Q H and W are independent  = × 
5 5 25
Example 16: A person X speaks the truth in 80% cases and another person Y speaks the truth in 90% cases.
Find the probability that they contradict each other in stating the same fact.
Solution: Let A, B be the events that person X and person Y speak truth respectively, then
80 90
P(A)   0.8, P(B)   0.9  P(A)  1  0.8  0.2, P(B)  1  0.9  0.1 .
100 100
 By addition law for mutually 
    
Thus, the required probability = P  A  B  A  B   P(A  B)  P(A  B) 
 exclusive events

 By multiplication law for 
 P(A)P(B)  P(A)P(B)  
independent events 
= 0.8  0.1 + 0.2  0.9 = 0.08+ 0.18 = 0.26 = 26%.
E 8) Two cards are drawn from a pack of cards in succession presuming that drawn cards are replaced. What
is the probability that both drawn cards are of the same suit?


There are experiments which are conducted in two stages for completion. Such experiments are termed as
two-stage experiments. At the first stage, the experiment involves selection of one of the given numbers of
possible mutually exclusive events. At the second stage, the experiment involves happening of an event
which is a sub-set of at least one of the events of first stage.
As an illustration for a two-stage experiment, let us consider the following example:
Suppose there are two urns – Urn I and Urn II. Suppose Urn I contains 4 white, 6 blue and Urn II contains 4
white, 5 blue balls. One of the urns is selected at random and a ball is drawn. Here, the first stage is the
selection of one of the urns and second stage is the drawing of a ball of particular colour.
If we are interested in finding the probability of the event of second stage, then it
is obtained using law of total probability, which is stated and proved as under:
Law of Total Probability
Statement: Let S be the sample space and E1, E2, …, En be n mutually exclusive
and exhaustive events with P(Ei)  0; i = 1, 2, …, n. Let A be any event which is a
sub-set of E1  E2 …En (i.e. at least one of the events E1, E2, …, En ) with P(A) > 0,
then P(A) = P(E1) P(AE1) + P(E2) P(AE2) + … + P(En) P(AEn)
  P  Ei  P  A  Ei 
i 1

Example 1: There are two bags. First bag contains 5 red, 6 white balls and the second bag contains 3 red, 4
white balls. One bag is selected at random and a ball is drawn from it. What is the probability that it is
(i) red, (ii) white.
Solution: Let E1 be the event that first bag is selected and E2 be the event that second bag is selected 
P(E1 )  P(E 2 )  .
5 3
(i) Let R be the event of getting a red ball from the selected bag.  P(R  E1 )  , and P(R  E 2 )  .
11 7
Thus, the required probability is given by
1 5 1 3 5 3 35  33 68 34
P(R)  P(E1 )P(R  E1 )  P(E 2 )P(R  E 2 ) = × + ×     
2 11 2 7 22 14 154 154 77
(ii) Let W be the event of getting a white ball from the selected bag.
6 4
 P(W  E1 )  , and P(W  E 2 )  .
11 7
Thus, the required probability is given by
1 6 1 4 3 2 21  22 43
P(W)  P(E1 )P(W  E1 )  P(E 2 )P(W  E 2 )         .
2 11 2 7 11 7 77 77
Example 2: A factory produces certain type of output by 3 machines. The respective daily production figures
are-machine X : 3000 units, machine Y: 2500 units and machine Z: 4500 units. Past experience shows that 1%
of the output produced by machine X is defective. The corresponding fractions of defectives for the other two
machines are 1.2 and 2 percent respectively. An item is drawn from the day’s production. What is the
probability that it is defective?
Solution: Let E1, E2 and E3 be the events that the drawn item is produced by machine X, machine Y and
machine Z, respectively. Let A be the event that the drawn item is defective.
As the total number of units produced by all the machines is 3000 + 2500 + 4500 = 10000,
3000 3 2500 1 4500 9
 P(E1) =  , P  E2    , P  E3    .
10000 10 10000 4 10000 20
1 1.2 2
P  A  E1    0.01, P  A  E 2    0.012, P  A  E 3    0.02.
100 100 100
Thus, the required probability = Probability that the drawn item is defective
= P(A) = P(E1) P(AE1) + P(E2) P(AE2) + P(E3) P(AE3)
3 1 9 3 3 9 15
=  0.01   0.012   0.02 =   =  0.015 .
10 4 20 1000 1000 1000 1000
Example 3: There are two coins-one unbiased and the other two- headed, otherwise they are identical. One of
the coins is taken at random without seeing it and tossed. What is the probability of getting head?
Solution: Let E1 and E2 be the events of selecting the unbiased coin and the two-headed coin respectively. Let
A be the event of getting head on the tossed coin.
1 1
 P  E1   , P  E 2   [Q selection of each of the coin is equally likely]
2 2
P  A  E1   [Q if it is unbiased coin, then head and tail are equally likely]
P(AE2) = 1 [Q if it is two-headed coin, then getting the head is certain]
1 1 1 1 1 3
Thus, the required probability = P(A)= P(E1) P(AE1) + P(E2) P(AE2)=   1 =   .
2 2 2 4 2 4
Example 4: The probabilities of selection of 3 persons for the post of a principal in a newly started colleage are
in the ratio 4 : 3: 2 . The probabilities that they will introduce co-education in the college are 0.2, 0.3 and 0.5,
respectively. Find the probability that co-education is introduced in the college.
Solution: Let E1 , E 2 , E3 be the events of selection of first, second and third person for the post of a principal
respectively. Let A be the event that co-education is introduced.
4 3 2
 P  E1   , P  E 2   , P  E 3   , P  A  E1   0.2, P  A  E2   0.3, P  A  E3   0.5.
9 9 9
Thus, the required probability = P(A)
= P  E1  P  A  E1   P  E2  P  A  E2  + P(E3) P(AE3)

4 3 2 0.8 0.9 1 2.7

=  0.2   0.3   0.5 =   =  0.3
9 9 9 9 9 9 9
E1) A person gets a construction job and agrees to undertake it. The completion of the job in time depends on
whether there happens to be strike or not in the company. There are 40% chances that there will be a
strike. Probability that job is completed in time is 30% if the strike takes place and is 70% if the strike
does not take place. What is the probability that the job will be completed in time?
E2) What is the probability that a year selected at random will contains 53 Sundays?
E3) There are two bags, first bag contains 3 red, 5 black balls and the second bag contains 4 red, 5 black
balls. One ball is drawn from the first bag and is put into the second bag without noticing its colour.
Then two balls are drawn from the second bag. What is the probability that balls are of opposite
If we are interested in finding the probability of the event of second stage, then it is obtained using law of total
probability. But if the happening of the event of second stage is given to us and on this basis we find the
probability of the events of first stage, then the probability of an event of first stage is the revised (or posterior)
probabilities and is obtained using an important theorem known as Bayes’ theorem given by Thomas Bayes
This theorem is also known as ‘Inverse probability theorem’, because here moving from first stage to second
stage, we again find the probabilities (revised) of the events of first stage i.e. we move inversely. Thus, using
this theorem, probabilities can be revised on the basis of having some related new information.
Statement: Let S be the sample space and E1, E2, …, En be n mutually exclusive and exhaustive events with
P(Ei)  0; i = 1, 2, .., n. Let A be any event which is a sub-set of E1  E2  … En (i.e. at least one of the
events E1, E2, …, En ) with P(A) > 0 [Notice that up to this line the statement is same as that of law of total
probability], then
P  Ei  P  A  Ei 
P  Ei  A   ,i  1, 2,..., n where P(A) = P(E1) P(AE1) + P(E2) P(AE2) + … +P(En) P(AEn).
P A

Example 5: There are two bags. First bag contains 5 red, 6 white balls and the second bag contains 3 red, 4
white balls. One bag is selected at random and a ball is drawn from it and it is found to be red, what is the
probability of?
i) selecting the first bag ii) selecting the second bag
Solution: First, we have to give the solution exactly as given for Example 1 of Sec. 4.3 of this unit. After that,
we are to proceed as follows:
i) Probability of selecting the first bag given that the ball drawn is red

1 5
P  E1  P  R  E1  
5 77 35
 P  E1  R    2 11   
P R  34 22 34 68
ii) Probability of selecting the second bag given that the ball drawn is red
1 3
P  E 2  P  R  E 2  2  7 3 77 33
P(E2R) =    
P R  34 14 34 68
Example 6: A factory produces certain type of output by 3 machines. The respective daily production figures
are-machine X : 3000 units, machine Y: 2500 units and machine Z: 4500 units. Past experience shows that 1%
of the output produced by machine X is defective. The corresponding fractions of defectives for the other two
machines are 1.2 and 2 percent respectively. An item is drawn from the day’s production and if the drawn item
is found to be defective, what is the probability that it has been produced by machine Y?
Solution: Proceed exactly in the manner the Example 2 has been solved and then as under:
Probability that the drawn item has been produced by machine Y given that it is defective
P  E 2  P  A  E 2  4  0.012 0.003 1
 P  E2  A  = =   .
P A 0.015 0.015 5
Example 7: The probabilities of selection of 3 persons for the post of a principal in a newly started colleage are
in the ratio 4 : 3: 2 . The probabilities that they will introduce co-education in the college are 0.2, 0.3 and 0.5,
respectively. If the co-education is introduced by the candidate selected for the post of principal, what is the
probability that first candidate was selected.
Solution: First give the solution of Example 4 then proceed as under:
P  E1  P  A  E1   0.2
4 0.2 8
The required probability = P  E1  A  = = 9    .
P A 0.3 9 0.3 27
E4) A bag contains 4 red and 5 white balls. Another bag contains 2 red and 3
white balls. A ball is drawn from the first bag and is transferred to the second
bag. A ball is then drawn from the second bag and is found to be red, what is
the probability that red ball was transferred from first to second bag?
E5) An insurance company insured 1000 scooter drivers, 3000 car drivers and 6000 truck drivers. The
probabilities that scooter, car and truck drivers meet an accident are 0.02, 0.04, 0.25 respectively. One of
the insured persons meets with an accident. What is the probability that he is a
(i) car driver (ii) truck driver
E6) By examining the chest X-ray, the probability that T.B is detected when a person is actually suffering
from T.B. is 0.99. The probability that the doctor diagnoses incorrectly that a person has T.B. on the basis
of X-ray is 0.002. In a certain city, one in 1000 persons suffers from T.B. A person is selected at random
and is diagnosed to have T.B., what is the chance that he actually has T.B.?
E7) A person speaks truth 3 out of 4 times. A die is thrown. She reports that there is five. What is the chance
there was five?
Let E1 be the event that the person speaks truth, E2 be the event that she tells a lie and A be the event that she
reports a five.
3 1 1 5
 P  E1   , P  E 2   , P  A  E1   , P  A  E 2   .
4 4 6 6
By law of total probability, we have P  A   P  E1  P  A  E1   P  E2  P  A  E2 
3 1 1 5 3 5 8 1
=    =    .
4 6 4 6 24 24 24 3

3 1
P  E1  P  A  E1  
3 1 3 3
Thus, the required probability = P  E1  A  = = 4 6     .
P A 1 4 6 1 8

Q1: A construction company is bidding for two contracts, A and B. The probability that the company will get
contract A is 3/5, will get contract B is ¼ and the probability that the company gets both the contracts is 1/8.
What is the probability that the company will get contract A or B.
Q2: Items produced by a certain process, each, may have one or both of two types of defects, A and B. It is
known that 22% if the items have type A defects and 12% have type B defects. Further, 8% are known to have
both types of defects. What is the probability that a randomly selected item will be defective?
Ans. 0.26
Q3: In a class 40% students read statistics, 25% Mathematics and 15% both Mathematics and Statistics. One
student is selected at random. Find the probability,
(i) that he reads Statistics, if it is known that he reads Mathematics =
(ii) that he reads Mathematics, if it is known that he reads Statistics =
1 2 3
Q4: The probabilities of A, B C solving a problem are , and respectively. If all the three try to solve the
3 7 8
problem simultaneously, find the probability that the problem will be solved.
(ii) In the above example, find the probability that exactly one of them will solve the problem.
Q5: Three critics review a book. Odds in favour of the book are 5: 2, 4: 3, and 3: 4 respectively for the three
critics. Find the probability that majority are in favour of the book.

Q6: Three balls are drawn successively from a box containing 6 red, 4 white and 5 blue balls. Find the
probability that they are of different colours if each ball is (i) not replaced (ii) replaced. ANS24/91&16/75

Q7: The probability that atleast one of the two independent events occurs is 0.5. Probability that the first event
occurs but not the second is 3/25. Also the probability that the second event occurs but not the first is 8/25. Find
the probability that none of the two events occurs.
Q8: Suppose that A and B are two independent events associated with a random experiment. If the probability
that A or B occurs equals to 0.6, while probability that A occurs equals 0.4. Determine the probability that B
Q9: In a certain college, the geographical distribution of male students is as follows: 50% come from East, 30%
come from the Mid West and 20% come from the Far west. The following proportion of the male students wear
Ties: 80% of the Easterners, 60% of the Midwesterners and 40% of the Far westerners. What is the probability
that a student who wear a tie comes from the East?

Random Experiment
An experiment in which all the possible outcomes are known in advance but we cannot predict as to which of
them will occur when we perform the experiment is called random experiment, e.g. Experiment of tossing a
coin is random experiment as the possible outcomes head and tail are known in advance but which one will turn
up is not known.
Similarly, ‘Throwing a die’ and ‘Drawing a card from a well shuffled pack of 52 playing cards ‘are the
examples of random experiment.
A quantity which takes different values is called variable. Variable has two types
Fixed (Deterministic) variable
A variable whose values are known in advance is called fixed variable. For example, the months of a year is a
fixed variable. Here it is known that what the next month is.
Random Variable
A variable whose possible values are known in advance but we cannot predict as to which of them will occur
when we perform the experiment is called random variable.
Random variable is a numerical valued function defined on the sample space of a random experiment. Random
variable is denoted by capital letters such as X, Y, Z.. For example, in tossing a coin if we let that x = 1 if the
coin falls with head and x = 0 if the coin falls with tail. So X is a random variable. Random variable has the
following properties:
i) Each particular value of the random variable can be assigned some probability.
ii) All the probabilities associated with all the different values of the random variable gives the value 1(unity).
Discrete Random Variable
A random variable is said to be discrete if it has either a finite or a countable number of values.
Countable number of values means the values which can be arranged in a sequence, i.e. on the basis of three-
four successive known terms, we can catch a rule and hence can write the subsequent terms. For example
suppose X is a random variable taking the values say 2, 5, 8, 11, … then we can write the fifth, sixth, …
values, So, X in this example is a discrete random variable. The number of students present each day in a class
during an academic session is an example of discrete random variable as the number cannot take a fractional
Continuous Random Variable
A random variable is said to be continuous if it can take all possible real (i.e. integer as well as fractional)
values between two certain limits. For example, temperature of a city at various points of time during a day is
an example of continuous random variable as the temperature takes uncountable values, i.e. it can take
fractional values also.
E2) Which of the random variables given below are discrete? Give reasons for your answer.
1. The daily measurement of snowfall at Shimla
2. The number of industrial accidents in each month.
3. The number of defective goods in a shipment (lot) of goods from a manufacturer.
Probability Mass Function
Let X be a discrete random variable (r.v.) which takes values x1, x2, ... and let P  X  xi  = p(xi). This function
p(xi), i =1,2, … defined for the values x1, x2, … is called probability mass function of X if
(i) p(xi)  0 and
(ii) px  1.

Probability Distribution

The set  x , p  x  ,  x , p  x  ,... specifies the probability distribution of a discrete r.v. X. Probability
1 1 2 2

distribution of r.v. X can also be exhibited in the following manner:

X x1 x2 x3 …
p( x ) p( x1 ) p( x 2 ) p( x 3 )…
Now, let us take up some examples concerning probability mass function:
Example 1: State, giving reasons, which of the following are not probability distributions:
X 0 1
p( x ) 1 3
2 4
X 0 1 2
p( x ) 3 1 3

4 2 4
X 0 1 2
p( x ) 1 1 1
4 2 4

(i) Here p( x i )  0, i = 1, 2; but
1 3 5
 px 
i 1
i = p( x1 ) + p( x 2 ) = p(0) + p(1) =    1.
2 4 4
So, the given distribution is not a probability distribution as  p  x  is
i 1
i greater than 1.

(ii) It is not probability distribution as p(x2) = p(1) =  i.e. negative
1 1 1
(iii) Here, p(x i )  0 , i = 1, 2, 3 and  p  x   p  x   p  x   p  x   p  0   p 1  p  2   4  2  4  1 .
i 1
i 1 2 3

 The given distribution is probability distribution.

Example 2: For the following probability distribution of a discrete r.v. X, find
i) the constant c, P[X  3] and P[1 < X < 4].
X 0 1 2 3 4 5
p( x ) 0 c c 2c 3c c
i) As the given distribution is probability distribution,   px  1

 0 + c + c + 2c + 3c + c = 1  8 c = 1  c =
1 1
ii) P[X  3] = P[X = 3] + P[X = 2] + P[X = 1] + P[X = 0] = 2c + c+ c + 0 = 4 c = 4   .
8 2
1 3
iii) P[1 < X < 4] = P[X = 2] + P[X = 3] = c + 2c = 3c = 3  .
8 8
Example 3: Find the probability distribution of the number of heads when three fair coins are tossed
Solution: Let X be the number of heads in the toss of three fair coins.
As the random variable, “the number of heads” in a toss of three coins may be 0 or 1 or 2 or 3 associated with
the sample space
 X can take the values 0, 1, 2, 3, with
1 3 3
P[X = 0] = P[TTT ] = , P[X = 1] = P[HTT, THT, TTH] = , P[X = 2] = P[HHT, HTH, THH] =
8 8 8
P[X = 3] = P [HHH] = .
Probability distribution of X, i.e. the number of heads when three coins are tossed simultaneously is
X 0 1 2 3
p( x ) 1 3 3 1
8 8 8 8

Probability Density Function

Let X be a continuous random variable which takes on values in the interval (a, b). [i.e. all values between a
and b, a < b)]. A function f(x) defined on X is called the probability density function of X if
(i) f(x) is nonnegative for a  x  b i.e., f(x)  0 for all x lying between a and b.
(ii) the area under the graph and above the interval (a, b)is 1.
Example 5: A continuous random variable X has the probability density function:
f( x ) = Ax3, 0  x  1.
i) A, (ii )P[0.2 < X < 0.5]
(i) As f( x ) is probability density function,
1 1
 x4  1 
  f  x  dx  1   f  x  dx  1   Ax dx  1  A    1  A   0   1  A  4

R 0 0  4 0 4 
0.5 0.5
 x4 
(ii) P[0.2 < X < 0.5] =  f  x  dx =  Ax dx  4   = [(0.5)4 – (0.2)4]
= 0.0625 – 0.0016= 0.0609
0.2 0.2  4  0.2


Expected value of a discrete random variable X is E  X    x i pi .
i 1

But, if X is a continuous random variable having the probability density function f  x  , then in place of
summation we will use integration and in this case, the expected value of X is defined as

E X   xf  x  dx ,
Example 3: Find the expectation of the number on an unbiased die when thrown.
Solution: Let X be a random variable representing the number on a die when thrown.
X can take the values 1, 2, 3, 4, 5, 6 with P  X  1  P  X  2  P  X  3  P  X  4  P  X  5  P  X  6   .
Thus, the probability distribution of X is given by
X: 1 2 3 4 5 6
1 1 1 1 1 1
px :
6 6 6 6 6 6
Hence, the expectation of number on the die when thrown is

1 1 1 1 1 1 21 7
E  X    x i p i  1  2   3   4   5   6  = 
i 1 6 6 6 6 6 6 6 2
Example 2: A player tosses two unbiased coins. He wins Rs 5 if 2 heads
appear, Rs 2 if one head appears and Rs1 if no head appears. Find the
expected value of the amount won by him.
Solution: In tossing two unbiased coins, the sample space, is 
1 2 1
S = HH, HT,TH,TT.  P  2 heads   , P  one head   , P  no head   .
4 4 4
Let X be the amount in rupees won by him
 X can take the values 5, 2 and 1 with
P  X  5  P  2heads   ,
P  X  2  P 1Head   , and
P  X  1  P  no Head   .
 Probability distribution of X is
X: 5 2 1
1 2 1
4 4 4
Expected value of X is given as
1  2   1  5 4 1 10
E  X    x i pi = x1p1  x 2 p2  x 3p3 = 5    2    1  =     2.5.
i 1 4 4 4 4 4 4 4
Thus, the expected value of amount won by him is Rs 2.5.
Example 5: For a continuous random variable (X) whose probability density function is given by:
f x   2  x  , 0  x  2, find the expected value of X.
Solution: Expected value of a continuous random variable X is given by
 2 2
3x 3
E X   xf  x  dx =  x  2  x  dx   x 2  2  x  dx
 0
4 40

3   2  2 
2 3 4
3  x3 x 4 
=   2x  x  dx =  2   =  2
3 2 3
  0
40 4 3 4 0 4  3 4 

3 16 16  3 16
    1
4  3 4  4 12
Binomial Distribution (Used when n is small and p > 0.05)
1) In involves a repetition of n identical trials.
2) The trials are independent of each other.
3) Each trial has two possible outcomes.
A discrete random variable X is said to follow binomial distribution with parameters n and p if its probability
mass function is given by
 n Cx p x q n  x ; x  0, 1, 2, ..., n
P X  x  
 0; elsewhere
where, n is the number of independent trials,
x is the number of successes in n trials,
p is the probability of success in each trial, and
q = 1 – p is the probability of failure in each trial.
Mean = np and Variance = npq and Mean> Variance
Example 2: An unbiased coin is tossed six times. Find the probability of obtaining
(i) exactly 3 heads (ii) less than 3 heads (iii) more than 3 heads (iv) at most 3 heads
(v) at least 3 heads (vi) more than 6 heads
Solution: Let p be the probability of getting head (success) in a toss of the coin and n be the number of trials.
1 1 1
 n = 6, p = and hence q = 1 – p = 1 –  .
2 2 2
Let X be the number of successes in n trials,
 by binomial distribution, we have
P  X  x   n Cx px qn x ; x  0, 1, 2, ..., n
x 6 x 6
1 1 1 1 6
 Cx    
 Cx   . =
. C x ; x  0, 1, 2, ..., 6.
2 2 2 64
1  6 5 4 
(i) P[exactly 3 heads] = P [X = 3] 
 6

C3    
64  3  2  16
[Q Recall n C x 
x nx
(ii) P[less than 3 heads] = P[X < 3]  P  X  2 or X  1 or X  0  P X  2  P X  1  P X  0
1 6 1 1 1 6 1 65  22 11
 . C2  . 6 C1  . 6 C0   C2  6 C1  6 C0  =   6  1 =  .
64 64 64 64 64  2  64 32
Q in 6 trials one can 
(iii) P[more than 3 heads] = P[X > 3] = P[X = 4 or X = 5 or X = 6]  
 have at most 6 heads 
1 6 1 1
 P  X  4  P X  5  P X  6 = . C 4  . 6 C5  . 6 C 6
64 64 64
1 65  22 11
=   6  1   .
64  2  64 32
(iv) P[at most 3 heads] = P [3 or less than 3 heads] = P X  3  P X  2  P X  1  P X  0
1 6 1 1 1 1 42 21
. C3  . 6C2  . 6C1  . 6C0   20  15  6  1 
=  .
64 64 64 64 64 64 32
(v) P[at least 3 heads] = P[3 or more heads] = P  X  3  P X  4  P X  5  P X  6
Q sum of probabilities of all possible 
= 1   P  X  0  P  X  1  P  X  2  
 values of a random variable is 1 

 11  21
1   
=  32  32 .
(vi) P [more than 6 heads] = P [7 or more heads] = P [an impossible event] = 0
Example 3: The chances of catching cold by workers working in an ice factory during winter are 25%. What is
the probability that out of 5 workers 4 or more will catch cold?
Solution: Let catching cold be the success and p be the probability of success for each worker.
 Here, n = 5, p = 0.25, q = 0.75 and by binomial distribution
P  X  x   n C x p x q n  x  5C x  0.25   0.75  ; 0, 1, 2, ...,5
x 5 x

Therefore, the required probability = P[X  4]  p X  4 or X  5  P X  4  P X  5

 5C4  0.25   0.75   5C5  0.25   0.75   0.015627
4 1 5 0

E1) The probability of a man hitting a target is 1/4. He fires 5 times. What is the probability of his hitting the
target at least twice?
E2) A policeman fires 6 bullets on a dacoit. The probability that the dacoit will be killed by a bullet is 0.6.
What is the probability that the dacoit is still alive?

Example 2: You are sitting in a plane waiting for its take off. The pilot announces a delay until some incoming
planes land. Suppose you want to find the following:
i) How long will it be before take off.
ii) How many incoming planes are there.
Problem 3: It has been claimed that in 60% of all solar heat installations, the utility bill is reduced by at least
one-third. Accordingly, what are the probabilities that the utility bill will be reduced by one-third in
i) four of five installations?
ii) at least four of the five installations?
Solution Here the random variable follows binomial distribution with p = 0.6, r = 4 and n = 5.
To find (i), we have to calculate P[X = 4], which is given by
P[X = 4] = C(5, 4) (0.6)4 (0.4) = 0.259
Now to find (ii), we have to find the probability that X is at least 4. This probability is the sum of the
probabilities that X = 4 and X = 5 because ‘at least 4 means 4 or more’.
Thus we have to find p[X = 4] + [X = 5].
P[X = 5] = (5, 5) (0.6)5 = 0.078
 the required probability = 0.259 + 0.078 = 0.337.

Poisson Distribution(used when n is large and p<0.05)

Poisson distribution is a limiting case of binomial distribution under the following conditions:
i) n, the number of trials is indefinitely large, i.e. n  .
ii) p, the constant probability of success for each trial is very small, i.e. p  0.
iii) np is a finite quantity say ‘’.
Definition: A random variable X is said to follow Poisson distribution if its probability mass function is given
 e . x
 ; x  0, 1, 2, 3, ... and   0.
px  P X  x   x
 elsewhere

Mean = λ and Variance = λ and Mean = Variance

When time is not mention then we use

 e t .  t  x
 ; x  0, 1, 2, 3, ... and   0.
p x  P X  x   x

0; elsewhere
 is the average arrival rate per unit of time and t is the number of arrival in t units of time

Also we know that  = 72 arrivals per hour is a constant for this situation. Since in the question  is given in
‘hour’, to standardise the unit, we have to find ‘t’ in hour.
1 1
i.e. 60 minutes = 1 hour 3 minutes = hour t = hour
20 20
1 1 
 72  72 
e 20  20 
e  3.6 (3.6) 4
p(4)  
4! 4!
Note: In most of the cases for Poisson distribution, if we are to compute the probabilities of the
type P X  a  or P X  a  , we write them as P  X  a   1  P  X  a  and
P  X  a   1  P X  a  , because n may not be definite and hence we cannot go up to the last value and hence
the probability is written in terms of its complementary probability.
Example 2: If the probability that an individual suffers a bad reaction from an injection of a given serum is
0.001, determine the probability that out of 500 individuals
i) exactly 3, more than 2 individuals suffer from bad reaction
Solution: Let X be the Poisson variate, “Number of individuals suffering from bad reaction”. Then,
n = 1500, p = 0.001,   = np = (1500) (0.001) = 1.5
 By Poisson distribution,
e1.5 . 1.5
e  x
P X  x   , x  0, 1, 2, ...  ; x  0, 1, 2, ...
x x
e 1.5 . 1.5   0.2231 3.375

i) The desired probability = P[X = 3]   = 0.1255

3 6

ii) The desired probability  P  X  2  1  P X  2 = 1   P  X  2  P  X  1  P  X  0

 e1.5 . 1.52 e1.5 . 1.5 1 e1.5 . 1.5 0 
 1    
 2 1 0 
 2.25 
 1  e 1.5   1.5  1  1   3.625 e1.5  1   3.625 0.2231 = 1 – 0.8087 = 0.1913
 2 
Example 3: If the mean of a Poisson distribution is 1.44, find the values of variance
Solution: Here, mean = 1.44  = 1.44
Hence, Variance =  = 1.44
Example 4: If a Poisson variate X is such that P[X = 1] = 2P[X = 2], find the mean and variance of the
Solution: Let  be the mean of the distribution, hence by Poisson distribution,
e  x
P X  x   ; x  0, 1, 2, ...
Now, P X  1  2P X  2
e .1 e . 2
 2   = 2  2   = 0  (  1) = 0   = 0, 1
1 2
But  = 0 is rejected
[Q if  = 0 then either n = 0 or p = 0 which implies that Poisson distribution
does not exist in this case.]
Hence mean =  = 1, and Variance =  = 1.

Example 1: It is known that the number of heavy trucks arriving at a railway station follows the Poisson
distribution. If the average number of truck arrivals during a specified period of an hour is 2, find the
probabilities that during a given hour
a) no heavy truck arrive, b)at least two trucks will arrive.
Solution: Here, the average number of truck arrivals is 2
i.e. mean = 2   = 2
Let X be the number of trucks arrive during a given hour,
e  x e  2 
2 x

 by Poisson distribution, we have P  X  x    ; x  0, 1, 2, ...

x x
See the table given 
e2 20
(a) P[arrival of no heavy truck] = P[X = 0] =  e = 0.1353 in the Appendix at 

0  
 the end of this unit 
(b) P[arrival of at least two trucks] = P  X  2  P  X  2  P X  3  ...  1   P  X  1  P  X  0
 e2 20 e2 21  2  2
21 
 = 1  e 1  2  1   0.1353 3  0.5941
 1     1  e  
 0 1   0 1
E1) Assume that the chance of an individual coal miner being killed in a mine accident during a year is .
Use the Poisson distribution to calculate the probability that in a mine employing 350 miners, there will be
at least one fatal accident in a year. (use e 0.25  0.78 )
E2) The mean and standard deviation of a Poisson distribution are 6 and 2
respectively. Test the validity of this statement.
E3) For a Poisson distribution, it is given that P[X = 1] = P[X = 2], find the value of mean of distribution.
Hence find P[X = 0] and P[X = 4].

1. A policeman fires 6 bullets on a dacoit. The probability that the dacoit will be killed by a bullet is 0.6.
What is the probability that the dacoit is still alive?
2. It has been claimed that in 60% of all solar heat installations, the utility bill is reduced by at least one-
third. Accordingly, what are the probabilities that the utility bill will be reduced by one-third in
i) four of five installations?
ii) at least four of the five installations?
3. An oil exploration firm plans to drill six holes. It is believed that the probability that each hole will yield
oil is 0.1. Since the holes are in quite different locations, the outcome of drilling one hole is statistically
independent of that of drilling any of the other holes.
(a) If the firm will be able to stay in business only if two or more holes produce oil, what is the
probability of its staying in business?
(b) Give the expected value of the number of holes that result in oil.
4. If a bank receives on an average  = 6 bad checks per day, what is the probability that he will receive 4
bad checks on any given day.
5. A hospital has 20 kidney dialysis machines and that the chance of any one of them malfunctioning
during any day is .02. We want to find the probability that exactly 3 machines will be out of service on
the same day. Then,
i) Can we use the binomial formula to find this probability? If yes, calculate the probability.
ii) Can we use the Poisson formula to find this? If yes calculate the probability.

Uniform Distribution
Definition : A random variable X is said to follow the uniform distribution in the interval [a, b], where a < b if
its probability density function (pdf) is given by:

, if a  X  b
f(x) =
0, otherwise

b  a
Mean = and Variance =
2 12

E 14) Suppose that the weight of sugar obtained processing a tank of sugar cane juice is uniformly
distributed with a mean of 10 kg. and range of 1.8 kg. Then

i) What are the largest and smallest weights of sugar obtained from a tank of sugar can juice?

ii) What is the probability that a tank of juice will yield sugar weighing between 9 kg. and 10.5 kg.?

E 15) A train is due to arrive at 5.30 p.m. but in practice is equally likely to arrive at any time between 2
minutes early and 30 minutes late. Let the time of arrival (expressed as minutes from due time) be X.
Sketch the pdf f(x) of the r.v. X and shade the areas given bellow

1) The probability that the train is less than 10 minutes late.

2) The probability that the train is late, but less than 16 minutes late.
6. Verify whether the following situations can be described by uniform distribution or not?
a) The average life span of a life bulb produced by a manufacturing company.
b) The number of defective items produced by an assembly process.

Normal Distribution
Definition: A continuous random variable X is said to follow normal distribution with parameters 
(      ) and 2(>0) if its probability density function (pdf) is given by
1  x  
1   
f x  e 2  
,   x  ;
 2
Mean   and Variance 2

In short we write normal distribution as X : N(, 2 ) and read as X follows normal distribution with
mean  and variance 2
i) The curve of the normal distribution is bell-shaped as shown

ii) The curve of the distribution is completely symmetrical about x =  i.e. if we fold the curve at x  , both
the parts of the curve are the mirror images of each other.
iii) For normal distribution, Mean = Median = Mode
iv) Area property:

P     X       f  x  dx  0.6827,
 2 

P   2  X    2   f  x  dx  0.9544,
 2 
 3

P   3  X    3   f  x  dx  0.9973.

Standard Normal Distribution(SND)

X  mean X  
If we put Z   in the normal distribution then the normal distribution transfer in standard
SD 
normal distribution as
1  12 z2
f z  e ,   z  
The mean of SND is 0 and variance is 1. That is if X : N(, 2 ) then Z : N(0,1)
Note: To find out the probability by using the normal distribution first of all we have transformed normal
X  mean X  
variate ‘X’ to standard normal variate (S.N.V.) Z as Z   .
SD 
b 1  x  
1  2   
This is because, the computation of  e dx requires construction of separate tables for different
a 2
values of  and  as the normal variate X may have any values of mean and standard deviation and hence
different tables are required for different  and . So, infinitely many tables are required to be constructed
which is impossible. But beauty of standard normal variate is that its mean is always ‘0’ and standard deviation
is always ‘1’ as shown in Unit 13. So, whatever the values of mean and standard deviation of a normal variate
be, the mean and standard deviation on transforming it to the standard normal variate are always ‘0’ and ‘1’
respectively and hence only one table is required.
Example 1: If the r.v. X is normally distributed with mean 80 and standard deviation 5, then find
(i) P  X  95 , (ii) P  X  72 , (iii) P 60.5  X  90 ,

Solution: Here we are given that X is normally distributed with mean 80 and standard deviation (S.D.) 5.
i.e. Mean =   80 and var iance  2  (S.D.)2  25.
X   X  80
If Z is the S.N.V., then Z  
 5
95  80 15
(i) X = 95, Z =  3
5 5
 X  80 95  80 
 P  X  95  P    P  Z  3
 5 5 

= 0.5  P 0  Z  3 = 0.5 – 0.4987 = 0.0013Using table area under normal curve]

72  80 8
ii) X = 72, Z =   1.6
5 5
 P  X  72  P  Z  1.6

Q normal curveis symmetrical 

= P  Z  1.6  
about the line Z  0 
= 0.5  P 0  Z  1.6

= 0.5 – 0.4452 [Using table area under

normal curve]
= 0.0548
60.5  80 19.5
(iii) X = 60.5, Z    3.9
5 5
90  80 10
X = 90, Z   2
5 5
 P 60.5  X  90  P 3.9  X  2

= P  3.9  X  0  P 0  Z  2

Q normal curve is 
P 0  X  3.9  P 0  Z  2 symmetrical about 
 the line Z  0 

= 0.5000+ 0.4772
= 0.9772
7. A filling machine is set to pour 952 ml (milimetres) of oil into bottles. The amounts of fill are normally
distributed with a mean of 952 ml, and a standard deviation of 4 ml. use the standard normal table to
find the probability that a bottle contains oil between 952 and 956 ml.
8. For each of these write down the equivalent standard normal probability.
a) The number of people who visit a historic monument in a week is normally distributed with a mean
of 10,500 and a standard deviation of 600. Consider the probability that fewer than 9000 people visit
in a week.
b) The number of cheques processed by a bank each day is normally distributed with a mean of 30,100
and a standard deviation of 2450. Consider the probability that the bank processes more that 32,000
cheques in a day.

A group of elements or units under study by an analyst is called Population. For example, the collection of
books in a library, the particles in a salt bag, the rivers in India, the students in a classroom, etc. are considered
as populations in Statistics.
The total number of elements / items / units / observations in a population is known as population size and
denoted by N. The characteristic under study may be denoted by X or Y.
A sample is a part / fraction / subset of the population. The procedure of drawing a sample from the population
is called sampling. The number of units selected in a sample is known as sample size and it is denoted by n.
Complete Survey and Sample Survey
(1) Complete Survey or Complete Enumeration or Census
When each and every element or unit of the population is investigated or studied for the characteristics under
study then we call it complete survey or census. For example, suppose we want to find out the average height
of the students of a study centre then if we measure the height of each and every student of this study centre to
find the average height of the students then such type of survey is called complete survey.
(2) Sample Survey or Sample Enumeration
When only a part or a small number of elements or units (i.e. sample) of population are investigated or studied
for the characteristics under study then we call it sample survey or sample enumeration. In the above
example, if we select some students of this study centre and measure the height to find average height of the
students then such type of survey is called sample survey.
Simple Random Sampling or Random Sampling
A sampling technique is said to be simple random sampling if the sample is drawn in such a way that each
element or unit of the population has an equal and independent chance of being included in the sample. If a
sample is drawn by this method then it is known as a simple random sample or random sample. The random
sample of size n is denoted by X1 , X 2 , ..., X n or Y1 , Y2 , ..., Yn and the observed value of this sample is denoted
by x1 , x 2 , ..., x n or y1 , y 2 , ..., y n .

(1) Simple Random Sampling without Replacement (SRSWOR)

In simple random sampling, if the elements or units are selected or drawn one by one in such a way that an
element or unit drawn at a time is not replaced back to the population before the subsequent draws is called
SRSWOR. If we draw a sample of size n from a population of size N without replacement then total number of
possible samples is N C n . For example, consider a population that consists of three elements, A, B and C.
Suppose we wish to draw a random sample of two elements then N = 3 and n = 2. The total number of possible
random samples without replacement is N Cn  3 C2  3 as (A, B), (A, C) and (B, C).
(2) Simple Random Sampling with Replacement (SRSWR)
In simple random sampling, if the elements or units are selected or drawn one by one in such a way that a unit
drawn at a time is replaced back to the population before the subsequent draw is called SRSWR. In this
method, the same element or unit can appear more than once in the sample and the probability of selection of a
unit at each draw remains same i.e. 1/N. In this method, total number of possible samples is Nn. In above
example, the total number of possible random samples with replacement is N n  32  9 as (A, A), (A, B), (A,
C), (B, A), (B, B), (B, C), (C, A), (C, B) and (C, C).
Example 2: If population size is 6 then how many samples of size of 4 are possible with replacement?
Solution: Here, we are given that Population size = N = 6 and Sample size = n = 4
Since we know that all possible samples of size n taken from a population of size N with replacement are Nn so
in our case Nn = 64 = 1296.
A parameter is a function of population values which is used to represent the certain characteristic of the
population. For example, population mean, population variance, population coefficient of variation, population
correlation coefficient, etc. are all parameters. Population parameter mean usually denoted by µ and population
variance denoted by σ2.
Any quantity calculated from sample values and does not contain any unknown parameter is known as statistic.
For example, if X1 , X 2 , ..., X n is a random sample of size n taken from a population with mean µ and variance
1 n
σ2 (both are unknown) then sample mean X   X i is a statistic whereas X   and X /  are not statistics
n i 1
because both are function of unknown parameters.
Sample Mean and Sample Variance
If X1 , X 2 , ..., X n is a random sample of size n taken from a population whose probability density(mass)
function f(x, θ) then sample mean is defined as
X1  X 2  ...  X n 1 n
X   Xi
n n i 1

And sample variance is defined as

1 n
  Xi  X 
S2 
n  1 i 1
Statistical Inference
Generally population parameters are unknown and when the population is too large or the units of the
population are destructive in nature or there is a limited resources and manpower available then it is not
possible practically to examine each and every unit of the population to obtain the population parameters. In
such situations, one can draw sample from the population under study and utilize sample observations to draw
reliable conclusions about the population parameters.
The technique of drawing the reliable conclusions about the population on the basis of the sample drawn from
the population is known as statistical inference. The statistical inference may be divided into two areas or
(i) The population parameters are unknown and we may want to guess the true value of the unknown
parameters on the basis of a random sample drawn from the population. This type of problem is known as
(ii) Some information is available about the population or parameter and we may like to verify whether the
information is true or not on the basis of a random sample drawn from the population. This type of
problem is known as “Testing of hypothesis”.
Sampling Distribution of Mean
A list of all possible values for a sample mean with probability associated with each value is called a sampling
distribution of the mean.
Consider a population comprising four typists who type the sample page of a manuscript. The number of errors
made by each typist is shown below:
Typist Number of Errors
A 4
B 2
C 3
D 1
i) Calculate the population mean
ii) How many samples of size 2 are possible?

iii) Construct the sampling distribution of means by taking samples of size 2 and organise the data.
iv) Calculate the mean of the sampling distribution and compare it with the population mean.

Solution The population mean (average number of errors) can be obtained as

4  2  3 1
μ  2. 5
Number of possible samples of size 2 with replacement are Nn = 42 = 16. The possible sample and sample mean
of each sample are shown on the following table
Sample No Sample in Term of Sample Sample Mean (X)
Typist Observation
1 (A, A) (4, 4) 4.0
2 (A, B) (4, 2) 3.0
3 (A, C) (4, 3) 3.5
4 (A, D) (4, 1) 2.5
5 (B, A) (2, 4) 3.0
6 (B, B) (2, 2) 2.0
7 (B, C) (2, 3) 2.5
8 (B, D) (2, 1) 1.5
9 (C, A) (3, 4) 3.5
10 (C, B) (3, 2) 2.5
11 (C, C) (3, 3) 3.0
12 (C, D) (3, 1) 2.0
13 (D, A) (1, 4) 2.5
14 (D, B) (1, 2) 1.5
15 (D, C) (1, 3) 2.0
16 (D, D) (1, 1) 1.0

the sampling distribution of sample mean is shown as

S. No. X Frequency(f) Probability(p)
1 1.0 1 1/16 = 0.0625
2 1.5 2 2/16 = 0.1250
3 2.0 3 3/16 = 0.1875
4 2.5 4 4/16 = 0.2500
5 3.0 3 3/16 = 0.1875
6 3.5 2 2/16 = 0.1250
7 4.0 1 1/16 = 0.0625

E) The ages of six executives of a company are

Name Age
Mr. Ravi 54
Mrs. Veena 50
Mrs. Shanti 52
Mr. Suresh 48
(i) How many samples of size 3 are possible?
(ii) Construct the sampling distribution of means by taking samples of size3 and organise the data.
(iii)Calculate the mean of the sampling distribution and compare it with the population mean.
E4) If lives of 3 Televisions of certain company are 8, 6 and 10 years then construct the sampling distribution
of average life of Televisions by taking all samples of size 2.

Note- Mean of the sampling distribution of the mean is equal to the population mean that is X  
If the samples are drawn from normal population with mean µ and variance σ2 then the sampling distribution of
mean X is also normal distribution with mean µ and variance σ2/n, that is,
 2 
If X i ~ N  , 2  then X ~ N  , 
 n 
The standard deviation of a sampling distribution of a statistic is known as standard error and it is denoted by
SE. If X1 , X 2 , ..., X n is a random sample of size n taken from a population with mean µ and variance σ2 then
the standard errors of sample mean ( X ) is given by

Nn 
SE  X  
n 1 n
Where N- population size, n- sample size, σ – population standard deviation

If population is infinite or very large then the SE of sample mean ( X ) is SE  X  
Example 3: Diameter of a steel ball bearing produced by a semi-automatic machine is known to be distributed
normally with mean 12 cm and standard deviation 0.1 cm. If we take a random sample of size 10 with
replacement then find standard error of sample mean for estimating the population mean of diameter of steel
ball bearing for whole population.
Solution: Here, we are given that  = 12, σ = 0.1, n = 10
Since the sampling is done with replacement therefore the standard error of sample mean for estimating
population mean is given by
 0.1
SE  x     0.03
n 10
Example 1: Diameter of a steel ball bearing produced on a semi-automatic machine is known to be distributed
normally with mean 12 cm and standard deviation 0.1 cm. If we take a random sample of size 10 then find
(i) Mean and variance of sampling distribution of mean.
(ii) The probability that the sample mean lies between 11.95 cm and 12.05 cm.

Sampling Distribution of Proportion

A list of all possible values for a sample proportion mean with probability associated with each value is called a
sampling distribution of the proportion.
Example Suppose, there is a lot of 3 cartons A, B & C of electric bulbs and each carton contains 20 bulbs. The
number of defective bulbs in each carton is given below:
Carton Number of Defectives Bulbs
A 2
B 4
C 1
(i) Calculate the population proportion of defective bulbs
(ii) How many samples of size 3 are possible?
(iii)Construct the sampling distribution of proportion by taking samples of size 2 and organise the data.
(iv) Calculate the proportion of the sampling distribution and compare it with the population proportion.
The population proportion of defective bulbs can be obtained as
2  4 1 7
P 
20  20  20 60
Sample Sample Carton Sample Observation Sample Proportion(p)
1 (A, A) (2, 2) 4/40

2 (A, B) (2, 4) 6/40
3 (A, C) (2, 1) 3/40
4 (B, A) (4, 2) 6/40
5 (B, B) (4, 4) 8/40
6 (B, C) (4, 1) 5/40
7 (C, A) (1, 2) 3/40
8 (C, B) (1, 4) 5/40
9 (C, C) (1, 1) 2/40

Since there are 9 possible samples therefore the probability of selecting a sample is 1/9. Then we arrange the
possible sample proportion with their respective probability in Table 2.3 given in next page:

S.No. Sample Proportion(p) Frequency Probability

1 2/40 1 1/9
2 3/40 2 2/9
3 4/40 1 1/9
4 5/40 2 2/9
5 6/40 2 2/9
6 8/40 1 1/9

This distribution is called the sampling distribution of sample proportion. Thus, we can define the sampling
distribution of sample proportion as:
Note- Mean of the sampling distribution of the proportion is equal to the population proportion that is
p  P.
Where N- population size, n- sample size, P – population proportion
If the sample size is large n > = 30 sampling distribution of proportion X is also normal distribution with mean
P(1  P)
P and variance , that is,
 P(1  P) 
If X i ~ N  , 2  then p ~ N  P, 
 n 
Standard errors of sample proportion p is given by

 N  n  P(1  P)
SE  p    
 n 1  n
If population is infinite or very large then the SE of sample proportion is given by

P(1  P)
SE  p  
Example 3: A machine produces a large number of items of which 15% are found to be defective. If a random
sample of 200 items is taken from the population and sample proportion is calculated then find
(i) Mean and standard error of sampling distribution of proportion.

(ii) The probability that less than or equal to 12% defectives are found in the sample.
Estimator and Estimate
Generally, population parameters are unknown and the whole population is too large to find out the parameters.
Since the sample drawn from a population always contains some or more information about the population,
therefore in such situations, we guess or estimate the value of the parameter under study based on a random
sample drawn from that population.
Any statistic which is used to estimate an unknown population parameter then it is known as estimator and the
value of the estimator based on observed value of the sample is known as estimate of parameter. For example,

if we want to estimate the average height () of students in a college with the help of sample mean X then
X is the estimator and its particular value, say, 165 cm is the estimate of the population average height () .

In many real-life problems, the population parameter(s) is (are) unknown and someone is interested to obtain
the value of parameter. But, if the whole population is too large to study or the units of the population are
destructive in nature or there is a limited resources and manpower available then it is not practically convenient
to examine each and every unit of the population to find the value of parameter. In such situations, we can draw
a sample from the population under study and utilize sample observations to find/ estimate the parameter.
The technique of estimating (finding) the unknown parameter with the help of sample observations is called
Estimation is categorised into two categories namely:
Point estimation
If we find a single value with the help of sample observations which is taken as the estimate value of unknown
parameter then this value is known as point estimate and the technique of estimating the unknown parameter
with a single value is known as “point estimation”.
Interval estimation
If we compute an interval on the basis of sample observation, which will contain the parameter with certain
probability (confidence) then this interval is known as interval estimate of the parameter and this technique of
estimating is known as “interval estimation”. This is also called Confidence interval.
For example, if we estimate the average weight of men living in a colony on the basis of sample mean, say, 62
kg then 62 kg is called point estimate of average weight of men in the colony and this procedure is called as
point estimation. If we estimate the average weight of men by an interval, say, [40,110] with 90% confidence
that true value of the weight lie in this interval then this interval is called interval estimate and this procedure is
called as interval estimation.
Criteria (properties) of Good Estimator
For a parameter there may exist more than one estimator. For example, for estimating population mean sample
mean , (Xmax+Xmin)/2, sample median, etc are the estimators. So question may arise which one is the good
estimator. So an estimator is said to the good if it follows the following properties:
An estimator (T) is said to be unbiased for the population parameter (θ) if and only if the average or mean of
the sampling distribution of the estimator is equal to the true value of the parameter.
E (T )  θ ; for all θ  Θ
This property of estimator is called unbiasedness.
But if the expected value of the estimator does not equal to the true value of parameter, then the estimator is
said to be “biased estimator”, that is, if
E (T )  θ
then estimator T is called biased estimator of .
1.Sample mean is an unbiased estimator for the population mean.
2.Sample proportion is an unbiased estimator for the population proportion.
Example 2: A random sample of 10 cadets of a centre is selected and measures their weights (in kg) which are
given below:
48, 50, 62, 75, 80, 60, 70, 56, 52, 78
Determine an unbiased estimate of the average weight of cadets of the centre.

Solution: We know that sample mean (X) is an unbiased estimator of the population mean and its particular
value is the unbiased estimate of population mean, therefore,
1 n X  X 2  ...  X n 48  50  62  75  80  60  70  56  52  78
X 
n i1
Xi  1

 63.10

Hence, an unbiased estimate of the average weight of cadets of the centre is 63.10 kg.
An unbiased estimator T1 of a parameter  is said to be more efficient than another estimator T2 of  if
Var  T1   Var  T2  for all n
Example 7: Show that sample mean is more efficient estimator than sample median for estimating mean of
normal population.
Solution: If X1 , X 2 , ..., X n is a random sample taken from normal population with mean  and variance σ2then
 2 
we know that sample mean ( X ) and sample median ( X ) are normally distributed as X : N  ,  and
 n 
%: N  ,    .
X  
 2 n 
σ2 πσ 2
Therefore, Var  X  
and Var  %
X  

2 2   2 
But  Q and  1 therefore, Var  X   Var  X
% . Thus, we conclude that sample mean is more
2n n  2 n 
efficient estimator than sample median.
   
Confidence interval for the mean when variance is known  X  z  / 2 , X  z/ 2
 n n 

 S S 
Confidence interval for the mean when variance is unknown  X  t  n 1,  / 2 , X  t  n 1,  / 2
 n n 

 p1  p  p1  p  
Confidence interval for Population proportion p  z α / 2 , p  zα / 2 
 n n 

Example 1: The mean life of the tyres manufactured by a company follows normal distribution with standard
deviation 3200 kms. A sample of 250 tyres is taken and it is found that the average life of the tyres is 50000
kms with a standard deviation of 3500 kms. Establish the 99% confidence interval within which the mean life
of tyres of the company is expected to lie.
Solution: Here, we are given that n  250,   3200, X  50000, S  3500
Since population standard deviation i.e. population variance σ2 is known, therefore, we use

X mz  / 2
For 99% confidence interval, we have 1    0.99    0.01. For  = 0.01 we have, z / 2  z0.005  2.58.

Therefore, the 99% confidence limits are X m2.58
By putting the values of n, X and σ, the 99% confidence limits are

50000 m2.58  or 50000 m522.20  49477.80 and 50522.20
Hence, 99% confidence interval within which the mean life of tyres of the company is expected to lie is
49477.80, 50522.20
Example 2: It is known that the average weight of students of a Study Centre of IGNOU follows normal
distribution. To estimate the average weight, a sample of 10 students is taken from this Study Centre and
obtained mean sand SD 63 and 11.79, respectively. Compute the 95% confidence interval for the average
weight of students of Study Centre of IGNOU.
Solution: Since population variance is unknown, therefore we use the confidence limits for the average weight
of students of Study Centre are given by
X mt  n 1,  / 2
For 95% confidence interval, we have 1    0.95    0.05. Also from t-table, we
have, t  n1,  / 2  t 9,0.025  2.306. Thus, the 95% confidence limits are

S 11.79
X mt  n 1, 0.025  63 m2.306   63 m8.60  54.40 and 71.60
n 10
Hence, required 95% confidence interval for the average weight of students of Study Centre of IGNOU is
54.4, 71.60
Example 4: A sample of 200 voters is chosen at random from all voters in a given city. 60% of them were in
favour of a particular candidate. If large number of voters cast their votes then find 99% and 95% confidence
intervals for the proportion of voters in favour of a particular candidate.
Solution: Here, we are given
n = 200, p  0.60

p 1  p 
Confidence limits for the proportion are p mz  / 2
For 99% confidence interval, we have 1    0.99    0.01. For  = 0.01, we have z0.005  2.58 and for  =
0.05, z0.025  1.96.
Therefore, 99% confidence limits of voters in favour of a particular candidate are
p 1  p  0.60  0.40
p mz 0.005  0.60 m2.58 
n 200
 0.60 m2.58  0.03
 0.60 m0.08  0.52 and 0.68
Hence, required 99% confidence interval [0.52, 0.68]
p 1  p 
Similarly, 95% confidence limits p mz 0.025  0.60 m1.96  0.03  0.60 m0.06  0.54 and 0.66
Hence, 95% confidence interval [0.54, 0.66]

In our day-to-day life, we see different commercials advertisements in television, newspapers, magazines, etc.
and if someone may be interested to test such type of claims or statement then we come across the problem of
testing of hypothesis. For example,
(i) a customer of motorcycle wants to test whether the claim of motorcycle of certain brand gives the average
mileage 60 km/liter is true or false,
(ii) the businessman of banana wants to test whether the average weight of a banana of Kerala is more than
200 gm,
(iii) a doctor wants to test whether new medicine is really more effective for controlling blood pressure than
old medicine,
(iv) an economist wants to test whether the variability in incomes differ in two populations,
(v) a psychologist wants to test whether the proportion of literates between two groups of people is same, etc.
In all the cases discussed above, the decision maker is interested in making inference about the population
parameter(s). Here we are interested in testing a claim or statement or assumption about the value of population
parameter(s). Such claim or statement is postulated in terms of hypothesis.
In statistics, a hypothesis is a statement or a claim or an assumption about the value of a population
parameter (e.g., mean, median, variance, proportion, etc.).
Similarly, in case of two or more populations a hypothesis is comparative statement or a claim or an
assumption about the values of population parameters. (e.g., means of two populations are equal, variance of
one population is greater than other, etc.). The plural of hypothesis is hypotheses.
In hypothesis testing problems first of all we should being identifying the claim or statement or assumption or
hypothesis to be tested and write it in the words. Once the claim has been identified then we write it in
symbolical form if possible. As in the above examples,
(i) Customer of motorcycle may write the claim or postulate the hypothesis “the motorcycle of certain brand
gives the average mileage 60 km/liter.” Here, we are concerning the average mileage of the motorcycle
so let µ represents the average mileage then our hypothesis becomes µ = 60 km / liter.
(ii) Similarly, the businessman of banana may write the statement or postulate the hypothesis “the average
weight of a banana of Kerala is greater than 200 gm.” So our hypothesis becomes µ > 200 gm.
(iii) Doctor may write the claim or postulate the hypothesis “ the new medicine is really more effective for
controlling blood pressure than old medicine.” Here, we are concerning the average effect of the
medicines so let µ1 and µ2 represent the average effect of new and old medicines respectively on
controlling blood pressure then our hypothesis becomes µ1 > µ2.
(iv) Economist may write the statement or postulate the hypothesis “ the variability in incomes differ in two
populations.” Here, we are concerning the variability in income so let 12 and 22 represent the variability
in incomes in two populations respectively then our hypothesis becomes 12  22 .
(v) Psychologist may write the statement or postulate the hypothesis “the proportion of literates between two
groups of people is same.” Here, we are concerning the proportion of literates so let P1 and P2 represent
the proportions of literates of two groups of people respectively then our hypothesis becomes P1 = P2 or P1
–P2 = 0.
The hypothesis is classified according to its nature and usage as we will discuss in subsequent subsections.
Simple and Composite Hypotheses
In general sense, if a hypothesis specifies only one value or exact value of the population parameter then it is
known as simple hypothesis. And if a hypothesis specifies not just one value but a range of values that the
population parameter may assume is called a composite hypothesis.

As in the above examples, the hypothesis postulated in (i) µ = 60 km/liter is simple hypothesis because it gives
a single value of parameter (µ = 60) whereas the hypothesis postulated in (ii) µ > 200 gm is composite
hypothesis because it does not specify the exact average value of weight of a banana. It may be 260, 350, 400
gm or any other.
Similarly, (iii) µ1 > µ2 or µ1 −µ2 > 0 and (iv) 12  22 or 12  22  0 are not simple hypotheses because they
specify more than one value as µ1 −µ2 = 4, µ1 −µ2 = 7, 12  22  2, 12  22  5 , etc. and (v) P1 = P2 or P1 –P2
= 0 is simple hypothesis because it gives a single value of parameter as P1 –P2 = 0.
Null and Alternative Hypotheses
As we have discussed in last page that in hypothesis testing problems first of all we identify the claim or
statement to be tested and write it in symbolical form. After that we write the complement or opposite of the
claim or statement in symbolical form. In our example of motorcycle, the claim is µ = 60 km/liter then its
complement is µ ≠ 60 km/liter. In (ii) the claim is µ > 200 gm then its complement is µ ≤ 200 gm. If the claim
is µ < 200 gm then its complement is µ ≥ 200 gm. The claim and its complement are formed in such a way that
they cover all possibility of the value of population parameter.
Once the claim and its compliment have been established then we decide of these two which is the null
hypothesis and which is the alternative hypothesis. The thump rule is that the statement containing equality is
the null hypothesis. That is, the hypothesis which contains symbols  or  or  is taken as null hypothesis and
the hypothesis which does not contain equality i.e. contains  or  or  is taken as alternative hypothesis. The
null hypothesis is denoted by H0 and alternative hypothesis is denoted by H1 or HA.
In our example of motorcycle, the claim is µ = 60 km/liter and its complement is µ ≠ 60 km/liter. Since claim µ
= 60 km/liter contains equality sign so we take it as a null hypothesis and complement µ ≠ 60 km/liter as an
alternative hypothesis, that is,
H0: µ = 60 km/liter and H1: µ ≠ 60 km/liter
In our second example of banana, the claim is µ > 200 gm and its complement is µ ≤ 200 gm. Since
complement µ ≤ 200 gm contains equality sign so we take complement as a null hypothesis and claim µ > 200
gm as an alternative hypothesis, that is,
H0: µ ≤ 200 gm and H1: µ > 200 gm
Formally these hypotheses are defined as
The hypothesis which we wish to test is called as the null hypothesis.
According to Prof. R.A. Fisher,
“A null hypothesis is a hypothesis which is tested for possible rejection under the assumption that it is
The hypothesis which complements to the null hypothesis is called alternative hypothesis.
Note 1: Some authors use equality sign (=) in null hypothesis instead of ≥ and ≤ signs.
The alternative hypothesis has two types:
(i) Two-sided (tailed) alternative hypothesis
(ii) One-sided (tailed) alternative hypothesis
If the alternative hypothesis gives the alternate of null hypothesis in both directions (less than and greater than)
of the value of parameter specified in null hypothesis then it is known as two-sided alternative hypothesis and if
it gives an alternate only in one direction( less than or greater than) only then it is known as one-sided
alternative hypothesis. For example, if our alternative hypothesis is H1: θ ≠ 60 then it is a two-sided alternative
hypothesis because its means that the value of parameter θ is greater than or less than 60. Similarly, if H1: θ >
60 then it is a right-sided alternative hypothesis because its means that the value of parameter θ is greater than
60 and if H1: θ < 60 then it is a left-sided alternative hypothesis because its means that the value of parameter θ
is less than 60.

In order to test a hypothesis, the entire sample space is partitioned into two disjoint sub-spaces, say,
 and S    . If the calculated value of the test statistic lies in ω , then we reject the null hypothesis and if it
lies in ω , then we do not reject the null hypothesis. The region  is called a “rejection region or critical
region” and the region  is called a “non-rejection region”. Therefore, we can say that
“A region in the sample space in which if the calculated value of the test statistic lies, we reject the null
hypothesis then it is called critical region or rejection region.”
The rejection (critical) region lies in one-tail or two-tails on the probability curve of sampling distribution of
the test statistic its depends upon the alternative hypothesis. Therefore, three cases arise:
Case I: If the alternative hypothesis is right-sided such as H1: θ > θ0 or H1: θ1 > θ2 then the entire critical or
rejection region of size α lies on right tail

Critical value is a
value or values that
separate the region of
rejection from the non-
rejection region.

Case II: If the alternative hypothesis is left-sided such as H1: θ < θ0 or H1: θ1 < θ2 then the entire critical
or rejection region of size α lies on left tail

Case III: If the alternative hypothesis is two sided such as H1: θ ≠ θ0 or H1: θ1 ≠ θ2 then critical or rejection
regions of size α/2 lies on both tails


We have a rule that if the value of test statistic falls in rejection (critical) region then we reject the null
hypothesis and if it falls in the non-rejection region then we do not reject the null hypothesis. A test statistic is
calculated on the basis of observed sample observations. But a sample is a small part of the population about
which decision is to be taken. A random sample may or may not be a good representative of the population.
A faulty sample misleads the inference (or conclusion) relating to the null hypothesis. For example, an engineer
infers that a packet of screws is sub-standard when actually it is not. It is an error caused due to poor or
inappropriate (faulty) sample. Similarly, a packet of screws may infer good when actually it is sub-standard. So
we can commit two kinds of errors while testing a hypothesis which are summarised in the following table:
Decision H0 True H1 True
Reject H0 Type-I Error Correct Decision

Do not reject H0 Correct Decision Type-II Error

Let us take a situation where a patient suffering from high fever reaches to a doctor. And suppose the doctor
formulates the null and alternative hypotheses as
H0: The patient is a malaria patient
H1: The patient is not a malaria patient
Then following cases arise:
Case I: Suppose that the hypothesis H0 is really true, that is, patient actually a malaria patient and after
observation, pathological and clinical examination, the doctor rejects H0, that is, he / she declares him
/ her a non-malaria-patient. It is not a correct decision and he / she commits an error in decision
known as type-I error.
Case II: Suppose that the hypothesis H0 is actually false, that is, patient actually a non-malaria patient and
after observation, the doctor rejects H0, that is, he / she declares him / her a non-malaria-patient. It is
a correct decision.
Case III: Suppose that the hypothesis H0 is really true, that is, patient actually a malaria patient and after
observation, the doctor does not reject H0, that is, he / she declares him / her a malaria-patient. It is a
correct decision.
Case IV: Suppose that the hypothesis H0 is actually false, that is, patient actually a non-malaria patient and
after observation, the doctor does not reject H0, that is, he / she declares him / her a malaria-patient. It
is not a correct decision and he / she commits an error in decision known as type-II error.
Thus, we formally define type-I and type-II errors as below:
Type-I Error: (important)
The decision relating to rejection of null hypothesis H0 when it is true is called type-I error. The probability of
committing the type-I error is called size of test, denoted by  and is given by
 = P [Reject H0 when H0 is true] = P [Reject H0 / H0 is true]
Type-II Error: (important)
The decision relating to non-rejection of null hypothesis H0 when it is false (i.e. H1 is true) is called type-II
error. The probability of committing type-II error is generally denoted by  and is given by
 = P[Do not reject H0 when H0 is false]= P[Do not reject H0 when H1 is true]
= P[Do not reject H0 / H1 is true]= P[ X  ω / H1 ] where,  is the non-rejection region.
Power of the test
1- = 1-P[Do not reject H0 / H1 is true]
= P[Reject H0 / H1 is true] = P[Correct decision]

The probability of type-I error is known as level of significance of a test. It is also called the size of the test or
size of critical region, denoted by α. Generally, it is pre-fixed as 5% or 1% level (α = 0.05 or 0.01).


A test of testing the null hypothesis is said to be two-tailed test if the alternative hypothesis is two-tailed
whereas if the alternative hypothesis is one-tailed then a test of testing the null hypothesis is said to be one-
tailed test.
For example, if our null and alternative hypothesis are
H 0 :   0 and H1 :   0
then the test for testing the null hypothesis is two-tailed test because the alternative hypothesis is two-tailed that
means, the parameter θ can take value greater than θ0 or less than θ0.
If the null and alternative hypotheses are
H 0 :   0 and H1 :   0
then the test for testing the null hypothesis is right-tailed test because the alternative hypothesis is right-tailed.
Similarly, if the null and alternative hypotheses are
H 0 :   0 and H1 :   0
then the test for testing the null hypothesis is left-tailed test because the alternative hypothesis is left-tailed.
Step I: First of all, we have to setup null hypothesis H0 and alternative hypothesis H1. Suppose, we want to
test the hypothetical / claimed / assumed value θ0 of parameter θ. So we can take the null and
alternative hypotheses as
H0 :   0 and H1 :   0 for two-tailed test 
H 0 :   0 and H1 :   0 
or  for one-tailed test 
H 0 :   0 and H1 :   0 
In case of comparing same parameter of two populations of interest, say, 1 and 2, then our null and
alternative hypotheses would be
H 0 : 1  2 and H1 : 1  2 for two-tailed test 
H 0 : 1  2 and H1 : 1  2 
or  for one-tailed test 
H 0 : 1  2 and H1 : 1  2 
Step II: After setting the null and alternative hypotheses, we decide the level of significance (), at which
we want to test our hypothesis. Generally, it is taken as 5% or 1% (α = 0.05 or 0.01).
Step III: The third step is to choose an appropriate test statistic under H0 for testing the null hypothesis as
given below:
Statistic  Value of the parameter under H0
Test statistic 
Standard error of the statistic
Step IV: Calculate the value of the test statistic described in Step III on the basis of observed sample
Step V: Obtain the critical (or cut-off) value(s) in the sampling distribution of the test statistic and construct
rejection (critical) region of size . Generally, critical values for various levels of significance are
putted in the form of a table for various standard sampling distributions of test statistic such as Z-
table, 2-table, t-table, etc.
Step VI: After that, compare the calculated value of test statistic obtained from Step IV, with the critical
value(s) obtained in Step V and locates the position of the calculated test statistic, that is, it lies in
rejection region or non-rejection region.
Step VII: In testing of hypothesis ultimately we have to reach at a conclusion. It is done as explained below:
(i) If calculated value of test statistic lies in rejection region at  level of significance then we reject
null hypothesis. It means that the sample data provide us sufficient evidence against the null
hypothesis and there is a significant difference between hypothesized value and observed value
of the parameter.
(ii) If calculated value of test statistic lies in non-rejection region at  level of significance then we
do not reject null hypothesis. Its means that the sample data fails to provide us sufficient
evidence against the null hypothesis and the difference between hypothesized value and
observed value of the parameter due to fluctuation of sample.

S.No Test For Hypotheses Condition Test Test Statistic Decision Rule
. Name
Population H0 :   0 and H1 :   0 When Z-test X  μ0 Reject H0 if
Mean population SD Z Zcal  ztab
H0 :   0 and H1 :   0  σ/ n
 is known 1
H0 :   0 and H1 :   0  where X   X

When t-test X  μ0 Reject H0 if

population SD t t cal  t (n 1)tab
S/ n ot
is unknown
 
1 herwise H0

where S  X  nX may be
n 1
When X  μ0 Reject H0 if
n >30 Z Zcal  ztab
S/ n
then we otherwise H0
apply Z- may be
test accepted

Population H0 : P  P0 and H1 : P  P0 Z-test P  P0 Reject H0 if

Z Zcal  ztab
proportion H0 : P  P0 and H1 : P  P0  P0 (1  P0 )
 N otherwise H0
H0 : P  P0 and H1 : P  P0  may be

Difference H0 : 1  2 and H1 : 1  2 When Z-test XY Reject H0 if

of two population SDs Z Zcal  ztab
H0 : 1  2 and H1 : 1   2  σ12 σ 22
means are known  otherwise H0
 n1 n 2
H0 : 1  2 and H1 : 1   2  may be

When t-test XY Reject H0 if

population SDs t where t cal  t (n1 n 2 2)
1 1
are unknown Sp  otherwise H0
n1 n 2
may be
Sp 
1 
n1  n 2  2   X 2
 nX   Y 2
 nY 
Sp 
n1  n 2  2
  n1  1  S12   n 2  1  S22 

When n1 XY Reject H0 if

or n2 >30 Z Zcal  ztab
S12 S22
then we  otherwise H0
n1 n 2
apply Z- may be
test accepted

Paired Sample Paired t- d Reject H0 if

(n1 = n2) test t t cal  t (n 1)tab
Sd / n ot
1 herwise H0
Where d =X-Y and d  d may be
Population H0 :   0 and H1 :   0 Sample size n< Chi- nS2 Reject H0 if
Variance 30 square χ2  2cal  2(n 1)tab
H0 :   0 and H1 :   0  σ 02
or SD  test otherwise H0
H0 :   0 and H1 :   0  may be
Two H0 : 1  2 and H1 : 1  2 When n1 or n2 < F-test S12 Reject H0 if
F Fcal  F(n1 1,n2 1)
Population H0 : 1  2 and H1 : 1  2  30 S22
Variances  otherwise H0
or SDs H0 : 1  2 and H1 : 1  2  may be

Critical Values for Z-test

Level of Two-Tailed Test One-Tailed Test

Significance (α)
Right-Tailed Test Left- Tailed Test

α = 0.05 (= 5%) zα/2 = 1.96 zα = 1.645 zα = 1.645

α = 0.01 (= 1%) zα/2 = 2.58 zα = 2.33 zα = −2.33

Example 1: A light bulb company claims that the 100-watt light bulb it sells has an average life of 1200 hours
with a standard deviation of 100 hours. For testing the claim 50 new bulbs were selected randomly and allowed
to burn out. The average lifetime of these bulbs was found to be 1180 hours. Is the company’s claim is true at
5% level of significance?
Solution: Here, we are given that
Specified value of population mean = 0 = 1200 hours,
Population standard deviation = σ = 100 hours,
Sample size = n = 50
Sample mean = X = 1180 hours.
In this example, the population parameter being tested is population mean i.e. average life of a bulb (µ) and we
want to test the company’s claim that average life of a bulb is 1200 hours. So our claim is  = 1200 and its
complement is  ≠ 1200. Since claim contains the equality sign so we can take the claim as the null hypothesis
and complement as the alternative hypothesis. So
H0 :   0  1200 average life of a bulb is 1200 hours (claim)
H1 :   1200 average life of a bulb is not1200 hours
Also the alternative hypothesis is two-tailed so the test is two-tailed test.
Here, we want to test the hypothesis regarding mean when population SD (variance) is known and sample size n
= 50(> 30) is large. So we will go for Z-test.
Thus, for testing the null hypothesis the test statistic is given by
X  0 1180  1200 20
Z    1.41
/ n 100 / 50 14.14
The critical (tabulated) values for two-tailed test at 5% level of significance are zα/2 = z0.025 =1.96.

Since Z  1.41  1.96(Ztab ) so we do not reject the null hypothesis. Since the null hypothesis is the claim so
we support the claim at 5% level of significance.
Example 2: A manufacturer claims that a special type of projector bulb has an average life 160 hours. To check
this claim an investigator takes a sample of 20 such bulbs, puts on the test, and obtains an average life 167 hours
with standard deviation 16 hours. Assuming that the life time of such bulbs follows normal distribution; does
the investigator accept the manufacturer’s claim at 5% level of significance?
Example 3: The mean share price of companies of Pharma sector is Rs.70. The share prices of all companies
were changed time to time. After a month, a sample of 10 Pharma companies was taken and their share prices
were noted as below:
70, 76, 75, 69, 70, 72, 68, 65, 75, 72

Assuming that the distribution of share prices follows normal distribution, test whether mean share price is still
the same at 1% level of significance?
Example 4: A manufacturer of ball point pens claims that a certain pen manufactured by him has a mean
writing-life at least 460 A-4 size pages. A purchasing agent selects a sample of 100 pens and put them on the
test. The mean writing-life of the sample found 453 A-4 size pages with standard deviation 25 A-4 size pages.
Should the purchasing agent reject the manufacturer’s claim at 1% level of significance?
Example 5: In two samples of women from Punjab and Tamilnadu, the mean height of 1000 and 2000 women
are 67.6 and 68.0 inches respectively. If population standard deviation of Punjab and Tamilnadu are same and
equal to 5.5 inches then, can the mean heights of Punjab and Tamilnadu women be regarded as same at 1%
level of significance?
Example 6 In a large population 30% of a random sample of 1200 persons had blue-eyes and 20% of a random
sample of 900 persons had the same blue-eyes in another population. Test the proportion of blue-eyes persons is
same in two populations at 5% level of significance.
Example 7: Out of 200 patients who are given a particular injection 180 survived. Test the hypothesis that the
survival rate is more than 80% at 5% level of significance?
Example 7: A tyre manufacturer claims that the average life of a particular category of his tyre is 18000 km
when used under normal driving conditions. A random sample of 16 tyres was tested. The mean and SD of life
of the tyres in the sample were 20000 km and 6000 km respectively. Assuming that the life of the tyres is
normally distributed, test the claim of the manufacturer at 1% level of significance using appropriate test.
Example 8: In a random sample of 10 pigs fed by diet A, the gain in weights (in pounds) in a certain period
12, 8, 14, 16, 13, 12, 8, 14, 10, 9
In another random sample of 10 pigs fed by diet B, the gain in weights (in pounds) in the same period were
14, 13, 12, 15, 16, 14, 18, 17, 21, 15
Assuming that gain in the weights due to both foods follows normal distributions with equal variances, test
whether diets A and B differ significantly regarding their effect on increase in weight at 5% level of
Example 9:Two different types of drugs A and B were tried on some patients for increasing their weights. Six
persons were given drug A and other 7 persons were given drug B. The gain in weights (in ponds) is given
Drug A 5 8 7 10 9 6 −
Drug B 9 10 15 12 14 8 12

Assuming that increment in the weights due to both drugs follows normal distributions with equal variances, do
the both drugs differ significantly with regard to their mean weights increment at 5% level of significance?
Example 10: A group of 12 children was tested to find out how many digits they would repeat from memory
after hearing them once. They were given practice session for this test. Next week they were retested. The
results obtained were as follows:
Child Number 1 2 3 4 5 6 7 8 9 10 11 12
Recall Before 6 4 5 7 6 4 3 7 8 4 6 5
Recall After 6 6 4 7 6 5 5 9 9 7 8 7

Assuming that the memories of the children before and after the practice session follow normal distributions, is
the memory practice session improve the performance of children?

Example 11: Ten students were given a test in Statistics and after one month’s coaching they were again given
a test of the similar nature and the increase in their marks in the second test over the first are shown below:
Roll No. 1 2 3 4 5 6 7 8 9 10
Increase in Marks 6 −2 8 −4 10 2 5 −4 6 0

Assuming that increment in marks follows normal distribution. Do the data indicate that students have gained
knowledge from the coaching at 1% level of significance?
Example 12: A machine produces a large number of items out of which 25% are found to be defective. To
check this, company manager takes a random sample of 100 items and found 35 items defective. Is there an
evidence of more deterioration of quality at 5% level of significance?
Example 13: In a random sample of 100 persons from town A, 60 are found to be high consumers of wheat. In
another sample of 80 persons from town B, 40 are found to be high consumers of wheat. Do these data reveal a
significant difference between the proportions of high wheat consumers in town A and town B ( at α = 0.05 )?
Example 14: Two brands of electric bulbs are quoted at the same price. A buyer was tested a random sample of
200 bulbs of each brand and found the following information:
Mean Life (hrs.) SD(hrs.)
Brand A 1300 41
Brand B 1280 46
Is there significant difference in the mean duration of their lives of two brands of electric bulbs at 1% level of
E 5) The following data are collected during a test to determine consumer preference among five leading brands
of bath soaps:
Brand Preferred A B C D E Total
Number of Customers 194 205 204 196 201 1000

Test that the preference is uniform over the five brands at 5% level of significance.
E 6) The following table gives the numbers of road accidents that occurred during the various days of the week:
Days Mon Tue Wed Thu Fri Sat Sun
Number of Accidents 14 15 8 20 11 9 14

Test whether the accidents are uniformly distributed over the week by chi-square test at 1% level of
E7) A cigarette manufacturer claims that the variance of nicotine content of its cigarettes is 0.62. Nicotine
content is measured in milligrams and is normally distributed. A sample of 25 cigarettes has a variance
of 0.65. Test the manufacturer’s claim at 5% level of significance.

E8) The12 measurements of the same object on an instrument are given below:
1.6, 1.5, 1.3, 1.5, 1.7, 1.6, 1.5, 1.4, 1.6, 1.3, 1.5, 1.5
If the measurement of the instrument follows normal distribution then carry out the test at 1% level of
significance that variance in the measurement of the instrument is less than 0.016.
E10) The variance of a certain dimension article produced by a machine is 7.2 over a long period. A random
sample of 20 articles gave a variance 8. Is it justifiable to conclude that variability has increased at 5% level of
significance assuming that the measurement of dimension article is normally distributed?

1. The mean marks obtained by the students of a mathematics course in IGNOU is 54.5 with a standard
deviation 8.0. At one of the study centres, where 100 students took the examination, the mean marks
were 55.9. Are the students of this study centre significantly 1) different 2) better than, from the rest of
the students of that course in IGNOU at 0.01 level?
2. A consumer magazine, when comparing various brands of paints, stated that the drying time of one
particular brand was found to be four hours. The manufacturer was not particularly pleased with this and
consequently modified the paint to try to reduce the drying time. The paint was then tested by a random
sample of 40 customers all of whom were decorating their living rooms. For this sample the mean
drying time in hours was found to be 3.85 and the sample standard deviation was 0.55.
a) Analyse the sample data using the one-sided z-test.
b) Find a 95% confidence interval for the population mean of the drying times for the modified paint.
3. The breaking strengths of cables made by a company had mean of 1800 N. The company then adopted a
new technique which is believed to increase the breaking strengths. 50 cables made by the new
technique were tested to see if the belief is justified. or not. The mean breaking strength of these 50 is
found to be 1850 N with a standard deviation of 100 N. Is the belief justified at a) 5% level b) 1% level.
4. As part of a survey on drivers’ reaction times for a driving magazine, 300 drivers were subjected to the
following test: each driver was asked to press a lever with his/her foot in response to a flashing light.
The reaction times (in seconds) were recorded and the sample mean was found to be 0.83. The sample
standard deviation was 0.31. What can you conclude about drivers’ reaction times?
5. A machine manufactures standard weights to be used in weighting scales. To check if the machine is
working properly, a random sample of five 2-kg. weights was taken. Each 2kg. weight was weighted on
a special scale and the actual weights were found to have a mean of 1.962 kg. and a standard deviation
of 0.038 kg. If α = 0.05, can you say that the machine is in proper working order?
6. A management school claims that the starting salaries for its graduated average Rs. 10,000 or more per
month. A random sample of 7 students who had recently graduated, showed an average salary of Rs.
9700 with a standard deviation of Rs. 306. At a 5% level of significance would you accept the claim?
7. The specifications for the production of a certain alloy call for 23.2% copper. In 10 analyses, the mean
copper content was found to be 23.5 n of 0.24%. Can we conclude that the product meets the
specifications if α = 0.05>
8. The diameters of bolts manufactured by a machine are known to have a standard deviation of 0.0002
cm. A random sample of 10 bolts has an average diameter of 0.5046 cm. Test the hypothesis that the
true mean diameter of bolts is 0.51 cm, using α = 0.01.
9. A new teaching technique is to be tested. A group of 22 students were taught in the traditional way.
Another group of 18 students was taught with the help of the new technique. The two groups were then
given a standardised test which is known to have a standard deviation of 25. The mean score of the
traditional group was 127 and that of the experimental group was 136. If α = 0.1, do you think that the
new technique is significantly better?
10. A psychologist gave a test to decide if male students are a smart as female students. The sample of 40
female students had a mean score of 131 and the sample of 36 males had a mean score of 126. The test
has a standard deviation of 16. Is there a difference at 0.01 level of significances?
We have been considering cases where σ1 and σ2 are known. If they are not known, they have to be
estimated from the sample. If the samples are large, then these estimates are quite close to the real values
and so we can use them in forming the test statistic Z. In the next exercise you see one such situation.
11. A sample of 100 electric light bulbs produced by manufacturer A showed a mean life-time of 1190h and
a standard of 90h. A sample of 75 bulbs produced by manufacturer B showed mean life-time of 1230h
and a standard deviation of 120h. a) Is there a difference between the two brands of bulbs at a

significance level of 0.05? b) Are the bulbs of manufacturer B superior to those of manufacturer A at the
same level?
12. We want to test the effect of a new fertilise on wheat production. For this, 24 plots of land of equal area
were chosen. Half of these were treated with the new fertiliser and the other half were treated with old
one. With the new fertiliser, the mean yield was 48 kg. with a standard deviation of 4 kg . With the old
fertiliser, the mean yield was 51 kg, with a standard deviation of 3.6 kg. Can we say at 5% level of
significance that there is an improvement in the yield because of the new fertiliser? What will be your
conclusion at 1% level?
13. A botanist was interested in knowing if there was a difference in the time fruits matured on different
parts of a plant, and recorded the day of the first fruit on the top and on the bottom for 15 plants. all the
fruits came out during the same month.
Top 3 6 7 5 8 9 10 10 7 8 6 9 10 12 4
Bottom 7 9 5 8 8 10 11 12 6 9 7 13 8 13 8
Is there a significant difference in the time to mature at the 1% significant level?
14. The pulse rates of 12 people were recorded before and after taking a new drug.
Before 68 71 84 93 67 74 82 77 71 83 62 66
After 71 70 81 97 73 80 90 76 80 79 80 67
Using 10% level, can you say that there is a significant increase in the pulse rate?
15. A random sample of size 1000 from machine 1 contained 20 defectives, and a random sample of size
1500 form machine 2 contained 40 defectives. If α = 0.05, can you say that machine 1 is better than
machine 2?
16. A flue vaccine was given to 125 of a total of 200 employees of a firm. Thirty employees who had
received the vaccine were down with flue, while 25 of those who did not, also were stricken. At 1%
level of significance would you say that the vaccine was effective?

This test is used to test the independence of two attributes.
Null and alternative hypotheses
H0: The two attributes (characteristics) are independent
H1: They are not independent
Suppose there are two attributes, say, A and B. Also let the characteristic A be assumed to have ‘r’ categories
A1, A2, …, Ar and characteristic B be assumed to have ‘c’ categories B1, B2, …, Bc. The various observed
frequencies in different classes can be expressed in the form of a table known as contingency table.
B B1 B2 … Bj … Bc Total
A1 O11 O12 … O1j … O1c R1
A2 O21 O22 … O2j … O2c R2
. . . . . .
. . . . . .
. . . . . .
Ai Oi1 Oi2 … Oij … Oic Ri
. . . . . .
. . . . . .
. . . . . .
Ar Or1 Or2 … Orj … Orc Rr
Total C1 C2 … Cj … Cc N
Test statistic:
O  Eij 
r c
   ~ 2r 1 c1
2 ij

i 1 j1 Eij
Where Oij- observed frequency and Eij- expected frequency
R  C j Sum of i th row  Sum of jth column
Eij  i 
N Total sample size
Take the decision about the null hypothesis as:
If calculated value of test statistic is greater than tabulated value then we reject the null hypothsis otherwise we
may accept the null hypothesis.
Example 5: 1000 students at college level were graded according to their IQ level and the economic condition
of their parents.
Economic IQ level
Condition High Low Total
Poor 240 160 400
Rich 460 140 600
Total 700 300 1000

Test that IQ level of students is independent of the economic condition of their parents at 5% level of
Solution: H0 : IQ level and economic condition are independent
H1 : IQ level and economic condition are not independent
For testing the null hypothesis, the test statistic is

O  Eij 
r c
   ~ 2r 1 c1
2 ij

i 1 j1 Eij
Ri  Cj Sum of i th row  Sum of jth column
Eij = = 
N Total sample size
R 1  C1 400  700 R  C2 400  300
E11    280, E12  1   120
N 1000 N 1000
R  C1 600  700 R  C2 600  300
E 21  2   420, E 22  2   180
N 1000 N 1000
O  E

Calculations for :
Observed Expected (O – E) (O – E)2 O  E 2
Frequency (O) Frequency (E)
240 280 −40 1600 5.71
160 120 40 1600 13.33
460 420 40 1600 3.81
140 180 −40 1600 8.89
Total = 1000 1000 31.74
Therefore, from above calculations, we have
O  Eij 
r c
    31.74
2 ij

i 1 j1 Eij
The degrees of freedom will be (r –1)(c –1) = (2 – 1)(2 – 1) = 1.
The critical value of χ2 with 1 degree of freedom at 5% level of significance is 3.84.
Since calculated value of test statistic (= 31.74) is greater than critical value (= 3.84) so we reject the null
hypothesis i.e. we reject the claim at 5% level of significance.
Thus, we conclude that sample provides us sufficient evidence against the claim so IQ level of students is not
independent of the economic condition of their parents.
Example 6: Calculate the expected frequencies for the following data presuming the two attributes and check
that condition of home and condition of the child are independent at 5% level of significance.
Condition of Child Condition of Home
Clean Dirty
Clear 70 50
Fairly Clean 80 20
Dirty 35 45

Solution: H0 : Condition of home and condition of child are independent

H1 : Condition of home and condition of child are not independent
For testing the null hypothesis, test statistic is
O  Eij 
r c
2   ~ 2r 1c1 under H 0

i 1 j1 Eij
Now, under H0, the expected frequencies can be obtained as:
Condition of Condition of Home Total
Child Clean Dirty
Clear 70 50 120
Fairly Clean 80 20 100
Dirty 35 45 80
Total 185 115 300

Ri  Cj Sum of i th row  Sum of jth column

Eij = 
N Total sample size

R1  C1 120  185 R  C2 120  115
E11    74; E12  1   46;
N 300 N 300
R 2  C1 100  185 R  C2 100  115
E 21    61.67; E 22  2   38.33;
N 300 N 300
R  C1 80  185 R  C2 80  115
E31  3   49.33; E32  3   30.67
N 300 N 300
O  E

Calculations for :
Observed Expected (O – E) (O – E)2 O  E 2
Frequency (O) Frequency (E)
70 74.00 −4.00 16.00 0.22
50 46.00 4.00 16.00 0.35
80 61.67 18.33 335.99 5.45
20 38.33 −18.33 335.99 8.77
35 49.33 −14.33 205.35 4.16
45 30.67 14.33 205.35 6.70
Total = 300 300 25.64
Therefore, from above calculations, we have
O  Eij 
r c
    25.64
2 ij

i 1 j1

The degrees of freedom will be (r –1)(c –1) = (3 – 1)(2 – 1) = 2.

The critical value of χ2 with 2 degrees of freedom at 5% level of significance is 5.99.
Since calculated value of test statistic (= 25.64) is greater than critical value (= 5.99) so we reject the null
hypothesis i.e. we reject the claim at 5% level of significance.
Thus, we conclude that the sample provides us sufficient evidence against the claim so condition of home and
condition of the child are not independent.
E4)A group of 1650 school children were classified according to their performance in school tests and family
economic level. Test if there is any association between these two attributes (Given 0.05 (9)  16.918). )

Economic Performance
Level Very Good Average Poor Total
Very Rich 4 7 16 25 52
Rich 13 37 79 73 202
Average 105 372 298 175 950
Poor 35 213 75 123 446
Total 157 629 468 396 1650

E5) The following contingency table presents the analysis of 300 persons according to hair colour and eye
Hair Eye Colour
Colour Blue Grey Brown Total
Fair 30 10 40 80
Brown 40 20 40 100
Black 50 30 40 120
Total 120 60 120 300

Test the hypothesis that there is an association between hair colour and eye colour at 1% level of
When two samples are not independent and observations are recorded on the same individuals or items.
Generally, such types of observations are recorded to assess the effectiveness of a particular training, diet,
treatment, medicine, etc. In such situations, the observations are recorded “before and after” the insertion of
training, treatment, etc. as the case may be. For that we use paired t-test. .
Let (X1, Y1), (X2, Y2), …,(Xn, Yn) be a paired random sample of size n and the difference between paired
observations Xi & Yi be denoted by Di, that is,
di  Xi  Yi for all i 1, 2,..., n
Here, we want to test that there is an effect of a diet, training, treatment, medicine, etc. So we can take the null
hypothesis as
H0: μ1 = μ2 or H0 : μ D  μ1  μ2  0
and the alternative hypothesis
H1 : 1  2 or H1 :  D  0 for two-tailed test 
H 0: 1  2 and H1 : 1  2 
or  for one-tailed test 
H 0: 1  2 and H1 : 1  2 
For testing the null hypothesis, the test statistic t is given by
t ~ t  n 1
SD / n
1 n 2 
1 n
where, d   di and Sd2 
1 n
   
d  d   d  nd 
n  1 i 1 n  1  i 1
n i 1 
Example 5: A group of 12 children was tested to find out how many digits they would repeat from memory
after hearing them once. They were given practice session for this test. Next week they were retested. The
results obtained were as follows:
Child Number 1 2 3 4 5 6 7 8 9 10 11 12
Recall Before 6 4 5 7 6 4 3 7 8 4 6 5
Recall After 6 6 4 7 6 5 5 9 9 7 8 7

Assuming that the memories of the children before and after the practice session follow normal distributions, is
the memory practice session improve the performance of children?
Solution: Here, we want to test that memory practice session improve the performance of children. If 1 and 2
denote the mean digit repetition before and after the practice so our claim is 1 < 2 and its complement is 1 ≥

2. Since complement contains the equality sign so we can take the complement as the null hypothesis and the
claim as the alternative hypothesis. Thus,
H 0 : 1  2 and H1 : 1  2

Since the alternative hypothesis is left-tailed so the test is left-tailed test.

It is a situation of before and after. Also, it is given that the memories of the children before and after the
practice session follow normal distributions. So, population of differences will also be normal. Also all the
assumptions of paired t-test meet so we can go for paired t-test.
For testing the null hypothesis, the test statistic t is given by
t ~ t  n 1  … (3)
Sd / n

where, d and Sd are mean and standard deviation of the population of differences.
Child Digit recall d = (X−Y) d2
Before (X) After (Y)
1 6 6 0 0
2 4 6 −2 4
3 5 4 1 1
4 7 7 0 0
5 6 6 0 0
6 4 5 −1 1
7 3 5 −2 4
8 7 9 −2 4
9 8 9 −1 1
10 4 7 −3 9
11 6 8 −2 4
12 5 7 −2 4
d   14 d  32

From above calculations, we have

1 1
d d   14    1.17

 d 2  nd   1.42
n 1 
Sd2   Sd  1.42  1.19

1.17 1.17
t   3.44
1.19 / 12 0.34
The critical value of test statistic t for left-tailed test corresponding (n-1) = 11 df at 5% level of significance is
 t ( n 1), α   t (11),0.05  1.796.

Since calculated value of test statistic t (= −3.44) is less than the critical value (=−1.796), that means calculated
value of t lies in rejection region, so we reject the null hypothesis and support the alternative hypothesis i.e.
support the claim at 5% level of significance.
Thus, we conclude that samples fail to provide us sufficient evidence against the claim so we may assume that
memory practice session improves the performance of children.

CHI-SQUARE TEST FOR GOODNESS OF FIT (important for short note)

This test is used to test that a random variable under study follows a specified distribution such as uniform,
binomial, Poisson, normal, etc. Here, we compare observed frequencies in each category with theoretically
expected frequencies. This test is known as “goodness of fit test” because we test how well an observed
frequency distribution fit to the theoretical distribution such as normal, uniform, binomial, etc.
This test works under the following assumptions:
(i) The sample observations are random and independent.
(ii) The sample size is large.
(iii) The observations may be classified into non-overlapping categories.
(iv) The expected frequency of each class is greater than five.
(v) Sum of observed frequencies is equal to sum of expected frequencies, i.e., O   E.
We can take the null and alternative hypotheses as
H0: Data follow a specified distribution
H1: Data does not follow a specified distribution
Test statistic:
The test statistic is given by
 Oi  E i 
 
~ 2k 1
i 1 Ei
Where Oi- onserved frequency and Ei- expected frequency
Expected frequency are obtained as
Ei  Npi ; for all i =1, 2, …, k
Where pi (i =1, 2, …, k) is the probability that an observation falls in ith category
We obtain the tabulated value of the chi square.
Take the decision about the null hypothesis as:
If calculated value of  2 is greater than tabulated value then we reject the null hypothesis otherwise we do not
reject the null hypothesis.
E2) The following table gives the numbers of road accidents that occurred during the various days of the week:
Days Mon Tue Wed Thu Fri Sat Sun
Number of Accidents 14 15 8 20 11 9 14

Test whether the accidents are uniformly distributed over the week by chi-square test at 1% level of
H0: The accidents are uniformly distributed over the week
H1: The accidents are not uniformly distributed over the week
Since the data are given in the categorical form and we are interested to fit a distribution, so we can go for chi-
square goodness of fit test.

2  
Oi  Ei  2 ~ 2k 1
i 1 Ei
Since the uniform distribution is one in which all outcomes considered have equal or uniform probability.
Therefore, the probability that the accident occurs in any day is same. Thus,
p1  p 2  p3  p 4  p5  p 
The theoretical or expected frequency for each day is obtained by multiplying the appropriate probability by the
total number of accidents, that is, sample size N. Therefore,
E1  E 2  E3  E 4  E5  Np  91   13
O  E :

Calculations for
Days Observed Expected (O−E) (O−E)2 O  E 2
Frequency (O) Frequency (E)
Mon 14 13 1 1 0.0769
Tue 15 13 2 4 0.3077
Wed 8 13 −5 25 1.9231
Thu 20 13 7 49 3.7692
Fri 11 13 −2 4 0.3077
Sat 9 13 −4 16 1.2308
Sun 14 13 1 1 0.0769
Total 91 91 7.6923

From the above calculation, we have

 
Oi  Ei  2  7.6923
i 1 Ei
The critical value of chi-square with k  1  7  1  6 degrees of freedom at 1% level of significance is 16.81.
Since calculated value of test statistic (= 7.6923) is less than critical value (= 16.81) so we do not reject the null
hypothesis i.e. we support the claim at 1% level of significance.
Thus, we conclude that the sample fails to provide us sufficient evidence against the claim so we may assume
that the accidents are uniformly distributed over the week.
Example 1: The following data are collected during a test to determine consumer preference among five
leading brands of bath soaps:
Brand Preferred A B C D E Total
Number of Customers 194 205 204 196 201 1000

Test that the preference is uniform over the five brands at 5% level of significance.

Test for two Population Variances (F-Test)

We can take our alternative null and hypotheses as
H 0 : 12  22  2 and H1 : 12  22 for two-tailed test 
H 0 : 12  22 and H1 : 12  22 
or  for one-tailed test 
H 0 : 12  22 and H1 : 12  22 
For testing the null hypothesis, the test statistic F is given by
F ~ F( n1 1, n2 1)

where, S12 
n1  1
  X 2  nX 2  and S22 
n 2 1
  Y 2  nY 2 

Example 1: Two sources of raw materials are under consideration by a bulb manufacturing company. Both
sources seem to have similar characteristics but the company is not sure about their respective uniformity. A
sample of 12 lots from source A yields a variance of 125 and a sample of 10 lots from source B yields a
variance of 112. Is it likely that the variance of source A greater than B at significance level  = 0.01?
Solution: Here, we are given that
n1  12, S12  125, n 2  10, S22  112

Here, we want to test that variance of source A significantly differs to the variances of source B. If 12 and 22
denote the variances in the raw materials of sources A and B respectively so our claim is 12   22 and its
complement is 12  22 . Since complement contains the equality sign so we can take the complement as the null
hypothesis and the claim as the alternative hypothesis. Thus,
H 0 : 12  22 and H1 : 12  22
Since the alternative hypothesis is two-tailed so the test is two-tailed test.
For testing this, the test statistic is given by
S12 125
F   1.11
S22 112
F(n1 1,n2 1),  F(11,9)  3.10
Since calculated value of test statistic (= 1.11) is less than the critical value (= 3.10), so we do not reject the null
hypothesis and reject the alternative hypothesis i.e. we reject the claim at 5% level of significance.
E2) Two sources of raw materials are under consideration by a bulb manufacturing company. Both sources
seem to have similar characteristics but the company is not sure about their respective uniformity. A sample of
12 lots from source A yields a variance of 125 and a sample of 10 lots from source B yields a variance of 112.
Is it likely that the variance of source A significantly differs to the variance of source B at significance level 
= 0.01?

ANALYSIS OF VARIANCE(ANOVA) [most important topic]
Analysis of variance is used for testing of equality of means of several populations. It tests the variability of the
means of the several populations.
According to Professor R. A. Fisher, Analysis of Variance (ANOVA) is a method of splitting the total
variation in data into two components of variation one is due to assignable causes (between the groups
variability) or other is variation due to chance causes (within group variability).
The t –test is used to test the hypothesis about the means of two populations. But there are many situations
where we have to test the hypothesis about the equality of more than two means. For example one may be
interested to test whether there is a significance difference between three teaching methods of the statistical
techniques on the basis of sample data, in agriculture , the experimenter wants to comparison the three of more
fertilizers, in medical field , a investigator wishes to know whether four drugs are equally efficient in the
control of blood pressure.
In such situation, we can use t-test for testing the hypothesis about the means of more than two populations but
we have to use the t-test many times. Due to this the type I error increases. In such situations, we use Analysis
of variance (ANOVA).
Assumptions of ANOVA
1. Dependent variable measured at least on interval scale;
2. The samples are independently and randomly drawn from the population;
3. Population under study follows the normal distribution;
4. The samples have approximately equal variance;
5. Various effects are additive in nature; and
6. Errors (eij) are independently identically distributed normal with mean zero and variance σe2.
One Way Classification
If the observations in an experiment are classified on the basis of a single criterion then the classification is
called one way classification. For example, if we consider the yield of four varieties of wheat then we divide the
whole plots into four groups. In this case the observations (yields) are classified on the basis of a single
criterion, the variety of wheat. So the classification is called one-way classification.
Two Way Classification
If the observations in an experiment are classified on the basis of two criteria then the classification is called
two way classification. For example, we may consider the yields of four varieties of wheat using four different
types of fertilizers. In such experiment, the observations are classified according to two criteria (the wheat
variety and the type of fertilizer). So it is called a two-way classification.
Model for One Way Classification
Suppose there are k normal populations with means 1 , 2 ,...,  k and common variance  2 . Further let we draw
k random samples( one from each population) from these populations. Let n i (i  1, 2,...k) be the size of the
sample from ith population. Using the sample information, we wish to test
H0 : 1  2  k .

Against H1 : At least two means are not equal.

Let yij i  1, 2,..., n j  be the jth observation of ith sample, then one-way classified data can be arranged as
shown in the following table:
Level of Factor/ Treatment 1 2 k
Observations y11 y21 ... yk1
y12 y22 ... yk2
y13 y23 ... yk3
... ... ... ...
y1n1 y 2n 2 ... y kn k
Total T1 T2 ... Tk

The linear mathematical model for one-way classified data can be written as
yij    i  eij i = 1, 2, . . ., k & j = 1, 2, . . ., n

where  - represents the general mean(effect) and i  represents the effect of ith treatment. In general

 and i are known quantities and assumed to be constant whereas (eij) represent the errors due to random
fluctuations and hence assumed to be random variables. It is independently identically distributed normal with
mean zero and variance σe2.
In one-way classification we split the total variation as
Total sum of squares(TSS) = Sum of squares due to treatment(SST) + sum of squares due to error( SSE)
Model ANOVA table for One Way Classification
ANOVA Table for One-way Classified Data
Source of Degrees of Sum of Mean Sum of Variance Ratio FTab
Variation Freedom Squares Square Fcal
(df) SS MSS
Treatment(Between k−1 MSST = F= F With (k−1),
SST SST/(k−1) MSST/MSSE (N−k) df
Error(Within N−k MSSE =
Total N−1
Procedure for one way analysis of variance for k independent sample:
Step1: First step of the procedure is to make the null and alternative hypothesis.
We want to test the equality of the population means, i.e. homogeneity of effect of different levels of a factor.
Hence, the null hypothesis is given by
H0: μ1 = μ2 = . . . = μk
Against the alternative hypothesis
H1: At least two means are not equal
Step2: Calculate the correction factor(CF) as

CF  where, Grand total (G) = Sum of all values, N = total observations = n1 + n2 + … + nk =  n i
N i 1

Step 3: Find the sum of squares of all the observation as

     
k ni

 y
i 1 j1
 y11
 y12
 ...  y1n
 y221  y222  ...  y22n 2  ...  y2k1  y 2k 2  ...  y 2kn k

This is also known as Raw sum of squares (RSS).

Step 4: After that find Total Sum of Squares (TSS) as
Step 5: Find the Sum of Squares due to Treatment or Factor (SST) as
 T2 T2 T2 
SST   1  2  ...  k   CF
 n1 n 2 nk 

where, Ti is the sum or total of ith treatment or factor

Step 6: After that find Error Sum of Squares (ESS) as
The above analysis is presented in the following table:
ANOVA Table for One-way Classified Data
Source of Degrees of Sum of Mean Sum of Variance Ratio FTab
Variation Freedom Squares Square MSS Fcal
(df) SS
Treatment k−1 SST MSST = F = MSST/MSSE F With (k−1),
SST/(k−1) (N−k) df
Error N−k SSE MSSE =
Total N−1 TSS

Thus, if an observed value of F is greater than the tabulated value of F for {(k−1), (N−k)} df and specific level
of significance (usually 5% or 1%), then H0 is rejected otherwise, it may be accepted.
Example 1: An investigator is interested to know the level of knowledge about the history of India of 4
different schools in a city. A test is given to 5, 6, 7, 6 students of 8th class of 4 schools. Their scores out of 10 is
given below:
School I (S1) 8 6 7 5 9
School II (S2) 6 4 6 5 6 7
School III(S3) 6 5 5 6 7 8 5
School IV(S4) 5 6 6 7 6 7
Solution: If 1, 2, 3, 4 denote the average score of students of 8th class of schools I, II, III, IV respectively.
Null Hypothesis H0 : 1  2  3  4
Alternative hypothesis H1: Difference among 1, 2, 3, 4 are significant.
S1 S2 S3 S4 S12 S22 S32 S42
8 6 6 5 64 36 36 25
6 4 5 6 36 16 25 36
7 6 5 6 49 36 25 36
5 5 6 7 25 25 36 49
9 6 7 6 81 36 49 36
7 8 7 49 64 49
5 25
35 34 42 37 255 198 260 231

Grand Total G = 35 + 34 + 42 + 37 = 148

G 2 1482
Correction Factor (CF) =  = 912.6667 Since N = n1 + n2 + n3 + n4
N 24
k ni
Raw Sum of Square (RSS) =  y
i 1 j1
ij  255  198  260  231  944

Total Sum of Square (TSS) = RSS – CF = 944 – 912.6667 = 31.3333

T12. T22. T32. T42. 352 34 2 42 37 2
Sum of Squares due to Treatments (SST)=     CF =     912.6667
n1 n 2 n 3 n 4 5 6 7 6
= 245+192.6667+252+228.1667−912.6667 = 5.1667
Sum of Squares due to Errors (SSE) = TSS − SST = 31.3333 − 5.1667 = 26.1666
SST 5.1667
Now, MSST =   1.7222
k 1 3
SSE 26.1667
MSSE =   1.3083
Nk 20
Source of Variation SS d MSS F
Between schools 5.1667 3 1.7222 F  1.7222  1.3164
Within schools 26.1666 20 1.3083
Calculated F = 1.3164
Tabulated F at 5% level of significance with (3, 20) degree of freedom is 3.10.
Conclusion: Since Calculated F < Tabulated F, so we may accept H0 and conclude that level of knowledge of
schools I, II, III and IV do not differ significantly.
Example 2: If we have three fertilizers and we have to compare their efficacy, this could be done by a field
experiment in which each fertilizer is applied to 10 plots, and then 30 plots are later harvested, with the crop
field being calculated for each plot. The data were recorded in following table:
Fertilizer Yields (in tones) from the 10 plots allocated to that fertilizer
1 6.27 5.36 6.39 4.85 5.99 7.14 5.08 4.07 4.35 4.95
2 3.07 3.29 4.04 4.19 .41 0.75 04.87 3.94 6.49 3.15
3 4.04 3.79 4.56 4.55 4.53 3.53 3.71 7.00 4.61 4.55
H0: Mean effect of Ist fertilizer = Mean effect of the IInd fertilizer = Mean effect IIIrd fertilizer
H0: μ1 = μ2 = μ3
H1: At least one is different
Steps for calculating different sum of squares
Grant Total = Total of all observation =  y ij = G = 139.20
Correction Factor (CF) = G2/N = 139.20 ×139.20 /30 = 645.89
Raw Sum of Square (RSS) =  y 2
ij = 6385.3249
Total Sum of Square = RSS − CF =36.4449
T12 T22 T32
Sum of Square due to Fertilizer (SST) =   − CF = (54.5)2/10 + (40)2/10 + (44.9)2/10 – CF= 10.8227
10 10 10
Sum of Square due to Error = TSS − SST = 36.4449 −10.8227 = 25.6222
Mean Sum of Square due to Treatment (MSST) = SST/df = 10.8227/2 = 5.4114
Mean Sum of Square due to Error (MSSE) = SSE/df = 25.6221/27 = 0.9490
Variance ratio F2,27 = MSST/MSSE = 5.414/0.9490 = 5.70
Tabulated F2,27 = 3.35
Since calculation value of F2,27 is greater than F2,27 at 5% level of significance tabulated (3.35) so we reject H0.
It means there is a significant difference among the effect of these three fertilizers.
Now, H0 is rejected.
E 1) Three varieties A, B and C of wheat are sown in five plots each and the following fields per are as
Plots A B C
1 8 7 12
2 10 5 9
3 7 10 13
4 14 9 12
5 11 9 14

Set up a table of analysis of variance and find out whether there is significant difference between the
fields of these varieties.
Solution ANOVA Table for One-way classified data
Sources of Degree Sum of Mean Sum F-Statistic or
Variation of Squares of Squares Variation
(SV) Freedom (MSS) Ratio
Due to three 2 40 (SST) MSST = 20 20
F2,12  4
varieties or due or 5
to treatments
Due to error 12 60 (SSE) MSSE = 5
within groups or

Total 14 TSS = 100

For 1  2, 2  12, the table value of F at 5% level of distribution is 3.88 which can be seen from the
statistical table. Since the calculated value is greater than the table value of F at 5% level of significance.
So, we reject the null hypothesis and hence calculate that the difference between the mean field of three
varieties is significant.
E 2) The following figures relate to production in kg. of three varieties P, Q, R of wheat sown in 12 plots
P 14 16 18
Q 14 13 15 22
R 18 16 19 15 20
Is there any significant difference in the production of these varieties?
Sources of Degree of Sum of Squares Mean Sum of F-statistic or
Variation (SV) Freedom (SS) Squares (MSS) Variation Ratio
Between 2 16.8 8.4 8.4
Varieties F2,9   1.12
Due to Error 12 67.20 7.467
Total 14 84

E 3) In 25 plots four varieties ν1, ν2, ν3, ν4 of wheat are randomly put and their yield in kg are shown below.
ν1 ν3 ν2 ν4 ν4
2000 2270 2230 2270 2180
ν2 ν1 ν2 ν3 ν2
2160 2100 2050 2300 2280
ν1 ν1 ν4 ν3 ν1
2200 2300 2040 2420 2240
ν4 ν1 ν2 ν2 ν1
2370 2250 2040 2360 2460
ν3 ν1 ν2 ν1 ν3
2210 2340 2190 2150 2020
Perform the ANOVA to test whether there is any significant difference between varieties of wheat.
Source of variation SS df MSS F
Between Varieties 107.4114 3 35.8038 F =.2011
Due to errors 3739.6286 21 178.0776
Total 3847.04 24
Calculated F = 0.2011
Tabulated value of F at 5% level of significance with (3, 21) degree of freedom is 3.07
Conclusion: Since calculated F < Tabulated F, so we may accept H0 and conclude that varieties ν1, ν2,
ν3, ν4 of wheat are homogeneous.
Example 4: A chemical firm wants to determine how four catalysts differ in yield. The firm runs the
experiment in three of its plants types A,B and C. In each plan, the yield is measured with each catalyst. The
yields are as follows
Plant Catalyst
1 2 3 4
A 2 1 2 4
B 3 2 1 3
C 1 3 3 1

Perform the ANOVA and comment whether the yield due to a particular catalyst is significant or not at 5%
level of significance.

Set up-ANOVA table for the following per hectare yield for tree varieties of wheat each grown on four plots.
Plot of Variety of wheat
A1 A2 A3
1 16 15 15
2 17 15 14
3 13 13 13
4 18 17 12

Population is the collection or group of individuals /items /units /observations under study. The books in a
library, the particles in a room, the rivers in India, students in a classroom, etc are the example of population.
A sample is a fraction or a part or a subset of population drawn through a valid statistical procedure regarded as
representative of the whole population.
The valid statistical procedure of drawing a sample from the population is called sampling.
Complete Enumeration and Sample Survey
(1) Complete Enumeration or Census
When each and every unit of the population is investigated or studied for the characteristics under study then we
call it complete enumeration or census. For example, checking at border of a country, census of population of
a country, census of import and export, etc.
(2) Sample Survey or Sample Enumeration
When only a part or a small number of units of population are investigated or studied for the characteristics
under study then we call it sample enumeration or sample survey.
Advantages of Sampling Survey over Census or Complete Enumeration
Reduced Cost
Since in a sample survey we study only a part of the population therefore, the cost in terms of money and men
power of the survey is considerably small as compared to that in complete enumeration.
Saving of Time
Sampling results can be analysed more quickly than complete enumeration.
Greater scope
In certain cases, complete enumerations not possible and we can bound to use sample enquiry for example
when the population units are destroyed under investigation like bombs, bullets, life time of electric bulbs, etc.
to be tested we can bound sample survey.
Greater Accuracy
A sample survey gives data of better quality than a complete survey because in sample survey it may be
possible to use better resources as trained field workers, better equipment than complete enumeration, etc.
Sampling and Non-Sampling Error
(1) Sampling Error
The error which arises due to fact that only a part of the population called sample being used to estimate the
population parameters and draw inferences is known as sampling error. So whatever may be the degree of
aquracy is used in selecting a sample there will always be a difference between the population value and its
corresponding estimate. This error present in every sampling scheme. A sample with the smallest sampling
error will always be considered a good representative of the population. The error can be reduced by increasing
the size of sample.
(2) Non-Sampling Error
When all the units of the population are enumerated then one would expect that there is no error. However, in
practice it is not so. It is difficult to avoid errors of observations or ascertainment completely. Therefore, the
error which arises at the stages of observation, classification, tabulation, analysis, etc. known as non-sampling
error. This is the reason why the non-sampling error presents in both the census and the sample survey.

Subjective or judgment or purposive sampling
Any type of sampling in which the selection of units in the sample depends on personal discretion or judgment
of the investigator is called a subjective or judgment sampling. This type of sampling is used with a definite
purpose in view and as such is not used for general purpose. The investigator includes those items in the sample
which be thinks are mort typical of the population with respect to the characteristics under study. For example,
if we want to draw a sample of patient suffering form Tuberculosis (TB) since, it is not possible to certain a
population of TB sanatorium therefore, the peoples who suffering form TB are selected in the sample. This
sampling method is not preferred because if the investigator biased then it not give true picture.
Simple Random Sampling or Random Sampling
The simplest and common most method of sampling is simple random sampling. In simple random sampling
the sample is drawn in such a way that each unit of the population has an equal and independent chance of
being included in the sample. Simple random sampling may be classified as:
(1) Simple Random Sampling with Replacement (SRSWR)
In simple random sampling if the units are selected or drawn one by one in such a way that a unit drawn at a
time is replace back to the population before the subsequent draw is called SRSWR. In this type of sampling
from a population of size N, the probability of selection of a unit at each draw remains 1/N. This sampling is
used when population is homogeneous.
(2) Simple Random Sampling without Replacement (SRSWOR)
In simple random sampling if the units are selected or drawn one by one in such a way that a unit drawn at a
time is not replace back to the population before the subsequent draws is called SRSWOR.
There are two methods of selecting a simple random sample
1. Lottery Method
2. Use of random numbers tables

Lottery Method
This is the simplest method of drawing a simple random sample under which all units of the population are
numbered. In this method we collect identical cards of the same size, some colour and sample shape as the no of
population units and these cards are put in a rotated drum or container in which there are well mixed. If we want
to draw a sample of size n with replacement then we draw a card from the drum and noted the number on this
card and replace back before the next draw and corresponding to this number unit of the population is drawn
then drums is rotated and draw another card and note the number on this card. This procedure will continue
until we get a sample of size n and corresponding to these numbers the units from the population are selected. If
drawn card is not replaced bark before the next draw then we get a SRSWOR.
Use of Random Numbers table
The lottery method, discussed above, become quite cumbersome to use if the size of the population is very
large. An alternative method of random selection is that of using the table of random numbers. A random
number table is an arrangement of digits 0 o 9. A table of random number is so constructed that all numbers
0,1,2,…,9 appear independent of each other and appear with approximately the same frequency. If we have to
select a sample from a population of size N (≤ 99) then the numbers can be combined two by two to give pairs
from 00 to 99. The method of drawing the random sample consists in the following steps:

1. Identify the N units in the population with the numbers from 1 to N.

2. Select at random, any page of the random number tables and pick up the numbers in any row or column or
diagonal at random; and discarded the number which is greater than N.
3. The population units corresponding to the numbers selected in step-2, constitute the random sample.


When the units of the population are scattered and not completely homogeneous in nature with respect to the
characteristic under study, then simple random sample does not give proper representation of the population.
In stratified random sampling the whole population is to be divided in some homogeneous groups or classes
with respect to the characteristic under study which are known as strata. The auxiliary information related to the
character under study may be used to divide the population into various groups or strata such that units within
each stratum units are as homogeneous as possible and the strata are as widely different as possible.
Thus, all strata would comprise the population. Then from each stratum sample would be drawn and lastly all
samples would be combined to get the ultimate sample. For example let us consider that population consists of
N units and these are distributed in a heterogeneous structure. Now first of all we divide the population into ‘k’
non overlapping strata of sizes N1, N2, N3, ...,Nk such that each stratum becomes homogeneous. Evidently N =
N1 + N2 + N3 + ... + Nk. Then from first stratum a sample of size n1 would be drawn by simple random sampling
method. Similarly from the second stratum a sample of n2 units would be drawn and so on, up to kthstratum.
Now all these k samples would be combined to get the ultimate sample. So, the ultimate size of sample would
be n  n1  n 2  n 3  ...  n k . This method of sampling is known as Stratified Random Sampling because here
stratification is done first to make population homogeneous and then samples are drawn randomly by simple
random sampling from each stratum.
Allocation of Sample Size
1. Equal allocation n i  , For i  1, 2,...k
nN i
2. Proportional allocation n i 
3. Optimum allocation n i  n k i i  n k i i Q Wi 
 WSi i  NiSi
i 1 i 1

Problem : Suppose three small towns are under study, having population N1 = 50000, N2 = 30000 and N3 =
40000, respectively. A stratified random sample is to be taken with a total sample size of n = 500. Determine
the sample size to be taken from each town individually using the method of (a) proportional, and (b) optimal
allocation. It is (roughly) known from a previous survey that S1 = 30, S2 = 15 and S3 = 20
Solution: (a) Under proportional allocation:
N  50000 N  3 N  4
n1  n  1   500   208; n 2  n  2   500   125; n 3  n  3   500   167;
 N 120000  N  12  N  12
(c) Under optimal allocation:
   
 NS  1500000  NS  45
n1  n  3 1 1   500   272; n 2  n  3 2 2   500   82;
 NS   NS 
 i i   i i
2750000 2750000

 i 1   i 1 

 
 NS  80
n3  n  3 3 3   500   145;
 NS 
 i i

 i 1 

E) A sample of 60 persons is to drawn from a population consisting of 600 persons belonging to two villages A
and B. The means and SDs of their monthly wages are given below:
Village Size Mean SD
A 400 60 20
B 200 120 80
Draw the samples using proportion and optimum allocation.
Systematic sampling
In systematic sampling , one unit is selected randomly and subsequent units are selected according to a pre-
determined pattern. It is used in survey of timbers in a forest, library, etc.
Advantages of systematic sampling
1. Systematic sampling is very simple.
2. It is not very expensive
3. The systematic sample is uniformly distributed over the whole population.
4. Systematic sampling is more efficient than the simple random samling.
Linear Systematic Sampling
Suppose we have a population of size N and we have to draw a sample of size n. This method is applicable if
the population size N is multiple of sample size n. i.e., N = nk where k is an integer.
Step I: In this method, first of all we assign number 1 to N to the population units.
Step II: We select a random number r between 1 to k i.e. 1  r  k from the random number table where r is
called random start and k is called sampling interval
Step III: Then we select every k unit of the population is the sample. In this way we get the sample of size n as
r, r  k r  2k... r  (n  1)k
This technique will generate k systematic sample with equal probability. This method is known as linear
systematic sampling.
Example: Suppose there are 20 units in a population serially numbered 1 to 20 and we have to draw a
N 20
k  5
systematic sample of size 4. Here n 4
So first we select a random number between 1 to 5 from random number table. Suppose this number is 3 then
we select rest sample units in a systematic way as
3, 3  k, 3  2k, 3  3k
1, 8, 13, 18
Circular Systematic Sampling

The main drawback of linear systematic sampling is that it is used when N is multiple of n i.e. N = kn. But in
general N does not be always a multiple and n
k   3.75
For example N = 15 and n = 4 then 4 . In such situations we use circular systematic sampling. This
method has following steps
Step I: In this method first of all we assign number 1 to N to the population units and suppose N units may be
regarded as arranged around a circle.
; k
Step II: We take k by rounding of n to the nearest integer., i.e., n

Step III: We select a random number from 1 to N. Let is number is i

Step IV: Then we select every kth unit is circular mannar.
For example suppose we have a population of 14 household from which we have to draw a sample of size 5.
N 14
k   2.0 ; 3
Here N = 14, n = 5 so n 5

First we select a random number from 1 to N i.e. 1 to 14 let it is 7 then the selected sample is
7 , 7+k, 7+2k…
7, 10, 13, (2), 5
If we select random no say 9
9, 12,1, 4, 7
E) the information regarding production of wheat in 25 districts are collected for a particular season. Select a
systematic random sample of 7 units from the following data

Example 4: A chemical firm wants to determine how four catalysts differ in yield. The firm runs the
experiment in three of its plants types A,B and C. In each plan, the yield is measured with each catalyst. The
yields are as follows
Plant Catalyst
1 2 3 4
A 2 1 2 4
B 3 2 1 3
C 1 3 3 1

Perform the ANOVA and comment whether the yield due to a particular catalyst is significant or not at 5%
level of significance.

Test 1
Q1: A computer chip manufacturer claims that at most 2 most 2 percent of chips it produces are defective. An
electronic company, impressed by that claim, has purchased a large quality of chips. To check the claim of the
manufacturer, the company has decided to test a sample of 250 of these chips. If there are eight defective chips
among these 250, does this disprove the manufacturer’s claim at 5% level of significance.
Q2: A researcher would like to test whether there is any significant difference between safety consciousness of
men and women while driving a car. In a sample of 300 men, 130 said that they used seat belts. In a sample of
300 women, 90 said that they used seat belts. Test the claim that there is no significant between safety
consciousness of men and women while driving a car at 5% level of significance.
Q 3: A company manufacturers two types of bulbs (A and B), the manager of the company tests a random
sample of 50 bulbs of type A and 60 bulbs of type B. She obtains the following information.
Mean SD(in
Life(in hours)
Type A 1300 50
Type B 1200 60

Test there is a significance difference in the average life of two types of bulbs.

Test 2
Q1: Two sources of raw materials are under consideration by a bulb manufacturing company. Both sources
seem to have similar characteristics but the company is not sure about their respective uniformity. A sample of
12 lots from source A yields a variance of 125 and a sample of 10 lots from source B yields a variance of 112.
Is it likely that the variance of source A greater than B at significance level  = 0.01?
Q2: The pulse rates of 6 people were recorded before and after taking a new drug.
Before 68 71 84 93 67 74
After 71 70 81 97 73 80
Using 1% level, can you say that there is a significant increase in the pulse rate?

Q3: A computer chip manufacturer claims that at most 2 most 2 percent of chips it produces are defective. An
electronic company, impressed by that claim, has purchased a large quality of chips. To check the claim of the
manufacturer, the company has decided to test a sample of 250 of these chips. If there are eight defective chips
among these 250, does this disprove the manufacturer’s claim at 5% level of significance.

Example: In the given frequency distribution two frequencies are missing and its mean is found to be 1.46.
Number of Accidents (X): 0 1 2 3 4 5 Total
Frequency (f) 46 ? ? 25 10 5 200
Find the missing frequencies.
Solution: Let the missing frequencies be f1 and f2
Then  f  N  200  86  f1  f2 X f fX
Or f1  f2  200  86  114 … (i) 0 46 0
1 f1 f1
Also, since X   f X 140  f  2f
 1 2
= 1046 (Given) 2 f 2 2f 2
N 200 3 25 75
Or f1  2f2  292  140  150 … (ii) 4 10 40
Solving (i) and (ii), we get f1  76 and f2  38. 5 5 25
200  fX  140  f1  2f2

Example: The following data relate to the marks of 70 students in statistics. Find the mean.
Marks (More than): 20 30 40 50 60 70
No. of students : 70 63 40 30 18 7
Solution: In this example, a ‘more than’ cumulative frequency distribution is given. For computing mean, the
given distribution is converted into a simple frequency distribution as shown in the table:
Computing arithmetic Mean
Cumulative Classes f X X  55 fd’
d' 
classes 10
More than 20 20-30 70 – 63 = 7 25 –3 –21
More than 30 30-40 63 – 40 = 23 35 –2 –46
More than 40 40-50 40 – 30 = 10 45 –1 –10
More than 50 50-60 30 – 18 = 12 55 0 0
More than 60 60-70 18 – 7 = 11 65 1 11
More than 70 70-80 7–0=7 75 2 14
N = 70

 X  A h
 fd'  55  10  52   55  7.43  47.57 Marks.
N  70 
Example: Determine the modal value in the following series.
Value: 10 12 14 16 18 20 22 24 26 28 30 32
Frequency: 7 15 21 38 34 34 11 19 10 38 5 2
(ANS= 18)

Example: The sum of squares of 100 observations was calculated as 7961. Later, it was found that two values,
53 and 42 were wrongly read as 35 and 24 at the time of calculation. Find the corrected sum of squares.
 X  7961.
Solution: Given the incorrect 2

 Corrected  X = incorrect  X – (Squares of wrong observations) + (Squares of correct observations)

2 2

 Corrected X 2
 7931  (352  242 )  (532  422 )
= 7961 – (1225 + 576) + (2809 + 1764) = 10733.
Question1: The sum of square of 20 observations was worked out as 5100. But while calculating it, an
observation 31 was wrongly considered as 13. Find the corrected sum of squares.
Question2:The sum of squares of 50 observations is 4122. An observation 39 was wrongly includes in the
series. Find the sum of squares of the remaining 49 observations.
Question3:The arithmetic mean and the S.D. of a series of 20 items were calculated as 20 cm and 5 cm
respectively. But while calculating them, an item 13 was measured as 30. Find the correct arithmetic, mean and
standard deviation.
Question4:The mean and S.D. of 20 items are found to be 10 and 2 respectively. At the time of checking, it was
observed that one item 8 was incorrect. Find the mean and the S.D. if (i) the wrong item is omitted (ii) it is
replaced by 12.
Properties of Standard Deviation
1. The value of S.D. of a series remains unchanged if each variate value is increased or decreased by the same
constant value. In other words, we can say that the S.D is independent of change in origin.
Let Y  X  b. where b is a constant.
Then  Y   X , i.e., the S.D.’s of the variables X and Y will be equal.
Example: Suppose 5, 8, 17, 12 and 7 are five observations on a variable X. A new variables Y is obtained by
adding 2 (a constant) to each observation on X. Further, let Z be another variable defined by subtracting 3 from
each value on X. Find the standard deviations of the variable X, Y and Z, say  x ,  y and  Z respectively.
(Ans4.26, 4.26, 4.26)
2. If the value of variable X are multiplied (or divided) by a constant, the S.D. of the new observations can be
obtained multiplying (or dividing) the initial S.D. by the same constant. Symbolically,
If Y = kX, where k is a constant
Then  Y  k X
In other words, we can say that S.D. is affected by change in scale.
Example: Suppose 2, 6, 9, 5, 4 are five observations on a variable X. A new variable Y is obtained by
multiplying each observation on X by 3 (a constant). Further, another variable Z is obtained by dividing each
observation on X by 2. Then we find the S.D.’s of the variables X, Y and Z, say  X ,  Y ,  Z respectively.
(Ans: 2.32, 6.96, 1.16)

3. Combined standard deviation can be calculated if the standard deviations, means and number of items in
different groups are given. The formula used for computing combined standard deviation is as under:
Combined S.D. of two related groups.

 N1 (12  D12 )  N 2 ( 22  D 22 
12   
 N1  N 2 
12  Combined S.D. of two groups.
1  Standard deviation of the first group.
 2  Standard deviation of the second group.
N1  No. of observations in the first group.
N2  No. of observations in the second group.
D1  X1  X12 , X12  = combined mean of the two groups.
D 2  X 2  X12  .
X1  mean of the first group
X 2  mean of the second group.

Question5: The standard deviation of 5 items is found to be (50) . What will be the standard deviation if
the values of all the items are increased by 5? (Ans (50) . )
Question6: A sample of 35 values has mean 80 and S.D. 4. A second sample of 65 values from the same
population has mean 70 and S.D.3. Find the mean and standard deviation of the combined sample of 100
values. (Ans: 5.85)

Question7:Find the mean and the standard deviation of the two groups taken together:
Group Number Mean S.D.
A 113 160 22
B 120 150 20

(Ans: 154.85,21.58)

Example: A computer while calculating the correlation coefficient between two variables X and Y from 25
pairs of observations obtained the following results:
N  25,  X  125,  Y 100,  X 2  650,  Y 2  460 and XY  508.
It was, however, discovered at the time of checking that two pairs of observations were not correctly copied.
They were taken as (6, 14) and (8, 6) while the correct values were (8, 12) and (6, 8). Find the corrected value
of r.
Incorrect values Correct values
X Y X2 Y 2 XY X Y X2 Y 2 XY
6 14 36 196 84 8 12 64 144 96

8 6 64 36 48 6 8 36 64 48
Total 14 20 100 232 132 Total 14 20 100 208 144

Corrected  X = 125 – 14 + 14 = 125
Corrected  Y = 100 – 20 + 20 = 100
Corrected  X = 650 – 100 + 100 = 650

Corrected  Y = 460 – 232 + 208 = 436


Corrected  XY = 508 – 132 + 144 = 520

Question: With 10 observations each on two variables X and Y, the following data were observed:
X  12, X  3, Y  15,  y  4 and r  0.5. However, on subsequent verification it was found that one value of
X( = 15) and one value of Y(= 13) were wrongly taken as 16 and 18. Find the correct value of correlation
Cov (X, Y)  X  X Y  Y 
Since, r  
x , y N x y
  ( X  X) ( Y  Y)  N  r     x y

or XY  XY  YX  X Y  10  0.5 3 4
or XY  YX  XY   X Y  60
or XY  N Y X  N Y X  N X Y  60 [as  X  N X and  Y  N Y ]
or XY  60  N X Y  60 10 12 15
= 1860
Corrected  XY  1860  (16 18)  (15 13)  1767
Corrected X  12 10 16 15  119
Y  1510 18 13  145
 X    X  X
2 2 2

Now considering   2
 N 
x  X2 
  NN
Or 2 2 2

X  N X  X  10(9 144)  1530 
Similarly, Y 2
 
 N 2Y  Y 2  10(16  225)  2410

 Corrected X 2
 1530 162  152  1499
Corrected  Y  2410 182  132  2255

Therefore, the corrected value of r is

Random Numbers

The random numbers have been generated through a probabilistic mechanism. The numbers have the following

i) The probability that each digit 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 will appear at any particular place is the same,
namely 1/10.
ii) The occurrence of any two digits, in any two places, is independent to each other.
Pseudo-Random Numbers
The random numbers, which may be repeated after a certain period in a cycle or there may be a correlation
between successive numbers or several high numbers may follow or precede several low numbers. These are
called pseudo random numbers. In other words-Random numbers are called pseudo random numbers when
they are generated by some deterministic process. The generated numbers can be effectively used in simulation
if the period of cycle is very large.

Use of Random Number Generation

In probability and statistical application, we require large quantities of random numbers. Then to read the
random numbers from the tables is and use in the analysis very slow. And sometimes we need the random
numbers larger than the published in the table. So it is necessary to derive a mechanism, through which we can
generate random numbers automatically. Also in different situation we need the random numbers from
different distribution such as Poisson, binomial Normal, Gamma, etc. In such situation, we use the generated
random numbers.
Generation of Random Numbers
There are various techniques for generating random numbers. Some important methods of generating random
numbers are
1) Lottery Method
2) Middle Square Method
3) Congruential Method


You might also like