IME602Slides 01

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 294

IME602: PROBABILITY &

STATISTICS (Part 01)


Raghu Nandan Sengupta
Department of Management Sciences (DoMS)
Indian Institute of Technology Kanpur, INDIA

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 1


(Part # 01)
Objective of the Course

 Nowadays the use of Probability and Statistics in all domains


of science and engineering is wide spread and ubiquitous.
 Hence it becomes imperative for the theoretician as well as
the practitioner to understand the basics and use probability
and statistical tools as well as methods in a well-grounded
manner.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 2


(Part # 01)
Objective of the Course
(contd…)
 This course will undoubtedly benefit students in their masters and doctoral
programs when they work in a variety of areas like engineering, social
sciences, management science, etc., where probability, statistics and their
methodologies are used quite extensively.
 Furthermore the knowledge gained from this course will benefit learners
tackle and solve interesting problems both from theoretical as well as
practical viewpoints.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 3


(Part # 01)
Key learning take aways
 Will help students master the rich repertoire of methodology and tools used in the
domain of Probability, Statistics, Distribution, Sampling Theory, Sampling
Distribution, Regression Analysis, etc.

 Will facilitate students with both the basic and advanced theoretical background in
Probability & Statistics.

 Will equip learners with the requisite skills in utilizing different techniques through
practical applications and solved examples.
• Will help participants learn and build on their expertise in the use of data, statistical
theory and statistical tools in myriad of applications like engineering, management
science, social science, basic sciences, etc.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 4


(Part # 01)
Syllabus
Axioms of probability; Conditional probability; Discrete and continuous random
variables; Functions of random variables; Moments of random variables;
Generating functions; Limit theorems; Jointly distributed random variables;
Sufficiency and completeness; Descriptive and inferential Statistics; Sampling
theory and sampling distributions, Method for statistical inference; Theory of
point estimation and estimation of parameters; Theory of interval estimation;
Theory of hypotheses testing; Analysis of variance; Brief Introduction of
Multivariate Analysis; Linear and multiple linear regression; Introduction to
statistical Packages, e.g., MATLAB.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 5


(Part # 01)
Text Books
1) Rohatgi, V. K. and Saleh, A. K. Md. E., An Introduction to
Probability and Statistics, John Wiley &Sons, 2001, ISBN
(10): 9814-12-603-9.

2) Larson, H. J., Introduction to Probability Theory and


Statistical Inference, John Wiley & Sons, 1982.

3) Ross, S. M., Introduction to Probability Models, Harcourt


Indian Private Ltd., 2000, ISBN (10): 0-12-598475-8.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 6


(Part # 01)
Reference Books
1) Casella, G. and Berger, R. L., Statistical Inference, Duxbury Advanced
Series, 2002, ISBN (10): 81-315-0394-1.

2) Cox, D. R. and Hinkley, D. V., Theoretical Statistics, Chapman and Hall,


1974, ISBN (10): 0-412-12420-3.

3) Cràmer, H., Mathematical Methods of Statistics, Princeton University


Press, 1999, ISBN (13): 978-0-691-00547-8.

4) Feller, W., An Introduction to Probability Theory and its Applications


(Volume I), John Wiley & Sons, 2000, ISBN (10): 9971-51-315-3.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 7


(Part # 01)
Reference Books (contd..)
5) Feller, W., An Introduction to Probability Theory and its Applications (Volume II),
John Wiley & Sons, 2000, ISBN (10): 9971-51-298-X.

6) Goon, A. M., Gupta, M. K. and Dasgupta, B., An Outline of Statistical Theory


(Volume I), The World Press Private Limited, Calcutta, 1998.

7) Goon, A. M., Gupta, M. K. and Dasgupta, B., An Outline of Statistical Theory


(Volume II), The World Press Private Limited, Calcutta, 2000, ISBN (10): 81-
87567-26-0.

8) Goon, A. M., Gupta, M. K. and Dasgupta, B., Fundamentals of Statistics (Volume


I), The World Press Private Limited, Calcutta, 2000, ISBN (10): 81-87567-25-2.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 8


(Part # 01)
Reference Books (contd..)
9) Goon, A. M., Gupta, M. K. and Dasgupta, B., Fundamentals of Statistics
(Volume II), The World Press Private Limited, Calcutta, 2001, ISBN (10): 81-
87567-18-X.

10)Hogg, R. V. and Craig, A. T., Introduction to Mathematical Statistics, Pearson


Education, 2004, ISBN (10): 81-7808-630-1.

11)Loeve, Michel, Probability Theory, Affiliated East West Press Pvt. Ltd., 1963.

12)Mood, A. M., Graybill, F. A. and Boes, D. E., Introduction to the Theory of


Statistics, Tata McGraw Hill Publication, 2001, ISBN (10): 0-07-044520-6.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 9


(Part # 01)
Reference Books (contd..)
13) Parzen, E., Modern Probability Theory and its Application, John Wiley & Sons,
1992, ISBN (10): 0471668257.

14) Lehmann, E. L. and Romano, J. P., Testing Statistical Hypotheses, Springer Verlag
Publishers, 2005, ISBN (10): 0387988645.

15) Draper, N. R. and Smith, H., Applied Regression Analysis, Wiley & Sons, 1981,
ISBN (10): 0471170828.

16) Walpole, R. E., Myers, R. H., Myers, S. L. and Ye, K., Probability and Statistics
for Engineers and Scientists, Pearson Education, 2007, ISBN (10): 81-317-1552-3.

17) Yule, G. U. and Kendall, M. G., An Introduction to the Theory of Statistics,


Universal Book Stall, 1999, ISBN (10): 81-85461-71-6.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 10


(Part # 01)
Plan of classes (considering 14 weeks, with each
week 02 classes, of 1.25 hour each)

S.No. Coverage of Topics Week(s)

01 Axioms of probability; Conditional probability 01 to 02

02 Discrete and continuous random variables; Functions of random 03 to 04


variables; Moments of random variables

03 Generating functions; Limit theorems; Jointly distributed random 05 to 06


variables; Sufficiency and completeness

04 Descriptive and inferential Statistics; Sampling theory and 07 to 09


sampling distributions
05 Method for statistical inference; Theory of point estimation and 10 to 13
estimation of parameters; Theory of interval estimation; Theory of
hypotheses testing; Analysis of variance
06 Brief Introduction of Multivariate Analysis; Linear and multiple 14
linear regression
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 11
(Part # 01)
Evaluation Methodology

1) Quizzes: 20%
2) Assignments: 20%
3) Mid-term examination: 25%
4) Final examination: 35%

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 12


(Part # 01)
Software/Language
1) MATLAB <http://www.mathworks.com/> . One
can find sever based MATLAB at
https://www.iitk.ac.in/ccnew/
2) R <https://www.r-project.org/>
3) SPSS https://www.spss.co.in/
4) SAS <https://www.sas.com/en_in/home.html>

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 13


(Part # 01)
Practical cases
1) News paper vendor: wants to maximize
profit.
2) Production manager: wants to minimize
waiting time of jobs on machines and thus
reduce inventory and costs.
3) Marketing manager: wants to recommend
the best changes for a product (which is
being sold in the market) so that it will do
well.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 14


(Part # 01)
Statistics
The word STATISTICS is derived
from the Italian word stato, which
means ″state″ and statista refers
to a person involved with the
affairs of the state.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 15


(Part # 01)
Statistics
 Now a days, STATISTICS (in a plural
sense) is the study of qualitative and
quantitative data from our surrounding, be
it environment or any system so as to
draw meaningful conclusions about the
environment or system.
 It also means (in the singular sense) the
body of methods that are meant for
treatment of such data

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 16


(Part # 01)
Main steps in the study of
Statistics
 Method of collection of data
(primary or secondary)
 Scrutiny of data
 Presentation of data (non
frequency data, frequency
data)
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 17
(Part # 01)
Main steps in the study of
Statistics
 Analysis of data through
statistical models/methods
 Conclusions from results thus
obtained
 Modification of statistical
models/methods depending on
results obtained
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 18
(Part # 01)
Descriptive Statistics
Presentation of data
 Non frequency data
 Frequency data

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 19


(Part # 01)
Non frequency data
Time series or Historical data
Consider the case where the
representation of the values of one or
more variables like population of India,
price of petroleum etc., may be given for
different periods of time. For instance we
may be interested in knowing the
population change over time or the change
of production of petroleum over time.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 20


(Part # 01)
BSE(30) Close

0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000

(Part # 01)
3-Jan-94

5-Jan-94
7-Jan-94

11-Jan-94

IME602:Probability & Statistics


13-Jan-94
17-Jan-94

19-Jan-94
24-Jan-94

27-Jan-94
31-Jan-94
2-Feb-94

4-Feb-94
8-Feb-94

10-Feb-94
14-Feb-94

16-Feb-94

Date
BSE(30) Close)

18-Feb-94
22-Feb-94

R.N.Sengupta,DoMS.,IIT Kanpur,INDIA
24-Feb-94
28-Feb-94

1-Mar-94
3-Mar-94
7-Mar-94

9-Mar-94
Non frequency data

15-Mar-94

17-Mar-94
21-Mar-94
23-Mar-94

25-Mar-94
Time series or Historical data

29-Mar-94
21

31-Mar-94
Non frequency data
Spatial series data
It may be that the values of one or more
variables are given for different individuals
in a group for the same period of time. But
instead of considering the group as such
we may be more interested in studying the
way the values of the variable(s) change
from individual to individual in that group.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 22


(Part # 01)
Non frequency data
Spatial series data
Fertiliser Consumption for few Indian states for 1999-2000 (in tonnes)
Fertiliser consumption (in tonnes)

3500000

3000000

2500000

2000000

1500000

1000000

500000

Fertiliser Consumption States

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 23


(Part # 01)
Frequency data
We have the data on one or more variables for
different individuals for different periods of time
or for different points. But now we are more
interested in the characteristic(s) of the group
rather than the individuals in that group. In
studying the IQ level of students in a school we
may be interested in such group characteristics
as the percentage of students with IQ higher
than 130 or the percentage of students with
average IQ less than 90, etc.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 24
(Part # 01)
Frequency data:
Tabular representation
India at a glance Year
(% of GDP) 1983 1993 2002 2003
Agriculture 36.6 31.0 22.7 22.2
Industry 25.8 26.3 26.6 26.6
Mfg 16.3 16.1 15.6 15.8
Services 37.6 42.8 50.7 51.2
Pvt Consump 71.8 37.4 65.0 64.9
GOI consump 10.6 11.4 12.5 12.8
Import 8.1 10.0 15.6 16.0
Domes save 17.6 22.5 24.2 22.2
Interests paid 0.4 1.3 0.7 18.3
Note: 2003 refers to 2003-2004; data are preliminary. Gross domestic
savings figures are taken directly from Indias central statistical
organization.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 25


(Part # 01)
Frequency data
Diagrammatic representation
This is the most commonly used for representing time
series. The line diagram (also called histogram) is a
graph showing the relationship of the given variable with
time. There may be three types of line diagram, for the
scales used for both the axes of co-ordinates may be
arithmetic (or natural) scales, or one of them may be
arithmetic and the other logarithmic, or both may be
logarithmic. A line diagram where the vertical scale is
logarithmic but the horizontal scale is of the ordinary
arithmetic type is called a ratio chart or semi-logarithmic
chart. When both the vertical as well as the horizontal
axes are logarithmic then the chart is called the doubly-
logarithmic chart.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 26
(Part # 01)
Frequency data
Line diagram representation
Fertiliser Consumption for few Indian states for 1999-2000 (in tonnes)
Fertiliser consumption (in tonnes)

3500000

3000000

2500000

2000000

1500000

1000000

500000

Fertiliser Consumption States

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 27


(Part # 01)
Frequency data: Bar diagram
(histogram) representation
World Population (projected mid 2004)
7000000000

6000000000

5000000000
Population

4000000000

3000000000

2000000000

1000000000

0
1950 1960 1970 1980 1990 2000

Year
World Population

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 28


(Part # 01)
Frequency data: Bar diagram
(histogram) representation
World Population (projected mid 2004)

2000

1990

1980
Year

1970

1960

1950

0 1000000000 2000000000 3000000000 4000000000 5000000000 6000000000 7000000000

Population

World Population

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 29


(Part # 01)
Frequency data: Bar diagram
(histogram) representation
Number of countries
12

10

8
Number

0
10 to 15 15 to 20 20 to 25 25 to 30 30 to 35 35 to 40 40 to 45 45 to 50

GDP in 1000 US$ (year 2002)


Number of countries

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 30


(Part # 01)
Frequency data: Bar diagram
(histogram) representation
Height and Weight of individuals
200
180
160
140
Height/Weight

120
100
80
60
40
20
0
Ram Shyam Rahim Praveen Saikat Govind Alan

Individual
Height (in cms) Weight (in kgs)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 31


(Part # 01)
Frequency data: Bar diagram
(histogram) representation
Height and Weight of individuals

Alan

Govind

Saikat
Individual

Praveen

Rahim

Shyam

Ram

0 20 40 60 80 100 120 140 160 180 200

Height/Weight

Height (in cms) Weight (in kgs)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 32


(Part # 01)
Frequency data
Pictorial diagram representation
In this type of presentation we can
represent the data more vividly and in
many a cases it is the popular method of
representing the data. Here a suitable
symbol is first chosen to represent a
definite number/quantities of units of the
variable. Against each data or observation
the symbol is represented proportionally
so that we can get the idea of the variable
quantities against that data point.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 33
(Part # 01)
Frequency data
Pictorial diagram representation
Number of universities in the following states in USA
Note: Each of the  symbol represents 5 universities
Alabama 
Alaska 
Arizona 
Arkansas 
Colorado 
Connecticut 
Delaware 
Kansas 

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 34


(Part # 01)
Frequency data
Statistical map representation
If for example we are interested to show
diagrammatically regional seismicity in Alaska of
earthquakes of all magnitudes reported between
01/01/1960 to 11/09/2002. The colour code as
shown indicates the depth of the event. Thus
blue: 0 < h  33 km, green: 33 < h  75 km, red:
75 < h <= 125 km and yellow : h > 125 km. The
larger circles are earthquakes of M 7.0 and
higher from 1900-11/09/2002. The colour of
circle indicates the depth of the event, as above.
The star indicates the location of the 03/11/2002
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 35
(Part # 01)
Frequency data
Statistical map representation

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 36


(Part # 01)
Frequency data: Divided bar
diagram representation
Consider we have the time spent in hours
by student appearing for the CBSE
examination for the preparation of
Mathematics, Physics, Chemistry and
Biology. We collect the student's
preparation pattern for a five day period
and want to represent the data thus
obtained. In that case we would use the
divided bar diagram as illustrated below.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 37
(Part # 01)
Frequency data: Divided bar
diagram representation
Time spent in preparation for subjects

8.0

7.0

6.0

5.0
Hours

4.0

3.0

2.0

1.0

0.0
Monday Tuesday Wednesday Thursday Friday

Day

Mathematics Physics Chemistry Biology

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 38


(Part # 01)
Frequency data: Stacked column
diagram representation
The method of depicting the data is almost
similar to the divided bar diagram
representation, but here we represent the
percentage wise figures for the variables for
each data point. Consider we are finding the
consumption in rupees for the four main
categories of food of a family in the months of
January to June. Remembering that the total
amount spent for each month can be different,
we depict the percentage wise consumption in
food for the four categories.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 39
(Part # 01)
Frequency data: Stacked column
diagram representation
Percentage wise consumption of food

June

May

April
Month

March

February

January

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Rice Wheat Vegetables Cereals Percentage

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 40


(Part # 01)
Frequency data: Pie
diagram/chart representation
When the values of a variable are given
for a number of categories, as in spatial
series, we may be interested in a
comparison of the categories or series or
the contribution of each category to the
total. Here the proportions or percentages
of various categories, rather than the
absolute values for the categories, will be
the principal subject of study
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 41
(Part # 01)
Frequency data: Pie
diagram/chart representation
Median marks in JMET (2003)

Verbal Quantitative Analytical Data Interpresentation

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 42


(Part # 01)
Frequency data
Textual representation
In textual representation of data we depict the
information through text.
Consider for the year 2004-2005 we know the
number of post graduate students who have
registered in different engineering course at IIT
Kanpur. The figures are 83 in Aerospace, 88 in
Chemical, 139 in Civil, 222 in Electrical, 176 in
Mechanical, 115 in Computer Science. Given
this data we may be required to utilize this
information to answer some queries.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 43
(Part # 01)
Frequency data
Stem leaf representation
The stem and leaf representation is a quick way of
looking at the data set. It contains the information of a
histogram but avoids the loss of information in a
histogram that results from aggregating the data into
intervals. The stem and leaf display is based on the
tallying principle but also uses the decimal base of our
number system. In the steam and leaf representation,
the stem is the number without its rightmost digit (the
leaf). The steam is written to the left of a vertical line
separating the steam from the leaf. Suppose we have
the numbers 105, 106, 107, 109, 100, 108. Then if we
use the steam and leaf representation we would depict
the numbers as 10 | 567908
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 44
(Part # 01)
Frequency data
Box plot representation
The box plot is also called the box
whisker plot. A box plot is a set of five
summary measures of distribution of the
data which are
 median
 lower quartile
 upper quartile
 smallest observation
 largest observation.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 45


(Part # 01)
Frequency data
Box plot representation

Whisker

LQ Median UQ

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 46


(Part # 01)
Frequency data
Box plot representation
Here:
 UQ – LQ = Inter quartile range
(IQR)
 X = Smallest observation within
1.5(IQR) of LQ
 Y = Largest observation within
1.5(IQR) of UQ
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 47
(Part # 01)
Definitions
Quantitative variable: It can be described by a
number for which arithmetic operations such as
averaging make sense.
Qualitative (or categorical) variable: It simply
records a qualitative, e.g., good, bad, right, wrong, etc.

As already discusses statistics deals with


measurements, some being qualitative others being
quantitative. The measurements are the actual
numerical values of a variable. Qualitative variables
could be described by numbers, although such a
description might be arbitrary, e.g., good = 1, bad = 0,
right = 1, wrong = 0, etc.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 48


(Part # 01)
Scales of measurement
Nominal scale: In this scale numbers are
used simply as labels for groups or classes. If
we are dealing with a data set which consists
of colours blue, red, green and yellow, then we
can designate blue = 3, red = 4, green = 5 and
yellow = 6. We can state that the numbers
stand for the category to which a data point
belongs. It must be remembered that nothing
is sacrosanct regarding the numbering against
each category. This scale is used for
qualitative data rather than quantitative data.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 49
(Part # 01)
Scales of measurement
Ordinal Scale: In this scale of
measurement, data elements may be
ordered according to relative size or
quality. For example a customer or a
buyer can rank a particular
characteristics of a car as good, average,
bad and while doing so he/she can
assign some numeric value which may
be as follows, characteristic good = 10,
average = 5 and bad = 0.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 50
(Part # 01)
Scales of measurement
Interval Scale: For the interval scale we specify intervals
in a way so as to note a particular characteristic, which
we are measuring and assign that item or data point
under a particular interval depending on the data point.
Consider we are measuring the age of school going
students between classes 5 to 12 in the city of Kanpur.
We may form intervals 10-12 years, 12-14 years,....., 18-
20 years. Now when we have one data point, i.e., the
age of a student we put that data under any one
particular interval, e.g. if the student's age is 11 years,
we immediately put that under the interval 10-12 years.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 51


(Part # 01)
Scales of measurement
Ratio Scale: If two measurements are in
ratio scale, then we can take ratios of
measurements. The ratio scale
represents the reading for each recorded
data in a way which enables us to take a
ratio of the readings in order to depict it
either pictorially or in figures. Examples
of ratio scale are measurements of
weight, height, area, length etc.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 52
(Part # 01)
Definitions
Population: Consists of the set of all measurements in
which the investigator is interested. Example can be all
the students in the city of Kanpur. The population is also
called the universe. A population is denoted by N

Sample: Is a subset of measurements selected from the


population. Sampling from the population is often done
randomly, such that every possible sample of n elements
will have an equal chance of being selected. A sample
selected in this way is called a simple random sample
or just a random sample. Example can be the students
in Kendriya Vidalaya inside IIT Kanpur campus. A
sample is denoted by n
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 53
(Part # 01)
Definitions
 Tally number: By tally number
we mean the tally we give
depending upon the number of
times that particular value of the
variable occurs in the total
universe or sample.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 54


(Part # 01)
Definitions
 Frequency (absolute frequency): By
frequency (absolute frequency) we mean
the number of data points which fall within
a given class or for a given value in a
frequency distribution. It means that it
denotes the number of occurrence or
happening of a particular outcome. We
denote frequency (absolute frequency)
with fi.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 55
(Part # 01)
Definitions
 Cumulative frequency: The cumulative
frequency corresponding to the upper
boundary of any class interval or value in a
frequency distribution is the total absolute
frequency of all values less (greater) than
that boundary for the class or value. We
denote cumulative frequency less (greater)
than type by F   f i ( F   f i )
n in n in

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 56


(Part # 01)
Example # 001
Consider we have the following data related to
the size in numbers of thirty families in the city of
Jaipur.

2, 6, 3, 4, 4, 5, 3, 6, 4, 4, 5, 3, 2, 3, 6, 5, 4, 4, 4,
3, 2, 4, 5, 6, 7, 4, 4, 5, 3, 3.

Now the question is how do we represent the


data using tally numbers, frequencies,
cumulative frequencies?
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 57
(Part # 01)
Example # 001 (contd..)
# of members Tally # fi F   fi F   fi
n n in
in

2 ||| 3 3 30
3 |||| || 7 10 27
4 |||| |||| 10 20 20
5 |||| 5 25 10
6 |||| 4 29 5
7 | 1 30 1

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 58


(Part # 01)
Example # 001 (contd..)
Pictorial representation of cumulative frequencies

35

30

25
Cumulative frequency

20

15

10

0
2 3 4 5 6 7

CF < then type CF < then type Number

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 59


(Part # 01)
Ogive and Median
A cumulative frequency diagram
is called the ogive. The abscissa
of the point of intersection
represents the median of the
data.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 60


(Part # 01)
Guidelines for depicting
frequency tables
1) The classes should be mutually
exclusive and exhaustive
2) The number of classes should be neither
too small not too large
3) As far as practicable, the classes should
be of equal lengths

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 61


(Part # 01)
Example # 002
The following data relates to the height in cms of forty
individuals.
160.1, 167.2, 181.3, 154.7, 172.3, 161.3, 182.4, 158.2,
167.3, 159.4, 150.1, 157.3, 152.8, 155.8, 146.0, 162.0,
147.9, 149.9, 173.4, 166.4, 182.3, 151.2, 168.3, 170.1,
187.6, 163.4, 183.3, 171.9, 179.4, 166.8, 179.2, 168.3,
165.2, 166.7, 165.1, 166.3, 166.3, 173.4, 164.2, 164.9

1) We are required to prepare a frequency distribution table


showing the frequencies and the cumulative frequencies.
2) We are required to draw a histogram to exhibit the
frequency distribution graphically.
3) We are required to draw the two ogives.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 62


(Part # 01)
Example # 002 (contd..)
Frequency table for the heights
Class Interval fi CF(< then) CF(>
then)
145.95-152.95 6 6 40
152.95-159.95 5 11 34
159.95-166.95 13 24 29
166.95-173.95 9 33 16
173.95-180.95 2 35 7
180.95-187.95 5 40 5

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 63


(Part # 01)
Example # 002 (contd..)
To draw the cumulative frequency less (greater)
than type, first specify the class intervals. Then
depict the class intervals along the horizontal
axis and the cumulative frequency less (greater)
than type along the vertical axis.
Against each class interval mark the point by the
corresponding cumulative frequency less
(greater) than type value. Join the points (the
values) depicting the cumulative frequencies
less (greater) than type with straight lines, which
will give us the respective ogives
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 64
(Part # 01)
Example # 002 (contd..)
Cumulative frequency chart
45

40

35
Cumulative frequency

30

25

20

15

10

0
145.95-152.95 152.95-159.95 159.95-166.95 166.95-173.95 173.95-180.95 180.95-187.95

CF < then type CF < then type Class interval

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 65


(Part # 01)
Definitions
1) Class limit: The end points of any class
2) Class boundaries: The end points of any
class interval
3) Relative frequency: It is the ratio of f i
F

 Note: Using this concept of relative frequency


and cumulative relative frequency less
(greater) than type we can also draw the
cumulative relative frequencies curves

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 66


(Part # 01)
Definitions: Different
Measures
1) Measure of central tendency
 Mean (Arithmetic mean (AM), Geometric
mean (GM), Harmonic mean (HM))
 Median
 Mode
2) Measure of dispersion
 Variance or Standard deviation
 Skewness
 Kurtosis

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 67


(Part # 01)
Definition: Different Means
 Given N number of observations,
X1,….., Xn we define the following
AM=(X1+…..+Xn)/n
GM=(X1*…*Xn) 1/n

HM=n/(1/X1+…..+1/Xn)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 68


(Part # 01)
Arithmetic Mean (m)

When estimating the long-term


expectation of a random variable,
the arithmetic mean is a natural
choice, e.g. finding the average
age of a group of persons,
average income of a group of
people etc.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 69
(Part # 01)
Harmonic Mean
Consider a car travels a distance x with a
velocity of v1 and returns back the same
distance with a velocity of v2. What is the
average velocity?
2x
v
x x

v1 v2

Hence we see that in this case we use the


harmonic mean and not arithmetic mean

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 70


(Part # 01)
Harmonic Mean
Application areas are: (1) design stream
flow estimation for waste load allocation
(2) estimation of effective petrochemical
and geophysical properties of a
heterogeneous system of porous media.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 71


(Part # 01)
Geometric Mean
Suppose you have an investment which
earns 10% the first year, 50% the second
year, and 30% the third year and we are
interested in finding an equivalent average
rate of return (geometric mean), say r:
Now we have
P(1+0.1)(1+0.5)(1+0.3) = P(1+r)3
Hence r = 28.97%
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 72
(Part # 01)
Median (e) and Mode (o)
 Median(e) : The median of a data set is
the value below which lies half of the data
points. To find the median we use F (e) =
0.5.
 Mode (o): The mode of a data set is the
value that occurs most frequently. Hence
f(o)  f(x);  x.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 73


(Part # 01)
Example # 003

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 74


(Part # 01)
Example # 004

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 75


(Part # 01)
Example # 005
Consider we have the following data
points:
5, 7, 10, 7, 10, 11, 3, 5, 5
For these data points we have
m = 7; e = 10; o = 5; 2 = 6.89

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 76


(Part # 01)
Variance, Standard deviation,
Skewness, Kurtosis
 Variance: V[X] =   E X  E X 
2 2

 Standard deviation (SD) = 

3 3
 Skewness =  1    3
 3

  2 2
 4 
 Kurtosis =  2   2  3   4  3
 

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 77


(Part # 01)
Variance

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 78


(Part # 01)
Variance

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 79


(Part # 01)
Variance

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 80


(Part # 01)
Example # 006

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 81


(Part # 01)
Example # 007

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 82


(Part # 01)
Covariance & Correlation
Coefficient

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 83


(Part # 01)
Covariance

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 84


(Part # 01)
Covariance

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 85


(Part # 01)
Correlation Coefficient: Proof

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 86


(Part # 01)
Correlation Coefficient: Proof

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 87


(Part # 01)
Correlation Coefficient: Proof

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 88


(Part # 01)
Example # 008
143.995 147.122 148.456 148.884 150.028 150.245 151.375 151.465
Ht (cms) 140.421 1 2 1 1 5 6 3 6 151.789
62.4664 62.8498 64.2049 67.5864 68.2473 69.0382 69.8397 70.5646 70.6388
Wt (Kgs) 61.9169 9 1 3 3 1 2 2 6 3
152.174 152.599 156.278 160.920 161.346 161.542 166.054 168.003 168.222
Ht (cms) 1 8 3 3 1 3 1 9 9 171.456
71.8814 72.3498 73.8620 74.3739 79.7383 83.8073 84.9207 85.5075
Wt (Kgs) 3 3 73.7473 8 6 3 80.5299 8 5 9

•Given the height (in cms) and Weight (in Kgs) of 20


twenty students studying in Class XI at Sheiling House,
Kanpur. Find the mean, variance, correlation and
covariance values
•Remember this is a sample and NOT the population
hence we use the following terms
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 89
(Part # 01)
Example # 008 (contd…)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 90


(Part # 01)
Descriptive statistics
Suppose the data are available in the form of a
frequency distribution. Assume there are k
classes and the mid-points of the corresponding
class intervals being x1, x2,…., xk. While the
corresponding frequencies are f1, f2,….., fk, such
that n = f1+f2+…..+fk
1 k 1 k 2
1
   xi f i   {  ( xi  x ) f i } 2
Then: n i 1 n i 1

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 91


(Part # 01)
Histogram for the heights
Frequency distribution

14

12

10
Frequency

0
145.95-152.95 152.95-159.95 159.95-166.95 166.95-173.95 173.95-180.95 180.95-187.95

frequency Class

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 92


(Part # 01)
Ogive curves for the heights
Cumulative frequency chart
45

40

35
Cumulative frequency

30

25

20

15

10

0
145.95-152.95 152.95-159.95 159.95-166.95 166.95-173.95 173.95-180.95 180.95-187.95

CF < then type CF < then type Class interval

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 93


(Part # 01)
Descriptive statistics
Consider m groups of observations with
respective means 1, 2,….., m and standard
deviations 1, 2,….., m. Let the group sizes be
n1, n2,….., nm such that n = n1+ n2+…..+ nm.
Then: 1 m
 OVERALL    i f i
n i 1
1 m 2
m
2
1
 OVERALL  [ {  ni i   ni (  i   OVERALL ) }] 2
n i 1 i 1

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 94


(Part # 01)
Example # 009
In a batch of 10 children the IQ of the dull
boy is 36 below the average IQ of the
other children. Shown that the standard
deviation of IQ for all the children cannot
be less than 10.8. If the standard deviation
is actually 11.4, determine what is the
standard deviation when the dull boy is left
out.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 95


(Part # 01)
Example # 009 (contd..)
Take k =2, such that n1 = 1 and n2 =9.
It is given that 1 - 2 = 36 and we also
know that 1 = 0. 2
1
2
 OVERALL  [0.9 2  116 .64]
Hence:
 From above we have OVERALL > 10.8
 If OVERALL = 11.4, we have 2 = 3.9.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 96


(Part # 01)
Set Theory: Definition/Rules
 Set theory: Branch of
mathematical logic that studies
sets
 Set: A well defined collection of
objects, called numbers or
elements of the set

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 97


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 98


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 99


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 100


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 101


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 102


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 103


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 104


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 105


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 106


(Part # 01)
Example # 010

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 107


(Part # 01)
Venn Euler Diagram
U

A is Red

B is Yellow (Remember A is inside B)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 108


(Part # 01)
Venn Euler Diagram

A is RED

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 109


(Part # 01)
Venn Euler Diagram

A B

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 110


(Part # 01)
Venn Euler Diagram

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 111


(Part # 01)
Venn Euler Diagram

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 112


(Part # 01)
Venn Euler Diagram

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 113


(Part # 01)
Venn Euler Diagram
U

A B

B-A=BAC

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 114


(Part # 01)
Venn Euler Diagram

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 115


(Part # 01)
Venn Euler Diagram

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 116


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 117


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 118


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 119


(Part # 01)
Set Theory: Definition/Rules

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 120


(Part # 01)
Probability
 Probability is the branch of mathematics
concerning numerical descriptions of how
likely an event is to occur, or how likely it is
that a proposition is true
 Probability of an event is defined as a
quantitative measure of uncertainty of the
occurrence of the event

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 121


(Part # 01)
Probability
Probability (P(A)): Of an event is defined as a quantitative
measure of uncertainty of the occurrence of the event

 Objective probability: Based on game of chance and


which can be mathematically proved or verified . If the
experiment is the same for two different persons, then the
value of objective probability would remain the same. It is
the limiting definition of relative frequency. Example: be
probability of getting the number 5 when we roll a fair die.
 Subjective probability: Based on personal judgment,
intuition and subjective criteria. Its value will change from
person to person. Example one person sees the chance of
India winning the test series with Australia high while the
other person sees it to be low.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 122


(Part # 01)
Random event
Random experiment: Is an experiment whose
outcome cannot be predicted with certainty.
 Sample space () : The set of all possible
outcomes of a random experiment
 Sample point (i): The elements of the
sample space
 Event (A): Is a subset of the sample space
such that it is a collection of sample point(s).

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 123


(Part # 01)
Random event
For a random experiment, we denote

P(i) = pi P( A)   pi
 i A

Where:
 P(i) = pi = Probability of occurrence of the sample
point i
 P(A) = Probability of occurrence of the event
 P ()   p i  1
 i

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 124


(Part # 01)
Example #011
Suppose there are two dice each with faces 1, 2,....., 6
and they are rolled simultaneously. This rolling of the
two dice would constitute our random experiment
Then we have:
  = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1),.…..,
(5,6), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}.
 i = (1,1), (1,2),…., (6,5), (6,6)
 We define the event is such that the outcomes for
each die are equal in one simultaneous throw, then A
= {(1, 1), (2, 2),….., (6, 6)}
 P(i): p1 = p2 = ….. = p36 = 1/36
 P(A) = p1 + p8 + p15 + p22 + p29 + p36 = 6/36
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 125
(Part # 01)
Example # 012
Suppose a coin is tossed repeatedly till the first
head is obtained.
Then we have:
  = {(H), (T,H), (T,T,H),………}
 i = (H), (T,H), (T,T,H),…..
 We define the event such that at most 3 tosses
are needed to obtain the first head, then A =
{(H), (T,H), (T,T,H)}
 P(i): p1 = ½, p2 = (½)2, p3 = (½)3, p4 = (½)4,..…
 P(A) = p1 + p2 + p3 = 7/8
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 126
(Part # 01)
Classical definition of
probability
Under this definition we consider the
following:
 Sample space is finite.
 All the sample points are equally likely,
i.e., they have equal probability of
occurrence or equal relative frequency of
occurrence.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 127


(Part # 01)
Example # 013
In a club there 10 members of whom5 are
Asians and the rest are Americans. A committee
of 3 members has to be formed and these
members are to be chosen randomly. Find the
probability that there will be at least 1 Asian and
at least 1 American in the committee
Total number of cases = 10C2 and the number of
cases favouring the formation of the committee
is 5C2*5C1 + 5C1*5C2
Hence P(A) = 100/120

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 128


(Part # 01)
Example # 014
There is box containing 10 coloured balls out of
which 4 are red, 4 are blue and 2 are white.
1) If you draw a ball at random what is the probability
that it is a white ball?
2) If you draw two balls consecutively and with
replacement then what is the probability that both
the balls are red?
3) If you draw two balls consecutively and without
replacement, then what is the probability that the first
is blue and the second is white?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 129


(Part # 01)
Axiomatic definition of
probability
Under this definition we consider
the following:
 Sample space is infinite.
 Sample points are not equally
likely.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 130


(Part # 01)
Example # 015
Suppose we continue with example 2
which we have just discussed and we
define the event B , that al least 5 tosses
are needed to produce the first head
  = {(H), (T,H), (T,T,H),………}
 i = (H), (T,H), (T,T,H),…..
 P(i): p1 = ½, p2 = (½)2, p3 = (½)3, p4 = (½)4,..…
 P(B) = p5+p6+p7+ ….. = 1 – (p1+p2+p3+p4)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 131


(Part # 01)
Example # 016
Two fair coins are tossed simultaneously
and such simultaneously tossing is
repeated till you get two tails together.
What is the probability that you need at
most two such simultaneous tossing to
achieve our objective?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 132


(Part # 01)
Probability as a Function

1
2
3
4
5 0 1/6
6
1

Domain Co-domain/Range

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 133


(Part # 01)
Probability, Chance, Relative
Frequency
 Consider you have the following readings for the speeds (km/hr) of 20 car
travelling past a crossing: 65, 66, 70, 64, 72, 70, 66, 77, 75, 65, 64, 69, 49,
53, 54, 55, 55, 70, 73, 66\
 Sum of relative frequency is 1 as it SHOULD be

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 134


(Part # 01)
Probability, Chance, Relative
Frequency (Unbiased)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 135


(Part # 01)
Probability, Chance, Relative
Frequency (Unbiased)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 136


(Part # 01)
Probability, Chance, Relative
Frequency (Biased)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 137


(Part # 01)
Probability, Chance, Relative
Frequency (Biased)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 138


(Part # 01)
Probability, Chance, Relative
Frequency (Biased)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 139


(Part # 01)
Probability, Chance, Relative
Frequency (Unbiased)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 140


(Part # 01)
Probability, Chance, Relative
Frequency (Biased)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 141


(Part # 01)
Cumulative distribution function
(cdf) or the distribution function
We denote the distribution function by F(x)

F ( x)  P ( X  x)   f ( xi )
xi  x

x x
F ( x)  P( X  x)   f ( x)dx   dF ( x)
 

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 142


(Part # 01)
Properties of distribution
function
1) F(x) is non-decreasing in x, i.e.,
if x1 < x2, then F(x1)  F(x2)
2) Lt F(x) = 0 as x  - 
3) Lt F(x) = 1 as x  + 
4) F(x) is right continuous

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 143


(Part # 01)
Theorem in probability
For any event A, B  
 0  P(A)  1
 If A  B, then P(A)  P(B)
 P(A U B) = P(A) + P(B) – P(A  B)
 P(AC) = 1 – P(A)
 P() = 1
 P() = 0

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 144


(Part # 01)
Definitions
 Mutually exclusive: Consider n events
A1, A2,….., An. They are mutually exclusive
if no two of them can occur together, i.e.,
P(Ai  Aj) = 0.  i, j (i  j)  n
 Mutually exhaustive: Consider n events
A1, A2,….., An. They are mutually
exhaustive if at least one of them must
occur and P(A1UA2U…..UAn) = 1
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 145
(Part # 01)
Example # 017
Suppose a fair die with faces 1, 2,….., 6 is rolled. Then
 = {1, 2, 3, 4, 5, 6}. Let us define the events A 1 = {1,
2}, A2 = {3, 4, 5, 6} and A3 = {3, 5}
 The events A2 and A3 are neither mutually exclusive
nor exhaustive
 A1 and A3 are mutually exclusive but not exhaustive
 A1, A2 and A3 are not mutually exclusive but are
exhaustive
 A1 and A2 are mutually exclusive as well as
exhaustive
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 146
(Part # 01)
Unconditional Probability

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 147


(Part # 01)
Example # 018

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 148


(Part # 01)
Conditional probability
Let A and B be two events such that P(B) > 0.
Then the conditional probability of A given B is
P( A  B)
P( A | B) 
P( B)
Assume  = {1, 2, 3, 4, 5, 6}, A = {2}, B = {2, 4, 6}.
Then A  B = {2} and

16 1 16
P( A | B)   P( B | A)  1
36 3 16

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 149


(Part # 01)
Example # 019

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 150


(Part # 01)
Example # 019 (contd..)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 151


(Part # 01)
Bayes Theorem
Let B1, B2,….., Bn be mutually exclusive
and exhaustive events such that P(Bi) > 0,
for every i =1, 2,….,
n n and A be any event
such that P( A)   P( A | Bi ) P(then
Bi ) we have
i 1
P( A | B j ) P( B j )
P( B j | A) 
n
 P( A | Bi ) P( Bi )
i 1

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 152


(Part # 01)
Bayes Theorem
B1 B2 B3 B4 B5

B6
B7 B8 B9
A
B10 B11

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 153


(Part # 01)
Baye’s Theorem: Proof

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 154


(Part # 01)
Example #020
In an examination each question has four
alternatives, answer of which only one is correct.
If a student knows the correct alternative then
he/she is definitely able to identify it. Otherwise
he/she picks one of the alternatives at random.
Given that a student has identified the correct
alternative what is the conditional probability that
he/she knew it, assuming 70% of the student
know the correct alternative to the question
under consideration.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 155
(Part # 01)
Example # 020 (contd..)
Let us define the following events
 A1 = The student identifies the correct alternative
 B1 = The student knows the correct alternative
 B2 = The student does not know the correct
alternative
Then we know that P(B1) = 0.7, P(B2) = 0.3, P(A|
B1) = 1 and P(A|B2) = 0.25 from which we have
P(B1|A) = 0.9032

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 156


(Part # 01)
Example # 021
The marketing manager of a toy manufacturing company is
considering the marketing of a new toy. In the past 40% of
the toys introduced by the company have been successful
and 60% have been unsuccessful. Before the toy is
marketed, a market research is conducted and a report,
either favourable or unfavourable, is compiled. In the past,
80% of the successful toys received a favourable market
research report, and 30% of the unsuccessful received a
favourable market research report. The marketing manager
wants to know the probability that the toy will be successful
if it receives a favourable report

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 157


(Part # 01)
Independence of events

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 158


(Part # 01)
Example # 022
 Consider the example of rolling a die
 If A is the event the number appearing is odd and B be the
event the number appearing is a multiple of 3, then Pr(A)= 3/6
= and Pr(B) = 2/6
 Also A and B is the event the number appearing is odd AND a
multiple of 3 so that Pr(A ∩ B) = 1/6
 Pr(A│B) = Pr(A ∩ B)/ P(B)= (1/6)/(2/6) = 1/2 which implies
that the occurrence of event B has not affected the
probability of occurrence of the event A
 If A and B are independent events, then Pr(A│B) = Pr(A)
 Using Multiplication rule of probability, Pr(A ∩ B) = Pr(B) X
Pr(A│B) = Pr(B) X Pr(A)
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 159
(Part # 01)
Inequalities in Probability
 Inequality in probability theory are essential tools used
frequently in proofs.
 Probability Inequalities covers inequalities related with
events, distribution functions, characteristic functions,
moments and random variables (elements) and their sum.
 Examples
 Boole’s Inequality
 Bonferroni Inequality
 Markov’s Inequality
 Chebyshev’s Inequality
 Chernoff Bounds
 Cauchy-Schwarz Inequality

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 160


(Part # 01)
Inequalities in Probability
(contd…)
 Define the probability space
(Ω,ℱ,P).
Ω: Sample space
ℱ: Set of sub-sets of Ω
P: Function mapping of ℱ to
[0,1]
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 161
(Part # 01)
Inequalities in Probability
(contd…): Boole’s Inequality

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 162


(Part # 01)
Inequalities in Probability
(contd…): Boole’s Inequality

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 163


(Part # 01)
Inequalities in Probability
(contd…): Boole’s Inequality

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 164


(Part # 01)
Inequalities in Probability (contd…):
Bonferroni Inequality

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 165


(Part # 01)
Inequalities in Probability (contd…):
Bonferroni Inequality

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 166


(Part # 01)
Inequalities in Probability (contd…):
Bonferroni Inequality

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 167


(Part # 01)
Inequalities in Probability (contd…):
Poincare’s Theorem

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 168


(Part # 01)
Inequalities in Probability (contd…):
Poincare’s Theorem
 Consider events A1 and A2, such that (A1A2) =
A1+A2-(A1A2) = A1 + {A2-(A1A2)}
 P(A1A2) = P(A1) + P{A2-(A1A2)} – P[{A1 {A2-
(A1A2)}]
 P(A1A2) = P(A1) + P{A2-(A1A2)} – P()
 P(A1A2) = P(A1) + P(A2) - P (A1A2), hence it
holds true for n=2
 Consider it is also true for n

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 169


(Part # 01)
Inequalities in Probability (contd…):
Poincare’s Theorem

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 170


(Part # 01)
Inequalities in Probability (contd…):
Markov Inequality
 Let X be a non-negative r.v such
that E[X] exists
 Then for every positive a we
have
P(X  a)  E[X]/a

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 171


(Part # 01)
Distribution
Depending what are the outcomes of an
experiment a random variable (r.v) is used
to denote the outcome of the experiment
and we usually denote the r.v using X, Y
or Z and the corresponding probability
distribution is denoted by f(x), f(y) or f(z)
 Discrete: probability mass function (pmf)
 Continuous: probability density function
(pdf)
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 172
(Part # 01)
Discrete distribution
1) Uniform discrete distribution
2) Binomial distribution
3) Negative binomial distribution
4) Geometric distribution
5) Hypergeometric distribution
6) Poisson distribution
7) Log distribution

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 173


(Part # 01)
Bernoulli Trails
1) Each trial has two possible
outcomes, say a success and a
failure.
2) The trials are independent
3) The probability of success as well
as the probability of failure remains
the same from one trial to another

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 174


(Part # 01)
Uniform discrete distribution
[X ~ UD (a , b) ]
f(x) = 1/n x = a, a+k, a+2k,….., b
 a and b are the parameters where a, b  R
 E[X] = k
a (n  1)
2
 V[X] = 2 n 2  1
k ( )
 Example: Generating
12 the random numbers
1, 2, 3,…, 10. Hence X~UD(1,10) where
a=1, k=1, b=10. Hence n=10.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 175


(Part # 01)
Uniform discrete distribution
[X ~ UD (a , b) ]
Uniform discrete distribution

0.12

0.1

0.08

0.06
f(x)

0.04

0.02

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 176


(Part # 01)
Binomial distribution
[X ~ B (p , n)]
f(x) = nCxpxqn-x x = 0, 1, 2,…..,n

 n and p are the parameters where p  [0, 1] and n  Z+


 E[X] = np
 V[X] = npq
 Example: Consider you are checking the quality of the
product coming out of the shop floor. A product can
either pass (with probability p = 0.8) or fail (with
probability q = 0.2) and for checking you take such 50
products (n = 50). Then if X is the random variable
denoting the number of success in these 50 inspections,
we have
X~50Cx(0.8)x(0.2)50-x
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 177
(Part # 01)
Binomial distribution
[X ~ B (p , n)]
Binomial distribution

0.16

0.14

0.12

0.1

0.08
f(x)

0.06

0.04

0.02

0
0 2 4 6 8 10 12 14 16 18 20 22 24 30 32 34 36 38 40 42 44 46 48 50
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 178


(Part # 01)
Example # 023
20% of the BSNL shares are owned
by Mr.Murali Lal. A random sample of
5 shares is chosen. What is the
probability that at most 2 of them will
be found to be owned by Mr.Murali
Lal?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 179


(Part # 01)
Example # 023 (contd..)
Here p=0.2, q=0.8. Hence the required
probability is:
P(X  2) = P(X=0) + P(X=1) + P(X=2)
P(X  2) = 5C00.200.85 + 5C10.210.84 +
5
C2(0.2)2(0.8)3

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 180


(Part # 01)
Example # 024
Mr. Abhishek Bhatia finds that 70% of
the students have answered question
# 1 correctly in his chemistry
examination. A random sample of 10
answer scripts is chosen. What is the
probability that at least 3 of the
students have correctly answered the
question # 1?
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 181
(Part # 01)
Negative binomial distribution
[X ~ NB (p , r)]
f(x) = r+x-1Cr-1prqx x = r, r+1,.…..
 p and r are the parameters where p  [0, 1] and r  Z+
 E[X] = rq/p
 V[X] = rq/p2
 Example: Consider the example above where you are
still inspecting items from the production line. But now
you are interested in finding the probability distribution
of the number of failures preceding the 5th success of
getting the right product. Then, we have, considering
p=0.8, q=0.2
X ~ 5+x-1C5-1(0.8)5(0.2)x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 182


(Part # 01)
Negative binomial distribution
[X ~ NB (p , r)]
Negative binomial distribution

0.35

0.3

0.25

0.2
f(x)

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 183


(Part # 01)
Example # 025
Suppose that the probability of a
manufacturing process producing a defective
item is 0.05. Suppose further that the quality
of any one item is independent of the quality
of any other item produced. If a Mr. Rao the
quality control officer selects items at random
from the production line and stops his
inspection when he gets 10 good items. Then
what is the probability of getting 2 defective
items before he stops inspection?
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 184
(Part # 01)
Example # 025 (contd..)

 Here p=0.95, q=0.05


 Hence the required
probability is P(X=2) =
10+2-1
C10-1(0.95) (0.05)
10 2

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 185


(Part # 01)
Example # 026
Mr. Agrawal is the immigration official at IGIA
and he knows that the probability of an Indian
origin person coming in any flight from abroad is
10%. He notes down the country of origin of
persons and stop when he knows he has noted
down exactly 3 people of Indian origin. Then
what is the probability of Mr.Agrawal noting
down 2 non Indian origin people before he stops
his checking?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 186


(Part # 01)
Geometric distribution
[X ~ G (p)]
f(x) = pqx x = 0,1,2,…..
 p is the parameter where p  [0, 1]
 E[X] = q/p (r = 1 in the Negative Binomial distribution
case)
 V[X] = q/p2 (r = 1 in the Negative Binomial distribution
case)
 Example: Consider the example above. But now you
are interested in finding the probability distribution of
the number of failures preceding the 1 st success of
getting the right product. Then, we have considering
p=0.8, q=0.2
X~ (0.8)(0.2)x
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 187
(Part # 01)
Geometric distribution
[X ~ G (p)]
Geometric distribution

0.9

0.8

0.7

0.6

0.5
f(x)

0.4

0.3
0.2

0.1
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 188


(Part # 01)
Example # 027
A recent study indicates that Colgate toothpaste
has a market share of 45% (versus 55% of
Pepsodent). Miss. Dabawallah the marketing
research executive firm wants to conduct a new
taste test for which she wants users of Colgate.
Potential participants for the test are selected by
random screening of users of toothpaste to find
Colgate users. What is the probability that 5
users will have to be interviewed by Miss.
Dabawallah to find the 1st Colgate toothpaste
user?
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 189
(Part # 01)
Example # 027 (contd…)

Here p=0.45, q=0.55. Hence


the required probability is
P(X=4) = (0.45)1(0.55)4

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 190


(Part # 01)
Example # 028
Mr.Moti Singh is a lottery ticket sales
person and he knows that the
probability of a person coming to him
to check whether he/she has won the
lottery is 1%. What is the probability
that 10 persons arrive before the
winner arrives?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 191


(Part # 01)
Hypergeometric distribution
[X ~ HG (N, n, p)]
f(x) = NpCxNqCn-x/NCn 0  x  Np and 0  (n – x)  Nq
 N, n and p are the parameters
 E[X] = np
 V[X] = npq{(N – n)/(N – 1)}
 Example: Consider the example above. But now you are interested in
finding the probability distribution of the number of failures(success) of
getting the wrong(right) product when we choose n number of products
for inspection out of the total population N. If the population is 100 and
we choose 10 out of those, then the probability distribution of getting the
right product, denoted by X is given by
X~85Cx15C10-x/100C10
 Remember
 p (0.85) and q (0.15) are the proportions of getting a good item and
bad item respectively.
 In this distribution we are considering the choosing is done without
replacement

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 192


(Part # 01)
Hypergeometric distribution
[X ~ HG (N, n, p)]
Hypergeometric distribution

0.4
0.35
0.3
0.25
f(x)

0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9 10
f(x)
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 193


(Part # 01)
Example # 029
Suppose that automobiles arrive at a Mr.
Ghoshs garage in lots of 10 and that for time
and resource considerations, he can inspect
only 5 out of each 10 for safety. The 5 cars
are randomly chosen from the 10 on the lot.
If 2 out of the 10 cars on the lot are below
standards for safety, what is the probability
that at most 1 out of the 5 cars to be
inspected by Mr.Ghosh will be found not
meeting the safety standards?
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 194
(Part # 01)
Example # 029 (contd…)
Here N=10, n=5, Np=2, Nq=8
Hence the required probability is
P(X1) = P(X=0) + P(X=1)
P(X1) = 2C08C5/10C5 + 2C18C4/10C5

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 195


(Part # 01)
Example # 030
Mrs. Patanaik is the teacher of class III consisting of
30 students. Daily during morning prayer session
she would like to check whether all her students
have brought their respective lunches. But due to
paucity of time each day she has to randomly select
10 students and finds that on an average out of this
10, 4 do not bring their lunches. If on any particular
day she selects 10 students then what is the
probability that exactly 5 students would not have
brought their respective lunches.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 196


(Part # 01)
Poisson distribution
[X ~ P ()]
f(x) = e-x/x! x = 0,1,2,…..
  is the parameter where  > 0
 E[X] = 
 V[X] = 
 Example: Consider the arrival of the number of
customers at the bank teller counter. If we are
interested in finding the probability distribution of the
number of customers arriving at the counter in specific
intervals of time and we know that the average
number of customers arriving is 5, then, we have
X~ e-55x/x!

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 197


(Part # 01)
Poisson distribution
[X ~ P ()]
Poisson distribution

0.2
0.18
0.16
0.14
0.12
f(x)

0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 198


(Part # 01)
Example # 031
Mr. Gurneek Singh who is the cashier at the
departmental store cash counter notices that
the average number of customer arriving at the
cash counter per 5 minutes is 10. Then what is
the probability that more than 5 customers
arrive at the cash counter with the interval of 5
minutes?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 199


(Part # 01)
Example # 031 (contd…)
Here =10. Hence the required
probability is
P(X6) = 1 – {P(X=0) + P(X=1) +
P(X=2) + P(X=3) + P(X=4) + P(X=5)}
P(X6) = 1 – {exp(-10)100/0! + e-
10
101/1! + exp(-10)102/2! + exp(-
10)103/3! + exp(-10)104/4! + exp(-
10)105/5!}

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 200


(Part # 01)
Example # 032
Mr. Pankaj Yadav is the shop floor
manager and he notices that the average
number of jobs arriving at the lathe
machine per hour is 25. He wants to find
out what is the probability of exactly 10
jobs arrive in an hour at the lathe
machine?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 201


(Part # 01)
Log distribution
[X ~ L (p)]
f(x) = -(loge p)-1x-1(1 – p)x x = 1,2,3,…..
 p is the parameter where p  (0, 1)
 E[X] = -(1-p)/(plogep)
 V[X] = -(1-p)[1 + (1 - p)/logep]/(p2logep)
 Example
1) Emission of gases from engines against fuel type
2) Used to represent the distribution of the number of
items of a product purchased by a buyer in a
specified period of time
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 202
(Part # 01)
Log distribution
[X ~ L (p)]
Log distribution

0.6

0.5

0.4
f(x)

0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 203


(Part # 01)
Continuous distribution
1) Uniform distribution
2) Normal distribution
3) Exponential distribution
4) Chi-Square distribution
5) Gamma distribution
6) Beta distribution
7) Cauchy distribution

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 204


(Part # 01)
Continuous distribution
8) t-distribution
9) F-distribution
10)Log-normal distribution
11)Weibull distribution
12)Double exponential distribution
13)Pareto distribution
14)Logistic distribution

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 205


(Part # 01)
Uniform distribution
[X ~ U (a, b)]
f(x) = 1/(b – a) axb
 a and b are the parameters where a, b 
R and a < b
 E[X] = (a+b)/2
 V[X] = (b-a)2/12
 Example: Choosing any number between
1 and 10, both inclusive, from the real line

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 206


(Part # 01)
Uniform distribution
[X ~ U (a, b)]
Uniform distribution

0.12

0.1

0.08
f(x)

0.06

0.04

0.02

0
1 10
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 207


(Part # 01)
Normal distribution
[X ~ N ( , 2)]
( x   X )2
- < x <1  
2 X2
f ( x)  e
2  X

 X, 2X are the parameters where X  R and 2X


>0
 E[X] = X
 V[X] = 2X
 Example: Consider the average age of a student
between class VII and VIII selected at random
from all the schools in the city of Kanpur
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 208
(Part # 01)
Normal distribution
[X ~ N ( , 2)]
Normal distribution

0.25

0.2

0.15
f(x)

0.1

0.05

0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 209


(Part # 01)
Standard normal distribution
Putting Z=(X-X)/X in the normal distribution we
have the standard normal distribution
z2
1 
f ( z)  e 2
2
Where: Z = 0 and Z = 1
Remember
• F(x) = P(X  x) = F(z) = F(Z  z)
• f(z) = (z) z z
• F ( z )  P( Z  z )   f ( z )dx   dF ( x)   ( z )
 
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 210
(Part # 01)
Standard normal distribution
(PDF)
Standard Normal distribution

0.45

0.4
0.35

0.3
0.25
0.2
f(z)

0.15
0.1

0.05
0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
-0.05
z

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 211


(Part # 01)
Standard normal distribution
(CDF)
Normal F(x)
1.2

0.8
Normal F(x)

0.6

0.4

0.2

0
2.46

2.74

3.12

3.25

3.45

3.74

3.98

4.08

5.16
1.83

2.99

3.36

3.51

3.56

4.28

4.48

4.78
3.8

3.9
X

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 212


(Part # 01)
Normal distribution results

 (-z) =  (z), i.e., f(-z) = f(z)


 (-z) = 1- (z), i.e., P(Z  z) = 1 – P(Z
 z)
 P(a  X  b) = [(b - X)/X] - [(a -
X)/X]
 P(X  b) = [(b - X)/X]
 P(a  X) = 1 - [(a - X)/X]
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 213
(Part # 01)
Normal distribution results
Normal distribution

0.25
a b
0.2

0.15
f(x)

0.1

0.05

0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 214


(Part # 01)
Finding Probabilities using Standard
Normal Distribution: P(0 < Z < 1.56)
Standard Normal Probabilities
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
Standard Normal Distribution 0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.4 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.3 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
f(z)

0.2 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
0.1
1.56 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
{

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
0.0
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
-5 -4 -3 -2 -1 0 1 2 3 4 5 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
Z 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

Look in row labeled 1.5 and 2.1


2.2
0.4821
0.4861
0.4826
0.4864
0.4830
0.4868
0.4834
0.4871
0.4838
0.4875
0.4842
0.4878
0.4846
0.4881
0.4850
0.4884
0.4854
0.4887
0.4857
0.4890

column labeled .06 to find


2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
P(0 z 1.56) = .4406 2.6
2.7
0.4953
0.4965
0.4955
0.4966
0.4956
0.4967
0.4957
0.4968
0.4959
0.4969
0.4960
0.4970
0.4961
0.4971
0.4962
0.4972
0.4963
0.4973
0.4964
0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 215


(Part # 01)
Standard Normal distribution
z2
1 
Z ~ N(0,1) given by the equation f ( z)  e 2
2
The area within an interval (a,b) is given by

b z2
1  which is not integrable
F ( a  Z  b)   e 2 dz
a 2
algebraically. The Taylor’s expansion of the above assists in
speeding up the calculation, which is

1 1  (1) k z 2k 1
F (Z  z)   
2 2 k  0 (2k  1)2 k k!

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 216


(Part # 01)
Standard Normal distribution

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 217


(Part # 01)
Example # 033
In an examination 20% of the students failed
(i.e., obtained a score which is less than or
equal to 40 marks out of 100) and 10% of the
students obtained a grade A (score of 70
marks of above out of 100). Assuming normal
distribution of marks find the mean and the
standard deviation of the distribution

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 218


(Part # 01)
Example # 033 (contd…)
Steps
1) P(X  40) = 0.2 = P[(X-X)/X  (40- X)/X] = P(Z 
z1) = (z1) = -0.84
2) P(X  70) = 0.1 = P[(X-X)/X  (70- X)/X] = P(Z 
z2) = 1 - P(Z  z2) = 1 - (z2). Hence (z2) = 0.9
Hence we have from the above two equations:
 z1 = (40 - X)/X = - 0.84
 z2 = (70 - X)/X = + 0.90
 X = 54.12; X = 17.64
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 219
(Part # 01)
Example # 034
In Prof. Ram Pals mathematics examination
20% of the students failed (i.e., obtained a
score which is less than or equal to 40 marks
out of 100) and 10% of the students obtained
a grade A (score of 70 marks of above out of
100). Assuming normal distribution of marks
find the mean and the standard deviation of
the distribution of marks in mathematics?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 220


(Part # 01)
Example # 034
1) P(X  40) = 0.2 = P[(X-X)/X  (40- X)/X] = P(Z  z1)
= (z1) = -0.84

2) P(X  70) = 0.1 = P[(X-X)/X  (70- X)/X] = P(Z  z2)


= 1 - P(Z  z2) = 1 - (z2). Hence (z2) = 0.9
Hence we have from the above two equations:
 z1 = (40 - X)/X = - 0.84
 z2 = (70 - X)/X = + 1.28
 X = 51.9; X = 14.2

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 221


(Part # 01)
Example # 034 (contd…)
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 222


(Part # 01)
Example # 035
Dr. Satish Kashyap finds that his patients
height are normally distributed with mean
165 cms and standard deviation 20 cms.
What is the probability of a patient height
being between 160 to 170 cms?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 223


(Part # 01)
Exponential distribution
[ X ~ E (a, )]
( x  a)
1 
f ( x)  e 
 a<x<
 a and  are the parameters where a  R
and  > 0
 E[X] = a + 
 V[X] = 2
 `
Example: The life distribution of the
number of hours a electric bulb survives.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 224
(Part # 01)
Exponential distribution
[ X ~ E (a, )]
Exponential distribution

0.3

0.25

0.2

0.15
f(x)

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 225


(Part # 01)
Exponential distribution
(CDF)
Exponential F(x)

0.018
0.016
0.014
0.012
0.01
F(X)

0.008
0.006
0.004
0.002
0
1.83
2.46
2.74
2.99
3.12
3.25
3.36
3.45
3.51
3.56
3.74

4.08

4.78
5.16
3.98

4.28
4.48
3.8
3.9
X

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 226


(Part # 01)
Example # 036
Mr. K.Bharadwaj an electrician knows that
the time a bulb operates before it gets
fused is exponentially distributed with a
mean life of 100 hours. What is the
probability that any bulb will work
continuously for at least 200 hours?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 227


(Part # 01)
Example # 036 (contd…)
Here  = 100. hence the required
probability is
P(X200) = 1 – P(X200)

1 200
x
P( X  200)  1   exp( )dx
x 0 100 100

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 228


(Part # 01)
Example # 037
Mr. Lyndoh the owner of an electronics
shop knows that the average life of a
tape recorder he sells is 1.5 years.
What is the probability that any such
tape recorder would function for at
most 3 years?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 229


(Part # 01)
Example # 038 (Electronic
Components)
In a quality control survey the following data was obtained from the log book of QC, which
gives the number of hours of working before the electronic component fails and the
number of such components
Number of hours Number failed Number of hours Number failed
of working after this hour of working after this hour
2.0 41 6.0 27
5.0 30 4.0 34
3.5 35 1.0 45
1.5 43 2.5 39
4.5 32 7.0 25
0.5 48 6.5 26
10.0 18 5.5 29
3.0 37 7.5 24
8.5 21 8.0 22
9.5 19 9.0 20

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 230


(Part # 01)
Example # 038 (Electronic
Components) (contd…)
Frequency of number of failures

60
50
Number of failures

40
30
20
10
0
0.0 2.0 4.0 6.0 8.0 10.0

Frequency Hours of w orking

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 231


(Part # 01)
Example # 038 (Electronic
Components) (contd…)
Probability

0.0900
0.0800
0.0700
0.0600
Probability

0.0500
0.0400
0.0300
0.0200
0.0100
0.0000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Reading number
P[X=x]=f(x)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 232


(Part # 01)
Example # 038 (Electronic
Components) (contd…)
Cumulative Probability

1.0000

0.8000
Cumulative Probability

0.6000

0.4000

0.2000

0.0000
0.0 2.0 4.0 6.0 8.0 10.0
Cumulative Probability Hours of w orking

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 233


(Part # 01)
Example # 038 (Electronic
Components) (contd…)
 Probability that the electronic component
will survive after 6 hours?
 Average life of the electronic component?
 Standard deviation of life for the electronic
component?
 Probability that the electronic component
will fail before 4 hours?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 234


(Part # 01)
Example # 039 (NO2)
URL: http://lib.stat.cmu.edu/datasets/
URL: http://lib.stat.cmu.edu/datasets/NO2.dat
The data are a subsample of 500 observations from a data set
that originate in a study where air pollution at a road is related to
traffic volume and meteorological variables, collected by the
Norwegian Public Roads Administration. The response variable
(column 1) consist of hourly values of the logarithm of the
concentration of NO2 (particles), measured at Alnabru in Oslo,
Norway, between October 2001 and August 2003. The predictor
variables (columns 2 to 8) are the logarithm of the number of
cars per hour, temperature 2 meter above ground (degree C),
wind speed (meters/second), the temperature difference
between 25 and 2 meters above ground (degree C), wind
direction (degrees between 0 and 360), hour of day and day
number from October 1, 2001.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 235


(Part # 01)
Example # 039 (NO2) (contd…)

Temp(25)- Wind
Tem(2 directi
Ln(No2) ln(Cars) Temp(2 m) Wind Speed ) on Hour of day Day #

Average 3.698368 6.973342 0.8474 3.056 0.1494 143.3704 12.382 310.474

SD(S) 0.750597 1.087166 6.524636 1.784172 1.065237 86.51021 6.802693 200.9778

SD(P) 0.749846 1.086079 6.518108 1.782387 1.064171 86.42366 6.795887 200.7767

Median 3.84802 7.42536 1.1 2.8 0 97 12.5 212

Maximum 6.39509 8.34854 21.1 9.9 4.3 359 24 608

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 236


(Part # 01)
Example # 039 (NO2) (contd…)
Ln(No2) Concentration

7
6
Ln(No2) Concentration

5
4
3
2
1
0
1

26

76
51

101
126
151
176
201
226
251

276
301
326
351
376
401
426
451
476
Data #
Ln(No2)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 237


(Part # 01)
Example # 039 (NO2) (contd…)
ln(Cars)

9
8
7
6
ln(Cars)

5
4
3
2
1
0
101

176

301
326

401
426
451
26
51
76

126
151

201
226
251
276

351
376

476
1

Data #
ln(Cars)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 238


(Part # 01)
Example # 039 (NO2) (contd…)
Temp(2 m)

25
20
15
10
Temp (2 m)

5
0
1
27
53

79
105

131
157

183
209

235

287
313

339

391
417
443
469
261

365

495
-5
-10
-15
-20
-25
Data #
Temp(2 m)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 239


(Part # 01)
Example # 039 (NO2) (contd…)
Wind Speed
12

10

8
Wind Speed

0
1
26
51
76

126
151
176
201
226
251

301
326
351
376
401
426
451
476
101

276

Data #
Wind Speed

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 240


(Part # 01)
Example # 039 (NO2) (contd…)
Temp(25)-Tem(2)

4
Temp(25)-Tem(2)

0
1

53
27

79

131

287

313

365
105

157

183

209

235
261

339

391

417

443

469
495
-2

-4

-6
Data #
Temp(25)-Tem(2)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 241


(Part # 01)
Example # 039 (NO2) (contd…)
Wind direction

400
350
300
Wind direction

250
200
150
100
50
0
1

79
27
53

131

157
105

183
209

235
261

287
313
339

365
391
417

443
469
495
Data #
Wind direction

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 242


(Part # 01)
Example # 039 (NO2) (contd…)
ln(Cars) vs Ln(No2)
9
8.5
8
7.5
7
Ln(Cars)

6.5
6
5.5
5
4.5
4
1 2 3 4 5 6 7
Ln(No2)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 243


(Part # 01)
Example # 039 (NO2) (contd…)
Temp(2m) vs Ln(No2)
22
20
18
16
Temp(2m)

14
12
10
8
6
4
1.5 2 2.5 3 3.5 4 4.5 5
Ln(No2)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 244


(Part # 01)
Example # 039 (NO2) (contd…)
Frequency distribution of Ln(No2)

180
160
140
120
Frequency

100
80
60
40
20
0
2 to 2.5

5.5 to 6
1 to 1.5

1.5 to 2

2.5 to 3

3 to 3.5

3.5 to 4

4 to 4.5

4.5 to 5

5 to 5.5

6 to 6.5
Data Range
Frequency

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 245


(Part # 01)
Example # 039 (NO2) (contd…)
Frequency Distribution of Ln(No2)

100
90
80
70
Frequency

60
50
40
30
20
10
0
1 to 1.25

3 to 3.25
2 to 2.25

4 to 4.25

5 to 5.25

6 to 6.25
5.5 t 5.75
1.5 to 1.75

2.5 to 2.75

3.5 to 3.75

4.5 to 4.75
Data Range
Frequency

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 246


(Part # 01)
Example # 039 (NO2) (contd…)
Cumulative Relative Frequency of Ln(No2)
1.00000
0.90000
Cumulative Relative Frequency of

0.80000
0.70000
0.60000
Ln(No2)

0.50000
0.40000
0.30000
0.20000
0.10000
0.00000
0

10

15

20

25
Ln(No2) Range

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 247


(Part # 01)
Example # 040 (Arrival times at CNC
work station)
In a factory shop floor for a certain CNC machine (machine
marked # 1) the number of jobs arriving per unit time are given
below
# of Arrivals Frequency
0 2
1 4
2 4
3 1
4 2
5 1
6 4
7 6
8 4
9 1
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 248
(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 01 (contd…)
Frequency distribution of arrivals
7

5
Frequency

0
0 1 2 3 4 5 6 7 8 9
# of Arrivals

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 249


(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 01 (contd…)
Relative Frequency of Number of Arrivals
0.25

0.2
Relative Frequency

0.15

0.1

0.05

0
0

9
# of Arrivals

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 250


(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 01 (contd…)
Cumulative Relative Frequency of Number of Arrivals
1
0.9
Cumulative Relative Frequency

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

10
# of Arrivals

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 251


(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 01 (contd…)
1) The probability of number of arrivals of
jobs being equal to or more than 7 is
about 0.18.
2) The average number of arrival of jobs is
5.
3) The probability of number of arrivals of
jobs being less than or equal to 4 is
about 0.45.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 252
(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 02 (contd…)
In the same factory, same shop floor, for the same type of CNC
machine (machine marked # 2) the number of jobs arriving per unit
time are given below. Here we consider machine # 2 is being used
for some other type of different machining operation
# of Arrivals Frequency
4 8
5 11
6 19
7 15
8 14
9 22
10 18
11 15
12 8

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 253


(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 02 (contd…)
Frequency distribution of arrivals
25

20

15
Frequency

10

0
4 5 6 7 8 9 10 11 12
# of Arrivals

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 254


(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 02 (contd…)
Relative Frequency of Number of Arrivals
0.18
0.16
0.14
Relative Frequency

0.12
0.1
0.08
0.06
0.04
0.02
0
4

10

12
11
# of Arrivals

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 255


(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 02 (contd…)
Cumulative Relative Frequency of # of Arrivals

1
0.9
Cumulative Relative Frequency

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
4

10

11

12
# of Arrivals

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 256


(Part # 01)
Example # 040 (Arrival times at CNC
work station) M/C # 02 (contd…)
1) The probability of number of arrivals of
jobs being equal to or more than 7 is
about 0.60.
2) The average number of arrival of jobs is
8.
3) The probability of number of arrivals of
jobs being less than or equal to 4 is
about 0.05.
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 257
(Part # 01)
Example # 040 (Arrival times at CNC work
station) M/C # 01 and M/C # 02 (contd…)
1) Can you comment anything about the
bottleneck machine considering that the
processing time in both machine # 1 and
machine # 2 is same?
2) Can these sort of concepts be extended for
more than two machines?
3) What more things should we consider, apart
from processing time in each machine, in order
to find the optimal production rate per unit time
(say per shift)?

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 258


(Part # 01)
Example # 041 (SAT)
This data set [ ] includes eight variables:
1) STATE: Name of state
2) COST: Current expenditure per pupil (measured in thousands of
dollars per averagedaily attendance in public elementary and
secondary schools)
3) RATIO: Average pupil/teacher ratio in public elementary and
secondary schools during Fall 1994
4) SALARY: Estimated average annual salary of teachers in public
elementary and secondary schools during 1994-95 (in thousands of
dollars)
5) PERCENT: percentage of all eligible students taking the SAT in
1994-95
6) VERBAL: Average verbal SAT score in 1994-95
7) MATH: Average math SAT score in 1994-95
8) TOTAL: Average total score on the SAT in 1994-95

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 259


(Part # 01)
Example # 041 (SAT)
(contd…)
Histogram for Cost

12

10

COST State

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 260


(Part # 01)
Example # 041 (SAT)
(contd…)
Histograms for Cost and Ratio

30
25
Cost and Ratio

20
15
10
5
0

Nevada
Illinois

Texas
Maine
Georgia

Oregon

Virginia
Alabama

Michigan

New Mexico
Missouri

North Dakota

South Carolina
Arkansas

Kansas
Connecticut

Wisconsin
COST RATIO State

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 261


(Part # 01)
Cost, Ratio and Salary

(Part # 01)
10
20
30
40
50
60

0
Alabama

Arizona

California

COST

IME602:Probability & Statistics


Connecticut

RATIO
Florida

Hawaii

Illinois

SALARY
Iowa

Kentucky

Maine

Massachusetts

Minnesota

Missouri

State
Nebraska

New
Histogram of Cost, Ratio and Salary

R.N.Sengupta,DoMS.,IIT Kanpur,INDIA
New Mexico
(contd…)

North Carolina

Ohio

Oregon

Rhode Islan

South Dakota
Example # 041 (SAT)

Texas

Vermont

Washington

Wisconsin
262
Example # 041 (SAT)
(contd…)
1 n
Average value is given by E X    X i
n i 1
1 n
V  X    X i  E X 
Variance is given by 2

n i 1
Covariance is given by
CovX , Y   E X  E X Y  E Y    X ,Y V X  V Y 

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 263


(Part # 01)
Example # 041 (SAT)
(contd…)

COST RATIO SALARY PERCENT VERBAL MATH TOTAL

Mean 5.90526 16.858 34.82892 35.24 457.14 508.78 965.92

Median 5.7675 16.6 33.2875 28 448 497.5 945.5

Maximum 9.774 24.3 50.045 81 516 592 1107

Minimum 3.656 13.8 25.994 4 401 443 844

Standard Deviation(1) 1.362807 2.266355 5.941265 26.76242 35.17595 40.20473 74.82056

Standard Deviation(2) 1.34911 2.243577 5.881552 26.49344 34.82241 39.80065 74.06857

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 264


(Part # 01)
Example # 041 (SAT)
(contd…)

COST RATIO SALARY PERCENT VERBAL MATH TOTAL

COST 1.820097 -1.12303 6.901753 21.18202 -19.2638 -18.7619 -38.0258

RATIO -1.12303 5.033636 -0.01512 -12.6639 4.98188 8.52076 13.50264

SALARY 6.901753 -0.01512 34.59266 96.10822 -97.6868 -93.9432 -191.63

PERCENT 21.18202 -12.6639 96.10822 701.9024 -824.094 -916.727 -1740.82

VERBAL -19.2638 4.98188 -97.6868 -824.094 1212.6 1344.731 2557.331

MATHS -18.7619 8.52076 -93.9432 -916.727 1344.731 1584.092 2928.822

TOTAL -38.0258 13.50264 -191.63 -1740.82 2557.331 2928.822 5486.154

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 265


(Part # 01)
Example # 041 (SAT)
(contd…)

COST RATIO SALARY PERCENT VERBAL MATH TOTAL

COST 1 -0.37103 0.869802 0.592627 -0.41005 -0.34941 -0.38054

RATIO -0.37103 1 -0.00115 -0.21305 0.063767 0.095422 0.081254

SALARY 0.869802 -0.00115 1 0.61678 -0.47696 -0.40131 -0.43988

PERCENT 0.592627 -0.21305 0.61678 1 -0.89326 -0.86938 -0.88712

VERBAL -0.41005 0.063767 -0.47696 -0.89326 1 0.970256 0.991503

MATHS -0.34941 0.095422 -0.40131 -0.86938 0.970256 1 0.993502

TOTAL -0.38054 0.081254 -0.43988 -0.88712 0.991503 0.993502 1

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 266


(Part # 01)
Example # 042 (S&P)
URL: http://lib.stat.cmu.edu/datasets
URL: http://lib.stat.cmu.edu/datasets/spdc2693
The data set consists of the Standard and Poor's 500
Index closing values from 1926 to 1993. See also
djdc0093. Submitted by eduardo ley,
(edley@eco.uc3m.es) [13/Mar/96] (333 kbytes)
Standard and Poor's 500 Index closing values from 1926
to 1993.
The first column contains the date (yymmdd), second
column contains the value. These data are used in:
E.Ley (1996): "On the Peculiar Distribution of the U.S.
Stock Indices;" forthcoming in The American Statistician
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 267
(Part # 01)
Example # 042 (S&P)
(contd…)
Value of S&P
25

20
Value (USD)

15

10

0
Date

VALUE Date

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 268


(Part # 01)
Example # 042 (S&P)
(contd…)
For our problem we take n=300
 E(X)= 17.09927
 SD(Sample)=2.649796
 SD(Population)=2.645376

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 269


(Part # 01)
Example # 042 (S&P)
(contd…)
Frequency Distribution

80
70
60
Frequency

50
40
30
20
10
0

Frequency Range

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 270


(Part # 01)
Example # 042 (S&P)
(contd…)
Relative Frequency Distribution

0.3
0.25
Relative Frequency

0.2
0.15
0.1
0.05
0
11 to 12

13 to 14

14 to 15

15 to 16

16 to 17

17 to 18

18 to 19

19 to 20

20 to 21
12 to 13

Relative Frequency Range

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 271


(Part # 01)
Example # 042 (S&P)
(contd…)
Cumulative Relative Frequency

1
Cumulative Relative Frequency

0.8

0.6

0.4

0.2

0
0 2 4 6 8 10
Range Number
Cumulative Relative Frequency

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 272


(Part # 01)
Log-normal distribution
[X ~ LN ( , 2)]
(loge x   X ) 2

f ( x) 
1 1
e

2 X2 0<x<
2  X x

 X, 2X are the parameters where X  R


and 2X > 0
 E[X] = exp(X+2X/2)
 V[X] = exp(2X+2X){exp(2X)-1}
 Example: Stock prices return distribution
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 273
(Part # 01)
Log-normal distribution
[X ~ LN ( , 2)]
Log-normal distribution

0.012

0.01

0.008

0.006
f(x)

0.004

0.002

0
0.5 3 5.5 8 10.5 13 15.5 18 20.5 23 25.5 28 30.5 33 35.5 38
x

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 274


(Part # 01)
Relationship between Poisson
and Exponential distribution
If a process has the intervals
between successive events as
independent and identical and it
is exponentially distributed then
the number of events in a
specified time interval will be a
Poisson distribution
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 275
(Part # 01)
Few approximations
Normal approximation to Binomial
distribution
Let X ~ B(p, n) where n is large and p
is small. Then the distribution can be
approximated by the Normal
distribution X ~ N(np, npq)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 276


(Part # 01)
De Moivre Laplace Limit
Theorems
b  np a  np
1)
P( a  X  b)  [ ]  [ ]
npq npq

a  np
2) P (a  X )  1  [ ]
npq
b  np
3) P ( X  b)  [ ]
npq

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 277


(Part # 01)
Tchebychevs Inequality
 Let X be a r.v such that E[X] = X
and V[X] = 2X
 Then for every positive t we have
P(|X - X|  tX)  1/t2

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 278


(Part # 01)
Bernoullis Theorem
 Let Xn be the number of success
in n number of Bernoulli trials,
each with success probability p
 Then for arbitrary positive  we
have
Lt P[|Xn/n – p|)  ] = 1 as n  
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 279
(Part # 01)
Central Limit Theorem

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 280


(Part # 01)
Convergence Theorems
 In probability theory, there exist different
notions of convergence of random
variables.
 Convergence of sequences of random
variables to some limit random variable
is an important concept in probability
theory/applications to statistics/stochastic
processes.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 281


(Part # 01)
Convergence Theorems
(contd…)
 Stochastic convergence (in general mathematics)
formalizes the idea that a sequence of essentially random
or unpredictable events can sometimes be expected to
settle down into a behavior that is essentially unchanging
when items far enough into the sequence are studied.
 Convergence is related to some fixed/constant value, or
increasing preference towards a certain outcome or
increasing aversion against straying far away from a
certain outcome.
 Convergence is related to some unchanging
probability distribution as values in the sequence
continue to change.

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 282


(Part # 01)
Convergence Theorems
(contd…)
 Convergence in distribution
 Convergence in probability
 Almost sure convergence
 Sure convergence/Point wise
convergence
 Convergence in mean
 Other concepts of convergence
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 283
(Part # 01)
Convergence Theorems
(contd…)
 We assume that {X1,X2,….} is a
sequence of random variables, and
X is a random variable, and all of
them are defined on the same
probability space (Ω,ℱ,P).
Ω: Sample space
ℱ: Set of sub-sets of Ω
P: Function mapping of ℱ to [0,1]
IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 284
(Part # 01)
Convergence Theorems (contd…):
Convergence in Distribution

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 285


(Part # 01)
Convergence Theorems (contd…):
Convergence in Distribution

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 286


(Part # 01)
Convergence Theorems (contd…):
Convergence in Probability

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 287


(Part # 01)
Convergence Theorems (contd…):
Convergence in Probability

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 288


(Part # 01)
Convergence Theorems (contd…):
Convergence in Probability
 Convergence in probability is a
condition on the joint cdf's, as
opposed to convergence in
distribution, which is a
condition on the individual cdf's

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 289


(Part # 01)
Convergence Theorems
(contd…)
 Big O notation is a mathematical notation
that describes the limiting behavior of a
function when the argument tends towards
a particular value or infinity. It also means
grows no faster than
 Small o is a mathematical notion that
describes the upper bound of a function. It
also means grows strictly slower than

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 290


(Part # 01)
Convergence Theorems (contd…):
Convergence Almost Surely

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 291


(Part # 01)
Convergence Theorems (contd…):
Convergence Surely

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 292


(Part # 01)
Convergence Theorems (contd…):
Convergence in Mean

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 293


(Part # 01)
Convergence Theorems
(contd…)

IME602:Probability & Statistics R.N.Sengupta,DoMS.,IIT Kanpur,INDIA 294


(Part # 01)

You might also like