Interval Estimation Interval (CI) Estimation: Course No: MATH F113

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

BITS Pilani

Pilani Campus

Chapter 7: Statistical Intervals


Based on a Single Sample
Course No: MATH F113
Sumanta Pasari
BITS Pilani
Probability and Statistics Pilani Campus sumanta.pasari@pilani.bits-pilani.ac.in

Interval Estimation Interval (CI) Estimation


• A point estimate cannot be expected to provide the exact value (close – Instead of considering a statistic as a point estimator, we
value) of the population parameter. may use random intervals to understand the parameter.

• Usually, an interval estimate can be obtained by adding and – In this case, the end points of the interval are RVs and we
subtracting a margin of error to the point estimate. Then, can talk about the probability (in the sense of frequency
definition) that it brackets the parameter value.
Interval Estimate = Point Estimate
` + / - Margin of Error
Confidence Interval : A 100(1- α)% confidence interval for a
• Interval estimation provides us information about how close the parameter is a random interval [L1,L2] such that
point estimate is to the value of the parameter. P[L1 ≤ θ ≤ L2] = 1- α , regardless the value of θ.
• Why we use the term confidence interval?
3 4
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Theorem 7.1: Interval estimation for Interval estimation for µ: σ known


µ: σ known
Let X 1 , X 2 , , X n be a random sample from a normal population with    
P X - z 2    X  z 2   1 - 
mean   unknown  and the variance  2
 known  . Then, we know,  n n 
   2
X - Hence, the confidence interval for population mean  having confidence
XN  ,  N  0,1
      
 n 
  level 100  1 -   % is given as  x - z 2 , x  z 2  .
 n  n n 
Taking two points  z 2 symmetrically about the origin, we get The endpoints of the confidence interval is called confidence limits.
 
 X - 
P  - z 2   z 2   1 - 
  
 
 n 
Here 1 -   is known as confidence level, and  is the level of significance.

5 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Interval estimation for µ: σ known Practice Problems
Most commonly used confidence levels: Ex.1. The mean of a sample size 50 from a normal population is observed to
be 15.68. If the s.d. of the population is 3.27, find (a) 80% (b) 95%, (c) 99%
confidence interval for the population mean. Can you find out the respective
margin of errors? What is the length of CI for each case?

Sol. (b) First check the two assumptions : (i) normality (ii)  known
   
Hence, 95% CI for  is given as  x - 1.96 , x  1.96 . Step 1: Here n  50, x  15.68,  =3.27, and   0.05. We need CI for  .
 n n

That is, P  X - 1.96

   X  1.96
  Step 2: As   0.05, we need to find z 2 such that P Z  z 2 =0.975.  
  0.95
 n n From cumulative normal distribution table, we see z 2  1.96.
   
Step 3: The CI for   known  is  x - z0.025 , x  z0.025   14.77,16.59 
 n n 

7 BITS Pilani, Pilani Campus 8 BITS Pilani, Pilani Campus

Interval estimation for µ: σ known 95% CIs for population mean

Confusions and confusions?  


In the above example, we found 95% CI for μ is (14.77, 16.59).
This means, the unknown μ lies within the fixed interval with
probability 0.95. That is,
P [μ lies in (14.77, 16.59)]= 0.95 – right?

If not, then what is the interpretation of “95% confidence”?


- Long run relative frequency?
- A single replication/realization of random interval is not
enough! Not satisfactory, at least.
One hundred 95% CIs (asterisks identify intervals that do not include ).

9 BITS Pilani, Pilani Campus 10 BITS Pilani, Pilani Campus

Practice Problems Confidence Level, Precision, and


Sample Size
HW 1. Studies have shown that the random variable X, the Ex.2. Extensive monitoring of a computer time-sharing
processing time required to do a multiplication on a new 3- system has suggested that response time to a particular
D computer, is normally distributed with mean μ and editing command is normally distributed with standard
standard deviation 2 microseconds. A random sample of 16 deviation 25 millisec.
observations is to be taken
(a) These data are obtained
A new operating system has been installed, and we wish
42.65 45.15 39.32 44.44 to estimate the true average response time  for the new
41.63 41.54 41.59 45.68 environment.
46.50 41.35 44.37 40.27
43.87 43.79 43.28 40.70 Assuming that response times are still normally
Based on these data, find an unbiased estimate for μ. distributed with  = 25, what sample size is necessary to
(b) Find a 95% confidence interval for μ. ensure that the resulting 95% CI has a width of (at most)
11
10? 12
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Confidence Level, Precision, and Impracticality of Assumptions in CI
Sample Size
In practice, we usually face mainly two
problems in application of previous C.I.
formula.
• What if the population is not normal? (large
sample size is needed) - Can we take help from
CLT?
• What if the population variance is unknown?
13
14
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

7.2: Large Sample CI for µ 7.2: Large Sample CI for µ

Let X 1 , X 2 , , X n be a random sample from a large sample  n  40 


Let X 1 , X 2 , , X n  large sample  be a random sample from a with with mean   unknown  and sample variance S 2 . Then,
mean   unknown  and the variance  2
 known  . Then, using CLT, X -
N  0,1 approximately, a standard normal distribution 
 S 
   2
X -
X N  ,  N  0,1 
 n

 n    
   S S 
 n  P X - z 2    X  z 2   1 - 
 n n 
    Hence, the large-sample confidence interval for population mean  having confidence
100  1 -   % CI for  is given as  x - z 2 , x  z 2  .
 n n  
level 100  1 -   % is approximately given as  x -
s
z 2 , x 
s 
z 2  , n  40 is needed.
 n n 

15 16
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

HW 2 One Sided CI for µ

17 18
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Deriving a CI: Example 7.5
BITS Pilani
Pilani Campus

Course No: MATH F113

Probability and Statistics


20
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Example 7.5 Example 7.5

(https://en.wikipedia.org/wiki/Relationships_among_probability_distributions)

21 22
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Example 7.5 Sample/Population Proportion

Let us draw a random sample X1, X2, …, Xn of size n from


population, where
Xi = 1, if the i-th member of the sample has the trait
= 0, if i-th sample does not have the trait
n
Then, X   X gives the number of objects in the
i
i 1
sample with the trait and the statistic X/n gives the
proportion of the sample with the trait. Note that X is a
binomial RV with parameters n (known) and p.
23
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Sample Proportion Sample Proportion
The statistic that estimates the parameter p, a proportion of a n

X 
Xi
population that has some property, is the sample proportion
Note that pˆ   1
 X,
number in sample with the trait (success) X n n
pˆ   where, each X i is an independent point binomial (Bernoulli RV),
sample size n
Properties: that is, P  X i  1  p and P  X i  0   1 - p
(i) As the sample size increases (n large), the sampling xi 1 0
distribution of pˆ becomes approximately normal (WHY?) f(xi) p 1-p
p 1 - p 
(ii) The mean of pˆ is p, and variance of pˆ is (WHY?) E[Xi] = 1(p)+0(1-p)=p
n
(iii) Can we get estimators of p? Point and interval estimator Var (Xi) = E[Xi2]-(E[Xi])2 = p(1-p)
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Form of Sampling Distribution of (Traditional) Confidence Interval on p


Sample Proportion
If np  10 and n 1 - p   10, sampling distribution of pˆ can be approximated Interval Estimate = Point Estimate
` + / - Margin of Error
p 1 - p  Note that for large n (using CLT),
by a normal distribution with mean p and s.d. .
n  p 1 - p   pˆ - p
Knowing, E  pˆ   0.60 and  pˆ  0.0894, can we find pˆ N  p,   N  0,1
 n  p 1 - p 
(i) P  0.55  pˆ  0.65   ? n
(ii) P  0.50  pˆ  0.65   ? Taking two points  z 2 symmetrically about the origin, we get
 
 
p- p
ˆ
P  - z 2   z 2   1 - 
 p 1 - p  
 
 n 
Here 1 -   is known as confidence level.
27 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Confidence Interval on p Confidence Interval on p


 p 1 - p  p 1 - p  
P  pˆ - z 2  p  pˆ  z 2  1-
 n n 
 
As p is unknown, above confidence bounds are not statistics. So replace p by
unbiased estimator pˆ , and then the CI on p having confidence level 1 -   is
 pˆ obs 1 - pˆ obs  pˆ obs 1 - pˆ obs  
 pˆ obs - z 2 , pˆ obs  z 2 .
 n n 
 
The endpoints of the confidence interval is called confidence limits.

Self Study: Score Confidence Interval (Page 280, 8th Ed)


BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Sample Size for Estimating p Sample Size for Estimating p
We can be 100(1-% sure that p̂ and p differ by
at most d , where d is given by 1
It can be shown that the value of pˆ obs (1 - pˆ obs )  .
pˆ obs (1 - pˆ obs ) 4
d  z 2 Thus, sample size for estimating p, when
n
z2 2
Thus, sample size for estimating p, when prior prior estimate is not available is n  .
4d 2
estimate available is
pˆ obs (1 - pˆ obs )
n  z2 2
d2
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Problem Solving Problem Solving


Ex 3 Random variable X= number of failed devices which were due
to mechanical failure among 193 failed devices.
A study of electromechanical protection devices used in X has approx. normal dist with mean = 193p,
electrical power systems showed that of 193 devices variance=193p(1-p).
that failed when tested, 75 were due to mechanical a) Point estimate for p = pˆ obs = x/n = 75/193 = 0.3886.
part failures. b) 95% confidence interval on p is
75 75  75 
a)Find a point estimate for p, the proportion of failures  z0.025 1- 193   0.3198,0.4574 
193 193  193 
that are due to mechanical failures.
1.962  0.389  0.611
(c) (with using prior estimate) n  ~ 1015
b)Find a 95% confidence interval on p. (0.032 )

c) How large a sample is required to estimate p to within z2 1.962


(without using prior estimate) n   / 2  ~ 1068.
0.03 with 95% confidence. 4d 2 (4)(0.032 )
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Practice Problems Practice Problems

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Practice Problems Practice Problems

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Practice Problems Practice Problems

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Practice Problems
BITS Pilani
Pilani Campus

Course No: MATH F113

Probability and Statistics


BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
7.3: CI based on a normal population,
Interval estimation for µ: σ unknown
Interval estimation for µ: σ unknown
Interval Estimate for µ = X + / - Margin of Error Note :
Let Z N  0,1 and 2 be an independent chi-squared RV with  degrees
• If sample size is small and the population standard deviation σ
Z
of freedom, then T  follows a T -distribution with  dof.
is unknown, then? 2 
 Like previously in large-sample case, we can use sample
standard deviation S to calculate the margin of error. Theorem :
• With small sample, what is the distribution of RV obtained by Let X 1 , X 2 , , X n be a random sample of size n from a normal distribution
X -
replacing σ by S? Can it be a normal distribution, as earlier? with mean  and variance  2 . Then, Tn -1
S n
 In this case, assuming that the population is normal, we would
obtain a T-distribution – HOW?
43 BITS Pilani, Pilani Campus 44 BITS Pilani, Pilani Campus

T-Distribution T-Distribution
• Random variable T with  degrees of freedom (called
parameter) is a continuous r.v. with density
- (  1) / 2
(  1) / 2  t2 
f (t )  1   ;-  t  .
( / 2)    

• Density plot is bell shaped, symmetric about 0.


• Variance of T decreases as  increases. In fact T
approximates standard normal for large .

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Properties of T-distribution CDF of T-distribution


• Critical values for t-distribution are given in
Table A.5.
• By tα,v we denote the value of the t-variable
such that area under its density to its right is
α. (degrees of freedom v must be mentioned
separately).

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Interval estimation for µ: σ unknown Interval estimation for µ: σ unknown
(small sample)
The T -distribution is symmetric, and becomes approx. std. normal for large n.
Taking two points  t 2 symmetrically about the origin, we get
 
 X - 
P  -t 2,n -1   t 2,n -1   1 - 
 S 
 
 n 
Here 1 -   is known as confidence level, and  is the level of significance. So,
 S S 
P X - t 2,n -1    X  t 2,n -1   1 - 
 n n 
Hence, the CI for   unknown  having confidence level 100  1 -   %
 S S 
is given as  x - t 2,n -1 , x  t 2,n -1  .
 n n 
49 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Examples
Ex.4. Seven laboratory experiments of the value of g (accelearation due
to gravity that follows normal distribution) at Pilani gave a mean 977.51 cm/s 2
and a s.d. 4.42 cm/s 2 . Find 95% CI for the true value of g (i.e., population mean).
Sol.
Step 1: Here n  7, x  977.51, s =4.42, and   0.05. We need CI for  . https://en.wikipedia.org/wiki/Student
Population is also known to be normal dist. %27s_t-distribution

Step 2: As   0.05, we need to find t 2 from t-distribution with (n -1)  6


degree of freedom, such that P T  t 2 =0.975. 
From t-distribution table, we see t0.025,6  2.447.
 s s 
Step 3: The CI for   unknown  is  x - t0.025 , x  t0.025 
 n n  Similar to the table A.5
  973.09, 981.93
51 BITS Pilani, Pilani Campus 52 BITS Pilani, Pilani Campus

α/2
1-α/2
Degree of
freedom tα/2 Example 11 (Page 288)
Cumulative Probability

• Even as traditional markets for sweetgum lumber


have declined, large section solid timbers
traditionally used for construction bridges and
mats have become increasingly scarce.

Similar to
• The article “Development of Novel Industrial
Laminated Planks from Sweetgum Lumber” (J. of
the table Bridge Engr., 2008: 64–66) described the
A.5 manufacturing and testing of composite beams
designed to add value to low-grade sweetgum
lumber.

53 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Example 11 (Page 288) Example 11 (Page 288)
cont’d cont’d

• Here is data on the modulus of rupture (psi; the article contained • Let’s now calculate a confidence interval for true
summary data expressed in MPa):
average MOR using a confidence level of 95%.
• 6807.99 7637.06 6663.28 6165.03 6991.41 6992.23
The CI is based on n – 1 = 29 degrees of freedom,
• 6981.46 7569.75 7437.88 6872.39 7663.18 6032.28 so the necessary t critical value is t.025,29 = 2.045.
• 6906.04 6617.17 6984.12 7093.71 7659.50 7378.61 The interval estimate is now
• 7295.54 6702.76 7440.17 8053.26 8284.75 7347.95
• 7422.69 7886.87 6316.67 7713.65 7503.33 7674.99

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Examples

HW.3. Seven laboratory experiments of the value of g (accelearation due


to gravity that follows normal distribution) at Pilani gave a mean 977.51 cm/s 2
and a s.d. 4.42 cm/s 2 . Find 80%, 90% and 95% CIs for the population mean.
Self Study: Prediction Interval for a Single Future Can you find out the respective margin of errors (bound on the the error of
estimation)?
Value (Page 288, 8th Ed)

HW.4. A sample of size 15 taken from a larger population (normally dist.)


has a sample mean 12, and sample variance 25. Construct 95% CI for population
mean when population s.d. is 5. What is the length of CI?

57 BITS Pilani, Pilani Campus 58 BITS Pilani, Pilani Campus

One Sided CI One Sided CI


HW 5: One sided confidence interval can be used to Use the following data on X, the time that a commercial
approximate the maximum and minimum value of the airliner stays at the gate during a through flight, to find a 95%
population mean. one sided confidence interval that puts a bound on the
minimum time in minutes for μ:
An interval (-∞,L1] such that P(μ ≤ L1) = 1-α allows us
25 29 32 37 40 27 30 35 38 41 42 45 45 47 49 50
to place bounds on the maximum value of population
55 53 60 (Assume normality of population)
mean L  X t
1 S/ n
 , n -1

An interval [L2, ∞) such that P(μ ≥ L2) = 1-α allows us Solution : x  41.05, s 2  98.61, s  9.93
to place bounds on the minimum value of population s 9.93
Lobs  x - t / 2  41.05 - 1.734   37.10
n 19
mean L  X -t2 S/ n
 , n -1 CI  [37.10, )
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
7.4: Interval estimation of Variability Recall Chi-squared Distribution

Recall: S2 is an unbiased estimator for 2.


Theorem: Let X1, …, Xn be the random sample of
size n from a normal population
n
with mean  and
s.d. . Then (n - 1) S /    ( X i - X )2 /  2
2 2

i 1
has chi-squared distribution with (n-1) degrees of
freedom

Recall: Chi-squared dist with (n-1) degrees of freedom


is gamma dist with  = (n-1)/2,  = 2.
62
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Interval estimation of Variability Interval estimation for σ2


Theorem: Let X1, …, Xn be the random sample of size n   n - 1 s 2  n - 1 s 2 
Thus 100  1 -   % CI for  2 is  2
from a normal population with mean  and s.d. . Then,   2,n -1 12- 2,n -1 
, .
100(1-)% confidence interval estimate for 2 is given by  
(n - 1) s 2 (n - 1) s 2
2 
2 / 2, n -1 12- / 2, n -1
Proof: A confidence interval for  has lower and upper limits
(n - 1) S 2
that are the square roots of the corresponding limits in
P(  df2 ,1-     df2 , )  1 - 
 2  2
2 the interval for 2.

(n - 1) S 2 (n - 1) S 2 An upper or a lower confidence bound results from


2 
 df2 ,  df2 ,1-  replacing /2 with  in the corresponding limit of the CI.
2  2
64
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Interval estimation for σ2 Interval estimation for σ2


Sol. Here n  9, s 2 =0.24 (check), and   0.10.
To find the CI for the population variance  2 , first let us
calculate 12- 2, n -1
=  0.95,8
2
 2.733 and 2 2, n -1 =  0.05,8
2
 15.507. Thus,
  n - 1 s 2  n - 1 s 2    n - 1 s 2  n - 1 s 2 
Recall that 100  1 -   % CI for  2 is  2    0.1238, 0.7025  .The length
  2,n -1 12- 2,n -1  the 90% CI for  2 is 
, . ,
 2 2 12- 2 
  
of the CI for  2 is  0.7025 - 0.1238  =0.5787. The 90% CI for  is (?,?).

Ex.5. A sample of size 9 from a normal population is given below. Find HW.6. The heights in inches of 8 students of a college, chosen at random,
the 90% CI for the mean  of the population. Also find 90% CI for the were as follows: 62.2, 62.4, 63.1, 63.2, 65.5, 66.2, 66.3, 66.5. Compute 90%
variance  2 of the population. Sample: 0, 1, -1, 1, 1, 0, -1, - 2, 3. and 95% CI for the variance of the population of heights, assuming
Also find 90% CI for  . it be to be normal. Also, find the length of the interval in each case.
65
BITS Pilani, Pilani Campus 66 BITS Pilani, Pilani Campus
α
Degree of 1-α/2
freedom Chi-sq(α)

Cumulative Probability

Similar to A.7 Even this is


of your book fine

67 BITS Pilani, Pilani Campus 68 BITS Pilani, Pilani Campus

One Sided CI on σ2 Interval estimation for σ2


14 14
Ex 6: X is actual length of 63 mm nails. Use the given data
to find a 95% one side confidence interval on the variance
i 1
xi  882.2;i 1

xi2  55591.34; s 2  0.0105
in length
63.0 63.1 63.0 63.0 62.9 63.0 63.0 We want L, such that P  2  L   0.95
63.1 62.8 63.1 63.1 63.0 62.9 63.2 (n - 1) s 2 13  0.0105
The manufacturer wants to check to be sure that the  L    0.023

obs 2
population variance of the length of nails being produced 1- ,13 5.89
does not exceed 0.03. Assuming normality of population,
does this sample indicate that this is in the case? Explain.
Sol: Yes, this is the case.

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Supplementary HW Supplementary HW

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


BITS Pilani BITS Pilani
Pilani Campus Pilani Campus

Course No: MTH F113 Chapter 8: Tests of Hypothesis Based on a Single


Sample
Probability and Statistics
Sumanta Pasari
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Objectives Testing of Hypothesis


• Understanding hypothesis testing • Although we have studied parameter estimation, many times we need to
take decisions based on samples: the decision may be correct or it may be
• Constructing null and alternative hypotheses incorrect.

• Testing of hypothesis is used to verify whether a statement about the value


• Type I and Type II errors
of a population parameter should be rejected or not.
• Power of a test • The statement will be verified based on the information available from
random samples.
• Test for population mean
• σ known • Either the statement will be rejected or the statement cannot be rejected
• σ unknown (that is, accepted) based on the information available from samples.
• Test for population proportion • Two types of statement: null hypothesis and alternative hypothesis
3 BITS Pilani, Pilani Campus 4 BITS Pilani, Pilani Campus

Let us have a look… Testing of Hypothesis


• Which college is the best for Mechanical Engineering studies? • The null hypothesis, denoted by H0, is a tentative preconceived
assumption about population parameter. It always includes the
• SONY TV’s work at least 2000 days – correct? ‘statement of equality’, that is, equality part always appears with H0.
(“Null” means of no-value, no-effect or no-consequence)
• Diagnosis of a disease by X-ray machine
• The alternative hypothesis, denoted by Ha or H1, is the opposite of
• Testing efficiency of a new medicine available in market what is stated in the null hypothesis. The alternative hypothesis is
• Is there a change in relative saving of people at Pilani in 1990 and 2021? what the test is attempting to test or establish.
• If the information available from sample data contradicts the null
• A recruiter wants to recruit a few students from BITS Pilani – how?
hypothesis, we shall reject it, otherwise, we say “we fail to reject”
• Has the more advertising of a new magazine changed its sale? null hypothesis (similar to accepting the alternative hypothesis).

• CBI is trying hard to arrest the criminal. Whom to arrest?


5 BITS Pilani, Pilani Campus 6 BITS Pilani, Pilani Campus
Examples: Testing of Hypothesis Examples: Testing of Hypothesis

A criminal trial: In the language of statistics convicting the defendant is called rejecting the null
hypothesis in favor of the alternative hypothesis. That is, the jury is saying
In a trial, jury must decide between two hypotheses. The null hypothesis
that there is enough evidence to conclude that the defendant is guilty (i.e.,
(prior-belief) is there is enough evidence to support the alternative hypothesis).
H0: The defendant is innocent
The alternative hypothesis or research hypothesis is If the jury acquits it is stating that there is not enough evidence to support the
alternative hypothesis. Notice that the jury is not saying that the defendant is
H1: The defendant is guilty
innocent, only that there is not enough evidence to support the alternative
The jury do not know which hypothesis is true. They must make a hypothesis. In the same logic, we do not say that we accept the null
hypothesis, rather we say that “we fail to reject the null hypothesis” from
decision on the basis of evidence presented.
available information from sample.

7 BITS Pilani, Pilani Campus 8 BITS Pilani, Pilani Campus

Types of Errors Types of Errors


Decision H0 accepted H0 rejected
Reality Critical
Value
H0 true No error Type I error 
 (probability = α)

H0 false Type II error  No error


(probability = β) 
• H0: the null hypothesis and H1: the alternative hypothesis
• Type-I error: Rejecting null hypothesis when it is actually true; Prob(type-I) = α Accept H0 Reject H0

• Type-II error: Failed to reject null hypothesis when it is false; Prob(type-II) = β Reducing both type-I and type-II errors together is not possible.
• Power of a test (1-β): Probability of rejecting null hypothesis when it is false
Although, one can try to make either type of error reasonably small!
9 BITS Pilani, Pilani Campus 10 BITS Pilani, Pilani Campus

Level of Significance Type-I Error


• The decision depends on the value of the test A type I error is an error made when the null
statistic on a sample and hence has randomness in hypothesis is rejected, in spite of it being true. The
it. probability of committing a type I error is called the
• There is a chance that null hypothesis is rejected ‘level of significance’ of the test and is denoted by ‘’.
when it is true, that is, we have committed type I
error.
• Probability of Type I error is The set of values of the test statistic that leads us to
P[H0 is rejected when H0 is true]. reject the null hypothesis is termed as ‘Critical
• This is also called level of significance and denoted Region’.
by .
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Type-II Error Type-II Error
We design the test so that the probability of Consider a test of hypothesis. A type II error is
committing a type I error is approximately the value an error that is made when the null hypothesis
we desire. is not rejected when, in fact, the research
Sometimes, it might also happen that the observed theory is true. The probability of committing a
value of the test statistic does not fall on the rejection type II error is denoted by b.
region even though the null hypothesis is not true and
should be rejected. This is type-II error. The probability
of occurrence of this is given by beta (b).

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Power of a Test Constructing null and alternative


hypotheses
Consider a test of hypothesis. The probability that the null • One-tailed and two-tailed test:
hypothesis will be rejected when, in fact, the research theory
is true is called the power of the test (1-β). H 0 :   0 H 0 :   0 H 0 :   0
H 1 :   0 H 1 :   0 H 1 :   0
Note: We will either fail to reject to the null hypothesis with
One-tailed One-tailed
probability b or we reject the null hypothesis with probability Two-tailed
(lower-tail) (upper-tail)
power, so Or Or
b + power = 1 (Left-tailed) (Right-tailed)
• Probability of Type I error is  = P(H is rejected when H is true).
Note: Our objective is always to keep α and β as small as 0 0

possible and the power of the test to be as high as possible. This is also called the level of significance.
This is usually achieved by choosing a appropriate sample • Probability of Type II error is b =P(H0 is accepted when H0 is false)
size.
BITS Pilani, Pilani Campus 16 BITS Pilani, Pilani Campus

Test Procedure Test Procedure

17 BITS Pilani, Pilani Campus 18 BITS Pilani, Pilani Campus


Test Procedure Test Procedure

19 BITS Pilani, Pilani Campus 20 BITS Pilani, Pilani Campus

One-tailed or two-tailed? One-tailed or two-tailed?


Ex.1. From long experience of coca-cola company, it is known that Ex.2. A department store manager determines that a new billing system will be
yield is normally distributed with mean of 500 units and standard deviation cost effective only if the mean monthly account is more than $170. A random sample
96 units. For a modified process, yield is 535 units for a sample of size 50. of 400 monthly accounts is drawn, for which the sample mean is $178. It is known
At 5% significance level, does the modified process increased the yield? that the accounts are approximately normally distributed with s.d. of $65.
At   5%, can we conclude that the new system will be cost-effective?
Sol. Here H 0 :   500  this specifies a single value for the parameter 
Sol.
Actually, we shall assume H 0 :   500
System is cost effective if the mean account balance for all customers (population)
H1 :   500  this is what we want to test  is greater than $170, that is, if   $170.
 one-tailed test; test for  and  known Our null hypothesis thus H 0 :   170
H1 :   170  this is what we want to test 
Is calculated value
greater than the  one-tailed test ; t est for  and  known
critical value ?

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

One-tailed or two-tailed? One-tailed or two-tailed?


Ex.3. A drug is given to 10 patients, and the increments in their blood pressure Ex.4. The mean weakly sales of a magazine was 146 units. After an advertisement
were recorded as 3, 6,  2, 4,  4, 1,  6, 0, 0, 2. Is it reasonable to believe that the campaign, mean of weakly sales in 22 stores for a typical week increased to 154 with
a standard deviation of 17 units. Was the advertisement successful at 5% significance
drug has no effect on change of the mean blood pressure? Test at 5% significance
level? It is given that the weakly sales of magazine follows normal distribution.
level, assuming that the population is normal with variance 1. Sol. Formulate the hypothesis: H 0 :   146
Sol. Formulate the hypothesis: H 0 :   0 H1 :   146
H1 :   0  One-tailed test; test for  and  unknown, small sample from normal.
 Two-tailed test; test for  and  known. Ex.5. A state highway patrol periodically samples vehicles speeds at various
locations on a particular highway. The sample of vehicle speeds is used to test the
Does calculated hypothesis H 0 :   65. A sample of 64 vehicles shows a mean speed of 66.2 kmph
value fall in the with a s.d. of 4.2 kmph. Use   0.05 to test H 0 . Assume normality of population.
rejection region of Sol. Formulate the hypothesis: H 0 :   65
Acceptance
H0 (that is beyond H1 :   65  One-tailed test; test for  ,  unknown (large sample).
region
the critical values)
?
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
One-tailed or two-tailed? One-tailed or two-tailed?
Ex.6. A marketing research firm conducted a survey 10 years ago and found that Ex.8. In a golf course, over the past years, 20% of the players were women. In an effort
the average household income of Pilani is Rs. 12000. Mr. Agrawal, who has recently to increase the proportion of women players, a special promotion was implemented. Now the
joined the firm wants to verify the accuracy of data. For this, the firm decides to take manager likes to see whether the promotion helped to increase the proportion of women
a random sample of 200 households. Sample mean and sample s.d. are Rs. 13000 and players. A random sample of 400 players was selected, and 100 of the players were women.
Rs. 100. Verify Mr. Agrawal's doubt at   0.05, assuming normality of population. Test the hypothesis at 5% significance level.
Sol. Formulate the hypothesis: H 0 :   12000 Sol. Formulate the hypothesis: H 0 : p  0.2
H1 : p  0.2
H1 :   12000
 One-tailed test; test for p.
 Two-tailed test; test for  and  unknown  sample size n  200  .
Ex.9. A marketing company claims that it receives 8% responses from its mailing. To test this
Ex.7. A CFL manufacturing company supplies its products to various retailers. The
claim, a random sample of 500 were surveyed with 25 responses. Test the hypothesis at  =0.05.
company has received complaints from retailers that the average life of its CFL is not Sol. Formulate the hypothesis: H 0 : p  0.08
24 months, as the company claims. For verifying, the company collected a random sample H1 : p  0.08
of 150 CFLs and found that the average life is 23 months. Assuming   5 months, test the  Two-tailed test; test for p.
average population life of CFLs at   0.08.
Sol. Formulate the hypothesis: H 0 :   24, H1 :   24  Two-tailed; test for  ,  known.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Steps of Hypothesis Testing Test Statistic

Step 1. Develop the null and alternative hypotheses; determine 1. Test Statistic for population mean:
appropriate statistical test.
(a) when population variance is known: Z  X  0

Step 2. Specify the level of significance . / n


Step 3. Collect the sample data and compute the test statistic. (b) when population variance is unknown, Z  X  0

but having a large sample (n>40) S/ n


Step 4. Based on , identify critical values.

Step 5. Reject H0 if the calculated test statistic value falls in


(c) population variance unknown
Tn1  X   0

(small sample from a normal population) S/ n


the rejection region.
2. Test Statistic for population proportion:
27 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Lower-tailed test for population Upper-tailed test for population


mean (σ known) mean (σ known)

H 0 :   0 H 0 :   0
H 1 :   0 H 1 :   0

29 BITS Pilani, Pilani Campus 30 BITS Pilani, Pilani Campus


Two-tailed test for population Examples: Hypothesis Testing
mean (σ known)
Ex.8.1. From long experience of coca-cola company, it is known that
yield is normally distributed with mean of 500 units and standard deviation
96 units. For a modified process, yield is 535 units for a sample of size 50.
At 5% significance level, does the modified process increased the yield?
Sol.

Do Not
Reject H0
(Acceptance
Region)
H 0 :   0
H 1 :   0
31 BITS Pilani, Pilani Campus 32 BITS Pilani, Pilani Campus

Examples: Normal Distribution Examples: Hypothesis Testing

Ex.8.1. From long experience of coca-cola company, it is known that


yield is normally distributed with mean of 500 units and standard deviation
96 units. For a modified process, yield is 535 units for a sample of size 50.
At 5% significance level, does the modified process increased the yield?
Sol.
Step1:Here H 0 :   500  one-tailed (right-tailed) test; test for  ,  known
H1 :   500
x  0 535  500
Step 2: From sample data, we formulate zcalculated    2.57
   96 50
 
 n
Step 3: At 95% confidence level, z0.05  1.645  from single tailed test of Z-table 
Step 4: As zcalculated  z0.05  reject the null hypothesis (i.e., enough evidence to
accept the alternative hypothesis)
BITS Pilani, Pilani Campus 34 BITS Pilani, Pilani Campus

Examples: Hypothesis Testing Examples: Hypothesis Testing

Ex.8.2. A department store manager determines that a new billing system will be Ex.8.2. A department store manager determines that a new billing system will be
cost effective only if the mean monthly account is more that $170. A random sample cost effective only if the mean monthly account is more that $170. A random sample
of 400 monthly accounts is drawn, for which the sample mean is $178. It is known of 400 monthly accounts is drawn, for which the sample mean is $178. It is known
that the accounts are approximately normally distributed with s.d. of $65. that the accounts are approximately normally distributed with s.d. of $65.
At   5%, can we conclude that the new system will be cost-effective? At   5%, can we conclude that the new system will be cost-effective?
Sol. Sol.
Step1:Here H 0 :   170  one-tailed (right-tailed) test; test for  and  known
H1 :   170
x  0 178  170
Step 2: From sample data, we formulate zcalculated    2.46
   65 400
 
 n
Step 3: At 95% confidence level, z0.05  1.645  from single tailed test of Z-table 
Step 4: As zcalculated  z0.05  reject the null hypothesis (i.e., accept H1 )
35 BITS Pilani, Pilani Campus 36 BITS Pilani, Pilani Campus
Examples: Hypothesis Testing Examples: Normal Distribution
Ex.8.3. A drug is given to 10 patients, and the increments in their blood pressure
were recorded as 3, 6,  2, 4,  4, 1,  6, 0, 0, 2. Is it reasonable to believe that the
drug has no effect on change of the mean blood pressure? Test at 95% confidence
level, assuming that the population is normal with variance 1.
Sol.

37 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Examples: Hypothesis Testing Examples: Hypothesis Testing

Ex.8.3. A drug is given to 10 patients, and the increments in their blood pressure Ex.8.4. The mean weakly sales of a magazine was 146 units. After an advertisement
were recorded as 3, 6,  2, 4,  4, 1,  6, 0, 0, 2. Is it reasonable to believe that the campaign, mean of weakly sales in 22 stores for a typical week increased to 154 with
drug has no effect on change of the mean blood pressure? Test at 95% confidence a standard deviation of 15 units. Was the advertisement successful at 5% significance
level, assuming that the population is normal with variance 1. level? It is given that the weakly sales of magazine follows normal distribution.
Sol. Sol.
Step 1: Formulate the hypothesis: H 0 :   0, H1 :   0
 Two-tailed test for  and  is known.
x  0 0.4  0
Step 2: From sample data, we formulate z calculated    1.265
  n  1 10
Step 3: At 95% confidence level, z0.025  1.96,  z0.025  1.96
 from two tailed test of Z-table, we find  z   2

Step 4: As zcalculated does not fall in the rejection region, we fail to reject H 0 .
 We can believe that the drug has no effect on change of the mean blood pressure
39 BITS Pilani, Pilani Campus 40 BITS Pilani, Pilani Campus

Degree of
1-α/2
α/2
Examples: Hypothesis Testing
freedom tα/2

Cumulative Probability
Ex.8.4. The mean weakly sales of a magazine was 146 units. After an advertisement
campaign, mean of weakly sales in 22 stores for a typical week increased to 154 with
a standard deviation of 15 units. Was the advertisement successful at 5% significance
level? It is given that the weakly sales of magazine follows normal distribution.
Sol.
Step 1: Formulate the hypothesis: H 0 :   146
H1 :   146
 One-tailed test; test for  and  unknown, and small sample size from normal.
x  0 154  146
Step 2: From sample data, we formulate tcalculated    2.501
S n  15 22
Step 3: For   0.05 and 21 dof , t21,0.05  1.721  from one tailed test of T-table 
Step 4: As tcalculated  t21, 0.05  reject the null hypothesis (i.e., accept H1 )
 We can conclude that the advertisement was successful.
BAZC413: Introduction to Statistical Methods 41 BITS Pilani, Pilani Campus 42 BITS Pilani, Pilani Campus
Problem Solving Testing for p (large-sample)
Ex.8.8. In a golf course, over the past years, 20% of the players were women. In an effort
to increase the proportion of women players, a special promotion was implemented. Now the
manager likes to see whether the promotion helped to increase the proportion of women
players. A random sample of 400 players was selected, and 100 of the players were women.
Test the hypothesis at 5% significance level.
Sol. Step 1: Formulate the hypothesis: H 0 : p  0.20, H1 : p  0.20
 One-tailed test; test for p  sample size n  400  .
pˆ obs  p0 0.25  0.20
Step 2: From sample data, we formulate zcal    2.5
p0 1  p0  0.20  0.80
n 400
Step 3: For   0.05, z0.05  1.645  from one tailed test of z-table 
Step 4: As zcal  z 0.05  reject the null hypothesis (i.e., accept H1 )
 We can conclude that the proportion of women players has increased.

43 BITS Pilani, Pilani Campus 44 BITS Pilani, Pilani Campus

Examples: Hypothesis Testing P-Value Approach


HW1. In 64 randomly selected hours of production, the sample mean and sample Step 1. Develop the null and alternative hypotheses.
s.d. of the number of acceptable pieces produced by an automatic machine are 1038
and 146, respectively. At   0.05, does this enable us to reject H 0 :   1000 against Step 2. Specify the level of significance .
H1 :   1000?
Step 3. Collect the sample data and compute the test statistic.
HW2. Five measurements of tar content of a certain kind of cigarette yielded
14.5, 14.2, 14.4, 14.3, 14.6 mg per cigarette. Show that the difference between mean Step 4. Use the value of test statistic to compute the P-value.
of this sample and the average tar   14 claimed by manufacturer is significant at level
Step 5. Reject H0 if the calculated test statistic value falls in
of significance 0.05. Assume normality of population.
Hint. H 0 :   14 H1 :   14 the rejection region or not (i.e., P-value < )
HW3. A random sample of 225 undergraduates enrolled in marketing course was
asked to respond on a scale from one  strongly disagree  to seven  strongly agree  to Sometimes, though we have taken a decision about which
the proposition: “Insurance helps to protect from financial losses”.The sample mean hypothesis is to accept, we still want to support it by seeing
response was 4.27 and the sample standard deviation was 1.32.Test the hypothesis whether it is deep inside the critical region or on border.
H 0 :   4 against H1 :   4 at 5% significance level. This can be decided using tail probability or P-value.
45 BITS Pilani, Pilani Campus 46 BITS Pilani, Pilani Campus

Upper-tailed test for population Examples: Hypothesis Testing


mean (σ known) (using P-value)
Ex.8.1.From long experience of coca-cola company, it is known that
yield is normally distributed with mean of 500 units and standard deviation
96 units. For a modified process, yield is 535 units for a sample of size 50.
At 5% significance level, does the modified process increased the yield?
Sol.
Step1:Here H 0 :   500  one-tailed test; test for  and  known
H1 :   500
x   535  500
Step 2: From sample data, we formulate zcalculated    2.57
   96 50
 
 n
Step 3: For z  2.57, cummulative probability = 0.9949 (from table)
 P  value  1  0.9949  0.0051
Step 4: As P  value    reject the null hypothesis
47 BITS Pilani, Pilani Campus 48 BITS Pilani, Pilani Campus
Examples: Hypothesis Testing Problem Solving: Using p-Value
(using P-value)
Ex.8.3. A drug is given to 10 patients, and the increments in their blood pressure Ex.8.8. In a golf course, over the past years, 20% of the players were women. In an effort
were recorded as 3, 6,  2, 4,  4, 1,  6, 0, 0, 2. Is it reasonable to believe that the to increase the proportion of women players, a special promotion was implemented. Now the
drug has no effect on change of the mean blood pressure? Test at 95% confidence manager likes to see whether the promotion helped to increase the proportion of women
level, assuming that the population is normal with variance 1. players. A random sample of 400 players was selected, and 100 of the players were women.
Sol. Test the hypothesis at 5% significance level.
Sol. Step 1: Formulate the hypothesis: H 0 : p  0.20, H1 : p  0.20
Step 1: Formulate the hypothesis: H 0 :   0, H1 :   0
 Two-tailed test for  and  is known.  One-tailed test; test for p  sample size n  400  .

x  0.4  0 pˆ obs  p0 0.25  0.20


Step 2: From sample data, we formulate zcalculated    1.26 Step 2: From sample data, we formulate z    2.5
p0 1  p0 
 n  1 10
n
0.20  0.80
400
Step 3: For z  1.26, cummulative probability = 0.8962 (from table)
Step 3: For z  2.50, cummulative probability = 0.0.9938 (from table)
 P  value  2 1  0.8962   0.2076  p  value  1  0.9938   0.0062
Step 4: P  value   , we fail to reject H 0 . Step 4: p  value   , we reject H 0 .
 We can believe that the drug has no effect on change of the mean blood pressure  We conclude that the proportion of women players has increased.
49 BITS Pilani, Pilani Campus 50 BITS Pilani, Pilani Campus

P-value and Significance Testing Type-I and Type-II error (Ex. 8.1,
page 304)
• From the observed value u0 of the test statistic U, consider,
under assumption of H0, the probability that U lies to the
extreme of this observed value, on both sides or left side or
right side resp. depending on the nature of the alternate
hypothesis. This is called P-value or descriptive significance
level.
• Significance testing : Reject H0 if P-value is small.
Here, no level is pre-set, decision is taken only using
P-value.
• Previous testing using critical region with pre-set
level of significance, is called hypothesis testing.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Type-I and Type-II error (Ex. 8.1, Type-I and Type-II error (Ex. 8.1,
page 304) page 304)

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Type-I and Type-II error (Ex. 8.1, Type-I and Type-II error (Ex. 8.1,
page 304) page 304)

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Type-I and Type-II error (Ex. 8.1, Errors in Hypothesis Testing


page 304)

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Errors in Hypothesis Testing Errors in Hypothesis Testing

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Type-I and Type-II error A Normal Population with Known 

Ex.10. To test the null hypothesis that population mean is 4  H 0 :   4  against


alternative hypothesis   4  say,   5  , a test is designed based on a random sample of size 49.
It is decided that the null hypothesis will be rejected if the observed sample mean
x  4.3. If the population variance is 9, find (a) the distribution of X , assuming H 0
true, (b) the distribution of X , assuming H1 true, and (c) Probability of type-I and
type-II errors (e) What is the power of the test?
Hint. Using CLT, we can assume that X N   ,  2 n  . Here  2  9, n  49.
(a) If H 0 true,   4. Thus, X N  4,9 49 
(c) Prob of type-I error   = P  reject H 0 when H 0 true
 X  4 4.3  4 
= P  X  4.3 when X N  4,9 49   P    = P  Z  0.7   0.2420
 37 37 
......

BITS Pilani, Pilani Campus 62 BITS Pilani, Pilani Campus

A Normal Population with Known  Type-II error for a normal population


with known 

63 BITS Pilani, Pilani Campus 64 BITS Pilani, Pilani Campus

A Normal Population with Known  A Normal Population with Known 


(Ex. 8.7, from textbook) (Ex. 8.7, from textbook)

65 BITS Pilani, Pilani Campus 66 BITS Pilani, Pilani Campus


Similarly, for population proportion p Type-I and Type-II error
(under large sample)
HW 4

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


BITS Pilani BITS Pilani
Pilani Campus Pilani Campus

Course No: MTH F113 Chapter 12: Simple Linear Regression Model
(12.1 and 12.2)
Probability and Statistics Sumanta Pasari
BITS Pilani, Pilani Campus
(sumanta.pasari@pilani.bits-pilani.ac.in) BITS Pilani, Pilani Campus

Let’s Observe… Let’s Observe…linear or non-linear?


(1) Relationship between degrees Fahrenheit and degrees Celsius:
9
F  C  32  deterministic 
5
(2) Circumference    diameter  C  2 r  deterministic 
(3) Height and weight of students (is there a perfect relationship?)
(4) Driving speed and gas mileage (is there a deterministic relation?)
(5) Fertilizer and crop yield (production)
(6) Drug dosage and time to get cured
(7) Income and expenditure of a group of persons
(8) Sunshine hours/temperature and icecream sale
(9) Age of car and its sale price
3 BITS Pilani, Pilani Campus 4 BITS Pilani, Pilani Campus

Purpose: Linear Regression How to Formulate?


For the following observed data, how can we get the (1) Recall that we studied conditional mean, say Y x .
“best-fit” line? What would be the equation? What does Y x mean?
(2) Suppose, X is NOT a RV, rather a mathematical variable.
e.g., let X : depth of water, Y : the water temperature.
Then can we model the water temperature Y , as a function of X?
(3) Thus, aren't we dealing with a conditional variable Y x ?
(4) This Y x will have a mean Y x (a function of x).
(5) Can I express  linearly  this as Y x = 0  1 x ?
(Linear Curve of Regression of Y on X )
5 BITS Pilani, Pilani Campus 6 BITS Pilani, Pilani Campus
Simple Linear Regression What is a Model? Role Model?
Simple linear regression (regression means ‘act of going
back’, ‘return’, or ‘reversion’) is a statistical method that
allows us to summarize and study relationships between two
continuous (quantitative) variables:
• One variable, denoted by X, is regarded as
the predictor, explanatory, or independent variable.
• The other variable, denoted by Y, is regarded as
the response, outcome, or dependent variable.
Representation of some phenomenon.
Why is it called “Simple Linear Regression” model? What is a
model? Why simple? Why linear? What is regression? However, these are non-math/non-stats model.. then??
7 BITS Pilani, Pilani Campus 8 BITS Pilani, Pilani Campus

Math/Stats Model Regression Model


1. Often, they describe relationship between variables Regression
1 Explanatory 2+ Explanatory
2. Could be deterministic, or probabilistic (stochastic) Variable Models Variables

Probabilistic Simple Multiple


Models

Regression Correlation Other Non- Non-


Linear Linear
Models Models Models Linear Linear

9 BITS Pilani, Pilani Campus 10 BITS Pilani, Pilani Campus

Linear Probabilistic Model Linear Probabilistic Model

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Linear Probabilistic Model Linear Probabilistic Model

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Linear Probabilistic Model Linear Probabilistic Model

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Linear Probabilistic Model Linear Probabilistic Model

When  2 is small, an observed point (x, y) will almost always fall quite close to the true
regression line, whereas observations may deviate considerably from their expected values
(corresponding to points far from the line) when  2 is large.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Linear Probabilistic Model Linear Probabilistic Model

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Linear Probabilistic Model Linear Probabilistic Model

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Linear Probabilistic Model Problem Solving (HW)

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Problem Solving (HW)
BITS Pilani
Pilani Campus

Course No: MTH F113

Probability and Statistics


BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Scattergram Assumptions

In a regression study, it is useful to plot the data Simple Linear Regression Model: Y x   0  1 x  
points in xy-plane. Such a plot is called the Simple Linear Regression Equation: Y x   0  1 x
scattergram (scatter diagram). We do not expect the Estimated Simple Linear Regression Equation: ˆY x = b0  b1 x  or , yˆ = b0  b1 x 
points to lie exactly on a straight line. However if
linear regression is applicable, then they should
exhibit a linear trend. Model assumptions:
1. E     0
2. V      2 , same for all values of x
3. The values of  are independent.
 
4.  ~ N 0, 2  Y is also normally distributed.

BITS Pilani, Pilani Campus 28 BITS Pilani, Pilani Campus

Estimating Model Parameters Estimating Model Parameters

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Estimating Model Parameters Estimating Model Parameters

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Estimating Model Parameters Estimating Model Parameters

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Estimating Model Parameters Estimating Model Parameters

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Need Computation Table Problem Solving
Ex. 1: In the following table, x is the tensile force applied to a
steel specimen in thousand of pounds and y is the resulting
elongation in thousands of an inch:

x 1 2 3 4 5 6
y 14 33 40 63 76 85
(a)Graph the data to verify that it is reasonable to assume that
the regression of Y on X is linear
(b) Find the equation of the least-squares line and use it to
predict the elongation when the tensile force is 3.5
thousand pounds.
37 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Problem Solving Problem Solving


(a) Do by yourself Ex. 2 For the following data sets, plot a scattergram and
(b) Since n=6, we need to have: subjectively state whether it appears that a linear
n n n n n regression will (i) fit the data well (ii) give only a fair fit
x
i 1
i  21, x
i 1
2
i  91, y
i 1
i  311, x y
i 1
i i  1342, y
i 1
2
i  19855, (iii) fit the data poorly.
x 5 15 25 35 45 50
Estimated Regression Line:
ˆY x  b0  b1 x Y 10 18 20 25 32 45

 yˆ  1.133  14.486 x  interpretation of results?  Ex. 3. Now, estimate 0 and 1 . Find the residuals in
each case and verify that apart from round-off error, the
Therefore, for tensile force 3.5, y = 51.83 (meaning?) residuals sum to 0.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Problem Solving Estimating σ2


n
 n  n 
n  xi yi    xi )    yi 
b1  i 1  i 1   i 1   0.66
2
n
 n

n  xi2    xi ) 
i 1  i 1 
b0  25  0.66(29.2)  5.73
yˆ  5.73  0.66 x
yi 10 18 20 25 32 45
estimated y 8.95 15.59 22.23 28.87 35.52 38.84
ei 1.05 2.41 -2.23 -3.87 -3.52 6.16
e 0i
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Estimating σ2 The Coefficient of Determination

The error sum of squares SSE can be interpreted as a measure of how


much variation in y is left unexplained by the model—that is, how much
cannot be attributed to a linear relationship.
A quantitative measure of the total amount of variation in observed y
values is given by the total sum of squares (SST)

Using the model to explain y variation: (a) data for which all variation is explained; (b) data for which
most variation is explained; (c) data for which little variation is explained

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

The Coefficient of Determination The Coefficient of Determination

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

The Coefficient of Determination The Coefficient of Determination

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Coefficient of Determination Problem Solving
n
Sum of Squares due to Error (SSE): SSE    yi  yˆi  Ex 4: Armand’s pizza parlour:
2

i 1
Restaurant 1 2 3 4 5 6 7 8 9 10
n
Total Sum of Squares (SST): SST    yi  y 
2
Students (1000s) 2 6 8 8 12 16 20 20 22 26
i 1
n
Sum of Squares due to Regression (SSR): SSR    yˆi  y 
2 Sales ($1000s) 58 105 88 118 117 137 157 169 149 202
i 1

Relation: SST  SSR  SSE (a) Find an estimated regression line of Y on X.


(b) Find the coefficient of regression and point estimation of correlation
SSE SSR (sample correlation coefficient).
Coefficient of determination: r 2  1  
SST SST (c) You may use excel sheets.
Sample correlation coefficient: r   sign of b1  r 2

49 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Problem Solving Problem Solving


HW 1: The city’s transportation department is interested in studying the HW 2: Sporty cars are designed to provide better handling, acceleration
relationship between the temperature and the number of passengers and a more responsive driving experience than a normal sedan. But,
that use public transportation. The manager recorded the temperature even within this select group of cars, performance as well as price vary.
at the beginning of the hour, and then had a bus driver record the The road-test scores and prices of 12 randomly selected sporty cars are
number of passengers that boarded the bus throughout the hour. Their provided below.
findings are listed below.

temperature 42 37 46 30 50 43 43 46 46 49
(a) Develop a scatter diagram with price as the independent variable.
passenger 173 149 185 123 201 174 175 188 186 198 What does it indicate about the relationship between the two
variables?
(b) Find an estimated regression line of Y on X.
(a) Find an estimated regression line of Y on X.
(c) Obtain a point estimation of correlation coefficient.
(b) Obtain a point estimation of correlation coefficient.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus

Problem Solving Problem Solving (HW 5)


HW 3:

(a) Develop a scatter diagram with price as the independent variable. What does it indicate
about the relationship between the two variables?
(b) Find an estimated regression line of Y on X. Obtain a point estimation of correlation
coefficient. Find SSR, SST and the coefficient of determination. Find an estimate of σ2.
HW 4:

BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus


Let X denotes the number of heads. Then X is a discrete random variable or the function from the
sample space S onto the set {0, 1, 2}, that is,

X : S = {HH, HT, T H, T T } → {0, 1, 2}.

since X(HH) = 2, X(HT ) = 1, X(T H) = 1 and X(T T ) = 0. In tabular form, it can be displayed as
Contents Chapter 2
Outcome HH HT TH TT

X=x 2 1 1 0
Discrete Probability Distributions
2 Discrete Probability Distributions 2 So here the discrete random variable X assumes only three values X = 1, 2, 3.
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 We find that P (X = 0) = 14 , P (X = 1) = 12 and P (X = 2) = 41 . It is easy to see that the function f
2.1.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 given by
Note: These lecture notes aim to present a clear and crisp presentation of some topics in Probability and
2.1.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Statistics. Comments/suggestions are welcome via the e-mail: sukuyd@gmail.com to Dr. Suresh Kumar. X=x 0 1 2
2.1.3 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Moments and moment generating function . . . . . . . . . . . . . . . . . . . . . . . 8 f (x) = P (X = x) 1 1 1
4 2 4
2.2 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Definitions
2.2.1 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Discrete Random Variable is pmf of X. It gives the probability distribution of X.
2.3.1 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 The cumulative distribution function F of X is given by
Suppose a random experiment results into finite or countably infinite outcomes with sample space S.
2.4 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Then a variable X taking real values x corresponding to each outcome of the random experiment (or each
2.4.1 Binomial distribution as a limiting case of hypergeometric distribution . . . . . . . 19 X=x 0 1 2
element of S) is called a discrete random variable. In other words, the discrete random variable X
2.4.2 Generalization of the hypergeometric distribution . . . . . . . . . . . . . . . . . . . 20 is a function from the sample space S to the set of real numbers. So, in principle, the discrete random F (x) = P (X ≤ x) 1 3
1
4 4
2.5 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 variable X being a function could have any given definition.
2.6 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Remark: Note that X is a function with domain as the sample space S. So, in the above example, X
Probability Mass Function (pmf )
could also be defined as the number of tails, and accordingly we could write its pdf and cdf.
A function f is said to be probability mass function of a discrete random variable X if it satisfies the
following three conditions: Example with countably infinite sample space
(i) f (x) ≥ 0 for each value x of X.
Suppose a fair coin is tossed again and again till head appears. Then the sample space is
(ii) fX(x) = P (X = x), that is, f (x) provides probability for each value x of X.
(iii) f (x) = 1, that is, sum of probabilities of all values x of X is 1. S = {H, T H, T T H, T T T H, . . . }.
X=x
The outcome H corresponds to the possibility of getting head in the first toss. The outcome T H cor-
Cumulative Distribution Function (cdf ) responds to the possibility of getting tail in the first toss and head in the second toss. Likewise, T T H
A function F defined by corresponds to the possibility of getting head in the third toss, and so on.
If X denotes the number of tosses in this experiment, then X is a function from the sample space S
X
F (x) = f (x) to the set of natural numbers, and is given by
X≤x So here the discrete random variable X assumes countably infinite values X = 1, 2, 3, . . . .

is called cumulative distribution function of X. Therefore, F (x) is the sum of probabilities of all the The pmf of X is given by
values of X starting from its lowest value to the value x. It can also be written in the closed form
 x
1
Example with finite sample space f (x) = , x = 1, 2, 3, ........
2
Consider the random experiment of tossing of two fair coins. Then the sample space is
Notice that f (x) ≥ 0 for all x and
∞  x 1
S = {HH, HT, T H, T T }. X X 1 a
f (x) = = 2 1 = 1 (∵ The sum of infinite G.P. a + ar + ar2 + . . . is 1−r ).
2 1− 2
X=x x=1

1 2 3

Outcome H TH TTH ... sold by the agency. Simple calculations yield f (0) = 1/35, f (1) = 12/35, f (2) = 18/35, and f (3) = 4/35. Therefore,
3
X=x 1 2 3 ...
X 12
Sol. f (x) = 4

/16, x = 0, 1, 2, 3, 4. µ = E(X) = xf (x) = = 1.7.
x 7
x=0

X=x 1 2 3 ... 2.1.1 Expectation Thus, if a sample of size 3 is selected at random over and over again from a lot of 4 good components
and 3 defective components, it will contain, on average, 1.7 good components.
1 1 2 1 3
 
f (x) = P (X = x) 2 2 2 ... Let X be a random variable with pmf p. Then, the expectation of X, denoted by E(X), is defined as
X Ex. A salesperson for a medical device company has two appointments on a given day. At the first
E(X) = xf (x).
appointment, he believes that he has a 70% chance to make the deal, from which he can earn $ 1000
X=x
The cumulative distribution function F of X is given by commission if successful. On the other hand, he thinks he only has a 40% chance to make the deal at
1
x   x More generally, if H(X) is function of the random variable X, then we define the second appointment, from which, if successful, he can make $1500. What is his expected commission
1 − 12

X 1 based on his own probability belief? Assume that the appointment results are independent of each other.
F (x) = f (x) = 2 =1− , where x = 1, 2, 3, ......... E(H(X)) =
X
H(x)f (x).
X≤x
1 − 12 2
X=x
Note. Determining cdf could be very useful. For instance, in the above example, suppose it is required
Sol. First, we know that the salesperson, for the two appointments, can have 4 possible commission
to calculate P (10 ≤ X ≤ 30). Here, one option is to sum all the probabilities from P (X = 10) to
totals: $0, $1000, $1500, and $2500. We then need to calculate their associated probabilities. By inde-
P (X = 30). Instead, we use the cdf to obtain Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and
pendence, we obtain
P (10 ≤ X ≤ 30) = F (30) − F (9) = 1 − 2130 − 1 − 219 = 219 − 2130 . 2 with probabilities 14 , 12 and 14 respectively. So E(X) = 0 × 14 + 1 × 12 + 2 × 14 = 1.
 
f (0) = (1 − 0.7)(1 − 0.4) = 0.18,
f (2500) = (0.7)(0.4) = 0.28,
Some more illustrative examples Note: (1) The expectation E(X) of the random variable X is the theoretical average or mean value of
f (1000) = (0.7)(1 − 0.4) = 0.42,
X. In a statistical setting, the average value, mean value1 and expected value are synonyms. The mean
Ex. A shipment of 20 similar laptop computers to a retail outlet contains 3 that are defective. If a f (1500) = (1 − 0.7)(0.4) = 0.12.
value is demoted by µ. So E(X) = µ.
school makes a random purchase of 2 of these computers, find the probability distribution for the number
of defectives. Therefore, the expected commission for the salesperson is
(2) If X is a random variable, then it is easy to verify the following:
E(X) = (0)(0.18) + (1000)(0.42) + (1500)(0.12) + (2500)(0.28) = $1300.
(i) E(c) = c
Sol. f (0) = 68/95, f (1) = 51/190 and f (2) = 3/190. (ii) E(cX) = cE(X)
Ex. Suppose that the number of cars X that pass through a car wash between 4:00 P.M. and 5:00 P.M.
(iii) E(cX + d) = cE(X) + d
Ex. Find the probability distribution of the number of heads in a toss of four coins. Also, plot the on any sunny Friday has the following probability distribution:
(iv) E(cH(X) + dG(X)) = cE(H(X)) + dE(G(X))
probability mass function and probability histogram. where c, d are constants, and H(X) and G(X) are functions of X. Thus, expectation respects the linearity
property. X=x 4 5 6 7 8 9
Sol. Total number ofpoints  in  the samplespace is 16. The number points in the sample space
 with 0, 1,
2, 3 and 4 heads are 40 , 41 , 42 , 43 and 44 , respectively. So f (0) = 40 /16 = 1/16, f (1) = 41 /16 = 1/4,
 1 1 1 1 1 1
(3) The expected or the mean value of the random variable X is a measure of the location of the center f (x) = P (X = x) 12 12 4 4 6 6
f (2) = 42 /16 = 3/8, f (3) = 43 /16 = 1/4 and f (4) = 44 /16 = 1/16.
  
of values of X.
Thus, f (x) = x4 /16, x = 0, 1, 2, 3, 4.


The probability mass function plot and probability histogram are shown in Figure 2.1. Let g(X) = 2X − 1 represent the amount of money, in dollars, paid to the attendant by the manager.
Some illustrative examples
Find the attendant’s expected earnings for this particular time period.
Ex. A lot containing 7 components is sampled by a quality inspector; the lot contains 4 good compo-
nents and 3 defective components. A sample of 3 is taken by the inspector. Find the expected value of
the number of good components in this sample. Sol. We find
9
X
E(g(X)) = E(2X − 1) = (2x − 1)f (x) = $12.67.
Sol. Let X represent
 3  the number of good components in the sample. Then probability distribution of
4 x=4
x 3−x
X is f (x) = 7
 , x = 0, 1, 2, 3.
3 2.1.2 Variance
1
From your high school mathematics, you know that if we have n distinct values x1 , x2 , ...., xn with frequencies f1 , f2 ,
Xn Let X and Y be two random variables assuming the values X = 1, 9 and Y = 4, 6. We observe that both
...., fn respectively and fi = N , then the mean value is the variables have the same mean values given by µX = µY = 5. However, we see that the values of X
i=1
are far away from the mean or the central value 5 in comparasion to the values of Y . Thus, the mean
n n   n
X fi xi X fi X value of a random variable does not account for its variability. In this regard, we define a new parameter
Figure 2.1: Probability mass function plot and probability histogram µ= = xi = f (xi )xi .
i=1
N i=1
N i=1 known as variance. It is defined as follows.
Ex. If a car agency sells 50% of its inventory of a certain foreign car equipped with side airbags, find where f (xi ) = fNi is the probability of occurrence of xi in the given data set. Obviously, the final expression for µ is the
a formula for the probability distribution of the number of cars with side airbags among the next 4 cars expectation of a random variable X assuming the values xi with probabilities f (xi ).

4 5 6
n n
If X is a random variable with mean µ, then its variance, denoted by V (X) is defined as the expec- X X n(n + 1)(2n + 1) 2n + 1 Thus, the function E(etX ) generates all the ordinary moments. That is why, it is known as the moment
Now µ = E(X) = xf (x) = cx2 = c = .
tation of (X − µ)2 . So, we have 6 3 generating function and is denoted by mX (t). Thus, mX (t) = E(etX ).
x=1 x=1
V (X) = E((X − µ)2 ) = E(X 2 ) + µ2 − 2µE(X) = E(X 2 ) + E(X)2 − 2E(X)E(X) = E(X 2 ) − E(X)2 . n n In general, the kth moment of a random variable X about any point a is defined as E((X − a)k ).
X X n2 (n + 1)2 n(n + 1)
E(X 2 ) = x2 f (x) = cx3 = c = . Obviously, a = 0 for the ordinary moments. Further, E(X − µX ) = 0 and E((X − µX )2 ) = σX 2 . So the

Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and 4 2 first moment about mean is 0 while the second moment about mean yields the variance.
x=1 x=1
2
2 with probabilities 41 , 12 and 14 respectively. So

n(n + 1) 2n + 1
σ 2 = E(X 2 ) − E(X)2 = − .
E(X) = 0 × 14 + 1 × 12 + 2 × 41 = 1, 2 3
E(X 2 ) = (0)2 × 14 + (1)2 × 21 + (2)2 × 14 = 32 .
2.2 Geometric Distribution
Ex. Consider a random variable X with the pmf given by
∴ V (X) = 23 − 1 = 12 . The geometric distribution arises under the following conditions:
f (x) = c 2−|x| , x = ±1, ±2, ±3, ...., (i) The random experiment consists of a series of independent trials.
Note: (i) The variance V (X) of the random variable X is also denoted by σ 2 . So V (X) = σ 2 .
(ii) Each trial results into two outcomes, namely success (S) and failure (F ), which have constant proba-
(ii) If X is a random variable and c is a constant, then it is easy to verify that V (c) = 0 and V (cX) = 2|X|
where c is a constant. If g(X) = (−1)|X|−1 , then show that E(g(X)) exists but E(|g(X)|) does bilities p and q = 1 − p, respectively.a
c2 V (X). 2|X| − 1 (iii) X denotes the number of trials to obtain the first success.
not exist.
a

Such trials are called as Bernoulli trials.
X
Some illustrative examples Sol. Using the condition f (x) = 1, we find c = 1/2. Then the sample space of the random experiment is
x=±1
Ex. Let the random variable X represent the number of automobiles that are used for official business ∞ ∞ S = {S, F S, F F S, ..........},
X X 1
purposes on any given workday. The probability distribution for company A is Now E(g(X)) = g(x)f (x) = (−1)|x|−1 , which is an alternating and convergent
2(2|x| − 1) and X is a discrete random variable with countably infinite values: X = 1, 2, 3, ......... such that
x=±1 x=±1
x 1 2 3 ∞
X 1
series. So E(g(X)) exists. But E(|g(X)|) = is a divergent series, so E(|g(X)|) does not Outcome S FS FFS ...
f (x) 0.3 0.4 0.3 2(2|x| − 1)
x=±1
exist. X=x 1 2 3 ...
and that for company B is P (X = x) p qp qp2 ...
2.1.3 Standard Deviation
x 0 1 2 3 4
The variance of a random variable, by definition, is sum of the squares of the differences of the values
Thus, the pmf of X, denoted by g(x; p), is given by
f (x) 0.2 0.1 0.3 0.3 0.1 of the random variable from the mean value. So variance carries squared units of the original data, and
hence is a pure number often without any physical meaning. To overcome this problem, a second measure g(x; p) = q x−1 p, x = 1, 2, 3......
of variability is employed known as standard deviation and is defined as follows.
Show that the variance of the probability distribution for company B is greater than that for company Let X be a random variable with variance σ 2 . Then the standard deviation of X denoted by σ is the The random variable X with this pmf is called geometric random variable. Here the name ‘geometric’
A. the non-negative square root of X, that is, because the probabilities p, qp, q 2 p,.... in succession constitute a geometric progression. Given the value
2 = 0.6, µ = 2.0 and σ 2 = 1.6. of the parameter p, the probability distribution of the geometric random variable X is uniquely described.
Sol. µA = 2.0, σA B B σ=
p
V (X).
Ex. Calculate the variance of g(X) = 2X + 3, where X is a random variable with probability distribution Note: A large standard deviation implies that the random variable X is rather inconsistent and somewhat
hard to predict. On the other hand, a small standard deviation is an indication of consistency and stability. Mean, variance and mgf of geometric random variable
For the geometric random variable X, we have
x 0 1 2 3
2.1.4 Moments and moment generating function

1 1 1 1
f (x)
X
4 8 2 8 Let X be a random variable and k be any positive integer. Then E(X k ) defines the kth ordinary moment (i) µX = E(X) = xg(x; p)
2
of X. x=1
Sol. µ2X+3 = 6, σ2X+3 = 4. ∞
Obviously, E(X) = µ is the first ordinary moment, E(X 2 ) is the second ordinary moment and so on. X
= xq x−1 p
Further, the ordinary moments can be obtained from the function E(etX ). For, the ordinary moments
x=1
Ex. Find the mean and variance of a random variable X with the pmf given by k
E(X k ) are coefficients of tk! in the expansion X∞
=p xq x−1
f (x) = cx, x = 1, 2, 3, ...., n
t2 x=1
E(etX ) = 1 + tE(X) + E(X 2 ) + ............ ∞
!
where c is a constant and n is some fixed natural number. 2! d X
=p qx
Also, we observe that dq
n x=1
X 2 ∞
Sol. Using the condition f (x) = 1, we get c(1 + 2 + ...... + n) = 1 or c = . X
n(n + 1) dk  (∵ Term by term differentiation is permissible for the convergent power series q x within its interval of
E(X k ) = E(etX ) t=0 .

x=1
dtk x=1

7 8 9

convergence
 |q|< 1.) Some illustrative examples Mean, variance and mgf of negative binomial random variable
d q
=p Ex. A fair coin is tossed again and again till head appears. If X denotes the number of tosses in this ex- For the negative binomial random variable X, we have
dq 1 − q x
1 periment, then X is a geometric random variable with the pmf g(x) = 21 , x = 1, 2, 3, ......... Here p = 12 .

=p X
(1 − q)2 (i) µX = E(X) = x nb(x; k, p)
1 Ex. For a certain manufacturing process, it is known that, on the average, 1 in every 100 items is defec- x=k
= tive. What is the probability that the fifth item inspected is the first defective item found? ∞ 
p

X x − 1 k x−k
= x p q
(ii) σX2 = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2 k−1

Sol. Here p = 1/100 = 0.01 and x = 5. So required probability is (0.01)(0.99)4 = 0.0096. x=k
∞  
X 1 1 X x k x−k
=p x(x − 1)q x−1 + − 2 = k p q
p p Ex. At a “busy time,” a telephone exchange is very near capacity, so callers have difficulty placing their k
x=1 x=k
∞ calls. It may be of interest to know the number of attempts necessary in order to make a connection. ∞  
X q k X x k+1 x−k
= pq x(x − 1)q x−2 − 2 Suppose that we let p = 0.05 be the probability of a connection during a busy time. Find the probability = p q
p p k
x=1 of a successful call in the fifth attempt. x=k
d2 ∞ 

q q k X y−1

=p 2 − 2 = pk+1 q y−(k+1) , where x = y − 1,
dq 1 − q  p Sol. Here p = 1/100 = 0.05 and x = 5. So required probability is (0.05)(0.95)4 = 0.041. p (k + 1) − 1
y=k+1
d2 q q ∞
= pq 2 − k X
dq  1 − q  p2 = nb(y; k + 1, p)
2.2.1 Negative Binomial Distribution p
d 1 q y=k+1
= pq − 2
dq (1 − q)2 p In geometric distribution, X is the number of trials to obtain the first success. Its more general version is k
2 q that we choose X as the number of trials to obtain the kth success. Then X is called a negative binomial = .1
= pq − p
(1 − q)3 p2 random variable with the values X = k, k + 1, k + 2, . . . .. Since the final trial
 among the x trials would k
2 q =
= pq 3 − 2 result in a success, therefore the remaining k − 1 successes can occur in x−1k−1 ways from the x − 1 trials. p
p p Hence, the pmf of the negative binomial random variable X, denoted by nb(x; k, p), is given by ∞
2q q
X
= 2 − 2   (ii) E((X + 1)X) = (x + 1)x nb(x; k, p)
p p x − 1 k x−k x=k
q nb(x; k, p) = p q , x = k, k + 1, k + 2, . . . . ∞
k−1
 
= 2. X x − 1 k x−k
p = (x + 1)x p q

k−1
If we make a change of variable via y = x − k, then x=k

X
(iii) mX (t) = E(etX ) = etx g(x; p)
 
X x + 1 k x−k
x=1
 
k+y−1 k y

k+y−1 k y
 = (k + 1)k p q
∞ nb(y; k, p) = p q = p q , y = 0, 1, 2, . . . . k+1
x=k
X k−1 y ∞ 
=p etx q x−1

k(k + 1) X x + 1 k+2 x−k
x=1
= p q
p2
   
∞ n n k+1
pX t x Here we have used the well known result: = . x=k

= (qe ) x n−x k(k + 1) X

y−1

q
x=1 ∞   = pk+2 q y−(k+2) , where x = y − 2,
X k + y − 1 p2 (k + 2) − 1
p qet Note that (1 − q)−k = q y is a negative binomial series. y=k+2
= (t < − ln q) y ∞
q 1 − qet y=0 k(k + 1) X
∞ = nb(y; k + 2, p)
pet X p2
= Now, let us show that nb(x; k, p) = 1. For, y=k+2
1 − qet k(k + 1)
x=k
∞ ∞  = .1
Remark: Note that we can easily obtain E(X) and E(X 2 ) from the moment generating function mX (t) p2

X X x − 1 k x−k
nb(x; k, p) = p q k(k + 1)
by using k−1 =
x=k x=k 2
∞   p
dk
X y+k−1 k y
E(X k ) = [mX (t)]t=0 , = p q , where y = x − k k(k + 1) k k 2 kq
dtk k−1 So V (X) = E((X + 1)X) − E(X) + E(X)2 = − − 2 = 2
y=0 p2 p p p
∞  
for k = 1 and k = 2 respectively. In other words the first and second t-derivatives of mX (t) at t = 0
X y+k−1 y ∞ ∞  
= pk q X X x − 1
provide us E(X) and E(X 2 ), respectively. Hence we easily get mean and variance from the moment y (iii) mX (t) = E(etX ) = etx nb(x; k, p) = etx pk q x−k
y=0 k−1
x=k x=k
generating function. Verify! = pk (1 − q)−k ∞  
X x−1
= pk p−k = pk q x−k etx
k−1
= 1. x=k

10 11 12
∞  
X y + k − 1 k y t(y+k) So, the pmf of X, denoted by b(x; n, p), is given by Some illustrative examples
= p q e , where y = x − k
k−1
Ex. Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?
 
y=0 n n−x x
∞ 
y+k−1
 b(x; n, p) = q p , x = 0, 1, 2, 3......, n.
= pk etk
X
(qet )y x
y Sol. Here n = 5, p = 1/6, x = 2, and therefore
y=0 The random variable X with  this pmf is called binomial random variable. Here the name ‘binomial’ P (X = 2) = b(2; 5, 1/6) = 52 (1 − 1/6)5−2 (1/6)2 = 0.161.
= (pe ) (1 − qet )−k
t k
because the probabilities n0 q n , n1 q n−1 p,....., nn pn in succession are the terms in the binomial expansion
k
pet

of (q + p)n . Once the values of the parameters n and p are given/determined, the pmf uniquely describes Ex. The probability that a certain kind of component will survive a shock test is 3/4. Find the proba-
=
1 − qet the binomial distribution of X. bility that exactly 2 of the next 4 components tested survive.
Ex. In an NBA (National Basketball Association) championship series, the team that wins four games
out of seven is the winner. Suppose that teams A and B face each other in the championship games and Mean, variance and mgf of binomial random variable Sol. Here n = 4, p = 3/4, x = 2, and therefore
that team A has probability 0.55 of winning a game over team B. For the binomial random variable X, we have P (X = 2) = b(2; 4, 3/4) = 42 (1 − 3/4)4−2 (3/4)2 = 27/128.
(a) What is the probability that team A will win the series in 6 games?
n Ex. The probability that a patient recovers from a rare blood disease is 0.4. If 15 people are known to
(b) What is the probability that team A will win the series? X
(i) µX = E(X) = xb(x; n, p) have contracted this disease, what is the probability that (a) at least 10 survive, (b) from 3 to 8 survive,
x=0 and (c) exactly 5 survive?
Sol. (a) Here x = 6, k = 4, p = 0.55. So required probability is n  
n n−x x
nb(4; 4, 0.55) = 6−1
X
4 6−4 = 0.1853.
4−1 (0.55) (1 − 0.55) = x q p
x Sol. (a) 0.0338 (b) 0.8779 (c) 0.1859
x=0
n  
(b) The team A can win the championship series in 4th or 5th or 6th or the 7th game. So required X n − 1 n−x x−1
= np q p Ex. A large chain retailer purchases a certain kind of electronic device from a manufacturer. The manu-
probability is x−1
x=1
nb(4; 4, 0.55) + nb(5; 4, 0.55) + nb(6; 4, 0.55) + nb(7; 4, 0.55) = 0.6083. n−1 facturer indicates that the defective rate of the device is 3%.
X n − 1
= np q n−1−y py (where y = x − 1) (a) The inspector randomly picks 20 items from a shipment. What is the probability that there will be
y at least one defective item among these 20?
y=0
2.3 Binomial Distribution = np(p + q)n−1 = np. (b) Suppose that the retailer receives 10 shipments in a month and the inspector randomly tests 20 devices
per shipment. What is the probability that there will be exactly 3 shipments each containing at least one
The binomial distribution arises under the following conditions: n
X defective device among the 20 that are selected and tested from the shipment?
(i) The random experiment consists of a finite number n of independent trials. (ii) E(X(X − 1)) = x(x − 1)b(x; n, p)
(ii) Each trial results into two outcomes, namely success (S) and failure (F ), which have constant proba- x=0
Sol. (a) Denote by X the number of defective devices among the 20. Then X follows a binomial distri-
n  
bilities p and q = 1 − p, respectively in each trial. X n n−x x
= x(x − 1) q p bution with n = 20 and p = 0.03. Hence, P (X ≥ 1) = 1 − P (X = 0) = 0.4562.
(iii) X denotes the number of successes in n trials. x
x=0
Then the sample space of the random experiment is n  
X n − 2 n−x x−2 (b) In this case, each shipment can either contain at least one defective item or not. Hence, testing of
S = S0 ∪ S1 ∪ S2 ∪ · · · ∪ Sn , = n(n − 1)p2 q p
x−2 each shipment can be viewed as a Bernoulli trial with p = 0.4562 from part (a). Assuming independence
where the sets x=2
n−2
X n − 2
 from shipment to shipment and denoting by Y the number of shipments containing at least one defective
S0 = {F F · · · F }, = n(n − 1)p 2 q n−2−y y
p (where y = x − 2) item, Y follows another binomial distribution with n = 10 and p = 0.4562. Therefore,
S1 = {SF · · · F, F S · · · F, ...., F F · · · S}, y P (Y = 3) = 0.1602.
y=0
........... = n(n − 1)p2 (p + q)n−2 = n(n − 1)p2 .
Sn = {SS  · · ·S} Ex. In a bombing attack, there is a 50% chance that any bomb will strike the target. At least two direct
carry n0 , n1 , ......, nn number of elements (outcomes)

since out of n trials, no success can take place So 2
σX = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2 = n(n − 1)p2 + np − n2 p2 = npq hits are required to destroy the target. How many minimum number of bombs must be dropped so that
in  n0 ways, can take place in n1 ways, and so on. Thus, the sample space S carries

one success the probability of hitting the target at least twice is more than 0.99?
n n n n n n
 
0 + 1 + ...... + n = (1 + 1) = 2 outcomes.
X
The random variable X being the number of successes in the n trials takes the values: X = 0, 1, 2, ...., n (iii) mX (t) = E(etX ) = xb(x; n, p)
x=0
Sol. Let n bombs must be dropped so that there is at least 99% chance to hit the target at least twice.
such that n   Let X be random variable representing the number of bombs striking the target. Then X = 0, 1, 2, ...., n
X n n−x x
= etx
q p follows a a binomial distribution with, p = 1/2, and therefore
Outcome S0 S1 ..... S2 x
x=0 P (X ≥ 2) ≥ 0.99 or 1 − P (X = 0) − P (X = 1) ≥ 0.99
n  
X=x 0 1 ..... n X n n−x t x It can be simplified to get 2n ≥ 100 + 100n. This inequality is satisfied if n ≥ 11. So at least 11 bombs
= q (pe )
n
 n n n
x must be dropped so that there is at least 99% chance to hit the target at least twice.
q n−1 p pn x=0
 
P (X = x) q .....
0 1 n = (q + pet )n

Note: In the particular case n = 1, the binomial distribution is called Bernoulli distribution:

b(x; 1, p) = q 1−x px , x = 0, 1.

13 14 15

2.3.1 Multinomial Distribution 2.4 Hypergeometric Distribution Vandermonde’s identity. We can count these number of ways by considering that in the team of n persons,
x persons are men and remaining n − x persons are women. Then we end up with getting the left hand
The binomial experiment becomes a multinomial experiment if we let each trial have more than two The hypergeometric distribution arises under the following conditions: side of the Vandermonde’s identity.
possible outcomes. For example, the drawing of a card from a deck with replacement is a multinomial (i) The random experiment consists of choosing n objects without replacement from a lot of N objects given Now from the Vandermonde’s
 identity, it follows that
experiment if the 4 suits are the outcomes of interest. that r objects possess a trait or property of our interest in the lot of N objects. X n n
X r N −r

In general, if a given trial can result in any one of k possible outcomes o1 , o2 ,. . . , ok with probabilities x n−x
(ii) X denotes the number of objects possessing the trait or property in the selected sample of size n. h(x; N, r, n) = N
 = 1. Thus, h(x; N, r, n) is a valid pmf.
p1 , p2 ,. . . , pk , then the multinomial distribution gives the probability that o1 occurs x1 times, o2 occurs x=0 x=0 n
See the following venn diagram for an illustration.
x2 times, . . . , and ok occurs xk times in n independent trials, as follows:

n
 N Mean, variance and mgf of hypergeometric random variable
f (x1 , x2 , . . . , xk ) = px1 1 px2 2 . . . pnxk , r 2 =

x1 , x2 , ..., xk For thehypergeometric random variable X, it can be shown that µX = E(X) = n N and σX
 r  N − r N − n
N -r r n .
where N N N −1
  ∞
n n! X
= , For, µX = E(X) = xh(x; N, r, n)
x1 , x2 , ..., xk x1 !x2 ! . . . xk ! x
n- x x X xr N
 −r
x1 + x2 + · · · + xk = n, p1 + p2 + · · · + pk = 1. = x n−x
N

x n
Clearly, when k = 2, the multinomial distribution reduces to the binomial distribution. r−1 N −1−(r−1)

 r X
x−1 n−1−(x−1)
Ex. The probabilities that a person goes to office by car, bus and train are 1/2, 1/4 and 1/4, respectively. n =n
N x N −1
n−1

Find the probability that the person will go to office 2 days by car, 3 days by bus and 1 day by train in r r−1 N −1−(r−1)
X x−1
 
n−1−(x−1)
the 6 days. =n , since = 1 being the sum of the probabilities for a hypergeometric distri-
It is easy to see that the x objects with the trait (by definition of X) are to be chosen from the r N N −1

−r x n−1
6! 1 2
 1 3
 1
 objects in xr ways while the remaining n − x objects are to be chosen from the N − r objects in N bution with parameters N − 1, r − 1 and n − 1.
Sol. .  n−x
−r
2!3!1! 2 4 4
ways. So the n objects carrying x items with the trait can be chosen from the N objects in xr N
  r  r − 1 
n−x Likewise, it is easy to find that E(X(X − 1)) = n(n − 1) . Hence, we have
ways while N N −1

Ex. The complexity of arrivals and departures of planes at an airport is such that computer simulation n is the total numbers of ways in which n objects can be chosen from N objects. Therefore, N
 r  N − r N − n
is often used to model the “ideal” conditions. For a certain airport with three runways, it is known the pmf of X, denoted by h(x; N, r, n) is given by 2 = E(X(X − 1)) + E(X) − E(X)2 = n
σX .
that in the ideal setting the following are the probabilities that the individual runways are accessed by a r N −r
  N N N −1
x n−x
randomly arriving commercial jet: ∴ h(x; N, r, n) = P (X = x) = N
 . Just for the sake of completeness in line with the other distributions, some details of the moment
Runway 1: p1 = 2/9, n generating function of the hypergeometric distribution are given below. It is given by
Runway 2: p2 = 1/6, The random variable X with this pmf is called hypergeometric random variable. The hypergeometric
r N −r N −r
 
t

Runway 3: p3 = 11/18. distribution is characterized by the three parameters N , r and n. Note that X lies in the range max(0, n+ X 2 F1 (−n, −r; N − r − n + 1; e )
mX (t) = E(etX ) = etx x Nn−x = n .
What is the probability that 6 randomly arriving airplanes are distributed in the following fashion? r − N ) ≤ x ≤ min(n, r). So minimum value of x could be n + r − N instead of 0. To understand this, let
 N

x n n
Runway 1: 2 airplanes, N = 30, r = 20 and n = 15. Then the minimum value of x is n + r − N = 15 + 20 − 30 = 5. For, there are
Runway 2: 1 airplane, only N − r = 10 objects without the trait in the 30 items. So a sample of 15 items certainly contains at Here 2 F1 is hypergeometric function defined as the sum of the infinite series
Runway 3: 3 airplanes least 5 objects with the trait. So in this case, the random variable X takes the values 5, 6, ..., 15. Notice
that the maximum value of x is min(n, r) = min(20, 15) = 15. Similarly, if we choose n = 25, the random ab z a(a − 1)b(b − 1) z 2
6! 2 2 1 11 3 2 F1 (a, b; c; z) = 1 + + + .......,
c(c − 1)
  
Sol. 2!1!3! 9 6 18 . variable X takes the values 15, 16, 17, 18, 19 and 20. In case, we choose n = 8, the random variable X c 1! 2!
takes the values 0, 1, 2 ..., 8. where a, b, c are constants, and z is variable of the hypergeometric function.
Next, let us check whether h(x; N, r, n) is a valid pmf. Note that x ∈ [max(0, n + r − N ), min(n, r)].
But we can take x ∈ [0, n] because in situations where this range is not [0, n], we have h(x; N, r, n) = 0. Also, note that
Also, know the Vandermonde’s identity: d ab
[2 F1 (a, b; c; z)] = 2 F1 (a + 1, b + 1; c + 1; z),
n      n a
 b  dz c
X a b a+b X x n−x
= or  = 1.
x n−x n a+b d2 a(a − 1)b(b − 1)
x=0 x=0 n [2 F1 (a, b; c; z)] = 2 F1 (a + 2, b + 2; c + 2; z).
dz 2 c(c − 1)
This identity is understandable in view of the following example. Following this, it can be shown that
 
d r
Suppose a team of n persons is chosen from a group of a men and b women. The number of ways µX = E(X) = mX (t) =n .
dt N
of choosing the team of n persons from the group of a + b persons is a+b
 t=0
n , the right hand side of the

16 17 18
Similarly, by calculating 2 = E(X 2 ) −
second derivative of mX (t) at t = 0, the variance can be found as σX (N > 20n) or N/n > 20, then we may use binomial probabilities in place of hypergeometric probabilities. 2.5 Poisson Distribution
 r  N − r N − n
E(X)2 = n . Consider the pmf of the binomial random variable X:
N N N −1 Ex. A manufacturer of automobile tires reports that among a shipment of 5000 sent to a local distributor,
1000 are slightly blemished. If one purchases 10 of these tires at random from the distributor, what is the  
n x
Some illustrative examples probability that exactly 3 are blemished? b(x; n, p) = p (1 − p)n−x , x = 0, 1, 2, · · · , n.
x
Ex. Suppose we randomly select 5 cards without replacement from a deck of 52 playing cards. What is
Sol. We find P (X = 3) = 0.2013 from binomial distribution, and P (X = 3) = 0.2015 from hypergeometric Let us calculate limiting form of the Binomial distribution as n → ∞, p → 0, and np = k is a constant.
the probability of getting exactly 2 red cards?
distribution. We have  
n x
Sol. Here N = 52, r = 26, n = 5, x = 2, and therefore P (X = 2) = h(2; 52, 26, 5) = 0.3251. b(x; n, p) = p (1 − p)n−x
x
2.4.2 Generalization of the hypergeometric distribution n!
= px (1 − p)n−x
Ex. Lots of 40 components each are deemed unacceptable if they contain 3 or more defectives. The Consider a lot of N objects given that r1 , r2 , ....., rk objects possess different traits of our interest such that x!(n − x)!
procedure for sampling a lot is to select 5 components at random and to reject the lot if a defective is r1 + r2 + .... + rk = N . Suppose a lot of n objects is randomly chosen (without replacement) where x1 , x2 , n(n − 1)...(n − x + 1) x
= p (1 − p)n−x
found. What is the probability that exactly 1 defective is found in the sample if there are 3 defectives in ..., xk objects have the traits as in the r1 , r2 , ....., rk objects, respectively, such that x1 + x2 + .... + xk = n. x!
the entire lot? (np)(np − p)...(np − xp + p)
Then the probability of the random selection is = (1 − p)n−x
x!
r1 r2
  rk
 (np)(np − p)...(np − xp + p)
Sol. Here N = 40, r = 3, n = 5, x = 1, and therefore P (X = 1) = h(x; 40, 3, 5) = 0.3011. x x .... x = (1 − p)−x (1 − p)n
f (x1 , x2 , ...., xk ) = 1 2N  k . x!
(k)(k − p)...(k − xp + p)
2.4.1 Binomial distribution as a limiting case of hypergeometric distribution
n
= (1 − p)−x (1 − p)k/p (Using np = k)
x!
Ex. Ten cards are randomly chosen without replacement from a deck of 52 playing cards. Find the
There is an interesting relationship between the hypergeometric and the binomial distributions. probability of getting 2 spades, 3 clubs, 4 diamonds and 1 heart? Thus, in the limit p → 0, we get
It can be shown that if the population size N → ∞ in such a way that the proportion of successes
r/N → p, and n is held constant, then the hypergeometric probability mass function approaches the Sol. Here N = 52, r1 = 13, r2 = 13, r3 = 13, r4 = 13, n = 10, x1 = 2, x2 = 3, x3 = 4, x4 = 1. So the (k)(k − 0)...(k − 0) e−k k x
p(x; k) = (1 − 0)−x e−k = ,
binomial probability mass function. required probability is x! x!
Proof: We have
13 13 13 13
    known as the pmf of Poisson distribution.
2 3
r
 N −r
 4
52
1
.
x n−x r! (N − r)! n! · (N − n)! 10 Notice that the conditions n → ∞, p → 0 and np = k, intuitively refer to a situation where the sample
h(x; N, r, n) = N
= ·
x! · (r − x)! (n − x)! · (N − n − (r − x))! space of the random experiment is a continuous interval or medium (thus carrying infinitely many points,

n
N!
 
n r!/(r − x)! (N − r)! · (N − n)! n → ∞); the probability p of discrete occurrences of an event of interest is very small (p → 0) such that
= · · the mean number of occurrences np of the event remains constant k.
x N !/(N − x)! (N − x)! · (N − r − (n − x))!
  Thus, formally the Poisson distribution arises under the following conditions:
n r!/(r − x)! (N − r)!/(N − r − (n − x))!
= · · (i) The random experiment consists of counting or observing discrete occurrences of an event in a continuous
x N !/(N − x)! (N − n + (n − x))!/(N − n)!
region or time intervala of some given size s, called as a Poisson process or Poisson experiment. For example,
  Y x n−x
n (r − x + k) Y (N − r − (n − x) + m) counting number of airplanes landing on Delhi airport between 9am to 11am, observing the white blood
= · ·
x (N − x + k) (N − n + m) cells in a sample of blood etc. are Poisson experiments.
k=1 m=1
(ii) λ denotes the number of occurrences of the event of interest per unit measurement of the given region
of size s. Then k = λs is the expected or mean number of occurrences of the event in size s.
Now taking the large N limit for fixed r/N , n and x we get the binomial pmf,
(iii) X denotes the number of occurrences of the event in the region of size s.
 
n x n−x a
b(x; n, p) = p q Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of
x time, etc.
since Then X is called a Poisson random variable, and its pmf can be proved to be
(r − x + k) r e−k k x
lim = lim =p p(x; k) = , x = 0, 1, 2, ....
N →∞ (N − x + k) N →∞ N x!
and The Poisson distribution is characterized by the single parameter k.
(N − r − (n − x) + m) N −r
lim = lim = 1 − p = q.
N →∞ (N − n + m) N →∞ N
In practice, this means that we can approximate the hypergeometric probabilities with binomial prob-
abilities, provided N  n. As a rule of thumb, if the population size is more than 20 times the sample size

19 20 21

Mean, variance and mgf of Poisson random variable lisecond? The moment generating function, mean and variance of the uniform random variable respectively read

X as
(i) µX = E(X) = xp(x; k) Sol. Here k = 4 and x = 6. n n
x=0
1 X txi 1X

mX (t) = e , µ= xi ,
X e−k k x Ex. Ten is the average number of oil tankers arriving each day at a certain port. The facilities at the n n
i=1 i=1
= x
x! port can handle at most 15 tankers per day. What is the probability that on a given day tankers have to
x=1 n n
!2
∞ be turned away? 1X 2 1X
X k x−1 σ2 = xi − xi .
= ke−k n n
(x − 1)! i=1 i=1
x=1 Sol. Here k = 10 and required probability is
= ke−k ek P (X > 15) = 1 − P (X ≤ 15) Ex. Suppose a fair die is thrown once. Let X denotes the number appearing on the die. Then X is a
=k 15
X discrete random variable assuming the values 1, 2, 3, 4, 5, 6. Also, P (X = 1) = P (X = 2) = P (X = 3) =
=1− P (X = x) = 1 − 0.9513 = 0.0487. P (X = 4) = P (X = 5) = P (X = 6) = 1/6. Thus, X is a uniform random variable.
(ii) σX2 = E(X 2 ) − E[X]2 = E(X(X − 1)) + E(X) − E(X)2 x=0

X e−k k x
= x(x − 1) + k − k2 Note: We proved that the Binomial distribution tends to the Poisson distribution as n → ∞, p → 0 and
x!
x=2
∞ np = k remains constant. Thus, we may use Poisson distribution to approximate binomial probabilities
X k x−2
= k 2 e−k + k − k2 when n is large and p is small. As a rule of thumb this approximation can safely be applied if n > 50 and
(x − 2)! np < 5.
x=2
= k 2 e−k ek + k − k 2
=k Some illustrative examples
We notice that
Ex. In a certain industrial facility, accidents occur infrequently. It is known that the probability of an
2
µX = k = σX . accident on any given day is 0.005 and accidents are independent of each other.

(a) What is the probability that in any given period of 400 days there will be an accident on one day?
(b) What is the probability that there are at most three days with an accident?
X
(iii) mX (t) = E(etX ) = etx p(x; k)
x=0
∞ −k x Sol. Let X be a binomial random variable with n = 400 and p = 0.005. Thus, np = 2. Using the Poisson
tx e k
X
= e approximation,
x!
x=0
∞ (a) P (X = 1) = e−2 21 = 0.271 and
X (ket )x 3
= e−k X
x! (b) P (X ≤ 3) = e−2 2x /x! = 0.857.
x=0
t x=0
= e−k eke
t
= ek(e −1) Ex. In a manufacturing process where glass products are made, defects or bubbles occur, occasionally
rendering the piece undesirable for marketing. It is known that, on average, 1 in every 1000 of these items
produced has one or more bubbles. What is the probability that a random sample of 8000 will yield fewer
than 7 items possessing bubbles?
Some illustrative examples
Ex. A healthy person is expected to have 6000 white blood cells per ml of blood. A person is tested Sol. This is essentially a binomial experiment with n = 8000 and p = 0.001. Since p is very close to
for white blood cells count by collecting a blood sample of size 0.001ml. Find the probability that the 0 and n is quite large, we shall approximate with the Poisson distribution using k = (8000)(0.001) = 8.
collected blood sample will carry exactly 3 white blood cells. Hence, if X represents the number of bubbles, the require probability is P (X < 7) = 0.3134.

e−6 63
Sol. Here λ = 6000, s = 0.001, k = λs = 6 and x = 3, and therefore P (X = 3) = p(3; 6) = 3! .
2.6 Uniform Distribution
Ex. In the last 5 years, 10 students of BITS-Pilani are placed with a package of more than one crore.
A random variable X is said to follow uniform distribution if it assumes finite number of values all with
Find the probability that exactly 7 students will be placed with a package of more than one crore in the
same chance of occurrence or equal probabilities. For instance, if the random variable X assumes n values
next 3 years.
x1 , x2 , .... , xn with equal probabilities P (X = xi ) = 1/n, then it is uniform random variable with pmf
e−6 67 given by
Sol. Here λ = 10/5 = 2, s = 3, k = λs = 6 and x = 7, and therefore P (X = 7) = p(7; 6) = 7! .
1
u(x) = , x = x1 , x2 , ...., xn .
Ex. During a laboratory experiment, the average number of radioactive particles passing through a n
counter in 1 millisecond is 4. What is the probability that 6 particles enter the counter in a given mil-

22 23 24
Contents Chapter 3

f (x)
Continuous Probability Distributions
3 Continuous Probability Distributions 2
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 a b
3.2 Uniform or Rectangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 x
Note: These lecture notes aim to present a clear and crisp presentation of some topics in Probability and
3.3 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Statistics. Comments/suggestions are welcome via the e-mail: sukuyd@gmail.com to Dr. Suresh Kumar. Z b
3.3.1 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 3.1: The shaded golden region gives the probability P (a ≤ X ≤ b) = f (x)dx.
3.3.2 Chi-Squared (χ2 ) Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 a
3.4 Normal Distribution (Gaussian distribution) . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Definitions
3.4.1 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Cumulative Distribution Function (cdf )
3.4.2 Seeing the values from normal distribution table . . . . . . . . . . . . . . . . . . . 18 Continuous Random Variable
3.5 Density of a dependent random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 A function F defined by
A continuous random variable is a variable X that takes all values x in an interval or intervals of real
3.5.1 Chebyshev’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 numbers, and its probability for a particular value is 0.
Z x
3.5.2 Approximation of Binomial distribution by Normal distribution . . . . . . . . . . . 24 F (x) = P (X ≤ x) = f (y)dy
For example, if X denotes the lifetime of a person, then it is a continuous random variable because −∞
3.5.3 Approximation of Poisson distribution by Normal distribution . . . . . . . . . . . . 26 lifetime happens to be continuous, no matter how small or big it is.
3.6 Student t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 is called cumulative distribution function (cdf) of X. See Figure 3.2, where the shaded golden region
3.6.1 Symmetry of the t-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 gives the value of P (X ≤ b) = F (b).
Probability Density Function (pdf )
3.7 F -distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A function f is called probability density function (pdf) of a continuous random variable X if it We notice that Z b Z a
satisfies the following conditions: (i) P (a ≤ X ≤ b) = f (y)dy − f (y)dy = F (b) − F (a).
−∞ −∞
(i) f (x) ≥ 0 for all x. (ii) F 0 (x) = f (x), provided the differentiation is permissible.
Z b
(ii) P (a ≤ X ≤ b) = f (x)dx, i.e., f (x) gives probability of X lying in any given interval [a, b],
Z ∞ a

(iii) f (x)dx = 1.
−∞

We immediately notice the following points.

f (x)
(1) The condition f (x) ≥ 0 implies that the graph of y = f (x) lies on or above x-axis.
Z ∞
(2) The condition f (x)dx = 1 graphically implies that the total area under the curve y = f (x) is
−∞
Z b
1. Therefore, P (a ≤ X ≤ b) = f (x)dx implies the area under the curve y = f (x) from x = a to x = b
a
as shown in Figure 3.1
(3) P (a ≤ X ≤ b) = P (a < X ≤ b) = P (a < X < b) since P (X = a) = 0, P (X = b) = 0. a b
x

Figure 3.2: The shaded golden region gives the probability P (X ≤ b) = F (b).

1 2 3

Expectation 6 Sol. The mean of X is given by


The expectation of a random variable X having density f is defined as Z 2 2
x 4
5 µ= dx = .
0 2 3
Z ∞
E(X) = xf (x)dx.
−∞
4   Z 4/3
4 x 4
P 0≤X≤ = dx = ≈ 0.44.
3 2 9
f (x)

In general, the expectation of H(X), a function of X, is defined as 3 0


Z ∞ Ex. Consider a random variable X with the density function given by
E(H(X)) = H(x)f (x)dx. 2 ( c
−∞ , x≥0
f (x) = 1 + x2
1 0 x<0
Moment Generating Function (mgf ), Moments and Variance
The moment generating function of X is defined as 0 where c is a constant. Show that E(X) does not exist.
0.1 0.5
Z ∞ Z ∞
x
mX (t) = E(eXt ) = ext f (x)dx. Sol. Using the condition f (x)dx = 1, we find c = 2/π.
−∞ Z ∞ Z−∞
Figure 3.3: The area of the shaded golden triangular region gives the total probability. 2 ∞ x
The kth ordinary moment , mean and variance of X are respectively, given by Now xf (x)dx = dx = ∞. So E(X) does not exist.
−∞ π 0 1 + x2
Z ∞  k  Z 0.3
d P (0.2 ≤ X ≤ 0.3) = f (x)dx = 0.1875, which is the area of the shaded golden region as shown in
E(X k ) = xk f (x)dx = [mX (t)] ,
−∞ dtk t=0 0.2
Figure 3.4.
Z ∞
µ = E(X) = xf (x)dx, 6
−∞
Z ∞ Z ∞ 2
5
σ 2 = E(X 2 ) − E(X)2 = x2 f (x)dx − xf (x)dx .
−∞ −∞
4
Some illustrative examples
f (x)

3
Ex. Verify whether the function

12.5x − 1.25, 0.1 ≤ x ≤ 0.5 2
f (x) =
0 elsewhere
1
is density function of X. If so, find F (x), P (0.2 ≤ X ≤ 0.3), µ and σ2. 0.1875

∞ 0.5
0
0.1 0.2 0.3 0.5
Z Z
Sol. Please try the detailed calculations yourself. You will find f (x)dx = f (x)dx = 1. So f x
−∞ 0.1
is a density function. Also, see the shaded golden triangular region under the plot of f (x) in Figure 3.3.
The area of this right angled triangle is 12 (0.5 − 0.1)5 = 1, as expected. Figure 3.4: The shaded golden triangular region gives probability P (0.2 ≤ X ≤ 0.3) = 0.1875.
Further,
 
 0,
Rx x < 0.1  0, x < 0.1 Z 0.5
F (x) = (12.5y − 1.25)dy, 0.1 ≤ x ≤ 0.5 = 6.25x2 − 1.25x + 0.625, 0.1 ≤ x ≤ 0.5 µ= xf (x)dx = 0.3667.
 0.1
1, x > 0.5 1, x > 0.5

0.1
Z 0.5 Z 0.5 2
σ2 = x2 f (x)dx − xf (x)dx = 0.00883.
0.1 0.1
Ex. Show that mean of the random variable X with the density function given by
f (x) = x2 , 0 ≤ x ≤ 2, is 4/3. Find P (0 ≤ X ≤ 43 ).

4 5 6
Z ∞
3.2 Uniform or Rectangular Distribution a uniform distribution on the interval [0, 4]. Then the normalizing constant c is given by the density function condition f (x)dx = 1, which leads
(a) What is the probability density function? −∞
A random variable X is said to have uniform distribution if its density function f (x) is constant for all (b) What is the probability that any given conference lasts at least 3 hours? 1
to c = . Thus, the density function of the gamma random variable X reads
values of x, say, Γ(α)β α
 Sol. (a)
k, a ≤ x ≤ b 1

−x
f (x) =  xα−1 e β , x > 0
0, elsewhere 1 f (x) = Γ(α)β α

f (x) = 4, 0 ≤ x ≤ 4
Z ∞ 0, elsewhere
 0, x≤0
Then the normalizing constant k is given by the density function condition f (x)dx = 1, which leads R4 1 Graphs of several gamma distributions are shown in Figure 3.6.
−∞ (b) P (X ≥ 3) = 3 4 dx = 14 .
1
to k = . Thus the continuous random variable X has the following uniform distribution:
b−a 1.0
 1 3.3 Gamma Distribution α = 1, β =1
f (x) = b−a , a ≤ x ≤ b α = 1, β =2
0, elsewhere 0.8
Consider a Poisson process where Y is the Poisson random variable, and λ is the mean number of Poisson α = 2, β =1
events per unit time. Suppose T is waiting time for the occurrence of the first Poisson event. If no Poisson α = 4, β =1
In this case, the area under the curve is in the form of a rectangle as shown in Figure 3.5. That is 0.6
event occurs in the time interval [0, t], then T > t. It implies that
why the name rectangular is there.

f (x)
e−λt (λt)0
P (T > t) = P (Y = 0) = = e−λt . 0.4
0!
Thus, the cumulative distribution function for T is given by 0.2
1
b−a F (t) = P (0 ≤ T ≤ t) = 1 − P (T > t) = 1 − e−λt .
0.0
Therefore, the density function of T reads 0 1 2 3 4 5 6
f (x)

x
f (t) = F 0 (t) = λe−λt .
Figure 3.6: Graphs of several gamma distributions are shown for certain specified values of the parameters
Now, suppose T is waiting time for the occurrence of two Poisson events. If one Poisson event occurs α and β. The special gamma distribution for which α = 1 is called the exponential distribution.
in the time interval [0, t], then T > t. It implies that
The moment generating function of the gamma random variable can be derived as follows:
0 e−λt (λt)0 e−λt (λt)1
a b P (T > t) = P (Y = 0) + P (Y = 1) = + = e−λt + λte−λt . Z ∞ Z ∞
0! 1! 1 −x 1 (t− 1 )x
x mX (t) = E(etX ) = xα−1 e β etx dx = xα−1 e β dx.
Then, the cdf of T is given by Γ(α)β α 0 Γ(α)β α 0
Figure 3.5: The area of the shaded golden rectangular region gives the total probability 1. Substituting (t − β1 )x = y, and simplifying while using the definition of Gamma function, it is easy to find
F (t) = P (0 ≤ T ≤ t) = 1 − P (T > t] = 1 − e−λt − λte−λt .
You may easily derive the following for the uniform distribution. Therefore, the density function of T reads 1
mX (t) = (1 − βt)−α , where t< .
Z b β
1 b+a f (t) = F 0 (t) = λ2 te−λt .
µ = E(X) = xdx = ,
b−a a 2 Mean, variance and cdf are given by
Z b Z b 2 Thus, in general, if T is the waiting time for α Poisson events, then the density function of T reads  
(b − a)2 d

1 1
σ 2 = E(X 2 ) − E(X)2 = x2 dx − xdx = . 1 1 α α−1 −λt µ = E(X) = mX (t) = αβ,
b−a a b−a a 12 f (t) = λα tα−1 e−λt = λ t e , dt t=0
Z b (α − 1)! Γ(α)
1 bt
e −e at " 2 #
mX (t) = E(etX ) = etx dx = d2
 
.. R ∞ −x α−1 d
b−a a (b − a)t where Γ(α) = 0 e x dx is the gamma function.1 Such a distribution is called gamma distribution. σ 2 = E(X 2 ) − E(X)2 = mX (t) − mX (t) = αβ 2 (α + 1) − (αβ)2 = αβ 2 ,
Formally, a continuous random variable X is said to have gamma distribution with parameters α > 0 dt2 t=0 dt
t=0
0, x<a
 

 1 R  0, x<a and β > 0 if its density function is of the form: 1
Z x
− βy
F (x) = x x−a
, a≤x≤b F (x) = y α−1 e dy.
a dy, a ≤ x ≤ b
= Γ(α)β α
 b−a  b−a
( −x 0
1, x>b cxα−1 e β , x > 0

1, x>b f (x) = Note: Comparing the gamma distribution with it Poisson process appearance, we find β = 1/λ. There-
0, x≤0
Ex. Suppose that a large conference room at a certain company can be reserved for no more than 4 fore, β is the mean time between Poisson events since λ is the mean number occurrences of Poisson events

hours. Both long and short conferences occur quite often. Assume that the length X of a conference has 1
One should remember that Γ(1) = 1, Γ(α) = (α − 1)Γ(α − 1), Γ(1/2) = π and Γ(α) = (α − 1)! when α is an integer. in unit time. In reliability theory, where equipment failure often conforms to this Poisson process, β is

7 8 9

called mean time between failures. Many equipment breakdowns do follow the Poisson process, and thus µ = β, Ex. Based on extensive testing, it is determined that the time Y in years before a major repair is required
the gamma distribution does apply. Other applications include survival times in biomedical experiments for a certain washing machine is characterized by the density function
σ2 = β 2,
and computer response time.
1
Z x
− βy x 1
Ex. In a biomedical study with rats, a dose-response investigation is used to determine the effect of F (x) = e dy = 1 − e
−β
. f (y) = e−y/4 , y ≥ 0.
the dose of a toxicant on their survival time. The toxicant is one that is frequently discharged into the β 0
4
atmosphere from jet fuel. For a certain dose of the toxicant, the study determines that the survival time, Note that Y is an exponential random variable with µ = 4 years. The machine is considered a bargain if
in weeks, has a gamma distribution with α = 5 and β = 10. What is the probability that a rat survives The Memoryless Property of Exponential Distribution it is unlikely to require a major repair before the sixth year.
no longer than 60 weeks? (a) What is the probability P (Y > 6)?
The types of applications of the exponential distribution in reliability and component or machine lifetime
(b) What is the probability that a major repair is required in the first year?
problems are influenced by the memoryless (or lack-of- memory) property of the exponential distribution.
Sol. Let the random variable X be the survival time (time to death). The required probability is For example, in the case of, say, an electronic component where lifetime has an exponential distribution,
Sol. (a) P (Y > 6) = 0.2231.
1
Z 60
1 α−1 −x/β the probability that the component lasts, say, t hours, that is, P (T ≥ t), is the same as the conditional
P (X ≤ 60) = 5 x e dx Thus, the probability that the washing machine will require major repair after year six is 0.223. Of course,
β 0 Γ(5) probability P (T ≥ t0 + t|T ≥ t0 ). For,
it will require repair before year six with probability 0.777. Thus, one might conclude the machine is not
Z 6 P ((T ≥ t0 + t) ∩ (T ≥ t0 )) really a bargain.
1 4 −y P (T ≥ t0 + t|T ≥ t0 ) = .
= y e dy = 0.715. P (T ≥ t0 )
0 Γ(5) (b) The probability that a major repair is necessary in the first year is
Ex. It is known, from previous data, that the length of time in months between customer complaints Notice that both the events (T ≥ t0 + t) and (T ≥ t0 ) can occur if and only if T ≥ t0 + t. Therefore,
about a certain product is a gamma distribution with α = 2 and β = 4. Changes were made to tighten P (Y < 1) = 1 − e−1/4 = 1 − 0.779 = 0.221.
quality control requirements. Following these changes, 20 months passed before the first complaint. Does P (T ≥ t0 + t) 1 − F (t0 + t) e−λ(t0 +t)
P (T ≥ t0 + t|T ≥ t0 ) = = = = e−λt . Ex. Suppose that a system contains a certain type of component whose time, in years, to failure is given
it appear as if the quality control tightening was effective? P (T ≥ t0 ) 1 − F (t0 ) e−λt0
by T . The random variable T is modeled nicely by the exponential distribution with mean time to failure
Sol. We find P (X ≥ 20) = 1 − P (X < 20) = 1 − 0.96 = 0.04. Thus, it is reasonable to conclude that the This shows that the distribution of the remaining lifetime is independent of current age. So if the β = 5. If 5 of these components are installed in different systems, what is the probability that at least 2
quality control work was effective. component “makes it” to t0 hours, the probability of lasting an additional t hours is the same as the are still functioning at the end of 8 years?
Ex. Suppose that telephone calls arriving at a particular switchboard follow a Poisson process with an probability of lasting t hours. There is no “punishment” through wear that may have ensued for lasting
average of 5 calls coming per minute. What is the probability that up to a minute will elapse by the time the first t0 hours. Thus, the exponential distribution is more appropriate when the memoryless property Sol. The probability that a given component is still functioning after 8 years is given by
2 calls have come in to the switchboard? is justified. But if the failure of the component is a result of gradual or slow wear (as in mechanical
1 ∞ −t/5
Z
wear), then the exponential does not apply and either the gamma or the Weibull distribution may be P (T > 8) = e dt ≈ 0.2.
more appropriate. 5 8
Sol. Here the Poisson process applies, with time until 2 Poisson events following a gamma distribution
with β = 1/5 and α = 2. Denote by T the time in minutes that transpires before 2 calls come. The Let X represent the number of components functioning after 8 years. Then using the binomial
required probability is given by distribution with n = 5 and p = 0.2, we have
Z 1
1 −t/β
P (T ≤ 1) = 2
te dt P (X ≥ 2) = 1 − P (X = 0) − P (X = 1) = 0.2627.
0 β
Z 1
= 25 te−5t dt = 1 − e−5 (1 + 5) = 0.96.
0
Note: While the origin of the gamma distribution deals in time (or space) until the occurrence of α
Poisson events, there are many instances where a gamma distribution works very well even though there
is no clear Poisson structure. This is particularly true for survival time problems in both engineering and
biomedical applications.

3.3.1 Exponential Distribution


The special case of gamma distribution with α = 1 (see Figure 3.6) is called exponential distribution.
Therefore, density function of exponential distribution reads as

 1 e −xβ , x > 0, (β > 0),
f (x) = β
 0, x≤0
Its mgf, mean, variance and cdf read as
mX (t) = (1 − βt)−1 ,

10 11 12
3.3.2 Chi-Squared (χ2 ) Distribution 0.12
ν=7
√1
The special case of gamma distribution with β = 2 and α = ν/2, ν being some positive integer (called 0.10
σ 2π

as the degree of freedom), is named as Chi-Squared (χ2 ) distribution. The density function the χ2
random variable with ν degrees of freedom is given by 0.08

f (χ2 )
1 ν −χ2 0.06

f (x)
f (χ2 ) = ν
 ν (χ2 ) 2 −1 e 2 , x > 0.
Γ 2 22 0.04

Graphs of χ2 distributions for certain specified values of the parameter ν are shown in Figure 3.7. 0.02
α = 0.1
0.00
0.5 0 6 12.017 18
ν =1 χ2

ν =2 0
µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ
0.4
ν =3 Figure 3.8: The shaded golden region area is α = 0.1. It is the area under the χ2 curve with 7 degrees of x
ν =4 freedom for χ2 ≥ χ20.1 = 12.017.
0.3 Figure 3.10: The area of the shaded golden region under the normal probability curve gives the total
f (χ2 )

0.12
ν=7 probability 1. The normal probability curve is symmetrical about the vertical red line x = µ. Therefore,
0.2 P (X ≤ µ] = 0.5 = P (X ≥ µ). Also, the maximum value of f (x) occurs at x = µ, and is given by
0.10
1
f (µ) = √ .
0.08 σ 2π
0.1

f (χ2 )
0.06
We have
0.0
Z ∞ −1 x − µ
!2
0.04
0 1 2 3 4 5 6 Z ∞
1
χ2 0.02
0.2
f (x)dx = √ e 2 σ dx
−∞ Z σ 2π −∞
∞ y2
1
Figure 3.7: Graphs of χ2 distributions are shown for certain specified values of the parameter ν. 0.00
0 6 8.383 12.017 18 = √ e− 2σ2 dy, where y = x − µ
χ2
σ 2π Z−∞
∞ y2
2
The mean and variance of the χ2 distribution are µχ2 = ν and σχ2 2 = 2ν. = √ e− 2σ2 dy
σ 2π Z0 √
Figure 3.9: The shaded golden region area is P (8.383 ≤ χ2 ≤ 12.017) = P (χ20.3 ≤ χ2 ≤ χ20.1 ) = P (χ2 ≥ ∞
2 −r − 21 2 y2
The chi-squared distribution plays a vital role in statistical inference. It has considerable applications χ20.3 ) − P (χ2 ≥ χ20.1 ) = 0.3 − 0.1 = 0.2. = √ e r σdr, where 2σ 2 = r
in both methodology and theory. It is an important component of statistical hypothesis testing and σ 2π Z ∞0 2
1 1
estimation. Topics dealing with sampling distributions, analysis of variance, and nonparametric statistics 3.4 Normal Distribution (Gaussian distribution) = √ e−r r− 2 dr
π 0
involve extensive use of the chi-squared distribution. We will see these in the later chapters. 1
Remarkably, when n, np and nq are large, then it can be shown that the binomial distribution is well = √ Γ(1/2)
π
How to see values from the χ2 -distribution table approximated by a distribution of the form: 1 √
=√ π
Table 3.22 gives values of χ2α for various values of α and ν. The areas, α, are the column headings; the 1 x − np
!2 π
  − √ = 1.
degrees of freedom, ν, are given in the first column; and the table entries are the χ2 values. For example, n x n−x 1
f (x) = p q ∼p e 2 npq ,
the χ2 value with 7 degrees of freedom, leaving an area of 0.1 to the right, is χ20.1 = 12.017, as shown in x 2π(npq) Proof of mgf: We have mX (t) = E(etx )
Figure 3.8.
Z ∞ − 1 x − µ +tx
!2
known as the normal distribution.
Ex. Find P (8.383 ≤ χ2 ≤ 12.017) given the χ2 -distribution with 7 degrees of freedom 1 2
Formally, a continuous random variable X is said to follow Normal distribution with parameters µ = √ e σ dx
and σ if its density function is given by σ 2π Z−∞
Sol. From the table 3.22, we see that χ20.3 = 8.383 and χ20.1 = 12.017. It follows that P (8.383 ≤ χ2 ≤ 1 ∞
e− 2σ2 [(x−µ) −2σ tx] dx
1 2 2

12.017) = P (χ20.3 ≤ χ2 ≤ χ20.1 ) = P (χ2 ≥ χ20.3 ) − P (χ2 ≥ χ20.1 ) = 0.3 − 0.1 = 0.2, as shown in Figure 3.9. 1 x−µ
!2 = √
− σ 2π Z−∞
1 ∞
f (x) = √ e 2 σ , −∞ < x < ∞, −∞ < µ < ∞, σ > 0. = √
1
e− 2σ2 [x +µ −2µx−2σ tx] dx
1 2 2 2

σ 2π σ 2π Z−∞

1
e− 2σ2 [x −2(µ+σ t)x+µ ] dx
1 2 2 2

For the normal random variable X, we can verify the following: = √


σ 2π Z−∞

Z ∞ 1 − 12 [(x−(µ+σ 2 t))2 −2µσ 2 t−σ 4 t2 ]
1 2 2
f (x)dx = 1, mX (t) = eµt+ 2 σ t , Mean = µ, Variance = σ 2 . = √ e 2σ dx
−∞
σ 2π −∞

13 14 15

Z ∞
1 2
e− 2σ2 [x−(µ+σ t)] dx
1 2 2 1 2
= √ eµt+ 2 σ t
σ 2π Z−∞ √1
∞ 2 2π
1 µt+ 12 σ 2 t2 − y2
= √ e e 2σ dy, where y = x − (µ + σ 2 t)
σ 2π Z−∞∞ y2
2 1 2 2
= √ eµt+ 2 σ t e− 2σ2 dy

φ(z)
σ 2π Z0 ∞ √
φ(z)

2 1 2 2 1 2 y2
= √ eµt+ 2 σ t e−r r− 2 σdr, where 2σ 2 = r
σ 2π Z ∞0 2 68.26%
1 1 2 2 1
= √ eµt+ 2 σ t e−r r− 2 dr
π 0
1 1 2 2
= √ eµt+ 2 σ t Γ(1/2) −3 −2 −1 0 1 2 3
π z
1 1 2 2√ 0
= √ eµt+ 2 σ t π −3 −2 −1 0 1 2 3
π Figure 3.12: The area of the shaded golden region under the standard normal probability curve gives the
1 2 2 z
= eµt+ 2 σ t probability corresponing to the 1σ confidence interval (µ − σ, µ + σ). So P (µ − σ < X < µ + σ) = P (−1 <
It follows that Z < 1) = 0.6826.
  Figure 3.11: The area of the shaded golden region under the standard normal probability curve gives
d h 1 2 2
i
the total probability 1. The normal probability curve is symmetrical about the vertical red line z = 0.
Mean = E(X) = mX (t) = eµt+ 2 σ t (µ + σ 2 t) =µ.
dt t=0 t=0 Therefore, P (Z ≤ 0) = 0.5 = P (Z ≥ 0). Also, the maximum value of φ(z) occurs at z = 0, and is given
 2 " 2 # 1
by φ(0) = √ .

2 2 d d
Variance = E(X ) − E(X) = mX (t) − mX (t) = µ 2 + σ 2 − µ2 = σ 2 . 2π
dt2 t=0 dt
t=0
Thus, the two parameters µ and σ in the density function of normal random variable X are its mean The normal probability curve is symmetric about the line X = µ (see Figure 3.10) or Z = 0 (see

φ(z)

φ(z)
and standard deviation, respectively. Figure 3.11). Therefore, we have

P (X < 0) = P (X > 0) = 0.5, P (−a < Z < 0) = P (0 < Z < a). 95.44% 99.73%
It deserves mention that the normal distribution is the most important continuous probability dis-
tribution in the entire field of statistics. It is also known as the Gaussian distribution. In fact, the The probabilities of the standard normal variable Z in the probability table of normal distribution
normal distribution was first described by De Moivre in 1733 as the limiting case of Binomial distribution are given in terms of cumulative distribution function Φ(z) = F (z) = P (Z ≤ z) (See the Normal Table −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
when number of trials is infinite. This discovery did not get much attention. Around fifty years later, 3.21). So we have z z
Laplace and Gauss rediscovered normal distribution while dealing with astronomical data. They found
that the errors in astronomical measurements are well described by normal distribution. It approximately P (a < Z < b) = P (Z < b) − P (Z < a) = F (b) − F (a). Figure 3.13: Left panel: The area of the shaded golden region under the standard normal probability
describes many phenomena that occur in nature, industry, and research. For example, physical measure-
curve gives P (µ − 2σ < X < µ + 2σ) = P (−2 < Z < 2) = 0.9544. Right panel: The area of the shaded
ments in areas such as meteorological experiments, rainfall studies, and measurements of manufactured From the normal table, it can be found that
golden region under the standard normal probability curve gives P (µ − 3σ < X < µ + 3σ) = P (−3 <
parts are often more than adequately explained with a normal distribution. In addition, errors in sci-
P (|X −µ| < σ) = P (µ−σ < X < µ+σ) = P (−1 < Z < 1) = F (1)−F (−1) = 0.8413−0.1587 = 0.6826. Z < 3) = 0.9973.
entific measurements are extremely well approximated by a normal distribution.The normal distribution
finds enormous application as a limiting distribution. For instance, under certain conditions, the normal
This shows that there is approximately 68% probability that the normal variable X lies in the interval
distribution provides a good continuous approximation to the binomial and hypergeometric distributions. 3.4.2 Seeing the values from normal distribution table
(µ − σ, µ + σ), as shown in Figure 3.12. We call this interval as the 1σ confidence interval of X.
Similarly, the probabilities of X in 2σ and 3σ confidence intervals are respectively, are given by As mentioned earlier, the Normal Table 3.21 provides values of cdf F (z) of the standard normal variable
Note: If X is a normal random variable with mean µ and variance σ 2 , then we write X ∼ N (µ, σ 2 ).
Z. This table provides values of F (z) from z = −3.49 to z = 3.49 with F (−3.49) = 0.0002 and F (3.49) =
P (|X − µ| < 2σ) = P (µ − 2σ < X < µ + 2σ) = P (−2 < Z < 2) = 0.9544, 0.9998. Thus it covers almost 100% region under the normal probability curve. The normal table reads
3.4.1 Standard Normal Distribution like the following table.
P (|X − µ| < 3σ) = P (µ − 3σ < X < µ + 3σ) = P (−3 < Z < 3) = 0.9973.
Let Z = X−µσ . Then E(Z) = 0 and Var(Z) = 1. We call Z as the standard normal variate and we write
Here the first column shows the z values upto first decimal place: −3.4, −3.3,..., 3.3, 3.4 and the
Z ∼ N (0, 1). Its density function reads as For geometrical clarity, see the left and right panels in Figure 3.13. second column shows the corresponding F (z) values. The third column shows the values of F (z) cor-
responding to the z values: −3.41, −3.31, ..., 3.31, 3.41, so on and so forth. For instance, F (3.42) = 0.9997.
z2
1 −
φ(z) = √ e 2 , −∞ < z < ∞. Note. In the following examples, we will refer to the Normal Table 3.21 whenever the values of F (z) are

needed.
The corresponding cumulative distribution function is given by

z z
z2
1
Z Z

Φ(z) = φ(z)dz = √ e 2 dz. Ex. A random variable X is normally distributed with mean 9 and standard deviation 3. Find P (X ≥ 15),
−∞ 2π −∞

16 17 18
z 0.00 0.01 0.02 ... 0.09 Ex. An electrical firm manufactures light bulbs that have a life, before burn-out, that is normally dis- Likewise, if Y = g(X) is an increasing function of X, we find
tributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that FY (y) = P (Y ≤ y) = P (g(X) ≤ y)
−3.4 0.0003 0.0003 0.0003 ... 0.0002
a bulb burns between 778 and 834 hours. = P (g −1 (g(X)) ≤ g −1 (y)) (∵ g is increasing function, so is g −1 .)
−3.3 0.0005 0.0005 0.0005 ... 0.0003 = P (X ≤ g −1 (y))
Sol. P (778 < X < 834) = P (−0.55 < Z < 0.85) = P (Z < 0.85) − P (Z < −0.55) = 0.8023 − 0.2912 =
... ... ... ... ... ... 0.5111. ∴ FY (y) = FX (g −1 (y)).
0 0.5000 0.5040 0.5080 ... 0.5359
Ex. In an industrial process, the diameter of a ball bearing is an important measurement. The buyer sets It leads to
... ... ... ... ... ... specifications for the diameter to be 3.0± 0.01 cm. The implication is that no part falling outside these −1
specifications will be accepted. It is known that in the process the diameter of a ball bearing has a normal fY (y) = fX (g −1 (y)) dg dy(y) .
3.3 0.9995 0.9995 0.9995 ... 0.9997
distribution with mean µ = 3 and standard deviation σ = 0.005. On average, how many manufactured
3.4 0.9997 0.9997 0.9997 ... 0.9998 ball bearings will be scrapped? Ex. (Lognormal Distribution) If a random variable X follows normal distribution, then the distribution
of the random variable Y = eX is called lognormal distribution. Determine cdf, pdf, mean and variance
Sol. P (X < 2.99) + P (X > 3.01) = P (Z < −2) + P (Z > 2) = 2(0.0228) = 0.0456. of Y . Also, find P (a ≤ Y ≤ b)
P (X ≤ 15) and P (0 ≤ X ≤ 9). As a result, it is anticipated that, on average, 4.56% of manufactured ball bearings will be scrapped.
Sol. Here Y = eX is increasing function of X. So the cdf of Y is given by
Sol. We have Z = X−9
∴ P (X ≥ 15) = P (Z ≥ 2) = 1 − F (2) = 1 − 0.9772 = 0.0228. Ex. Gauges are used to reject all components for which a certain dimension is not within the specification
3 .
1.5 ± d. It is known that this measurement is normally distributed with mean 1.5 and standard deviation FY (y) = FX (ln y).
P (X ≤ 15) = 1 − 0.0228 = 0.9772 0.2. Determine the value d such that the specifications cover 95% of the measurements.
It implies that

P (0 ≤ X ≤ 9) = P (−3 ≤ Z ≤ 0) = F (0) − F (−3) = 0.5 − 0.0013 = 0.4987. Sol. From the Normal Table 3.21, we notice that P (Z < −1.96) = 0.025. So by symmetry of normal 1
distribution, it follows that FY0 (y) = FX0 (ln y). .
y
Ex. Given a normal distribution with µ = 40 and σ = 6, find the value of x that has P (−1.96 < Z < 1.96) = 0.95. Therefore
(a) 45% of the area to the left and 1
(1.5 + d) − 1.5 =⇒ fY (y) = fX (ln y). .
(b) 14% of the area to the right. 1.96 = . y
0.2
Therefore the density function of the lognormal random variable Y = eX reads
Sol. (a) We require a z value that leaves an area of 0.45 to the left. From the Normal Table 3.21, we find So we get, d = (0.2)(1.96) = 0.392.
1

1 ln y−µ 2
P (Z < −0.13) = 0.45, so the desired z value is −0.13. Hence,  √ e− 2 ( σ ) , y > 0
fY (y) = yσ 2π
x = σz + µ = (6)(−0.13) + 40 = 39.22. 3.5 Density of a dependent random variable 
0, y≤0

(b) This time we require a z value that leaves 0.14 of the area to the right and hence an area of 0.86 to Let X be a continuous random variable with density fX . Let Y be some random variable dependent on where µ and σ are mean and variance of the normal random variable X. The mean and variance of the
the left. Again, from Normal Table, we find P (Z < 1.08) = 0.86, so the desired z value is 1.08 and X via the relation Y = g(X), where g is strictly monotonic and differentiable. Then it can be proved lognormal random variable Y can be shown to be
that the density fY of Y is given by
1 2
x = σz + µ = (6)(1.08) + 40 = 46.48. −1 E(Y ) = eµ+ 2 σ ,
dg (y)
fY (y) = fX (g −1 (y)) .
2µ+σ 2 2
Ex. In a normal distribution, 12% of the items are under 30 and 85% are under 60. Find the mean and dy V (Y ) = e (eσ − 1).
standard deviation of the distribution.
Proof. Assuming that Y = g(X) is decreasing function of X, we have We can determine the probability of lognormal random variable Y using the normal distribution table
Sol. Let µ be mean and σ be standard deviation of the distribution. Given that P (X < 30) = 0.12 and of X since
P (X < 60) = 0.85. Let z1 and z2 be values of the standard normal variable Z corresponding to X = 30 FY (y) = P (Y ≤ y] = P (g(X) ≤ y)
= P (g −1 (g(X)) ≥ g −1 (y)) (∵ g is decreasing function, so is g −1 .) P (a ≤ Y ≤ b) = P (a ≤ eX ≤ b) = P (ln a ≤ X ≤ ln b).
and X = 60 respectively so that P (Z < z1 ) = 0.12 and P (Z < z2 ) = 0.85. From the Normal Table 3.21,
we find z1 ≈ −1.17 and z2 ≈ 1.04 since F (−1.17) = 0.121 and F (1.04) = 0.8508. = P (X ≥ g −1 (y))
Thus, the probability of Y in the interval [a, b] is given by the probability of the normal random variable
Finally, solving the equations, 30−µ = −1.17 and 60−µ = 1.04, we find µ = 45.93 and σ = 13.56. = 1 − P (X ≤ g −1 (y))
σ σ X in the interval [ln a, ln b].
= 1 − FX (g −1 (y)).
Ex. A certain type of storage battery lasts, on average, 3.0 years with a standard deviation of 0.5 year.
Assuming that battery life is normally distributed, find the probability that a given battery will last less Since derivative of cdf gives density function, so derivative of both sides with respect to y gives
than 2.3 years. −1
−1
fY (y) = −fX (g −1 (y)) dg dy(y) = fX (g −1 (y)) dg dy(y) ,

Sol. P (X < 2.3) = P (Z < −1.4) = 0.0808.
−1
dg (y) dg −1 (y) −1
where dy = − dy , g being a decreasing function.

19 20 21

3.5.1 Chebyshev’s Inequality Ex. Number of students visiting a zoo on weekend is a random variable with mean 18 and S.D. 2.5. Use Ex. If X is Gamma random variable with α = 0.05 and β = 100, find an upper bound on P ((X − 4)(X −
Chebyshev’s inequality to estimate the minimum probability that between 8 to 28 students will visit the 6) ≥ 999).
If X is normal random variable with mean µ and variance σ 2 , then P (|X − µ| < kσ) = P (|Z| < k) =
zoo on a given weekend?
F (k) − F (−k). However, if X is any random variable, then the rule of thumb for the required probability
Sol. We have
is given by the Chebyshev’s inequality as stated below.
Sol. Let X be the number of students visiting the zoo on weekend. Then mean and S.D. of X are µ = 18 P ((X − 4)(X − 6) ≥ 999)
and σ = 2.5, respectively. So by Chebyshev’s inequality, we have = P ((X − 5 + 1)(X − 5 − 1) ≥ 999)
If X is a random variable with mean µ and variance σ2,then
= P ((X − 5)2 − 1 ≥ 999)
1 1
P (|X − µ| < kσ) ≥ 1 − 2 . P (18 − 2.5k < X < 18 + 2.5k)) ≥ 1 − . = P ((X − 5)2 ≥ 1000)
k k2 2
= 1 − P ((X −√5) < 1000) √
Proof. By definition of variance, we have Choosing k = 4, we get = 1 − P (−10 10√ < (X − 5) < 10 √10)
Z ∞ = 1 − P (5 − 10 10 < X < 5 + 10 10).
1
(x − µ)2 f (x)dx = σ 2 P (8 < X < 28) ≥ 1 − = 0.9375.
−∞ 16
Z µ−kσ Z µ+kσ Z ∞ Given that X is Gamma random
p variable
p with α = 0.05
√ and β = 100. So its mean is µ = αβ =
=⇒ (x − µ)2 f (x)dx + (x − µ)2 f (x)dx + (x − µ)2 f (x)dx = σ 2 So the required minimal probability is 0.9375. 0.05(100) = 5, and S.D. is σ = αβ 2 = 0.05(100)2 = 10 5. So by Chebyshev’s inequality, we have
Z−∞
µ−kσ Z µ−kσ

µ+kσ
Z µ+kσ 1 √ √ 1
2 2 2 Ex. If X is geometric random variable with density f (x) = 2x , (x = 1, 2, 3, ...), find P (|X − 2| < 2). P (5 − 10 5k < X < 5 + 10 5k) ≥ 1 − 2 .
=⇒ (x − µ) f (x)dx + (x − µ) f (x)dx ≤ σ (∵ (x − µ)2 f (x)dx ≥ 0) k
Also, use Chebyshev’s inequality to estimate P (|X − 2| < 2).
Z−∞
µ−kσ Z ∞ µ+kσ µ−kσ

=⇒ 2 2
k σ f (x)dx + 2 2 2
k σ f (x)dx ≤ σ (∵ (x − µ) ≥ k σ 2 for x ≤ µ − kσ or x ≥ µ + kσ)
2 2 Choosing k = 2, we get
Sol. We have
Z−∞
µ−kσ Z ∞ µ+kσ √ √ 1
1 P (5 − 10 10 < X < 5 + 10 10) ≥ 1 − = 0.5.
=⇒ f (x)dx + f (x)dx ≤ 2 P (|X − 2| < 2) = P (−2 < X − 2 < 2) = P (0 < X < 4) 2
−∞ µ+kσ k
Z µ−kσ Z ∞
1 1 1 1 7 It follows that
=⇒ 1 − f (x)dx − f (x)dx ≥ 1 − 2 = P (X = 1) + P (X = 2) + P (X = 3) = + 2 + 3 = = 0.875
k 2 2 2 8 √ √
Z ∞ −∞ Z µ−kσ µ+kσ Z ∞ Z ∞ 1
q
q
√ P ((X − 4)(X − 6) ≥ 999) = 1 − P (5 − 10 10 < X < 5 + 10 10) ≤ 1 − 0.5 = 0.5.
1 Now X is geometric random variable with mean µ = p = 2 and S.D. σ = p2
= 2. So by Chebyshev’s
=⇒ f (x)dx − f (x)dx − f (x)dx ≥ 1 − 2 (∵ f (x)dx = 1)
−∞ −∞ µ+kσ k −∞ inequality, we have
Z µ+kσ
1 3.5.2 Approximation of Binomial distribution by Normal distribution
=⇒ f (x)dx ≥ 1 − 2 . √ √ 1
µ−kσ k P (2 − 2k < X < 2 + 2k) ≥ 1 − . If X is a Binomial random variable with parameters n and p, then X approximately follows a normal
1 k2
=⇒ P (µ − kσ < X < µ + kσ) ≥ 1 − 2 . distribution with mean np and variance np(1 − p) provided n is large. Here the word “large” is quite

k Choosing k = 2, we get vague. In strict mathematical sense, large n means n → ∞. However, for most of the practical purposes,
1
=⇒ P (|X − µ| < kσ) ≥ 1 − 2 . the approximation is acceptable if the values of n and p are such that either p ≤ 0.5 and np > 5 or p > 0.5
k 1
P (0 < X < 4) ≥ 1 − = 0.5. and n(1 − p) > 5.
Note that the Chebyshev’s inequality does not yield the exact probability of X to lie in the interval 2 It turns out that the normal distribution with µ = np and σ 2 = np(1 − p) not only provides a very
(µ − kσ, µ + kσ) rather it gives the minimum probability for the same. However, in case of normal random Ex. How many times should we toss a fair coin so that at least 0.99 probability is ensured that the accurate approximation to the binomial distribution when n is large and p is not extremely close to 0 or
variable, the probability obtained is exact. For example, consider the 2σ interval (µ − 2σ, µ + 2σ) for X. proportion of heads occurs between 0.45 and 0.55? 1 but also provides a fairly good approximation even when n is small and p is reasonably close to 0.5.
Then, Chebyshev’s inequality gives P (|X − µ| < 2σ) ≥ 1 − 14 = 0.75. In case, X is normal variable, we Sol. Let n be number of tosses, and X be the number of heads. Then the proportion of heads is X/n. Ex. To illustrate the normal approximation to the binomial distribution, in Fig. 3.14, we first draw the
get the exact probability P (|X − µ| < 2σ) = 0.9544. However, the advantage of Chebyshev’s inequality is So we need to determine n such that P (0.45 < X/n < 0.55) ≥ 0.99. histogram for a binomial distribution with n = 4 and p = 0.5 and then superimpose the particular normal
√ √
that it applies to any random variable of known mean and variance. Also note that the above proof may Here X follows binomial distribution with p = 0.5, mean µ = np = 0.5n and S.D. σ = npq = 0.5 n. curve having the same mean and variance as the binomial variable X. Hence, we draw a normal curve
be done for discrete random variable as well. So the Chebyshev’s inequality is true for discrete random So by Chebyshev’s inequality, we have with µ = np = (4)(0.5) = 2 and σ 2 = npq = (4)(0.5)(0.5) = 1.
variable as well. √ √ 1
P (0.5n − k(0.5) n < X < 0.5n + k(0.5) n) ≥ 1 − 2 . Now suppose we wish to calculate P (1 ≤ X ≤ 3). From the binomial distribution, we have
k
Ex. A random variable X with unknown probability distribution has mean 8 and S.D. 3. Use Cheby-
√ 4 6 4 14
shev’s inequality to find a lower bound of P (−7 < X < 23). Choosing k = 0.1 n, we get P (1 ≤ X ≤ 3) = P (X = 1) + P (X = 2) + P (X = 3) = + + = = 0.875.
16 16 16 16
100
Sol. Here µ = 8 and σ = 3. So by Chebyshev’s inequality, we have P (0.45n < X < 0.55n) ≥ 1 − . Note that geometrically in the histogram, 164 6
+ 16 4
+ 16 is the sum of the areas of the vertical bars each of
n
1 width unity with centers at 1, 2, 3. We see that it can be approximated by the area under the blue curve
P (8 − 3k < X < 8 + 3k) ≥ 1 − 2 . 100
k =⇒ P (0.45 < X/n < 0.55) ≥ 1 − . from X = 0.5 to X = 3.5. So using the normal distribution approximation, we have
In order to get lower bound of P (−7 < X < 23), we choose k = 5. We get n
1 So the required condition P (0.45 < X/n < 0.55) ≥ 0.99 is satisfied if 1 − 100
n ≥ 0.99, that is, if n ≥ 10000.
P (0.5 ≤ X ≤ 3.5) = P (−1.5 ≤ Z ≤ 1.5) = F (1.5) − F (−1.5) = 0.9332 − 0.0668 = 0.8664,
P (−7 < X < 23) ≥ 1 − = 0.96.
25

22 23 24
3.5.3 Approximation of Poisson distribution by Normal distribution 3.6 Student t-Distribution
Let X be a Poisson random variable with parameter λs. Then for large λs, X is approximately normal If Z is a standard normal variable and χ2ν is anpindependent chi-squared random variable with ν degrees
with mean λs and variance λs. It follows that the Poisson probabilities can be approximated by normal of freedom, then the random variable Tν = Z/ χ2ν /ν is said to follow the Student t-distribution2 with ν
distribution N (λs, λs), by using the 0.5 unit correction on both sides of the given range of X as we did degrees of freedom.
in the binomial case. The density function of a Tν random variable reads as
−(ν+1)/2
t2

Γ[(ν + 1)/2]
f (t) = √ 1+ , −∞ < t < ∞.
Γ(ν/2) πν ν
Since f (−t) = f (t), the graph of this density function is symmetric about the line t = 0 (see Figure 3.15).
ν
Further, its mean is µt = 0 and variance is σt2 = ν−2 , (ν > 2). So σt2 tends to 1 as ν tends to ∞. Thus the
t-distribution tends to the standard normal distribution as the number of degrees of freedom ν increases.

0.40
ν =1
0.35
ν =2
0.30 ν =3
ν =4
0.25

f (t)
0.20
Figure 3.14: Histogram of the binomial distribution with n = 4, p = 0.5 where X takes the values
0.15
0, 1, 2, 3, 4 with probabilities 1/16, 4/16, 6/16, 4/16 and 1/16, respectively. The blue curve is the normal
probability curve with µ = np = (4)(0.5) = 2 and σ 2 = npq = (4)(0.5)(0.5) = 1. 0.10

0.05
which is a good approximation to the binomial probability 0.875. Note that while approximating the 0.00
probability using the normal distribution, we make half-unit correction on both sides of the given range −6 −4 −2 0 2 4 6
of X. t

Ex. The probability that a patient recovers from a rare blood disease is 0.4. If 100 people are known to Figure 3.15: The Student t-distributions are plotted for some specific degrees of freedom. We see that
have contracted this disease, what is the probability that fewer than 30 survive? the t-distribution is symmetric about the vertical line t = 0.
√ p
The t-distribution is used extensively in problems that deal with inference about the population mean
Sol. Here µ = np = (100)(0.4) = 40, σ = npq = (100)(0.4)(0.6) = 4.899. We need the area to the
left of x = 29.5. The corresponding z value is z = (29.5 − 40)/4.899 = 2.14. Therefore, the required or in problems that involve comparative samples (i.e., in cases where one is trying to determine if means
probability is P (X < 30) ≈ P (Z < −2.14) = 0.0162. from two samples are significantly different).

Note. In the above example, the binomial random variable X < 30 implies that X takes the values 0, 1, How to see values from the t-distribution table
2,...., 29. So in the normal approximation, we should have chosen P (−0.5 ≤ X ≤ 29.5). But notice that Table 3.23 gives values of tα for various values of α and ν. The areas, α, are the column headings; the
P (−0.5 ≤ X ≤ 29.5) ≈ P (X < 30) since P (−8.27 ≤ Z ≤ 2.14) ≈ P (Z < 2.14). degrees of freedom, ν, are given in the first column; and the table entries are the t values. Therefore it is
Ex. A multiple-choice quiz has 200 questions, each with 4 possible answers of which only 1 is correct. customary to let tα represent the t-value above which we find an area equal to α.
What is the probability that sheer guesswork yields from 25 to 30 correct answers for the 80 of the 200
problems about which the student has no knowledge? For example, the t-value with 10 degrees of freedom leaving an area of 0.1 to the right is t0.1 = 1.372.,
√ p as shown in Figure 3.16.
Sol. Here µ = np = (80)(0.25) = 20, σ = npq = (80)(0.25)(0.75) = 3.873. We need the area
between x1 = 24.5 and x2 = 30.5. The corresponding z values are z1 = (24.5 − 20)/3.873 = 1.16 and
z2 = (30.5 − 20)/3.873 = 2.71. Therefore, the required probability is P (25 ≤ X ≤ 30) ≈ P (1.16 < Z < Ex. Find P (0.26 ≤ t ≤ 1.812) given the t-distribution with 7 degrees of freedom
2.71) = P (Z < 2.71) − P (Z < 1.16) = 0.9966 − 0.8770 = 0.1196.
2
The probability distribution of T was first published in 1908 in a paper written by W. S. Gosset. At the time, Gosset was
employed by an Irish brewery that prohibited publication of research by members of its staff. To circumvent this restriction,
he published his work secretly under the name “Student”. Consequently, the distribution of T is usually called the Student
t-distribution or simply the t-distribution.

25 26 27

0.40 3.6.1 Symmetry of the t-distribution 3.7 F -distribution


ν = 10
0.35 Since the t-distribution is symmetric about the mean zero, we have t1−α = −tα ; that is, the t-value Let χ21 and χ22 be two independent random variables having chi-squared distributions with ν1 and ν2
0.30
leaving an area of 1 − α to the right and therefore an area of α to the left is equal to the negative t-value degrees of freedom, respectively. Then the distribution of the random variable
that leaves an area of α in the right tail of the distribution. For example, t0.95 = −t0.05 . In particular,
0.25 t0.95 = −t0.05 = −1.812 for 10 degrees of freedom as shown in Figure 3.18. χ21 /ν1
F =
χ22 /ν2
f (t)

0.20
0.40
is given by the density function:
0.15 ν = 10
0.35
Γ[(ν1 + ν2 )/2](ν1 /ν2 )ν1 /2 f (ν1 /2)−1
0.10 h(f ) = , f > 0,
0.30 Γ(ν1 /2)Γ(ν2 /2) (1 + ν1 f /ν2 )(ν1 +ν2 )/2
0.05
α = 0.1
0.25 and h(f ) = 0 for f ≤ 0.
0.00
f (t)

-4 -1.372 0 1.372 4 0.20


t This is known as the F -distribution with ν1 and ν2 degrees of freedom.
0.15
1.0
Figure 3.16: The shaded golden region area is α = 0.1. It is the area under the t-distribution curve with 0.10 ν1 = 2, ν2 = 11
10 degrees of freedom for t ≥ t0.1 = 1.372. 0.05 ν1 = 6, ν2 = 10
0.8
0.05 0.05 ν1 = 10, ν2 = 3
0.00
Sol. From the table 3.23, we see that t0.4 = 0.26 and t0.05 = 1.812. It follows that P (0.26 ≤ t ≤ 1.812] = -4 −1.812 0 1.812 4
t0.95 t0.05 0.6
P (t0.4 ≤ t ≤ t0.05 ) = P (t ≥ t0.4 ) − P (t ≥ t0.4 ] = 0.4 − 0.05 = 0.35, as shown in Figure 3.17.
t

h(f )
0.40 0.4
ν = 10 Figure 3.18: t0.95 = −t0.05 = −1.812 for 10 degrees of freedom.
0.35

0.30 0.2

0.25
0.0
0 1 2 3 4 5 6
f (t)

0.20
f
0.15

0.10 Figure 3.19: The F -distributions are plotted for some specific degrees of freedom.
0.35
0.05 The F -distribution is used in two-sample situations to draw inferences about the population vari-
0.00 ances. However, the F -distribution can also be applied to many other types of problems involving sample
-4 -1.812 0.26 1.812 4 variances. In fact, the F -distribution is called the variance ratio distribution.
t

Some properties of F -distribution


Figure 3.17: The shaded golden region area is P (0.26 ≤ t ≤ 1.812) = P (t0.4 ≤ t ≤ t0.05 ) = P (t ≥
t0.4 ) − P (t ≥ t0.4 ) = 0.4 − 0.05 = 0.35. (i) The mean of F -distribution is
ν2
µf = (ν2 > 2).
ν2 − 2
It is independent of ν1 .
(ii) The variance of the F -distribution is
2ν22 (ν1 + ν2 − 2)
σf2 = , (ν2 > 4).
ν1 (ν2 − 2)2 (ν2 − 4)
(iii) The t-distribution and F -distribution are related. If a statistic t follows t-distribution with ν
degrees of freedom, then t2 follows F -distribution with (1, ν) degrees of freedom.
(iv) For large values of the degrees of freedom
 ν1 and ν2 , the F -distribution tends to the normal
distribtion with mean 1 and variance 2 ν11 + ν12 .

28 29 30
How to see values from the F -distribution table
Let fα be the f -value above which we find an area equal to α just as in case of t-distribution. Table 3.24
gives values of fα only for α = 0.05 for various combinations of the degrees of freedom ν1 and ν2 . Hence,
the f -value with 6 and 10 degrees of freedom, leaving an area of 0.05 to the right, is f0.05 = 3.22, as shown
in Figure 3.20.

0.7 ν1 = 6, ν2 = 10

0.6

0.5

h(f ) 0.4

0.3

0.2

0.1
α = 0.05
0.0
0 1 2 3.22 4 5 6
f

Figure 3.20: The shaded golden region area is α = 0.1. It is the area under the F -distribution curve with
6 and 10 degrees of freedom for f ≥ f0.05 = 3.22.

Likewise, Table 3.25 gives values of fα only for α = 0.01 for various combinations of the degrees of
freedom ν1 and ν2 .
By means of the following theorem, the F -distribution Tables can also be used to find values of f0.95
and f0.99 .
1
f1−α (ν1 , ν2 ) = .
fα (ν1 , ν2 )

Thus, the f -value with 6 and 10 degrees of freedom, leaving an area of 0.95 to the right, can be calculated
as
1 1
f0.95 (6, 10) = = = 0.246.
f0.05 (10, 6) 4.06

Figure 3.22: The χ2 Table


31 Figure 3.21: The
32 Normal Table 33

Figure 3.23: The t-distribution Table


Figure 3.24: The F -distribution Table Figure 3.25: The F -distribution Table

34 35 36
4.1.3 Marginal density functions
The marginal density of X, denoted by fX , is defined as
X
fX (x) = f (x, y).
Y =y

Contents Chapter 4 Similarly, the marginal density of Y , denoted by fY , is defined as


X
fY (y) = f (x, y).
Joint Probability Distributions X=x

4 Joint Probability Distributions 2 4.1.4 Expectation


4.1 Discrete Bivariate Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The expectation or mean of X is defined as
4.1.1 Joint probability mass function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Note: These lecture notes aim to present a clear and crisp presentation of some topics in Probability and
4.1.2 Cumulative distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
X X X X X
Statistics. Comments/suggestions are welcome via the e-mail: sukuyd@gmail.com to Dr. Suresh Kumar. µX = E[X] = xf (x, y) = x f (x, y) = xfX (x).
4.1.3 Marginal density functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 X=x Y =y X=x Y =y X=x
4.1.4 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4.1.5 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4.1 Discrete Bivariate Random Variable In general, the expectation of a function of X and Y , say H(X, Y ), is defined as
4.1.6 Independent random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 X X
4.2 Continuous Bivariate Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 So far we have studied a single random variable either discrete or continuous. Such random variables E[H(X, Y )] = H(x, y)f (x, y).
4.2.1 Joint probability density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 are called univariate. Problems do arise where we need to study two random variables simultaneously. X=x Y =y
4.2.2 Distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 For example, we may wish to study the heights and weights of a group of students up to the age of 20
4.2.3 Marginal density functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 years. Typical questions to ask are, “What is the average height of students of age less than or equal to 4.1.5 Covariance
4.2.4 Independent random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 18 years?” or, “Is the height independent of weight?”. To answer this type of questions, we need to study
If µX and µY are the means of X and Y respectively, then covariance of X and Y , denoted by Cov(X, Y )
4.2.5 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 what are called two-dimensional or bivariate random variables.
is defined as
4.2.6 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Let X and Y be two discrete random variables. Then the ordered pair (X, Y ) is called a two dimen-
X X X X
4.3 Pearson coefficient of correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Cov(X, Y ) = E[(X −µX )(Y −µY )] = E[XY ]−E[X]E[Y ] = xf (x, y)− xfX (x) yfY (y).
sional or bivariate discrete random variable. X=x Y =y X=x X=y

4.1.1 Joint probability mass function 4.1.6 Independent random variables


A function f such that The discrete random variables X and Y are said to be independent if and only if
X X
f (x, y) ≥ 0, f (x, y) = P [X = x, Y = y], f (x, y) = 1
f (x, y) = fX (x)fY (y) for all (x, y).
X=x Y =y

is called joint probability mass function of (X, Y ). Ex. Suppose in a random experiment of toss of two fair coins, X denotes the number of heads and Y
Suppose X assumes the m values x1 , x2 , ..., xm and Y assumes the n values y1 , y2 , ..., yn . Then it is denotes the number of tails.
convenient and informative to write the joint distribution in the following tabular form: (i) Find the joint density f (x, y) of (X, Y ).
(ii) Find the cdf F (x, y).
X/Y y1 y2 ... yn (iii) Find the marginal densities fX (x) and fY (y).
x1 f (x1 , y1 ) f (x2 , y2 ) ... f (xn , yn ) (iv) Are the variables X and Y independent?
x2 f (x2 , y1 ) f (x2 , y2 ) ... f (x2 , yn ) (v) Find Cov(X, Y ).
... ... ... ... ...
xm f (xm , y1 ) f (xm , y2 ) ... f (xm , yn ) Sol. (i) The sample space of the given random experiment is

S = {HH, HT, T H, T T }.

4.1.2 Cumulative distribution function Given that X denotes the number of heads and Y denotes the number of tails. So X = 0, 1, 2 and
Y = 0, 1, 2. The joint pmf f (x, y) is given by
The cdf of (X, Y ) is given by f (0, 0) = P [X = 0, Y = 0] = 0,
X X
F (x, y) = P [X ≤ x, Y ≤ y] = f (x, y). f (0, 1) = P [X = 0, Y = 1] = 0,
X≤x Y ≤y f (0, 2) = P [X = 0, Y = 2] = 1/4,

1 2 3

f (1, 0) = P [X = 1, Y = 0] = 0, Y 0 1 2 X/Y 0 1 2 fX (x)


3 6 1 10
f (1, 1) = P [X = 1, Y = 1] = 1/2, fY (y) 1/4 1/2 1/4 0 28 28 28 28
9 6 15
f (1, 2) = P [X = 1, Y = 2] = 0, 1 28 28 0 28
X/Y 0 1 2 fX (x) 3 3
f (2, 0) = P [X = 2, Y = 0] = 1/4, 2 28 0 0 28
0 0 0 1/4 1/4 15 12 1
f (2, 1) = P [X = 2, Y = 1] = 0, fY (y) 28 28 28 1
f (2, 2) = P [X = 2, Y = 2] = 0. 1 0 1/2 0 1/2
2 1/4 0 0 1/4
So in the tabular form, f (x, y) is given by fY (y) 1/4 1/2 1/4 1
Ex. In an automobile plant, two tasks are performed by robots, the welding of two joints and tightening
X/Y 0 1 2 of three bolts. Let X denote the number of defective joints and Y denote the number of improperly
0 0 0 1/4 (iv) From the table of joint and marginal densities, we find that f (0, 0) = 0, fX (0) = 1/4 and fY (0) = 1/4. tightened bots produced per car. The probabilities of (X, Y ) are given in the following table.
1 0 1/2 0 It follows that f (0, 0) 6= fX (0)fY (0). So X and Y are not independent.
X/Y 0 1 2 3 fX (x)
2 1/4 0 0
(v) We find 0 0.84 0.03 0.02 0.01 0.9
2 1 0.06 0.01 0.008 0.002 0.08
(ii) The cdf F (x, y) = P [X ≤ x, Y ≤ y] is given by X
E[X] = xfX (x) = 1. 2 0.01 0.005 0.004 0.001 0.02
X=0 fY (y) 0.91 0.045 0.032 0.013 1
F (0, 0) = f (0, 0) = 0, 2
X
F (0, 1) = f (0, 0) + f (0, 1) = 0, E[Y ] = yfY (y) = 1.
F (0, 2) = f (0, 0) + f (0, 1) + f (0, 2) = 1/4, Y =0
(i) Is it a joint pmf?
2 2
F (1, 0) = f (0, 0) + f (1, 0) = 0, X X
(ii) Find the probability that there would be exactly one error made by the robots.
F (1, 1) = f (0, 0) + f (0, 1) + f (1, 0) + f (1, 1) = 1/2, E[XY ] = xyf (x, y) = 1/2.
X=0 Y =0 (iii) Find the probability that there would be no improperly tightened bolts.
F (1, 2) = f (0, 0) + f (0, 1) + f (0, 2) + f (1, 0) + f (1, 1) + f (1, 2) = 3/4, For an easy calculation of E[XY ], write the table showing, all the XY product values as: (iv) Are the variables X and Y independent?
F (2, 0) = f (0, 0) + f (1, 0) + f (2, 0) = 1/4, (v) Find Cov(X, Y ).
F (2, 1) = f (0, 0) + f (0, 1) + f (1, 0) + f (1, 1) + f (2, 0) + f (2, 1) = 3/4, X/Y 0 1 2
F (2, 2) = f (0, 0) + f (0, 1) + f (0, 2) + f (1, 0) + f (1, 1) + f (1, 2) + f (2, 0) + f (2, 1) + f (2, 2) = 1. 0 0 0 0 Sol. (i) We have
1 0 1 2
So in the tabular form, F (x, y) is given by 2 0 2 4 2 X
X 3

X/Y 0 1 2 f (x, y) = f (0, 0) + f (0, 1) + f (0, 2) + f (0, 3) + f (1, 0) + f (1, 1)


Multiply these values in the Table with the corresponding values in the f (x, y) values and add to get X=0 Y =0
0 0 0 1/4 E(X, Y ).
1 0 1/2 3/4 +f (1, 2) + f (1, 3) + f (2, 0) + f (2, 1) + f (2, 2) + f (2, 3)
Hence, Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 1/2 − (1)(1) = −1/2.
2 1/4 3/4 1 = 0.84 + 0.03 + 0.02 + 0.01 + 0.06 + 0.01 + 0.008 + 0.002 + 0.01 + 0.005 + 0.004 + 0.001
Ex. Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red pens, and 3 = 1
(iii) The marginal density fX (x) is given by
green pens. If X is the number of blue pens selected and Y is the number of red pens selected, find
fX (0) = f (0, 0) + f (0, 1) + f (0, 2) = 1/4, This shows that f is a pmf.
(a) the joint probability mass function f (x, y),
fX (1) = f (1, 0) + f (1, 1) + f (1, 2) = 1/2,
(b) P [X + Y ≤ 1].
fX (2) = f (2, 0) + f (2, 1) + f (2, 2) = 1/4, (ii) The probability that there would be exactly one error made by the robots, is given by
Sol. (a) We have
In the tabular form fX (x) is P [X = 1, Y = 0] + P [X = 0, Y = 1] = f (1, 0) + f (0, 1) = 0.06 + 0.03 = 0.09.
3
 2 3

X 0 1 2 x y 2−x−y
f (x, y) = 8
 , x = 0, 1, 2; y = 0, 1, 2; 0 ≤ x + y ≤ 2. (iii) The probability that there would be no improperly tightened bolts, reads as
fX (x) 1/4 1/2 1/4 2
2
X
The marginal density fY (y) is given by (b) P [X + Y ≤ 1] = f (0, 0) + f (0, 1) + f (1, 0) = 3
+ 6
+ 9
= 18
= 9 P [Y = 0] = f (x, 0) = f (0, 0) + f (1, 0) + f (2, 0) = 0.84 + 0.06 + 0.01 = 0.91.
28 28 28 28 14 .
X=0
fY (0) = f (0, 0) + f (1, 0) + f (2, 0) = 1/4,
fY (1) = f (0, 1) + f (1, 1) + f (2, 1) = 1/2, It is the marginal density fY (y) of Y at y = 0, that is, fY (0) = 0.91.
fY (2) = f (0, 2) + f (1, 2) + f (2, 2) = 1/4,
(iv) From the given Table, we notice that f (0, 0) = 0.84, fX (0) = 0.9 and fY (0) = 0.91. So we have
In the tabular form fY (y) is
The marginal densities fX (x) and fY (y) are easier to calculate and display with the joint density fX (0)fY (0) = 0.819 6= f (0, 0).
f (x, y) as follows:

4 5 6
This shows that X and Y are not independent. 4.2 Continuous Bivariate Random Variable 4.2.6 Covariance

Let X and Y be two continuous random variables. Then the ordered pair (X, Y ) is called a two dimensional If µX and µY are the means of X and Y respectively, then covariance of X and Y , denoted by Cov(X, Y )
(v) We find
2 or bivariate continuous random variable. is defined as
X
E[X] = xfX (x) = 0.12,
Cov(X, Y ) = E[(X − µX )(Y − µY )] = E[XY ] − E[X]E[Y ].
X=0 4.2.1 Joint probability density function
3
Ex. Let X denote a person’s blood calcium level and Y , the blood cholesterol level. The joint density
X
E[Y ] = yfY (y) = 0.148, A function f such that
Y =0
function of (X, Y ) is
2 3
ZZ Z ∞ Z ∞
f (x, y) ≥ 0, P [(X, Y ) ∈ R] =
(
f (x, y)dxdy, f (x, y)dxdy = 1,
X X
E[XY ] = xyf (x, y) = 0.064. k, 8.5 ≤ x ≤ 10.5, 120 ≤ y ≤ 240
R −∞ −∞ f (x, y) =
X=0 Y =0 0, elsewhere
Hence, Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.046. where R is any region in the domain of f , is called joint probability density function of (X, Y ).
(i) Find the value of k.
Ex. Suppose X and Y are two discrete random variables taking only integer values. The joint density 4.2.2 Distribution function (ii) Find the marginal densities of X and Y .
function of (X, Y ) is (iii) Find the probability that a healthy person has a cholesterol level between 150 to 200.
f (x, y) = c/[n(n + 1)], 1 ≤ y ≤ x ≤ n, where n is some positive integer. The distribution function of (X, Y ) is given by (iv) Are the variables X and Y independent?
(i) Find the value of c. Z x Z y (v) Find Cov(X, Y ).
(ii) Find the marginal densities. F (x, y) = f (x, y)dxdy.
(iii) Given that n = 5, evaluate P [X ≤ 3, Y ≤ 2]. −∞ −∞
Sol. (i) f (x, y) being joint pdf, we have
n X
n
X 4.2.3 Marginal density functions Z ∞Z ∞ Z 240 Z 10.5
Sol. (i) Using f (x, y) = 1, we find 1= f (x, y)dxdy = kdxdy = 240k.
x=1 y=x The marginal density of X, denoted by fX , is defined as −∞ −∞ 120 8.5
n X
x
X c Z ∞ So k = 1/240 and f (x, y) = 1/240
= 1. fX (x) = f (x, y)dy.
n(n + 1)
x=1 y=1 −∞
c XX n x (ii) The marginal density of X is
=⇒ 1 = 1. Similarly, the marginal density of Y , denoted by fY , is defined as
n(n + 1) Z ∞ Z 240
1 1
x=1 y=1
n
Z ∞ fX (x) = f (x, y)dy = dy = , 8.5 ≤ x ≤ 10.5.
c X fY (y) = f (x, y)dx. −∞ 120 240 2
=⇒ x = 1. −∞
n(n + 1) Similarly, the marginal density of Y is
x=1
c n(n + 1)
=⇒ = 1. 4.2.4 Independent random variables Z ∞ Z 10.5
1 1
n(n + 1) 2 fY (y) = f (x, y)dx = dx = , 120 ≤ y ≤ 240.
The continuous random variables X and Y are said to be independent if and only if −∞ 8.5 240 120
=⇒ c = 2.
f (x, y) = fX (x)fY (y). (iii) The probability that a healthy person has a cholesterol level between 150 to 200, is
(ii) We have Z 200
5
x x 4.2.5 Expectation P [150 ≤ Y ≤ 200] = fY (y)dy = .
X X 2 2x 150 12
fX (x) = f (x, y) = = , 1 ≤ x ≤ n.
n(n + 1) n(n + 1) The expectation or mean of X is defined as
y=1 y=1 (iv) We have
n n Z ∞Z ∞
X X 2 2(n − y + 1)
fY (y) = f (x, y) = = , 1 ≤ y ≤ n. E[X] = xf (x, y)dxdy = µX . 1 1 1
x=y x=y
n(n + 1) n(n + 1) −∞ −∞ fX (x)fY (y) = × = = f (x, y).
2 120 240
In general, the expectation of a function of X and Y , say H(X, Y ), is defined as This shows that X and Y are independent.
1
(iii) Given that n = 5, so f (x, y) = 15 , and therefore Z ∞Z ∞
2 X3 2 E[H(X, Y )] = H(x, y)f (x, y)dxdy. (v) We find
X 1 1 X 1 1 −∞ −∞
P [X ≤ 3, Y ≤ 2] = = (3 − y + 1) = (6 − 3 + 2) = . Z ∞ Z ∞ Z 240 Z 10.5
x=y
15 15 15 3 x
y=1 y=1 E[X] = xf (x, y)dxdy = dxdy = 9.5,
−∞ −∞ 120 8.5 240
Z ∞ Z ∞ Z 240 Z 10.5
y
E[Y ] = yf (x, y)dxdy = dxdy = 180,
−∞ −∞ 120 8.5 240

7 8 9

∞ ∞ 240 Z 10.5
−2 −1
Z Z Z
xy (iii) To calculate the probability P [X ≤ 32, Y ≤ 30], we need to integrate the joint density over the X/Y 1 2 fX (x)
E[XY ] = xyf (x, y)dxdy = dxdy = 1710.
−∞ −∞ 120 8.5 240 shaded golden region shown in Figure 4.2. Considering the horizontal ray through this region, we find 1 0 1/4 1/4 0 1/2
that the x limits are from x = y to x = 32, and y limits are from y = 27 to y = 30. 4 1/4 0 0 1/4 1/2
Hence, Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 1710 − 9.5 × 180 = 0.
Z 30 Z 32 fY (y) 1/4 1/4 1/4 1/4 1
c
Ex. The joint density function of (X, Y ) is ∴ P [X ≤ 32, Y ≤ 30] = dxdy = c(3 ln 32 + 3 − 30 ln 30 + 27 ln 27).
27 y x
(
f (x, y) =
c/x, 27 ≤ y ≤ x ≤ 33 4.3 Pearson coefficient of correlation
0, elsewhere 2 and σ 2 , then correlation
If X and Y are two random variables with means µX , µY , and variances σX Y
33 between X and Y is given by
(i) Find the value of c.
(ii) Find the marginal densities and hence check the independence of X and Y
y=x
Cov(X, Y ) σXY
(iii) Evaluate P [X ≤ 32, Y ≤ 30]. 30 ρXY = = .
x = 33 σX σY σX σY
y

Sol. (i) Here the given range of (X, Y ) is the triangular region common to the three regions given by the It can be proved that ρXY lies in the range [−1, 1]. Further, |ρXY | = 1 if and only if Y = a + bX for
27
inequalities y ≥ 27, y ≤ x and x ≤ 33, as shown in Figure 4.1. 6 0.
some real numbers a and b =
y = 27
23
33 23 27 30 32 33
x

y=x
x = 33 Figure 4.2: The shaded golden region is given by X ≤ 32, Y ≤ 30. The horizontal ray enters this region
y

through the line x = y and leaves at the line x = 32. The y-value at the bottomost points the region is
27 y = 27, and at the uppermost points is y = 30.

y = 27 Theorem: If X and Y are two independent random variables with joint density f , then show that
23 E[XY ] = E[X]E[Y ], that is, Cov(X, Y ) = 0.
23 27 33
x
Proof. We have Figure 4.3: In case of large negative covariance, we have ρXY ≈ −1. In case of nearly zero covariance,
Z ∞Z ∞ ρXY ≈ 0 while in case of very large positive covriance, ρXY ≈ 1.
Figure 4.1: The shaded golden region is the triangular region given by the inequalities y ≥ 27, y ≤ x and
E[XY ] = xyf (x, y)dxdy
x ≤ 33. The vertical ray enters the given region through the line y = 27 and leaves at the line y = x.
Z−∞ −∞
∞ Z ∞
Note that if ρXY = 0, we say that X and Y are uncorrelated (means no linear relationship). It does
The x-value at the leftmost point (27, 27) of the region is x = 27, and at the rightmost points (all points
= xyfX (x)fY (y)dxdy (∵ f (x, y) = fX (x)fY (y) as X and Y are given independent.) not imply that X and Y are unrelated. Of course, the relationship, if exists, would not be linear.
on the line x = 33) is x = 33. −∞ −∞
Z ∞ Z ∞  2 = 0.146, σ 2 = 0.268, Cov(X, Y ) = 0.046 and therefore ρ
In Robot’s example, σX Y XY = 0.23.
Considering vertical ray through the given region, we find that the x limits are from x = 27 to x = 33, = yfY (y) xfX (x)dx dy
and y limits are from y = 27 to y = x. Therefore, to find c, we use Z−∞∞
−∞
Remarks: (i) Covariance tells us how the two variables vary together. It can vary from −∞ to ∞. On
Z 33 Z x = yfY (y)E[X]dy the other hand, correlation tells about the degree of linear relationship between the two variables. It can
−∞
f (x, y)dydx = 1 Z ∞ vary from −1 to 1.
27 27
= E[X] yfY (y)dy (ii) Covariance matrix of X and Y is written as
−∞
and we get
= E[X]E[Y ].  2
σX σXY

1 σY X σY2
c= . Note. Converse of the above result need not be true, that is, if E[XY ] = E[X]E[Y ], then X and Y
6 − 27 ln(33/27)
need not be independent. For instance, see the following table for the joint density function of a two Notice that the covariance matrix is always symmetric since σXY = σY X .
Z y=x
c dimensional discrete random variable (X, Y ). The correlation matrix is given by
(ii) fX (x) = dy = c(1 − 27/x), 27 ≤ x ≤ 33 We find that E[X] = 5/2, E[Y ] = 0 and E[XY ] = 0. So E[XY ] = E[X]E[Y ]. Next, we see that
x
Z x=33y=27 fX (1) = 1/2, fY (−2) = 1/4 and f (1, −2) = 0. So fX (1)fY (−2) 6= f (1, −2), and hence X and Y are not

1 ρXY

c
fY (y) = dx = c(ln 33 − ln y), 27 ≤ y ≤ 33. independent. We can easily observe the dependency X = Y 2 . Thus, covariance does not describe the ρY X 1
x=y x
We observe that f (x, y) = c/x 6= fX (x)fY (y). So X and Y are not independent. type or strength of the association between X and Y except the linear relationship via a measure known
as Pearson coefficient of correlation. It is also symmetric since ρXY = ρY X . In addition, its diagonal elements are unity.

10 11 12
Practice Problems (with partial solution steps) Q.3 Consider two continuous random variables X and Y with pdf
(
Q.1 Let X denote the number of times a photocopy machine will malfunction: 0, 1, 2 or 3 times, on any k(x + y), x > 0, y > 0, 3x + y < 3
given month. Let Y denote the number of times (0, 1 or 2) a technician is called on an emergency service. f (x, y) =
0, elsewhere
The joint pmf is given as: f (0, 0) = 0.15, f (0, 1) = 0.05, f (0, 2) = 0, f (1, 0) = 0.30, f (1, 1) = 0.15, f (1, 2) =
0.05, f (2, 0) = 0.05, f (2, 1) = 0.05, f (2, 2) = 0.10, f (3, 0) = 0, f (3, 1) = 0.05, and f (3, 2) = 0.05. Find (i) Find (i) k, (ii) P (X < Y ), (iii) the marginal pdfs of X and Y , (iv) Cov(X + 2, Y − 3), (v) Corr(−2X +
P (X < Y ), (ii) the marginal pmfs of X and Y , and (iii) Cov(X, Y ). 3, 2Y + 7), and (vi) Cov(−2X + 3Y − 4, 4X + 7Y + 5).

Sol. The given joint pmf in tabular form is Sol. (i) k = 2,


(ii) P (X < Y ) = 27/32,
X/Y 0 1 2 fX (x)
(iii) the marginal pdfs of X and Y are
0 0.15 0.05 0 0.20
1 0.30 0.15 0.05 0.50 9 3
fX (x) = − 3x + x2 , 0 < x < 1
2 0.05 0.05 0.10 0.20 Figure 4.4: Shaded golden region is portion of the region: 0 < x < 3, 0 < y < 3, where y < x/3. 4 4
3 0 0.05 0.05 0.10 1 1 5
fY (y) 0.50 0.30 0.20 1 fY (y) = + y − y 2 , 0 < y < 3
4 3 36
(ii) To find P (X + Y > 3), do the double integral of f (x, y) over the portion of the region: 0 < x < 3, 0 <
y < 3, where y > 3 − x. It is the shaded golden region in Figure 4.4. Any ray parallel to y-axis through For the remaining parts, we find E(X) = 5/16, E(Y ) = 21/16, E(XY ) = 3/10 and Cov(X, Y ) =
where marginal pmfs fX (x) an fY (y) are also shown.
this region, enters the region at y = 3 − x and leaves the region at y = 3. Also, the leftmost point of the −0.11016. Also, find Cov(X, X), Cov(Y, Y ) and Corr(X, Y ). Then use the following results:
region corresponds to x = 0, while the rightmost points correspond to x = 3. So we have (1) Cov(aX + b, cY + d) = ac Cov(X, Y )
(i) To find P (X < Y ), do the sum of probabilities of the pairs (X, Y ) where X < Y . Such pairs are
(0, 1), (0, 2), (1, 2). So we have Z 3Z 3 ac
2 2 (2) Corr(aX + b, cY + d) = |a||c| Corr(X, Y ), a 6= 0, c 6= 0
P (X + Y > 3) = x ydydx = 0.9
P (X < Y ) = f (0, 1) + (0, 2) + f (1, 2) = 0.05 + 0 + 0.05 = 0.1. 0 3−x 81
(3) Cov(aX + bY + h, cX + dY + k) = ac Cov(X, X) + bd Cov(Y, Y ) + (ad + bc) Cov(X, Y )
(iii) E(X) = 1.2, E(Y ) = 0.7, E(XY ) = 1.2 and Cov(X, Y ) = 0.36.
If you do not recall these rules, then do direct simplification using the simple rules of expectation and
variance. For example, let us solve part (iii).
Q.2 Consider two continuous random variables X and Y with pdf
(iii) Cov(X + 2, Y − 3) = E[(X + 2)(Y − 3)] − E(X + 2)E(Y − 3)
(
2 2
x y, 0 < x < k, 0 < y < k
f (x, y) = 81 = E(XY − 3X + 2Y − 6) − (E(X) + 2)(E(Y ) − 3)
0, elsewhere
= E(XY ) − 3E(X) + 2E(Y ) − 6 − (E(X) + 2)(E(Y ) − 3)
Find (i) k, (ii) P (X > 3Y ), (iii) P (X + Y > 3), and (iv) the marginal pdfs of X and Y . Are X and Y Now you just need to put the values: E(X) = 5/16, E(Y ) = 21/16, E(XY ) = 3/10.
independent?
Likewise, you can solve the remaining parts.
Sol. (i) To find k, put the double integral of the joint pdf f (x, y) over the square region 0 < x < k, 0 <
y < k equal to 1, that is, Q.4 Consider two continuous random variables X and Y with pdf
Z kZ k (
2 2
x y = 1. ke−y , −y < x < y, y > 0
Figure 4.5: Shaded golden region is portion of the region: 0 < x < 3, 0 < y < 3, where y > 3 − x. f (x, y) =
0 0 81 0, elsewhere
We find k = 3. (iv) Marginal pdfs are given by
The joint pdf is therefore Find (i) k, (ii) the marginal pdfs of X and Y , and (iii) the conditional pdfs of X and Y .
Z 3 Z 3 Sol. (i) k = 1/2
2 2 1 2 2 2
fX (x) = x ydy = x2 , 0 < x < 3, fY (y) = x ydx = y, 0 < y < 3.
(
2 2
x y, 0 < x < 3, 0 < y < 3 81 9 81 9
f (x, y) = 81 0 0
0, elsewhere Q.5 Two persons A and B have agreed to meet for lunch between noon (0:00 pm) to 1:00 pm. Denote
2 2
We see that fX (x)fY (y) = 81 x y = f (x, y) for all (x, y). So X and Y are independent. A’s arrival time by X, B’s by Y , and suppose X and Y are independent with density functions:
(ii) To find P (X > 3Y ), do the double integral of f (x, y) over the portion of the region: 0 < x < 3, 0 < (
y < 3, where y < x/3. It is the shaded golden region in Figure 4.4. Any ray parallel to y-axis through 3x2 , 0 < x < 1,
fX (x) =
this region, enters the region at y = 0 and leaves the region at y = x/3. Also, the leftmost point of the 0, elsewhere
region corresponds to x = 0, while the rightmost points correspond to x = 3. So we have
(
Z 3 Z x/3
2 2 1 2y, 0 < y < 1,
P (X > 3Y ) = x ydydx = . fY (y) = .
81 15 0, elsewhere
0 0

13 14 15

(i) Find the probability that A arrives before B, and hence compute the expected amount of time A It implies that P (Ti > t) = eλi t .
would have to wait for B to arrive.
The cdf of Tmin = min(T1 , T2 , ..., Tk ) is given by
(ii) If they have pre-decided on a condition that whoever comes first will only wait for 15 minutes for
the other, what is the probability that they will meet for lunch? FTmin (t) = P (Tmin ≤ t)
= 1 − P (Tmin > t)
Sol. Since X and Y are independent, their joint pdf is given by = 1 − P (min(T1 , T2 , ..., Tk ) > t)
( = 1 − P (T1 > t, T2 > t, ..., Tk > t)
6x2 y, 0 < x < 1, 0 < y < 1
f (x, y) = fX (x)fY (y) = = 1 − P (T1 > t)P (T2 > t)....P (Tk > t) (∵ T1 , T2 , ..., Tk are independent.)
0, elsewhere
= 1 − e−λ1 t e−λ2 t ...e−λk t
(i) P (A arrives before B) = P (X < Y ) = 2/5. = 1 − e−(λ1 +λ2 +...+λk )t
Next, expected amount of time A would have to wait for B to arrive is given by E(Y − X) provided
Y > X. Solving the double integral of (y − x)f (x, y) over the region 0 < x < 1, 0 < y < 1, y > x, we get We see that Tmin has an exponential distribution with mean λ1 + λ2 + ... + λk .
E(Y − X) = 1/12 hours. (Verify!).

(ii) They will meet for lunch if waiting for the either is less than 15 minutes or 1/4 hours. So required
probability is the sum of P (Y − X < 1/4) and P (X − Y < 1/4).

Q.6 The following table shows the quality and meal price ratings (1 lowest to 3 highest) of 300 restaurants
in a metro city:

Quality/Meal Price 1 2 3 Total


1 42 39 3 84
2 33 63 54 150
3 3 15 48 66
Total 78 117 105 300

Develop a bivariate probability distribution for quality X and meal price Y of a randomly selected
restaurant in the metro city. Determine Cov(X, Y ) and Cor(X, Y ). Based on your results, do you suppose
it is likely to find a low cost restaurant with high meal quality.
Sol. Divining the number of restaurants by the total number 300, the probability distribution of (X, Y )
reads:
X/Y 1 2 3 fX (x)
1 0.14 0.13 0.01 0.28
2 0.11 0.21 0.18 0.50
3 0.01 0.05 0.05 0.22
fY (y) 0.26 0.29 0.35 1

E(X) = 1.94, E(Y ) = 2.09, V (X) = 0.4964, V (Y ) = 0.6019, Cov(X, Y ) = 0.2854 and Cor(X, Y ) =
0.5221. Because of the moderately positive correlation between X and Y , it is not very likely to find a
restaurant with least meal price and highest quality.

Q.7 Let T1 , T2 , ..., Tk be independent exponential random variables with mean values 1/λ1 , 1/λ2 , ..., 1/λk ,
respectively. Denote Tmin = min(T1 , T2 , ..., Tk ). Show that Tmin has an exponential distribution. What is
the mean of Tmin ?

Sol. The cdf of each exponential random variable Ti is

FTi (t) = P (Ti ≤ t) = 1 − eλi t , (t > 0).

16 17

You might also like