Professional Documents
Culture Documents
Interval Estimation Interval (CI) Estimation: Course No: MATH F113
Interval Estimation Interval (CI) Estimation: Course No: MATH F113
Interval Estimation Interval (CI) Estimation: Course No: MATH F113
Pilani Campus
• Usually, an interval estimate can be obtained by adding and – In this case, the end points of the interval are RVs and we
subtracting a margin of error to the point estimate. Then, can talk about the probability (in the sense of frequency
definition) that it brackets the parameter value.
Interval Estimate = Point Estimate
` + / - Margin of Error
Confidence Interval : A 100(1- α)% confidence interval for a
• Interval estimation provides us information about how close the parameter is a random interval [L1,L2] such that
point estimate is to the value of the parameter. P[L1 ≤ θ ≤ L2] = 1- α , regardless the value of θ.
• Why we use the term confidence interval?
3 4
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Sol. (b) First check the two assumptions : (i) normality (ii) known
Hence, 95% CI for is given as x - 1.96 , x 1.96 . Step 1: Here n 50, x 15.68, =3.27, and 0.05. We need CI for .
n n
That is, P X - 1.96
X 1.96
Step 2: As 0.05, we need to find z 2 such that P Z z 2 =0.975.
0.95
n n From cumulative normal distribution table, we see z 2 1.96.
Step 3: The CI for known is x - z0.025 , x z0.025 14.77,16.59
n n
15 16
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
17 18
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Deriving a CI: Example 7.5
BITS Pilani
Pilani Campus
(https://en.wikipedia.org/wiki/Relationships_among_probability_distributions)
21 22
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
X
Xi
population that has some property, is the sample proportion
Note that pˆ 1
X,
number in sample with the trait (success) X n n
pˆ where, each X i is an independent point binomial (Bernoulli RV),
sample size n
Properties: that is, P X i 1 p and P X i 0 1 - p
(i) As the sample size increases (n large), the sampling xi 1 0
distribution of pˆ becomes approximately normal (WHY?) f(xi) p 1-p
p 1 - p
(ii) The mean of pˆ is p, and variance of pˆ is (WHY?) E[Xi] = 1(p)+0(1-p)=p
n
(iii) Can we get estimators of p? Point and interval estimator Var (Xi) = E[Xi2]-(E[Xi])2 = p(1-p)
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Practice Problems
BITS Pilani
Pilani Campus
T-Distribution T-Distribution
• Random variable T with degrees of freedom (called
parameter) is a continuous r.v. with density
- ( 1) / 2
( 1) / 2 t2
f (t ) 1 ;- t .
( / 2)
Examples
Ex.4. Seven laboratory experiments of the value of g (accelearation due
to gravity that follows normal distribution) at Pilani gave a mean 977.51 cm/s 2
and a s.d. 4.42 cm/s 2 . Find 95% CI for the true value of g (i.e., population mean).
Sol.
Step 1: Here n 7, x 977.51, s =4.42, and 0.05. We need CI for . https://en.wikipedia.org/wiki/Student
Population is also known to be normal dist. %27s_t-distribution
degree of freedom, such that P T t 2 =0.975.
From t-distribution table, we see t0.025,6 2.447.
s s
Step 3: The CI for unknown is x - t0.025 , x t0.025
n n Similar to the table A.5
973.09, 981.93
51 BITS Pilani, Pilani Campus 52 BITS Pilani, Pilani Campus
α/2
1-α/2
Degree of
freedom tα/2 Example 11 (Page 288)
Cumulative Probability
Similar to
• The article “Development of Novel Industrial
Laminated Planks from Sweetgum Lumber” (J. of
the table Bridge Engr., 2008: 64–66) described the
A.5 manufacturing and testing of composite beams
designed to add value to low-grade sweetgum
lumber.
• Here is data on the modulus of rupture (psi; the article contained • Let’s now calculate a confidence interval for true
summary data expressed in MPa):
average MOR using a confidence level of 95%.
• 6807.99 7637.06 6663.28 6165.03 6991.41 6992.23
The CI is based on n – 1 = 29 degrees of freedom,
• 6981.46 7569.75 7437.88 6872.39 7663.18 6032.28 so the necessary t critical value is t.025,29 = 2.045.
• 6906.04 6617.17 6984.12 7093.71 7659.50 7378.61 The interval estimate is now
• 7295.54 6702.76 7440.17 8053.26 8284.75 7347.95
• 7422.69 7886.87 6316.67 7713.65 7503.33 7674.99
Examples
An interval [L2, ∞) such that P(μ ≥ L2) = 1-α allows us Solution : x 41.05, s 2 98.61, s 9.93
to place bounds on the minimum value of population s 9.93
Lobs x - t / 2 41.05 - 1.734 37.10
n 19
mean L X -t2 S/ n
, n -1 CI [37.10, )
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
7.4: Interval estimation of Variability Recall Chi-squared Distribution
i 1
has chi-squared distribution with (n-1) degrees of
freedom
Ex.5. A sample of size 9 from a normal population is given below. Find HW.6. The heights in inches of 8 students of a college, chosen at random,
the 90% CI for the mean of the population. Also find 90% CI for the were as follows: 62.2, 62.4, 63.1, 63.2, 65.5, 66.2, 66.3, 66.5. Compute 90%
variance 2 of the population. Sample: 0, 1, -1, 1, 1, 0, -1, - 2, 3. and 95% CI for the variance of the population of heights, assuming
Also find 90% CI for . it be to be normal. Also, find the length of the interval in each case.
65
BITS Pilani, Pilani Campus 66 BITS Pilani, Pilani Campus
α
Degree of 1-α/2
freedom Chi-sq(α)
Cumulative Probability
Supplementary HW Supplementary HW
A criminal trial: In the language of statistics convicting the defendant is called rejecting the null
hypothesis in favor of the alternative hypothesis. That is, the jury is saying
In a trial, jury must decide between two hypotheses. The null hypothesis
that there is enough evidence to conclude that the defendant is guilty (i.e.,
(prior-belief) is there is enough evidence to support the alternative hypothesis).
H0: The defendant is innocent
The alternative hypothesis or research hypothesis is If the jury acquits it is stating that there is not enough evidence to support the
alternative hypothesis. Notice that the jury is not saying that the defendant is
H1: The defendant is guilty
innocent, only that there is not enough evidence to support the alternative
The jury do not know which hypothesis is true. They must make a hypothesis. In the same logic, we do not say that we accept the null
hypothesis, rather we say that “we fail to reject the null hypothesis” from
decision on the basis of evidence presented.
available information from sample.
• Type-II error: Failed to reject null hypothesis when it is false; Prob(type-II) = β Reducing both type-I and type-II errors together is not possible.
• Power of a test (1-β): Probability of rejecting null hypothesis when it is false
Although, one can try to make either type of error reasonably small!
9 BITS Pilani, Pilani Campus 10 BITS Pilani, Pilani Campus
possible and the power of the test to be as high as possible. This is also called the level of significance.
This is usually achieved by choosing a appropriate sample • Probability of Type II error is b =P(H0 is accepted when H0 is false)
size.
BITS Pilani, Pilani Campus 16 BITS Pilani, Pilani Campus
Step 1. Develop the null and alternative hypotheses; determine 1. Test Statistic for population mean:
appropriate statistical test.
(a) when population variance is known: Z X 0
H 0 : 0 H 0 : 0
H 1 : 0 H 1 : 0
Do Not
Reject H0
(Acceptance
Region)
H 0 : 0
H 1 : 0
31 BITS Pilani, Pilani Campus 32 BITS Pilani, Pilani Campus
Ex.8.2. A department store manager determines that a new billing system will be Ex.8.2. A department store manager determines that a new billing system will be
cost effective only if the mean monthly account is more that $170. A random sample cost effective only if the mean monthly account is more that $170. A random sample
of 400 monthly accounts is drawn, for which the sample mean is $178. It is known of 400 monthly accounts is drawn, for which the sample mean is $178. It is known
that the accounts are approximately normally distributed with s.d. of $65. that the accounts are approximately normally distributed with s.d. of $65.
At 5%, can we conclude that the new system will be cost-effective? At 5%, can we conclude that the new system will be cost-effective?
Sol. Sol.
Step1:Here H 0 : 170 one-tailed (right-tailed) test; test for and known
H1 : 170
x 0 178 170
Step 2: From sample data, we formulate zcalculated 2.46
65 400
n
Step 3: At 95% confidence level, z0.05 1.645 from single tailed test of Z-table
Step 4: As zcalculated z0.05 reject the null hypothesis (i.e., accept H1 )
35 BITS Pilani, Pilani Campus 36 BITS Pilani, Pilani Campus
Examples: Hypothesis Testing Examples: Normal Distribution
Ex.8.3. A drug is given to 10 patients, and the increments in their blood pressure
were recorded as 3, 6, 2, 4, 4, 1, 6, 0, 0, 2. Is it reasonable to believe that the
drug has no effect on change of the mean blood pressure? Test at 95% confidence
level, assuming that the population is normal with variance 1.
Sol.
Ex.8.3. A drug is given to 10 patients, and the increments in their blood pressure Ex.8.4. The mean weakly sales of a magazine was 146 units. After an advertisement
were recorded as 3, 6, 2, 4, 4, 1, 6, 0, 0, 2. Is it reasonable to believe that the campaign, mean of weakly sales in 22 stores for a typical week increased to 154 with
drug has no effect on change of the mean blood pressure? Test at 95% confidence a standard deviation of 15 units. Was the advertisement successful at 5% significance
level, assuming that the population is normal with variance 1. level? It is given that the weakly sales of magazine follows normal distribution.
Sol. Sol.
Step 1: Formulate the hypothesis: H 0 : 0, H1 : 0
Two-tailed test for and is known.
x 0 0.4 0
Step 2: From sample data, we formulate z calculated 1.265
n 1 10
Step 3: At 95% confidence level, z0.025 1.96, z0.025 1.96
from two tailed test of Z-table, we find z 2
Step 4: As zcalculated does not fall in the rejection region, we fail to reject H 0 .
We can believe that the drug has no effect on change of the mean blood pressure
39 BITS Pilani, Pilani Campus 40 BITS Pilani, Pilani Campus
Degree of
1-α/2
α/2
Examples: Hypothesis Testing
freedom tα/2
Cumulative Probability
Ex.8.4. The mean weakly sales of a magazine was 146 units. After an advertisement
campaign, mean of weakly sales in 22 stores for a typical week increased to 154 with
a standard deviation of 15 units. Was the advertisement successful at 5% significance
level? It is given that the weakly sales of magazine follows normal distribution.
Sol.
Step 1: Formulate the hypothesis: H 0 : 146
H1 : 146
One-tailed test; test for and unknown, and small sample size from normal.
x 0 154 146
Step 2: From sample data, we formulate tcalculated 2.501
S n 15 22
Step 3: For 0.05 and 21 dof , t21,0.05 1.721 from one tailed test of T-table
Step 4: As tcalculated t21, 0.05 reject the null hypothesis (i.e., accept H1 )
We can conclude that the advertisement was successful.
BAZC413: Introduction to Statistical Methods 41 BITS Pilani, Pilani Campus 42 BITS Pilani, Pilani Campus
Problem Solving Testing for p (large-sample)
Ex.8.8. In a golf course, over the past years, 20% of the players were women. In an effort
to increase the proportion of women players, a special promotion was implemented. Now the
manager likes to see whether the promotion helped to increase the proportion of women
players. A random sample of 400 players was selected, and 100 of the players were women.
Test the hypothesis at 5% significance level.
Sol. Step 1: Formulate the hypothesis: H 0 : p 0.20, H1 : p 0.20
One-tailed test; test for p sample size n 400 .
pˆ obs p0 0.25 0.20
Step 2: From sample data, we formulate zcal 2.5
p0 1 p0 0.20 0.80
n 400
Step 3: For 0.05, z0.05 1.645 from one tailed test of z-table
Step 4: As zcal z 0.05 reject the null hypothesis (i.e., accept H1 )
We can conclude that the proportion of women players has increased.
P-value and Significance Testing Type-I and Type-II error (Ex. 8.1,
page 304)
• From the observed value u0 of the test statistic U, consider,
under assumption of H0, the probability that U lies to the
extreme of this observed value, on both sides or left side or
right side resp. depending on the nature of the alternate
hypothesis. This is called P-value or descriptive significance
level.
• Significance testing : Reject H0 if P-value is small.
Here, no level is pre-set, decision is taken only using
P-value.
• Previous testing using critical region with pre-set
level of significance, is called hypothesis testing.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Type-I and Type-II error (Ex. 8.1, Type-I and Type-II error (Ex. 8.1,
page 304) page 304)
Course No: MTH F113 Chapter 12: Simple Linear Regression Model
(12.1 and 12.2)
Probability and Statistics Sumanta Pasari
BITS Pilani, Pilani Campus
(sumanta.pasari@pilani.bits-pilani.ac.in) BITS Pilani, Pilani Campus
When 2 is small, an observed point (x, y) will almost always fall quite close to the true
regression line, whereas observations may deviate considerably from their expected values
(corresponding to points far from the line) when 2 is large.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Linear Probabilistic Model Linear Probabilistic Model
Scattergram Assumptions
In a regression study, it is useful to plot the data Simple Linear Regression Model: Y x 0 1 x
points in xy-plane. Such a plot is called the Simple Linear Regression Equation: Y x 0 1 x
scattergram (scatter diagram). We do not expect the Estimated Simple Linear Regression Equation: ˆY x = b0 b1 x or , yˆ = b0 b1 x
points to lie exactly on a straight line. However if
linear regression is applicable, then they should
exhibit a linear trend. Model assumptions:
1. E 0
2. V 2 , same for all values of x
3. The values of are independent.
4. ~ N 0, 2 Y is also normally distributed.
x 1 2 3 4 5 6
y 14 33 40 63 76 85
(a)Graph the data to verify that it is reasonable to assume that
the regression of Y on X is linear
(b) Find the equation of the least-squares line and use it to
predict the elongation when the tensile force is 3.5
thousand pounds.
37 BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
yˆ 1.133 14.486 x interpretation of results? Ex. 3. Now, estimate 0 and 1 . Find the residuals in
each case and verify that apart from round-off error, the
Therefore, for tensile force 3.5, y = 51.83 (meaning?) residuals sum to 0.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
Using the model to explain y variation: (a) data for which all variation is explained; (b) data for which
most variation is explained; (c) data for which little variation is explained
i 1
Restaurant 1 2 3 4 5 6 7 8 9 10
n
Total Sum of Squares (SST): SST yi y
2
Students (1000s) 2 6 8 8 12 16 20 20 22 26
i 1
n
Sum of Squares due to Regression (SSR): SSR yˆi y
2 Sales ($1000s) 58 105 88 118 117 137 157 169 149 202
i 1
temperature 42 37 46 30 50 43 43 46 46 49
(a) Develop a scatter diagram with price as the independent variable.
passenger 173 149 185 123 201 174 175 188 186 198 What does it indicate about the relationship between the two
variables?
(b) Find an estimated regression line of Y on X.
(a) Find an estimated regression line of Y on X.
(c) Obtain a point estimation of correlation coefficient.
(b) Obtain a point estimation of correlation coefficient.
BITS Pilani, Pilani Campus BITS Pilani, Pilani Campus
(a) Develop a scatter diagram with price as the independent variable. What does it indicate
about the relationship between the two variables?
(b) Find an estimated regression line of Y on X. Obtain a point estimation of correlation
coefficient. Find SSR, SST and the coefficient of determination. Find an estimate of σ2.
HW 4:
since X(HH) = 2, X(HT ) = 1, X(T H) = 1 and X(T T ) = 0. In tabular form, it can be displayed as
Contents Chapter 2
Outcome HH HT TH TT
X=x 2 1 1 0
Discrete Probability Distributions
2 Discrete Probability Distributions 2 So here the discrete random variable X assumes only three values X = 1, 2, 3.
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 We find that P (X = 0) = 14 , P (X = 1) = 12 and P (X = 2) = 41 . It is easy to see that the function f
2.1.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 given by
Note: These lecture notes aim to present a clear and crisp presentation of some topics in Probability and
2.1.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Statistics. Comments/suggestions are welcome via the e-mail: sukuyd@gmail.com to Dr. Suresh Kumar. X=x 0 1 2
2.1.3 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Moments and moment generating function . . . . . . . . . . . . . . . . . . . . . . . 8 f (x) = P (X = x) 1 1 1
4 2 4
2.2 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Definitions
2.2.1 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Discrete Random Variable is pmf of X. It gives the probability distribution of X.
2.3.1 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 The cumulative distribution function F of X is given by
Suppose a random experiment results into finite or countably infinite outcomes with sample space S.
2.4 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Then a variable X taking real values x corresponding to each outcome of the random experiment (or each
2.4.1 Binomial distribution as a limiting case of hypergeometric distribution . . . . . . . 19 X=x 0 1 2
element of S) is called a discrete random variable. In other words, the discrete random variable X
2.4.2 Generalization of the hypergeometric distribution . . . . . . . . . . . . . . . . . . . 20 is a function from the sample space S to the set of real numbers. So, in principle, the discrete random F (x) = P (X ≤ x) 1 3
1
4 4
2.5 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 variable X being a function could have any given definition.
2.6 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Remark: Note that X is a function with domain as the sample space S. So, in the above example, X
Probability Mass Function (pmf )
could also be defined as the number of tails, and accordingly we could write its pdf and cdf.
A function f is said to be probability mass function of a discrete random variable X if it satisfies the
following three conditions: Example with countably infinite sample space
(i) f (x) ≥ 0 for each value x of X.
Suppose a fair coin is tossed again and again till head appears. Then the sample space is
(ii) fX(x) = P (X = x), that is, f (x) provides probability for each value x of X.
(iii) f (x) = 1, that is, sum of probabilities of all values x of X is 1. S = {H, T H, T T H, T T T H, . . . }.
X=x
The outcome H corresponds to the possibility of getting head in the first toss. The outcome T H cor-
Cumulative Distribution Function (cdf ) responds to the possibility of getting tail in the first toss and head in the second toss. Likewise, T T H
A function F defined by corresponds to the possibility of getting head in the third toss, and so on.
If X denotes the number of tosses in this experiment, then X is a function from the sample space S
X
F (x) = f (x) to the set of natural numbers, and is given by
X≤x So here the discrete random variable X assumes countably infinite values X = 1, 2, 3, . . . .
is called cumulative distribution function of X. Therefore, F (x) is the sum of probabilities of all the The pmf of X is given by
values of X starting from its lowest value to the value x. It can also be written in the closed form
x
1
Example with finite sample space f (x) = , x = 1, 2, 3, ........
2
Consider the random experiment of tossing of two fair coins. Then the sample space is
Notice that f (x) ≥ 0 for all x and
∞ x 1
S = {HH, HT, T H, T T }. X X 1 a
f (x) = = 2 1 = 1 (∵ The sum of infinite G.P. a + ar + ar2 + . . . is 1−r ).
2 1− 2
X=x x=1
1 2 3
Outcome H TH TTH ... sold by the agency. Simple calculations yield f (0) = 1/35, f (1) = 12/35, f (2) = 18/35, and f (3) = 4/35. Therefore,
3
X=x 1 2 3 ...
X 12
Sol. f (x) = 4
/16, x = 0, 1, 2, 3, 4. µ = E(X) = xf (x) = = 1.7.
x 7
x=0
X=x 1 2 3 ... 2.1.1 Expectation Thus, if a sample of size 3 is selected at random over and over again from a lot of 4 good components
and 3 defective components, it will contain, on average, 1.7 good components.
1 1 2 1 3
f (x) = P (X = x) 2 2 2 ... Let X be a random variable with pmf p. Then, the expectation of X, denoted by E(X), is defined as
X Ex. A salesperson for a medical device company has two appointments on a given day. At the first
E(X) = xf (x).
appointment, he believes that he has a 70% chance to make the deal, from which he can earn $ 1000
X=x
The cumulative distribution function F of X is given by commission if successful. On the other hand, he thinks he only has a 40% chance to make the deal at
1
x x More generally, if H(X) is function of the random variable X, then we define the second appointment, from which, if successful, he can make $1500. What is his expected commission
1 − 12
X 1 based on his own probability belief? Assume that the appointment results are independent of each other.
F (x) = f (x) = 2 =1− , where x = 1, 2, 3, ......... E(H(X)) =
X
H(x)f (x).
X≤x
1 − 12 2
X=x
Note. Determining cdf could be very useful. For instance, in the above example, suppose it is required
Sol. First, we know that the salesperson, for the two appointments, can have 4 possible commission
to calculate P (10 ≤ X ≤ 30). Here, one option is to sum all the probabilities from P (X = 10) to
totals: $0, $1000, $1500, and $2500. We then need to calculate their associated probabilities. By inde-
P (X = 30). Instead, we use the cdf to obtain Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and
pendence, we obtain
P (10 ≤ X ≤ 30) = F (30) − F (9) = 1 − 2130 − 1 − 219 = 219 − 2130 . 2 with probabilities 14 , 12 and 14 respectively. So E(X) = 0 × 14 + 1 × 12 + 2 × 14 = 1.
f (0) = (1 − 0.7)(1 − 0.4) = 0.18,
f (2500) = (0.7)(0.4) = 0.28,
Some more illustrative examples Note: (1) The expectation E(X) of the random variable X is the theoretical average or mean value of
f (1000) = (0.7)(1 − 0.4) = 0.42,
X. In a statistical setting, the average value, mean value1 and expected value are synonyms. The mean
Ex. A shipment of 20 similar laptop computers to a retail outlet contains 3 that are defective. If a f (1500) = (1 − 0.7)(0.4) = 0.12.
value is demoted by µ. So E(X) = µ.
school makes a random purchase of 2 of these computers, find the probability distribution for the number
of defectives. Therefore, the expected commission for the salesperson is
(2) If X is a random variable, then it is easy to verify the following:
E(X) = (0)(0.18) + (1000)(0.42) + (1500)(0.12) + (2500)(0.28) = $1300.
(i) E(c) = c
Sol. f (0) = 68/95, f (1) = 51/190 and f (2) = 3/190. (ii) E(cX) = cE(X)
Ex. Suppose that the number of cars X that pass through a car wash between 4:00 P.M. and 5:00 P.M.
(iii) E(cX + d) = cE(X) + d
Ex. Find the probability distribution of the number of heads in a toss of four coins. Also, plot the on any sunny Friday has the following probability distribution:
(iv) E(cH(X) + dG(X)) = cE(H(X)) + dE(G(X))
probability mass function and probability histogram. where c, d are constants, and H(X) and G(X) are functions of X. Thus, expectation respects the linearity
property. X=x 4 5 6 7 8 9
Sol. Total number ofpoints in the samplespace is 16. The number points in the sample space
with 0, 1,
2, 3 and 4 heads are 40 , 41 , 42 , 43 and 44 , respectively. So f (0) = 40 /16 = 1/16, f (1) = 41 /16 = 1/4,
1 1 1 1 1 1
(3) The expected or the mean value of the random variable X is a measure of the location of the center f (x) = P (X = x) 12 12 4 4 6 6
f (2) = 42 /16 = 3/8, f (3) = 43 /16 = 1/4 and f (4) = 44 /16 = 1/16.
of values of X.
Thus, f (x) = x4 /16, x = 0, 1, 2, 3, 4.
The probability mass function plot and probability histogram are shown in Figure 2.1. Let g(X) = 2X − 1 represent the amount of money, in dollars, paid to the attendant by the manager.
Some illustrative examples
Find the attendant’s expected earnings for this particular time period.
Ex. A lot containing 7 components is sampled by a quality inspector; the lot contains 4 good compo-
nents and 3 defective components. A sample of 3 is taken by the inspector. Find the expected value of
the number of good components in this sample. Sol. We find
9
X
E(g(X)) = E(2X − 1) = (2x − 1)f (x) = $12.67.
Sol. Let X represent
3 the number of good components in the sample. Then probability distribution of
4 x=4
x 3−x
X is f (x) = 7
, x = 0, 1, 2, 3.
3 2.1.2 Variance
1
From your high school mathematics, you know that if we have n distinct values x1 , x2 , ...., xn with frequencies f1 , f2 ,
Xn Let X and Y be two random variables assuming the values X = 1, 9 and Y = 4, 6. We observe that both
...., fn respectively and fi = N , then the mean value is the variables have the same mean values given by µX = µY = 5. However, we see that the values of X
i=1
are far away from the mean or the central value 5 in comparasion to the values of Y . Thus, the mean
n n n
X fi xi X fi X value of a random variable does not account for its variability. In this regard, we define a new parameter
Figure 2.1: Probability mass function plot and probability histogram µ= = xi = f (xi )xi .
i=1
N i=1
N i=1 known as variance. It is defined as follows.
Ex. If a car agency sells 50% of its inventory of a certain foreign car equipped with side airbags, find where f (xi ) = fNi is the probability of occurrence of xi in the given data set. Obviously, the final expression for µ is the
a formula for the probability distribution of the number of cars with side airbags among the next 4 cars expectation of a random variable X assuming the values xi with probabilities f (xi ).
4 5 6
n n
If X is a random variable with mean µ, then its variance, denoted by V (X) is defined as the expec- X X n(n + 1)(2n + 1) 2n + 1 Thus, the function E(etX ) generates all the ordinary moments. That is why, it is known as the moment
Now µ = E(X) = xf (x) = cx2 = c = .
tation of (X − µ)2 . So, we have 6 3 generating function and is denoted by mX (t). Thus, mX (t) = E(etX ).
x=1 x=1
V (X) = E((X − µ)2 ) = E(X 2 ) + µ2 − 2µE(X) = E(X 2 ) + E(X)2 − 2E(X)E(X) = E(X 2 ) − E(X)2 . n n In general, the kth moment of a random variable X about any point a is defined as E((X − a)k ).
X X n2 (n + 1)2 n(n + 1)
E(X 2 ) = x2 f (x) = cx3 = c = . Obviously, a = 0 for the ordinary moments. Further, E(X − µX ) = 0 and E((X − µX )2 ) = σX 2 . So the
Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and 4 2 first moment about mean is 0 while the second moment about mean yields the variance.
x=1 x=1
2
2 with probabilities 41 , 12 and 14 respectively. So
n(n + 1) 2n + 1
σ 2 = E(X 2 ) − E(X)2 = − .
E(X) = 0 × 14 + 1 × 12 + 2 × 41 = 1, 2 3
E(X 2 ) = (0)2 × 14 + (1)2 × 21 + (2)2 × 14 = 32 .
2.2 Geometric Distribution
Ex. Consider a random variable X with the pmf given by
∴ V (X) = 23 − 1 = 12 . The geometric distribution arises under the following conditions:
f (x) = c 2−|x| , x = ±1, ±2, ±3, ...., (i) The random experiment consists of a series of independent trials.
Note: (i) The variance V (X) of the random variable X is also denoted by σ 2 . So V (X) = σ 2 .
(ii) Each trial results into two outcomes, namely success (S) and failure (F ), which have constant proba-
(ii) If X is a random variable and c is a constant, then it is easy to verify that V (c) = 0 and V (cX) = 2|X|
where c is a constant. If g(X) = (−1)|X|−1 , then show that E(g(X)) exists but E(|g(X)|) does bilities p and q = 1 − p, respectively.a
c2 V (X). 2|X| − 1 (iii) X denotes the number of trials to obtain the first success.
not exist.
a
∞
Such trials are called as Bernoulli trials.
X
Some illustrative examples Sol. Using the condition f (x) = 1, we find c = 1/2. Then the sample space of the random experiment is
x=±1
Ex. Let the random variable X represent the number of automobiles that are used for official business ∞ ∞ S = {S, F S, F F S, ..........},
X X 1
purposes on any given workday. The probability distribution for company A is Now E(g(X)) = g(x)f (x) = (−1)|x|−1 , which is an alternating and convergent
2(2|x| − 1) and X is a discrete random variable with countably infinite values: X = 1, 2, 3, ......... such that
x=±1 x=±1
x 1 2 3 ∞
X 1
series. So E(g(X)) exists. But E(|g(X)|) = is a divergent series, so E(|g(X)|) does not Outcome S FS FFS ...
f (x) 0.3 0.4 0.3 2(2|x| − 1)
x=±1
exist. X=x 1 2 3 ...
and that for company B is P (X = x) p qp qp2 ...
2.1.3 Standard Deviation
x 0 1 2 3 4
The variance of a random variable, by definition, is sum of the squares of the differences of the values
Thus, the pmf of X, denoted by g(x; p), is given by
f (x) 0.2 0.1 0.3 0.3 0.1 of the random variable from the mean value. So variance carries squared units of the original data, and
hence is a pure number often without any physical meaning. To overcome this problem, a second measure g(x; p) = q x−1 p, x = 1, 2, 3......
of variability is employed known as standard deviation and is defined as follows.
Show that the variance of the probability distribution for company B is greater than that for company Let X be a random variable with variance σ 2 . Then the standard deviation of X denoted by σ is the The random variable X with this pmf is called geometric random variable. Here the name ‘geometric’
A. the non-negative square root of X, that is, because the probabilities p, qp, q 2 p,.... in succession constitute a geometric progression. Given the value
2 = 0.6, µ = 2.0 and σ 2 = 1.6. of the parameter p, the probability distribution of the geometric random variable X is uniquely described.
Sol. µA = 2.0, σA B B σ=
p
V (X).
Ex. Calculate the variance of g(X) = 2X + 3, where X is a random variable with probability distribution Note: A large standard deviation implies that the random variable X is rather inconsistent and somewhat
hard to predict. On the other hand, a small standard deviation is an indication of consistency and stability. Mean, variance and mgf of geometric random variable
For the geometric random variable X, we have
x 0 1 2 3
2.1.4 Moments and moment generating function
∞
1 1 1 1
f (x)
X
4 8 2 8 Let X be a random variable and k be any positive integer. Then E(X k ) defines the kth ordinary moment (i) µX = E(X) = xg(x; p)
2
of X. x=1
Sol. µ2X+3 = 6, σ2X+3 = 4. ∞
Obviously, E(X) = µ is the first ordinary moment, E(X 2 ) is the second ordinary moment and so on. X
= xq x−1 p
Further, the ordinary moments can be obtained from the function E(etX ). For, the ordinary moments
x=1
Ex. Find the mean and variance of a random variable X with the pmf given by k
E(X k ) are coefficients of tk! in the expansion X∞
=p xq x−1
f (x) = cx, x = 1, 2, 3, ...., n
t2 x=1
E(etX ) = 1 + tE(X) + E(X 2 ) + ............ ∞
!
where c is a constant and n is some fixed natural number. 2! d X
=p qx
Also, we observe that dq
n x=1
X 2 ∞
Sol. Using the condition f (x) = 1, we get c(1 + 2 + ...... + n) = 1 or c = . X
n(n + 1) dk (∵ Term by term differentiation is permissible for the convergent power series q x within its interval of
E(X k ) = E(etX ) t=0 .
x=1
dtk x=1
7 8 9
convergence
|q|< 1.) Some illustrative examples Mean, variance and mgf of negative binomial random variable
d q
=p Ex. A fair coin is tossed again and again till head appears. If X denotes the number of tosses in this ex- For the negative binomial random variable X, we have
dq 1 − q x
1 periment, then X is a geometric random variable with the pmf g(x) = 21 , x = 1, 2, 3, ......... Here p = 12 .
∞
=p X
(1 − q)2 (i) µX = E(X) = x nb(x; k, p)
1 Ex. For a certain manufacturing process, it is known that, on the average, 1 in every 100 items is defec- x=k
= tive. What is the probability that the fifth item inspected is the first defective item found? ∞
p
X x − 1 k x−k
= x p q
(ii) σX2 = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2 k−1
∞
Sol. Here p = 1/100 = 0.01 and x = 5. So required probability is (0.01)(0.99)4 = 0.0096. x=k
∞
X 1 1 X x k x−k
=p x(x − 1)q x−1 + − 2 = k p q
p p Ex. At a “busy time,” a telephone exchange is very near capacity, so callers have difficulty placing their k
x=1 x=k
∞ calls. It may be of interest to know the number of attempts necessary in order to make a connection. ∞
X q k X x k+1 x−k
= pq x(x − 1)q x−2 − 2 Suppose that we let p = 0.05 be the probability of a connection during a busy time. Find the probability = p q
p p k
x=1 of a successful call in the fifth attempt. x=k
d2 ∞
q q k X y−1
=p 2 − 2 = pk+1 q y−(k+1) , where x = y − 1,
dq 1 − q p Sol. Here p = 1/100 = 0.05 and x = 5. So required probability is (0.05)(0.95)4 = 0.041. p (k + 1) − 1
y=k+1
d2 q q ∞
= pq 2 − k X
dq 1 − q p2 = nb(y; k + 1, p)
2.2.1 Negative Binomial Distribution p
d 1 q y=k+1
= pq − 2
dq (1 − q)2 p In geometric distribution, X is the number of trials to obtain the first success. Its more general version is k
2 q that we choose X as the number of trials to obtain the kth success. Then X is called a negative binomial = .1
= pq − p
(1 − q)3 p2 random variable with the values X = k, k + 1, k + 2, . . . .. Since the final trial
among the x trials would k
2 q =
= pq 3 − 2 result in a success, therefore the remaining k − 1 successes can occur in x−1k−1 ways from the x − 1 trials. p
p p Hence, the pmf of the negative binomial random variable X, denoted by nb(x; k, p), is given by ∞
2q q
X
= 2 − 2 (ii) E((X + 1)X) = (x + 1)x nb(x; k, p)
p p x − 1 k x−k x=k
q nb(x; k, p) = p q , x = k, k + 1, k + 2, . . . . ∞
k−1
= 2. X x − 1 k x−k
p = (x + 1)x p q
∞
k−1
If we make a change of variable via y = x − k, then x=k
∞
X
(iii) mX (t) = E(etX ) = etx g(x; p)
X x + 1 k x−k
x=1
k+y−1 k y
k+y−1 k y
= (k + 1)k p q
∞ nb(y; k, p) = p q = p q , y = 0, 1, 2, . . . . k+1
x=k
X k−1 y ∞
=p etx q x−1
k(k + 1) X x + 1 k+2 x−k
x=1
= p q
p2
∞ n n k+1
pX t x Here we have used the well known result: = . x=k
∞
= (qe ) x n−x k(k + 1) X
y−1
q
x=1 ∞ = pk+2 q y−(k+2) , where x = y − 2,
X k + y − 1 p2 (k + 2) − 1
p qet Note that (1 − q)−k = q y is a negative binomial series. y=k+2
= (t < − ln q) y ∞
q 1 − qet y=0 k(k + 1) X
∞ = nb(y; k + 2, p)
pet X p2
= Now, let us show that nb(x; k, p) = 1. For, y=k+2
1 − qet k(k + 1)
x=k
∞ ∞ = .1
Remark: Note that we can easily obtain E(X) and E(X 2 ) from the moment generating function mX (t) p2
X X x − 1 k x−k
nb(x; k, p) = p q k(k + 1)
by using k−1 =
x=k x=k 2
∞ p
dk
X y+k−1 k y
E(X k ) = [mX (t)]t=0 , = p q , where y = x − k k(k + 1) k k 2 kq
dtk k−1 So V (X) = E((X + 1)X) − E(X) + E(X)2 = − − 2 = 2
y=0 p2 p p p
∞
for k = 1 and k = 2 respectively. In other words the first and second t-derivatives of mX (t) at t = 0
X y+k−1 y ∞ ∞
= pk q X X x − 1
provide us E(X) and E(X 2 ), respectively. Hence we easily get mean and variance from the moment y (iii) mX (t) = E(etX ) = etx nb(x; k, p) = etx pk q x−k
y=0 k−1
x=k x=k
generating function. Verify! = pk (1 − q)−k ∞
X x−1
= pk p−k = pk q x−k etx
k−1
= 1. x=k
10 11 12
∞
X y + k − 1 k y t(y+k) So, the pmf of X, denoted by b(x; n, p), is given by Some illustrative examples
= p q e , where y = x − k
k−1
Ex. Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?
y=0 n n−x x
∞
y+k−1
b(x; n, p) = q p , x = 0, 1, 2, 3......, n.
= pk etk
X
(qet )y x
y Sol. Here n = 5, p = 1/6, x = 2, and therefore
y=0 The random variable X with this pmf is called binomial random variable. Here the name ‘binomial’ P (X = 2) = b(2; 5, 1/6) = 52 (1 − 1/6)5−2 (1/6)2 = 0.161.
= (pe ) (1 − qet )−k
t k
because the probabilities n0 q n , n1 q n−1 p,....., nn pn in succession are the terms in the binomial expansion
k
pet
of (q + p)n . Once the values of the parameters n and p are given/determined, the pmf uniquely describes Ex. The probability that a certain kind of component will survive a shock test is 3/4. Find the proba-
=
1 − qet the binomial distribution of X. bility that exactly 2 of the next 4 components tested survive.
Ex. In an NBA (National Basketball Association) championship series, the team that wins four games
out of seven is the winner. Suppose that teams A and B face each other in the championship games and Mean, variance and mgf of binomial random variable Sol. Here n = 4, p = 3/4, x = 2, and therefore
that team A has probability 0.55 of winning a game over team B. For the binomial random variable X, we have P (X = 2) = b(2; 4, 3/4) = 42 (1 − 3/4)4−2 (3/4)2 = 27/128.
(a) What is the probability that team A will win the series in 6 games?
n Ex. The probability that a patient recovers from a rare blood disease is 0.4. If 15 people are known to
(b) What is the probability that team A will win the series? X
(i) µX = E(X) = xb(x; n, p) have contracted this disease, what is the probability that (a) at least 10 survive, (b) from 3 to 8 survive,
x=0 and (c) exactly 5 survive?
Sol. (a) Here x = 6, k = 4, p = 0.55. So required probability is n
n n−x x
nb(4; 4, 0.55) = 6−1
X
4 6−4 = 0.1853.
4−1 (0.55) (1 − 0.55) = x q p
x Sol. (a) 0.0338 (b) 0.8779 (c) 0.1859
x=0
n
(b) The team A can win the championship series in 4th or 5th or 6th or the 7th game. So required X n − 1 n−x x−1
= np q p Ex. A large chain retailer purchases a certain kind of electronic device from a manufacturer. The manu-
probability is x−1
x=1
nb(4; 4, 0.55) + nb(5; 4, 0.55) + nb(6; 4, 0.55) + nb(7; 4, 0.55) = 0.6083. n−1 facturer indicates that the defective rate of the device is 3%.
X n − 1
= np q n−1−y py (where y = x − 1) (a) The inspector randomly picks 20 items from a shipment. What is the probability that there will be
y at least one defective item among these 20?
y=0
2.3 Binomial Distribution = np(p + q)n−1 = np. (b) Suppose that the retailer receives 10 shipments in a month and the inspector randomly tests 20 devices
per shipment. What is the probability that there will be exactly 3 shipments each containing at least one
The binomial distribution arises under the following conditions: n
X defective device among the 20 that are selected and tested from the shipment?
(i) The random experiment consists of a finite number n of independent trials. (ii) E(X(X − 1)) = x(x − 1)b(x; n, p)
(ii) Each trial results into two outcomes, namely success (S) and failure (F ), which have constant proba- x=0
Sol. (a) Denote by X the number of defective devices among the 20. Then X follows a binomial distri-
n
bilities p and q = 1 − p, respectively in each trial. X n n−x x
= x(x − 1) q p bution with n = 20 and p = 0.03. Hence, P (X ≥ 1) = 1 − P (X = 0) = 0.4562.
(iii) X denotes the number of successes in n trials. x
x=0
Then the sample space of the random experiment is n
X n − 2 n−x x−2 (b) In this case, each shipment can either contain at least one defective item or not. Hence, testing of
S = S0 ∪ S1 ∪ S2 ∪ · · · ∪ Sn , = n(n − 1)p2 q p
x−2 each shipment can be viewed as a Bernoulli trial with p = 0.4562 from part (a). Assuming independence
where the sets x=2
n−2
X n − 2
from shipment to shipment and denoting by Y the number of shipments containing at least one defective
S0 = {F F · · · F }, = n(n − 1)p 2 q n−2−y y
p (where y = x − 2) item, Y follows another binomial distribution with n = 10 and p = 0.4562. Therefore,
S1 = {SF · · · F, F S · · · F, ...., F F · · · S}, y P (Y = 3) = 0.1602.
y=0
........... = n(n − 1)p2 (p + q)n−2 = n(n − 1)p2 .
Sn = {SS · · ·S} Ex. In a bombing attack, there is a 50% chance that any bomb will strike the target. At least two direct
carry n0 , n1 , ......, nn number of elements (outcomes)
since out of n trials, no success can take place So 2
σX = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2 = n(n − 1)p2 + np − n2 p2 = npq hits are required to destroy the target. How many minimum number of bombs must be dropped so that
in n0 ways, can take place in n1 ways, and so on. Thus, the sample space S carries
one success the probability of hitting the target at least twice is more than 0.99?
n n n n n n
0 + 1 + ...... + n = (1 + 1) = 2 outcomes.
X
The random variable X being the number of successes in the n trials takes the values: X = 0, 1, 2, ...., n (iii) mX (t) = E(etX ) = xb(x; n, p)
x=0
Sol. Let n bombs must be dropped so that there is at least 99% chance to hit the target at least twice.
such that n Let X be random variable representing the number of bombs striking the target. Then X = 0, 1, 2, ...., n
X n n−x x
= etx
q p follows a a binomial distribution with, p = 1/2, and therefore
Outcome S0 S1 ..... S2 x
x=0 P (X ≥ 2) ≥ 0.99 or 1 − P (X = 0) − P (X = 1) ≥ 0.99
n
X=x 0 1 ..... n X n n−x t x It can be simplified to get 2n ≥ 100 + 100n. This inequality is satisfied if n ≥ 11. So at least 11 bombs
= q (pe )
n
n n n
x must be dropped so that there is at least 99% chance to hit the target at least twice.
q n−1 p pn x=0
P (X = x) q .....
0 1 n = (q + pet )n
Note: In the particular case n = 1, the binomial distribution is called Bernoulli distribution:
b(x; 1, p) = q 1−x px , x = 0, 1.
13 14 15
2.3.1 Multinomial Distribution 2.4 Hypergeometric Distribution Vandermonde’s identity. We can count these number of ways by considering that in the team of n persons,
x persons are men and remaining n − x persons are women. Then we end up with getting the left hand
The binomial experiment becomes a multinomial experiment if we let each trial have more than two The hypergeometric distribution arises under the following conditions: side of the Vandermonde’s identity.
possible outcomes. For example, the drawing of a card from a deck with replacement is a multinomial (i) The random experiment consists of choosing n objects without replacement from a lot of N objects given Now from the Vandermonde’s
identity, it follows that
experiment if the 4 suits are the outcomes of interest. that r objects possess a trait or property of our interest in the lot of N objects. X n n
X r N −r
In general, if a given trial can result in any one of k possible outcomes o1 , o2 ,. . . , ok with probabilities x n−x
(ii) X denotes the number of objects possessing the trait or property in the selected sample of size n. h(x; N, r, n) = N
= 1. Thus, h(x; N, r, n) is a valid pmf.
p1 , p2 ,. . . , pk , then the multinomial distribution gives the probability that o1 occurs x1 times, o2 occurs x=0 x=0 n
See the following venn diagram for an illustration.
x2 times, . . . , and ok occurs xk times in n independent trials, as follows:
n
N Mean, variance and mgf of hypergeometric random variable
f (x1 , x2 , . . . , xk ) = px1 1 px2 2 . . . pnxk , r 2 =
x1 , x2 , ..., xk For thehypergeometric random variable X, it can be shown that µX = E(X) = n N and σX
r N − r N − n
N -r r n .
where N N N −1
∞
n n! X
= , For, µX = E(X) = xh(x; N, r, n)
x1 , x2 , ..., xk x1 !x2 ! . . . xk ! x
n- x x X xr N
−r
x1 + x2 + · · · + xk = n, p1 + p2 + · · · + pk = 1. = x n−x
N
x n
Clearly, when k = 2, the multinomial distribution reduces to the binomial distribution. r−1 N −1−(r−1)
r X
x−1 n−1−(x−1)
Ex. The probabilities that a person goes to office by car, bus and train are 1/2, 1/4 and 1/4, respectively. n =n
N x N −1
n−1
Find the probability that the person will go to office 2 days by car, 3 days by bus and 1 day by train in r r−1 N −1−(r−1)
X x−1
n−1−(x−1)
the 6 days. =n , since = 1 being the sum of the probabilities for a hypergeometric distri-
It is easy to see that the x objects with the trait (by definition of X) are to be chosen from the r N N −1
−r x n−1
6! 1 2
1 3
1
objects in xr ways while the remaining n − x objects are to be chosen from the N − r objects in N bution with parameters N − 1, r − 1 and n − 1.
Sol. . n−x
−r
2!3!1! 2 4 4
ways. So the n objects carrying x items with the trait can be chosen from the N objects in xr N
r r − 1
n−x Likewise, it is easy to find that E(X(X − 1)) = n(n − 1) . Hence, we have
ways while N N −1
Ex. The complexity of arrivals and departures of planes at an airport is such that computer simulation n is the total numbers of ways in which n objects can be chosen from N objects. Therefore, N
r N − r N − n
is often used to model the “ideal” conditions. For a certain airport with three runways, it is known the pmf of X, denoted by h(x; N, r, n) is given by 2 = E(X(X − 1)) + E(X) − E(X)2 = n
σX .
that in the ideal setting the following are the probabilities that the individual runways are accessed by a r N −r
N N N −1
x n−x
randomly arriving commercial jet: ∴ h(x; N, r, n) = P (X = x) = N
. Just for the sake of completeness in line with the other distributions, some details of the moment
Runway 1: p1 = 2/9, n generating function of the hypergeometric distribution are given below. It is given by
Runway 2: p2 = 1/6, The random variable X with this pmf is called hypergeometric random variable. The hypergeometric
r N −r N −r
t
Runway 3: p3 = 11/18. distribution is characterized by the three parameters N , r and n. Note that X lies in the range max(0, n+ X 2 F1 (−n, −r; N − r − n + 1; e )
mX (t) = E(etX ) = etx x Nn−x = n .
What is the probability that 6 randomly arriving airplanes are distributed in the following fashion? r − N ) ≤ x ≤ min(n, r). So minimum value of x could be n + r − N instead of 0. To understand this, let
N
x n n
Runway 1: 2 airplanes, N = 30, r = 20 and n = 15. Then the minimum value of x is n + r − N = 15 + 20 − 30 = 5. For, there are
Runway 2: 1 airplane, only N − r = 10 objects without the trait in the 30 items. So a sample of 15 items certainly contains at Here 2 F1 is hypergeometric function defined as the sum of the infinite series
Runway 3: 3 airplanes least 5 objects with the trait. So in this case, the random variable X takes the values 5, 6, ..., 15. Notice
that the maximum value of x is min(n, r) = min(20, 15) = 15. Similarly, if we choose n = 25, the random ab z a(a − 1)b(b − 1) z 2
6! 2 2 1 11 3 2 F1 (a, b; c; z) = 1 + + + .......,
c(c − 1)
Sol. 2!1!3! 9 6 18 . variable X takes the values 15, 16, 17, 18, 19 and 20. In case, we choose n = 8, the random variable X c 1! 2!
takes the values 0, 1, 2 ..., 8. where a, b, c are constants, and z is variable of the hypergeometric function.
Next, let us check whether h(x; N, r, n) is a valid pmf. Note that x ∈ [max(0, n + r − N ), min(n, r)].
But we can take x ∈ [0, n] because in situations where this range is not [0, n], we have h(x; N, r, n) = 0. Also, note that
Also, know the Vandermonde’s identity: d ab
[2 F1 (a, b; c; z)] = 2 F1 (a + 1, b + 1; c + 1; z),
n n a
b dz c
X a b a+b X x n−x
= or = 1.
x n−x n a+b d2 a(a − 1)b(b − 1)
x=0 x=0 n [2 F1 (a, b; c; z)] = 2 F1 (a + 2, b + 2; c + 2; z).
dz 2 c(c − 1)
This identity is understandable in view of the following example. Following this, it can be shown that
d r
Suppose a team of n persons is chosen from a group of a men and b women. The number of ways µX = E(X) = mX (t) =n .
dt N
of choosing the team of n persons from the group of a + b persons is a+b
t=0
n , the right hand side of the
16 17 18
Similarly, by calculating 2 = E(X 2 ) −
second derivative of mX (t) at t = 0, the variance can be found as σX (N > 20n) or N/n > 20, then we may use binomial probabilities in place of hypergeometric probabilities. 2.5 Poisson Distribution
r N − r N − n
E(X)2 = n . Consider the pmf of the binomial random variable X:
N N N −1 Ex. A manufacturer of automobile tires reports that among a shipment of 5000 sent to a local distributor,
1000 are slightly blemished. If one purchases 10 of these tires at random from the distributor, what is the
n x
Some illustrative examples probability that exactly 3 are blemished? b(x; n, p) = p (1 − p)n−x , x = 0, 1, 2, · · · , n.
x
Ex. Suppose we randomly select 5 cards without replacement from a deck of 52 playing cards. What is
Sol. We find P (X = 3) = 0.2013 from binomial distribution, and P (X = 3) = 0.2015 from hypergeometric Let us calculate limiting form of the Binomial distribution as n → ∞, p → 0, and np = k is a constant.
the probability of getting exactly 2 red cards?
distribution. We have
n x
Sol. Here N = 52, r = 26, n = 5, x = 2, and therefore P (X = 2) = h(2; 52, 26, 5) = 0.3251. b(x; n, p) = p (1 − p)n−x
x
2.4.2 Generalization of the hypergeometric distribution n!
= px (1 − p)n−x
Ex. Lots of 40 components each are deemed unacceptable if they contain 3 or more defectives. The Consider a lot of N objects given that r1 , r2 , ....., rk objects possess different traits of our interest such that x!(n − x)!
procedure for sampling a lot is to select 5 components at random and to reject the lot if a defective is r1 + r2 + .... + rk = N . Suppose a lot of n objects is randomly chosen (without replacement) where x1 , x2 , n(n − 1)...(n − x + 1) x
= p (1 − p)n−x
found. What is the probability that exactly 1 defective is found in the sample if there are 3 defectives in ..., xk objects have the traits as in the r1 , r2 , ....., rk objects, respectively, such that x1 + x2 + .... + xk = n. x!
the entire lot? (np)(np − p)...(np − xp + p)
Then the probability of the random selection is = (1 − p)n−x
x!
r1 r2
rk
(np)(np − p)...(np − xp + p)
Sol. Here N = 40, r = 3, n = 5, x = 1, and therefore P (X = 1) = h(x; 40, 3, 5) = 0.3011. x x .... x = (1 − p)−x (1 − p)n
f (x1 , x2 , ...., xk ) = 1 2N k . x!
(k)(k − p)...(k − xp + p)
2.4.1 Binomial distribution as a limiting case of hypergeometric distribution
n
= (1 − p)−x (1 − p)k/p (Using np = k)
x!
Ex. Ten cards are randomly chosen without replacement from a deck of 52 playing cards. Find the
There is an interesting relationship between the hypergeometric and the binomial distributions. probability of getting 2 spades, 3 clubs, 4 diamonds and 1 heart? Thus, in the limit p → 0, we get
It can be shown that if the population size N → ∞ in such a way that the proportion of successes
r/N → p, and n is held constant, then the hypergeometric probability mass function approaches the Sol. Here N = 52, r1 = 13, r2 = 13, r3 = 13, r4 = 13, n = 10, x1 = 2, x2 = 3, x3 = 4, x4 = 1. So the (k)(k − 0)...(k − 0) e−k k x
p(x; k) = (1 − 0)−x e−k = ,
binomial probability mass function. required probability is x! x!
Proof: We have
13 13 13 13
known as the pmf of Poisson distribution.
2 3
r
N −r
4
52
1
.
x n−x r! (N − r)! n! · (N − n)! 10 Notice that the conditions n → ∞, p → 0 and np = k, intuitively refer to a situation where the sample
h(x; N, r, n) = N
= ·
x! · (r − x)! (n − x)! · (N − n − (r − x))! space of the random experiment is a continuous interval or medium (thus carrying infinitely many points,
n
N!
n r!/(r − x)! (N − r)! · (N − n)! n → ∞); the probability p of discrete occurrences of an event of interest is very small (p → 0) such that
= · · the mean number of occurrences np of the event remains constant k.
x N !/(N − x)! (N − x)! · (N − r − (n − x))!
Thus, formally the Poisson distribution arises under the following conditions:
n r!/(r − x)! (N − r)!/(N − r − (n − x))!
= · · (i) The random experiment consists of counting or observing discrete occurrences of an event in a continuous
x N !/(N − x)! (N − n + (n − x))!/(N − n)!
region or time intervala of some given size s, called as a Poisson process or Poisson experiment. For example,
Y x n−x
n (r − x + k) Y (N − r − (n − x) + m) counting number of airplanes landing on Delhi airport between 9am to 11am, observing the white blood
= · ·
x (N − x + k) (N − n + m) cells in a sample of blood etc. are Poisson experiments.
k=1 m=1
(ii) λ denotes the number of occurrences of the event of interest per unit measurement of the given region
of size s. Then k = λs is the expected or mean number of occurrences of the event in size s.
Now taking the large N limit for fixed r/N , n and x we get the binomial pmf,
(iii) X denotes the number of occurrences of the event in the region of size s.
n x n−x a
b(x; n, p) = p q Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of
x time, etc.
since Then X is called a Poisson random variable, and its pmf can be proved to be
(r − x + k) r e−k k x
lim = lim =p p(x; k) = , x = 0, 1, 2, ....
N →∞ (N − x + k) N →∞ N x!
and The Poisson distribution is characterized by the single parameter k.
(N − r − (n − x) + m) N −r
lim = lim = 1 − p = q.
N →∞ (N − n + m) N →∞ N
In practice, this means that we can approximate the hypergeometric probabilities with binomial prob-
abilities, provided N n. As a rule of thumb, if the population size is more than 20 times the sample size
19 20 21
Mean, variance and mgf of Poisson random variable lisecond? The moment generating function, mean and variance of the uniform random variable respectively read
∞
X as
(i) µX = E(X) = xp(x; k) Sol. Here k = 4 and x = 6. n n
x=0
1 X txi 1X
∞
mX (t) = e , µ= xi ,
X e−k k x Ex. Ten is the average number of oil tankers arriving each day at a certain port. The facilities at the n n
i=1 i=1
= x
x! port can handle at most 15 tankers per day. What is the probability that on a given day tankers have to
x=1 n n
!2
∞ be turned away? 1X 2 1X
X k x−1 σ2 = xi − xi .
= ke−k n n
(x − 1)! i=1 i=1
x=1 Sol. Here k = 10 and required probability is
= ke−k ek P (X > 15) = 1 − P (X ≤ 15) Ex. Suppose a fair die is thrown once. Let X denotes the number appearing on the die. Then X is a
=k 15
X discrete random variable assuming the values 1, 2, 3, 4, 5, 6. Also, P (X = 1) = P (X = 2) = P (X = 3) =
=1− P (X = x) = 1 − 0.9513 = 0.0487. P (X = 4) = P (X = 5) = P (X = 6) = 1/6. Thus, X is a uniform random variable.
(ii) σX2 = E(X 2 ) − E[X]2 = E(X(X − 1)) + E(X) − E(X)2 x=0
∞
X e−k k x
= x(x − 1) + k − k2 Note: We proved that the Binomial distribution tends to the Poisson distribution as n → ∞, p → 0 and
x!
x=2
∞ np = k remains constant. Thus, we may use Poisson distribution to approximate binomial probabilities
X k x−2
= k 2 e−k + k − k2 when n is large and p is small. As a rule of thumb this approximation can safely be applied if n > 50 and
(x − 2)! np < 5.
x=2
= k 2 e−k ek + k − k 2
=k Some illustrative examples
We notice that
Ex. In a certain industrial facility, accidents occur infrequently. It is known that the probability of an
2
µX = k = σX . accident on any given day is 0.005 and accidents are independent of each other.
∞
(a) What is the probability that in any given period of 400 days there will be an accident on one day?
(b) What is the probability that there are at most three days with an accident?
X
(iii) mX (t) = E(etX ) = etx p(x; k)
x=0
∞ −k x Sol. Let X be a binomial random variable with n = 400 and p = 0.005. Thus, np = 2. Using the Poisson
tx e k
X
= e approximation,
x!
x=0
∞ (a) P (X = 1) = e−2 21 = 0.271 and
X (ket )x 3
= e−k X
x! (b) P (X ≤ 3) = e−2 2x /x! = 0.857.
x=0
t x=0
= e−k eke
t
= ek(e −1) Ex. In a manufacturing process where glass products are made, defects or bubbles occur, occasionally
rendering the piece undesirable for marketing. It is known that, on average, 1 in every 1000 of these items
produced has one or more bubbles. What is the probability that a random sample of 8000 will yield fewer
than 7 items possessing bubbles?
Some illustrative examples
Ex. A healthy person is expected to have 6000 white blood cells per ml of blood. A person is tested Sol. This is essentially a binomial experiment with n = 8000 and p = 0.001. Since p is very close to
for white blood cells count by collecting a blood sample of size 0.001ml. Find the probability that the 0 and n is quite large, we shall approximate with the Poisson distribution using k = (8000)(0.001) = 8.
collected blood sample will carry exactly 3 white blood cells. Hence, if X represents the number of bubbles, the require probability is P (X < 7) = 0.3134.
e−6 63
Sol. Here λ = 6000, s = 0.001, k = λs = 6 and x = 3, and therefore P (X = 3) = p(3; 6) = 3! .
2.6 Uniform Distribution
Ex. In the last 5 years, 10 students of BITS-Pilani are placed with a package of more than one crore.
A random variable X is said to follow uniform distribution if it assumes finite number of values all with
Find the probability that exactly 7 students will be placed with a package of more than one crore in the
same chance of occurrence or equal probabilities. For instance, if the random variable X assumes n values
next 3 years.
x1 , x2 , .... , xn with equal probabilities P (X = xi ) = 1/n, then it is uniform random variable with pmf
e−6 67 given by
Sol. Here λ = 10/5 = 2, s = 3, k = λs = 6 and x = 7, and therefore P (X = 7) = p(7; 6) = 7! .
1
u(x) = , x = x1 , x2 , ...., xn .
Ex. During a laboratory experiment, the average number of radioactive particles passing through a n
counter in 1 millisecond is 4. What is the probability that 6 particles enter the counter in a given mil-
22 23 24
Contents Chapter 3
f (x)
Continuous Probability Distributions
3 Continuous Probability Distributions 2
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 a b
3.2 Uniform or Rectangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 x
Note: These lecture notes aim to present a clear and crisp presentation of some topics in Probability and
3.3 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Statistics. Comments/suggestions are welcome via the e-mail: sukuyd@gmail.com to Dr. Suresh Kumar. Z b
3.3.1 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 3.1: The shaded golden region gives the probability P (a ≤ X ≤ b) = f (x)dx.
3.3.2 Chi-Squared (χ2 ) Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 a
3.4 Normal Distribution (Gaussian distribution) . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Definitions
3.4.1 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Cumulative Distribution Function (cdf )
3.4.2 Seeing the values from normal distribution table . . . . . . . . . . . . . . . . . . . 18 Continuous Random Variable
3.5 Density of a dependent random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 A function F defined by
A continuous random variable is a variable X that takes all values x in an interval or intervals of real
3.5.1 Chebyshev’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 numbers, and its probability for a particular value is 0.
Z x
3.5.2 Approximation of Binomial distribution by Normal distribution . . . . . . . . . . . 24 F (x) = P (X ≤ x) = f (y)dy
For example, if X denotes the lifetime of a person, then it is a continuous random variable because −∞
3.5.3 Approximation of Poisson distribution by Normal distribution . . . . . . . . . . . . 26 lifetime happens to be continuous, no matter how small or big it is.
3.6 Student t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 is called cumulative distribution function (cdf) of X. See Figure 3.2, where the shaded golden region
3.6.1 Symmetry of the t-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 gives the value of P (X ≤ b) = F (b).
Probability Density Function (pdf )
3.7 F -distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A function f is called probability density function (pdf) of a continuous random variable X if it We notice that Z b Z a
satisfies the following conditions: (i) P (a ≤ X ≤ b) = f (y)dy − f (y)dy = F (b) − F (a).
−∞ −∞
(i) f (x) ≥ 0 for all x. (ii) F 0 (x) = f (x), provided the differentiation is permissible.
Z b
(ii) P (a ≤ X ≤ b) = f (x)dx, i.e., f (x) gives probability of X lying in any given interval [a, b],
Z ∞ a
(iii) f (x)dx = 1.
−∞
f (x)
(1) The condition f (x) ≥ 0 implies that the graph of y = f (x) lies on or above x-axis.
Z ∞
(2) The condition f (x)dx = 1 graphically implies that the total area under the curve y = f (x) is
−∞
Z b
1. Therefore, P (a ≤ X ≤ b) = f (x)dx implies the area under the curve y = f (x) from x = a to x = b
a
as shown in Figure 3.1
(3) P (a ≤ X ≤ b) = P (a < X ≤ b) = P (a < X < b) since P (X = a) = 0, P (X = b) = 0. a b
x
Figure 3.2: The shaded golden region gives the probability P (X ≤ b) = F (b).
1 2 3
3
Ex. Verify whether the function
12.5x − 1.25, 0.1 ≤ x ≤ 0.5 2
f (x) =
0 elsewhere
1
is density function of X. If so, find F (x), P (0.2 ≤ X ≤ 0.3), µ and σ2. 0.1875
∞ 0.5
0
0.1 0.2 0.3 0.5
Z Z
Sol. Please try the detailed calculations yourself. You will find f (x)dx = f (x)dx = 1. So f x
−∞ 0.1
is a density function. Also, see the shaded golden triangular region under the plot of f (x) in Figure 3.3.
The area of this right angled triangle is 12 (0.5 − 0.1)5 = 1, as expected. Figure 3.4: The shaded golden triangular region gives probability P (0.2 ≤ X ≤ 0.3) = 0.1875.
Further,
0,
Rx x < 0.1 0, x < 0.1 Z 0.5
F (x) = (12.5y − 1.25)dy, 0.1 ≤ x ≤ 0.5 = 6.25x2 − 1.25x + 0.625, 0.1 ≤ x ≤ 0.5 µ= xf (x)dx = 0.3667.
0.1
1, x > 0.5 1, x > 0.5
0.1
Z 0.5 Z 0.5 2
σ2 = x2 f (x)dx − xf (x)dx = 0.00883.
0.1 0.1
Ex. Show that mean of the random variable X with the density function given by
f (x) = x2 , 0 ≤ x ≤ 2, is 4/3. Find P (0 ≤ X ≤ 43 ).
4 5 6
Z ∞
3.2 Uniform or Rectangular Distribution a uniform distribution on the interval [0, 4]. Then the normalizing constant c is given by the density function condition f (x)dx = 1, which leads
(a) What is the probability density function? −∞
A random variable X is said to have uniform distribution if its density function f (x) is constant for all (b) What is the probability that any given conference lasts at least 3 hours? 1
to c = . Thus, the density function of the gamma random variable X reads
values of x, say, Γ(α)β α
Sol. (a)
k, a ≤ x ≤ b 1
−x
f (x) = xα−1 e β , x > 0
0, elsewhere 1 f (x) = Γ(α)β α
f (x) = 4, 0 ≤ x ≤ 4
Z ∞ 0, elsewhere
0, x≤0
Then the normalizing constant k is given by the density function condition f (x)dx = 1, which leads R4 1 Graphs of several gamma distributions are shown in Figure 3.6.
−∞ (b) P (X ≥ 3) = 3 4 dx = 14 .
1
to k = . Thus the continuous random variable X has the following uniform distribution:
b−a 1.0
1 3.3 Gamma Distribution α = 1, β =1
f (x) = b−a , a ≤ x ≤ b α = 1, β =2
0, elsewhere 0.8
Consider a Poisson process where Y is the Poisson random variable, and λ is the mean number of Poisson α = 2, β =1
events per unit time. Suppose T is waiting time for the occurrence of the first Poisson event. If no Poisson α = 4, β =1
In this case, the area under the curve is in the form of a rectangle as shown in Figure 3.5. That is 0.6
event occurs in the time interval [0, t], then T > t. It implies that
why the name rectangular is there.
f (x)
e−λt (λt)0
P (T > t) = P (Y = 0) = = e−λt . 0.4
0!
Thus, the cumulative distribution function for T is given by 0.2
1
b−a F (t) = P (0 ≤ T ≤ t) = 1 − P (T > t) = 1 − e−λt .
0.0
Therefore, the density function of T reads 0 1 2 3 4 5 6
f (x)
x
f (t) = F 0 (t) = λe−λt .
Figure 3.6: Graphs of several gamma distributions are shown for certain specified values of the parameters
Now, suppose T is waiting time for the occurrence of two Poisson events. If one Poisson event occurs α and β. The special gamma distribution for which α = 1 is called the exponential distribution.
in the time interval [0, t], then T > t. It implies that
The moment generating function of the gamma random variable can be derived as follows:
0 e−λt (λt)0 e−λt (λt)1
a b P (T > t) = P (Y = 0) + P (Y = 1) = + = e−λt + λte−λt . Z ∞ Z ∞
0! 1! 1 −x 1 (t− 1 )x
x mX (t) = E(etX ) = xα−1 e β etx dx = xα−1 e β dx.
Then, the cdf of T is given by Γ(α)β α 0 Γ(α)β α 0
Figure 3.5: The area of the shaded golden rectangular region gives the total probability 1. Substituting (t − β1 )x = y, and simplifying while using the definition of Gamma function, it is easy to find
F (t) = P (0 ≤ T ≤ t) = 1 − P (T > t] = 1 − e−λt − λte−λt .
You may easily derive the following for the uniform distribution. Therefore, the density function of T reads 1
mX (t) = (1 − βt)−α , where t< .
Z b β
1 b+a f (t) = F 0 (t) = λ2 te−λt .
µ = E(X) = xdx = ,
b−a a 2 Mean, variance and cdf are given by
Z b Z b 2 Thus, in general, if T is the waiting time for α Poisson events, then the density function of T reads
(b − a)2 d
1 1
σ 2 = E(X 2 ) − E(X)2 = x2 dx − xdx = . 1 1 α α−1 −λt µ = E(X) = mX (t) = αβ,
b−a a b−a a 12 f (t) = λα tα−1 e−λt = λ t e , dt t=0
Z b (α − 1)! Γ(α)
1 bt
e −e at " 2 #
mX (t) = E(etX ) = etx dx = d2
.. R ∞ −x α−1 d
b−a a (b − a)t where Γ(α) = 0 e x dx is the gamma function.1 Such a distribution is called gamma distribution. σ 2 = E(X 2 ) − E(X)2 = mX (t) − mX (t) = αβ 2 (α + 1) − (αβ)2 = αβ 2 ,
Formally, a continuous random variable X is said to have gamma distribution with parameters α > 0 dt2 t=0 dt
t=0
0, x<a
1 R 0, x<a and β > 0 if its density function is of the form: 1
Z x
− βy
F (x) = x x−a
, a≤x≤b F (x) = y α−1 e dy.
a dy, a ≤ x ≤ b
= Γ(α)β α
b−a b−a
( −x 0
1, x>b cxα−1 e β , x > 0
1, x>b f (x) = Note: Comparing the gamma distribution with it Poisson process appearance, we find β = 1/λ. There-
0, x≤0
Ex. Suppose that a large conference room at a certain company can be reserved for no more than 4 fore, β is the mean time between Poisson events since λ is the mean number occurrences of Poisson events
√
hours. Both long and short conferences occur quite often. Assume that the length X of a conference has 1
One should remember that Γ(1) = 1, Γ(α) = (α − 1)Γ(α − 1), Γ(1/2) = π and Γ(α) = (α − 1)! when α is an integer. in unit time. In reliability theory, where equipment failure often conforms to this Poisson process, β is
7 8 9
called mean time between failures. Many equipment breakdowns do follow the Poisson process, and thus µ = β, Ex. Based on extensive testing, it is determined that the time Y in years before a major repair is required
the gamma distribution does apply. Other applications include survival times in biomedical experiments for a certain washing machine is characterized by the density function
σ2 = β 2,
and computer response time.
1
Z x
− βy x 1
Ex. In a biomedical study with rats, a dose-response investigation is used to determine the effect of F (x) = e dy = 1 − e
−β
. f (y) = e−y/4 , y ≥ 0.
the dose of a toxicant on their survival time. The toxicant is one that is frequently discharged into the β 0
4
atmosphere from jet fuel. For a certain dose of the toxicant, the study determines that the survival time, Note that Y is an exponential random variable with µ = 4 years. The machine is considered a bargain if
in weeks, has a gamma distribution with α = 5 and β = 10. What is the probability that a rat survives The Memoryless Property of Exponential Distribution it is unlikely to require a major repair before the sixth year.
no longer than 60 weeks? (a) What is the probability P (Y > 6)?
The types of applications of the exponential distribution in reliability and component or machine lifetime
(b) What is the probability that a major repair is required in the first year?
problems are influenced by the memoryless (or lack-of- memory) property of the exponential distribution.
Sol. Let the random variable X be the survival time (time to death). The required probability is For example, in the case of, say, an electronic component where lifetime has an exponential distribution,
Sol. (a) P (Y > 6) = 0.2231.
1
Z 60
1 α−1 −x/β the probability that the component lasts, say, t hours, that is, P (T ≥ t), is the same as the conditional
P (X ≤ 60) = 5 x e dx Thus, the probability that the washing machine will require major repair after year six is 0.223. Of course,
β 0 Γ(5) probability P (T ≥ t0 + t|T ≥ t0 ). For,
it will require repair before year six with probability 0.777. Thus, one might conclude the machine is not
Z 6 P ((T ≥ t0 + t) ∩ (T ≥ t0 )) really a bargain.
1 4 −y P (T ≥ t0 + t|T ≥ t0 ) = .
= y e dy = 0.715. P (T ≥ t0 )
0 Γ(5) (b) The probability that a major repair is necessary in the first year is
Ex. It is known, from previous data, that the length of time in months between customer complaints Notice that both the events (T ≥ t0 + t) and (T ≥ t0 ) can occur if and only if T ≥ t0 + t. Therefore,
about a certain product is a gamma distribution with α = 2 and β = 4. Changes were made to tighten P (Y < 1) = 1 − e−1/4 = 1 − 0.779 = 0.221.
quality control requirements. Following these changes, 20 months passed before the first complaint. Does P (T ≥ t0 + t) 1 − F (t0 + t) e−λ(t0 +t)
P (T ≥ t0 + t|T ≥ t0 ) = = = = e−λt . Ex. Suppose that a system contains a certain type of component whose time, in years, to failure is given
it appear as if the quality control tightening was effective? P (T ≥ t0 ) 1 − F (t0 ) e−λt0
by T . The random variable T is modeled nicely by the exponential distribution with mean time to failure
Sol. We find P (X ≥ 20) = 1 − P (X < 20) = 1 − 0.96 = 0.04. Thus, it is reasonable to conclude that the This shows that the distribution of the remaining lifetime is independent of current age. So if the β = 5. If 5 of these components are installed in different systems, what is the probability that at least 2
quality control work was effective. component “makes it” to t0 hours, the probability of lasting an additional t hours is the same as the are still functioning at the end of 8 years?
Ex. Suppose that telephone calls arriving at a particular switchboard follow a Poisson process with an probability of lasting t hours. There is no “punishment” through wear that may have ensued for lasting
average of 5 calls coming per minute. What is the probability that up to a minute will elapse by the time the first t0 hours. Thus, the exponential distribution is more appropriate when the memoryless property Sol. The probability that a given component is still functioning after 8 years is given by
2 calls have come in to the switchboard? is justified. But if the failure of the component is a result of gradual or slow wear (as in mechanical
1 ∞ −t/5
Z
wear), then the exponential does not apply and either the gamma or the Weibull distribution may be P (T > 8) = e dt ≈ 0.2.
more appropriate. 5 8
Sol. Here the Poisson process applies, with time until 2 Poisson events following a gamma distribution
with β = 1/5 and α = 2. Denote by T the time in minutes that transpires before 2 calls come. The Let X represent the number of components functioning after 8 years. Then using the binomial
required probability is given by distribution with n = 5 and p = 0.2, we have
Z 1
1 −t/β
P (T ≤ 1) = 2
te dt P (X ≥ 2) = 1 − P (X = 0) − P (X = 1) = 0.2627.
0 β
Z 1
= 25 te−5t dt = 1 − e−5 (1 + 5) = 0.96.
0
Note: While the origin of the gamma distribution deals in time (or space) until the occurrence of α
Poisson events, there are many instances where a gamma distribution works very well even though there
is no clear Poisson structure. This is particularly true for survival time problems in both engineering and
biomedical applications.
10 11 12
3.3.2 Chi-Squared (χ2 ) Distribution 0.12
ν=7
√1
The special case of gamma distribution with β = 2 and α = ν/2, ν being some positive integer (called 0.10
σ 2π
as the degree of freedom), is named as Chi-Squared (χ2 ) distribution. The density function the χ2
random variable with ν degrees of freedom is given by 0.08
f (χ2 )
1 ν −χ2 0.06
f (x)
f (χ2 ) = ν
ν (χ2 ) 2 −1 e 2 , x > 0.
Γ 2 22 0.04
Graphs of χ2 distributions for certain specified values of the parameter ν are shown in Figure 3.7. 0.02
α = 0.1
0.00
0.5 0 6 12.017 18
ν =1 χ2
ν =2 0
µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ
0.4
ν =3 Figure 3.8: The shaded golden region area is α = 0.1. It is the area under the χ2 curve with 7 degrees of x
ν =4 freedom for χ2 ≥ χ20.1 = 12.017.
0.3 Figure 3.10: The area of the shaded golden region under the normal probability curve gives the total
f (χ2 )
0.12
ν=7 probability 1. The normal probability curve is symmetrical about the vertical red line x = µ. Therefore,
0.2 P (X ≤ µ] = 0.5 = P (X ≥ µ). Also, the maximum value of f (x) occurs at x = µ, and is given by
0.10
1
f (µ) = √ .
0.08 σ 2π
0.1
f (χ2 )
0.06
We have
0.0
Z ∞ −1 x − µ
!2
0.04
0 1 2 3 4 5 6 Z ∞
1
χ2 0.02
0.2
f (x)dx = √ e 2 σ dx
−∞ Z σ 2π −∞
∞ y2
1
Figure 3.7: Graphs of χ2 distributions are shown for certain specified values of the parameter ν. 0.00
0 6 8.383 12.017 18 = √ e− 2σ2 dy, where y = x − µ
χ2
σ 2π Z−∞
∞ y2
2
The mean and variance of the χ2 distribution are µχ2 = ν and σχ2 2 = 2ν. = √ e− 2σ2 dy
σ 2π Z0 √
Figure 3.9: The shaded golden region area is P (8.383 ≤ χ2 ≤ 12.017) = P (χ20.3 ≤ χ2 ≤ χ20.1 ) = P (χ2 ≥ ∞
2 −r − 21 2 y2
The chi-squared distribution plays a vital role in statistical inference. It has considerable applications χ20.3 ) − P (χ2 ≥ χ20.1 ) = 0.3 − 0.1 = 0.2. = √ e r σdr, where 2σ 2 = r
in both methodology and theory. It is an important component of statistical hypothesis testing and σ 2π Z ∞0 2
1 1
estimation. Topics dealing with sampling distributions, analysis of variance, and nonparametric statistics 3.4 Normal Distribution (Gaussian distribution) = √ e−r r− 2 dr
π 0
involve extensive use of the chi-squared distribution. We will see these in the later chapters. 1
Remarkably, when n, np and nq are large, then it can be shown that the binomial distribution is well = √ Γ(1/2)
π
How to see values from the χ2 -distribution table approximated by a distribution of the form: 1 √
=√ π
Table 3.22 gives values of χ2α for various values of α and ν. The areas, α, are the column headings; the 1 x − np
!2 π
− √ = 1.
degrees of freedom, ν, are given in the first column; and the table entries are the χ2 values. For example, n x n−x 1
f (x) = p q ∼p e 2 npq ,
the χ2 value with 7 degrees of freedom, leaving an area of 0.1 to the right, is χ20.1 = 12.017, as shown in x 2π(npq) Proof of mgf: We have mX (t) = E(etx )
Figure 3.8.
Z ∞ − 1 x − µ +tx
!2
known as the normal distribution.
Ex. Find P (8.383 ≤ χ2 ≤ 12.017) given the χ2 -distribution with 7 degrees of freedom 1 2
Formally, a continuous random variable X is said to follow Normal distribution with parameters µ = √ e σ dx
and σ if its density function is given by σ 2π Z−∞
Sol. From the table 3.22, we see that χ20.3 = 8.383 and χ20.1 = 12.017. It follows that P (8.383 ≤ χ2 ≤ 1 ∞
e− 2σ2 [(x−µ) −2σ tx] dx
1 2 2
12.017) = P (χ20.3 ≤ χ2 ≤ χ20.1 ) = P (χ2 ≥ χ20.3 ) − P (χ2 ≥ χ20.1 ) = 0.3 − 0.1 = 0.2, as shown in Figure 3.9. 1 x−µ
!2 = √
− σ 2π Z−∞
1 ∞
f (x) = √ e 2 σ , −∞ < x < ∞, −∞ < µ < ∞, σ > 0. = √
1
e− 2σ2 [x +µ −2µx−2σ tx] dx
1 2 2 2
σ 2π σ 2π Z−∞
∞
1
e− 2σ2 [x −2(µ+σ t)x+µ ] dx
1 2 2 2
13 14 15
Z ∞
1 2
e− 2σ2 [x−(µ+σ t)] dx
1 2 2 1 2
= √ eµt+ 2 σ t
σ 2π Z−∞ √1
∞ 2 2π
1 µt+ 12 σ 2 t2 − y2
= √ e e 2σ dy, where y = x − (µ + σ 2 t)
σ 2π Z−∞∞ y2
2 1 2 2
= √ eµt+ 2 σ t e− 2σ2 dy
φ(z)
σ 2π Z0 ∞ √
φ(z)
2 1 2 2 1 2 y2
= √ eµt+ 2 σ t e−r r− 2 σdr, where 2σ 2 = r
σ 2π Z ∞0 2 68.26%
1 1 2 2 1
= √ eµt+ 2 σ t e−r r− 2 dr
π 0
1 1 2 2
= √ eµt+ 2 σ t Γ(1/2) −3 −2 −1 0 1 2 3
π z
1 1 2 2√ 0
= √ eµt+ 2 σ t π −3 −2 −1 0 1 2 3
π Figure 3.12: The area of the shaded golden region under the standard normal probability curve gives the
1 2 2 z
= eµt+ 2 σ t probability corresponing to the 1σ confidence interval (µ − σ, µ + σ). So P (µ − σ < X < µ + σ) = P (−1 <
It follows that Z < 1) = 0.6826.
Figure 3.11: The area of the shaded golden region under the standard normal probability curve gives
d h 1 2 2
i
the total probability 1. The normal probability curve is symmetrical about the vertical red line z = 0.
Mean = E(X) = mX (t) = eµt+ 2 σ t (µ + σ 2 t) =µ.
dt t=0 t=0 Therefore, P (Z ≤ 0) = 0.5 = P (Z ≥ 0). Also, the maximum value of φ(z) occurs at z = 0, and is given
2 " 2 # 1
by φ(0) = √ .
2 2 d d
Variance = E(X ) − E(X) = mX (t) − mX (t) = µ 2 + σ 2 − µ2 = σ 2 . 2π
dt2 t=0 dt
t=0
Thus, the two parameters µ and σ in the density function of normal random variable X are its mean The normal probability curve is symmetric about the line X = µ (see Figure 3.10) or Z = 0 (see
φ(z)
φ(z)
and standard deviation, respectively. Figure 3.11). Therefore, we have
P (X < 0) = P (X > 0) = 0.5, P (−a < Z < 0) = P (0 < Z < a). 95.44% 99.73%
It deserves mention that the normal distribution is the most important continuous probability dis-
tribution in the entire field of statistics. It is also known as the Gaussian distribution. In fact, the The probabilities of the standard normal variable Z in the probability table of normal distribution
normal distribution was first described by De Moivre in 1733 as the limiting case of Binomial distribution are given in terms of cumulative distribution function Φ(z) = F (z) = P (Z ≤ z) (See the Normal Table −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
when number of trials is infinite. This discovery did not get much attention. Around fifty years later, 3.21). So we have z z
Laplace and Gauss rediscovered normal distribution while dealing with astronomical data. They found
that the errors in astronomical measurements are well described by normal distribution. It approximately P (a < Z < b) = P (Z < b) − P (Z < a) = F (b) − F (a). Figure 3.13: Left panel: The area of the shaded golden region under the standard normal probability
describes many phenomena that occur in nature, industry, and research. For example, physical measure-
curve gives P (µ − 2σ < X < µ + 2σ) = P (−2 < Z < 2) = 0.9544. Right panel: The area of the shaded
ments in areas such as meteorological experiments, rainfall studies, and measurements of manufactured From the normal table, it can be found that
golden region under the standard normal probability curve gives P (µ − 3σ < X < µ + 3σ) = P (−3 <
parts are often more than adequately explained with a normal distribution. In addition, errors in sci-
P (|X −µ| < σ) = P (µ−σ < X < µ+σ) = P (−1 < Z < 1) = F (1)−F (−1) = 0.8413−0.1587 = 0.6826. Z < 3) = 0.9973.
entific measurements are extremely well approximated by a normal distribution.The normal distribution
finds enormous application as a limiting distribution. For instance, under certain conditions, the normal
This shows that there is approximately 68% probability that the normal variable X lies in the interval
distribution provides a good continuous approximation to the binomial and hypergeometric distributions. 3.4.2 Seeing the values from normal distribution table
(µ − σ, µ + σ), as shown in Figure 3.12. We call this interval as the 1σ confidence interval of X.
Similarly, the probabilities of X in 2σ and 3σ confidence intervals are respectively, are given by As mentioned earlier, the Normal Table 3.21 provides values of cdf F (z) of the standard normal variable
Note: If X is a normal random variable with mean µ and variance σ 2 , then we write X ∼ N (µ, σ 2 ).
Z. This table provides values of F (z) from z = −3.49 to z = 3.49 with F (−3.49) = 0.0002 and F (3.49) =
P (|X − µ| < 2σ) = P (µ − 2σ < X < µ + 2σ) = P (−2 < Z < 2) = 0.9544, 0.9998. Thus it covers almost 100% region under the normal probability curve. The normal table reads
3.4.1 Standard Normal Distribution like the following table.
P (|X − µ| < 3σ) = P (µ − 3σ < X < µ + 3σ) = P (−3 < Z < 3) = 0.9973.
Let Z = X−µσ . Then E(Z) = 0 and Var(Z) = 1. We call Z as the standard normal variate and we write
Here the first column shows the z values upto first decimal place: −3.4, −3.3,..., 3.3, 3.4 and the
Z ∼ N (0, 1). Its density function reads as For geometrical clarity, see the left and right panels in Figure 3.13. second column shows the corresponding F (z) values. The third column shows the values of F (z) cor-
responding to the z values: −3.41, −3.31, ..., 3.31, 3.41, so on and so forth. For instance, F (3.42) = 0.9997.
z2
1 −
φ(z) = √ e 2 , −∞ < z < ∞. Note. In the following examples, we will refer to the Normal Table 3.21 whenever the values of F (z) are
2π
needed.
The corresponding cumulative distribution function is given by
z z
z2
1
Z Z
−
Φ(z) = φ(z)dz = √ e 2 dz. Ex. A random variable X is normally distributed with mean 9 and standard deviation 3. Find P (X ≥ 15),
−∞ 2π −∞
16 17 18
z 0.00 0.01 0.02 ... 0.09 Ex. An electrical firm manufactures light bulbs that have a life, before burn-out, that is normally dis- Likewise, if Y = g(X) is an increasing function of X, we find
tributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that FY (y) = P (Y ≤ y) = P (g(X) ≤ y)
−3.4 0.0003 0.0003 0.0003 ... 0.0002
a bulb burns between 778 and 834 hours. = P (g −1 (g(X)) ≤ g −1 (y)) (∵ g is increasing function, so is g −1 .)
−3.3 0.0005 0.0005 0.0005 ... 0.0003 = P (X ≤ g −1 (y))
Sol. P (778 < X < 834) = P (−0.55 < Z < 0.85) = P (Z < 0.85) − P (Z < −0.55) = 0.8023 − 0.2912 =
... ... ... ... ... ... 0.5111. ∴ FY (y) = FX (g −1 (y)).
0 0.5000 0.5040 0.5080 ... 0.5359
Ex. In an industrial process, the diameter of a ball bearing is an important measurement. The buyer sets It leads to
... ... ... ... ... ... specifications for the diameter to be 3.0± 0.01 cm. The implication is that no part falling outside these −1
specifications will be accepted. It is known that in the process the diameter of a ball bearing has a normal fY (y) = fX (g −1 (y)) dg dy(y) .
3.3 0.9995 0.9995 0.9995 ... 0.9997
distribution with mean µ = 3 and standard deviation σ = 0.005. On average, how many manufactured
3.4 0.9997 0.9997 0.9997 ... 0.9998 ball bearings will be scrapped? Ex. (Lognormal Distribution) If a random variable X follows normal distribution, then the distribution
of the random variable Y = eX is called lognormal distribution. Determine cdf, pdf, mean and variance
Sol. P (X < 2.99) + P (X > 3.01) = P (Z < −2) + P (Z > 2) = 2(0.0228) = 0.0456. of Y . Also, find P (a ≤ Y ≤ b)
P (X ≤ 15) and P (0 ≤ X ≤ 9). As a result, it is anticipated that, on average, 4.56% of manufactured ball bearings will be scrapped.
Sol. Here Y = eX is increasing function of X. So the cdf of Y is given by
Sol. We have Z = X−9
∴ P (X ≥ 15) = P (Z ≥ 2) = 1 − F (2) = 1 − 0.9772 = 0.0228. Ex. Gauges are used to reject all components for which a certain dimension is not within the specification
3 .
1.5 ± d. It is known that this measurement is normally distributed with mean 1.5 and standard deviation FY (y) = FX (ln y).
P (X ≤ 15) = 1 − 0.0228 = 0.9772 0.2. Determine the value d such that the specifications cover 95% of the measurements.
It implies that
P (0 ≤ X ≤ 9) = P (−3 ≤ Z ≤ 0) = F (0) − F (−3) = 0.5 − 0.0013 = 0.4987. Sol. From the Normal Table 3.21, we notice that P (Z < −1.96) = 0.025. So by symmetry of normal 1
distribution, it follows that FY0 (y) = FX0 (ln y). .
y
Ex. Given a normal distribution with µ = 40 and σ = 6, find the value of x that has P (−1.96 < Z < 1.96) = 0.95. Therefore
(a) 45% of the area to the left and 1
(1.5 + d) − 1.5 =⇒ fY (y) = fX (ln y). .
(b) 14% of the area to the right. 1.96 = . y
0.2
Therefore the density function of the lognormal random variable Y = eX reads
Sol. (a) We require a z value that leaves an area of 0.45 to the left. From the Normal Table 3.21, we find So we get, d = (0.2)(1.96) = 0.392.
1
1 ln y−µ 2
P (Z < −0.13) = 0.45, so the desired z value is −0.13. Hence, √ e− 2 ( σ ) , y > 0
fY (y) = yσ 2π
x = σz + µ = (6)(−0.13) + 40 = 39.22. 3.5 Density of a dependent random variable
0, y≤0
(b) This time we require a z value that leaves 0.14 of the area to the right and hence an area of 0.86 to Let X be a continuous random variable with density fX . Let Y be some random variable dependent on where µ and σ are mean and variance of the normal random variable X. The mean and variance of the
the left. Again, from Normal Table, we find P (Z < 1.08) = 0.86, so the desired z value is 1.08 and X via the relation Y = g(X), where g is strictly monotonic and differentiable. Then it can be proved lognormal random variable Y can be shown to be
that the density fY of Y is given by
1 2
x = σz + µ = (6)(1.08) + 40 = 46.48. −1 E(Y ) = eµ+ 2 σ ,
dg (y)
fY (y) = fX (g −1 (y)) .
2µ+σ 2 2
Ex. In a normal distribution, 12% of the items are under 30 and 85% are under 60. Find the mean and dy V (Y ) = e (eσ − 1).
standard deviation of the distribution.
Proof. Assuming that Y = g(X) is decreasing function of X, we have We can determine the probability of lognormal random variable Y using the normal distribution table
Sol. Let µ be mean and σ be standard deviation of the distribution. Given that P (X < 30) = 0.12 and of X since
P (X < 60) = 0.85. Let z1 and z2 be values of the standard normal variable Z corresponding to X = 30 FY (y) = P (Y ≤ y] = P (g(X) ≤ y)
= P (g −1 (g(X)) ≥ g −1 (y)) (∵ g is decreasing function, so is g −1 .) P (a ≤ Y ≤ b) = P (a ≤ eX ≤ b) = P (ln a ≤ X ≤ ln b).
and X = 60 respectively so that P (Z < z1 ) = 0.12 and P (Z < z2 ) = 0.85. From the Normal Table 3.21,
we find z1 ≈ −1.17 and z2 ≈ 1.04 since F (−1.17) = 0.121 and F (1.04) = 0.8508. = P (X ≥ g −1 (y))
Thus, the probability of Y in the interval [a, b] is given by the probability of the normal random variable
Finally, solving the equations, 30−µ = −1.17 and 60−µ = 1.04, we find µ = 45.93 and σ = 13.56. = 1 − P (X ≤ g −1 (y))
σ σ X in the interval [ln a, ln b].
= 1 − FX (g −1 (y)).
Ex. A certain type of storage battery lasts, on average, 3.0 years with a standard deviation of 0.5 year.
Assuming that battery life is normally distributed, find the probability that a given battery will last less Since derivative of cdf gives density function, so derivative of both sides with respect to y gives
than 2.3 years. −1
−1
fY (y) = −fX (g −1 (y)) dg dy(y) = fX (g −1 (y)) dg dy(y) ,
Sol. P (X < 2.3) = P (Z < −1.4) = 0.0808.
−1
dg (y) dg −1 (y) −1
where dy = − dy , g being a decreasing function.
19 20 21
3.5.1 Chebyshev’s Inequality Ex. Number of students visiting a zoo on weekend is a random variable with mean 18 and S.D. 2.5. Use Ex. If X is Gamma random variable with α = 0.05 and β = 100, find an upper bound on P ((X − 4)(X −
Chebyshev’s inequality to estimate the minimum probability that between 8 to 28 students will visit the 6) ≥ 999).
If X is normal random variable with mean µ and variance σ 2 , then P (|X − µ| < kσ) = P (|Z| < k) =
zoo on a given weekend?
F (k) − F (−k). However, if X is any random variable, then the rule of thumb for the required probability
Sol. We have
is given by the Chebyshev’s inequality as stated below.
Sol. Let X be the number of students visiting the zoo on weekend. Then mean and S.D. of X are µ = 18 P ((X − 4)(X − 6) ≥ 999)
and σ = 2.5, respectively. So by Chebyshev’s inequality, we have = P ((X − 5 + 1)(X − 5 − 1) ≥ 999)
If X is a random variable with mean µ and variance σ2,then
= P ((X − 5)2 − 1 ≥ 999)
1 1
P (|X − µ| < kσ) ≥ 1 − 2 . P (18 − 2.5k < X < 18 + 2.5k)) ≥ 1 − . = P ((X − 5)2 ≥ 1000)
k k2 2
= 1 − P ((X −√5) < 1000) √
Proof. By definition of variance, we have Choosing k = 4, we get = 1 − P (−10 10√ < (X − 5) < 10 √10)
Z ∞ = 1 − P (5 − 10 10 < X < 5 + 10 10).
1
(x − µ)2 f (x)dx = σ 2 P (8 < X < 28) ≥ 1 − = 0.9375.
−∞ 16
Z µ−kσ Z µ+kσ Z ∞ Given that X is Gamma random
p variable
p with α = 0.05
√ and β = 100. So its mean is µ = αβ =
=⇒ (x − µ)2 f (x)dx + (x − µ)2 f (x)dx + (x − µ)2 f (x)dx = σ 2 So the required minimal probability is 0.9375. 0.05(100) = 5, and S.D. is σ = αβ 2 = 0.05(100)2 = 10 5. So by Chebyshev’s inequality, we have
Z−∞
µ−kσ Z µ−kσ
∞
µ+kσ
Z µ+kσ 1 √ √ 1
2 2 2 Ex. If X is geometric random variable with density f (x) = 2x , (x = 1, 2, 3, ...), find P (|X − 2| < 2). P (5 − 10 5k < X < 5 + 10 5k) ≥ 1 − 2 .
=⇒ (x − µ) f (x)dx + (x − µ) f (x)dx ≤ σ (∵ (x − µ)2 f (x)dx ≥ 0) k
Also, use Chebyshev’s inequality to estimate P (|X − 2| < 2).
Z−∞
µ−kσ Z ∞ µ+kσ µ−kσ
√
=⇒ 2 2
k σ f (x)dx + 2 2 2
k σ f (x)dx ≤ σ (∵ (x − µ) ≥ k σ 2 for x ≤ µ − kσ or x ≥ µ + kσ)
2 2 Choosing k = 2, we get
Sol. We have
Z−∞
µ−kσ Z ∞ µ+kσ √ √ 1
1 P (5 − 10 10 < X < 5 + 10 10) ≥ 1 − = 0.5.
=⇒ f (x)dx + f (x)dx ≤ 2 P (|X − 2| < 2) = P (−2 < X − 2 < 2) = P (0 < X < 4) 2
−∞ µ+kσ k
Z µ−kσ Z ∞
1 1 1 1 7 It follows that
=⇒ 1 − f (x)dx − f (x)dx ≥ 1 − 2 = P (X = 1) + P (X = 2) + P (X = 3) = + 2 + 3 = = 0.875
k 2 2 2 8 √ √
Z ∞ −∞ Z µ−kσ µ+kσ Z ∞ Z ∞ 1
q
q
√ P ((X − 4)(X − 6) ≥ 999) = 1 − P (5 − 10 10 < X < 5 + 10 10) ≤ 1 − 0.5 = 0.5.
1 Now X is geometric random variable with mean µ = p = 2 and S.D. σ = p2
= 2. So by Chebyshev’s
=⇒ f (x)dx − f (x)dx − f (x)dx ≥ 1 − 2 (∵ f (x)dx = 1)
−∞ −∞ µ+kσ k −∞ inequality, we have
Z µ+kσ
1 3.5.2 Approximation of Binomial distribution by Normal distribution
=⇒ f (x)dx ≥ 1 − 2 . √ √ 1
µ−kσ k P (2 − 2k < X < 2 + 2k) ≥ 1 − . If X is a Binomial random variable with parameters n and p, then X approximately follows a normal
1 k2
=⇒ P (µ − kσ < X < µ + kσ) ≥ 1 − 2 . distribution with mean np and variance np(1 − p) provided n is large. Here the word “large” is quite
√
k Choosing k = 2, we get vague. In strict mathematical sense, large n means n → ∞. However, for most of the practical purposes,
1
=⇒ P (|X − µ| < kσ) ≥ 1 − 2 . the approximation is acceptable if the values of n and p are such that either p ≤ 0.5 and np > 5 or p > 0.5
k 1
P (0 < X < 4) ≥ 1 − = 0.5. and n(1 − p) > 5.
Note that the Chebyshev’s inequality does not yield the exact probability of X to lie in the interval 2 It turns out that the normal distribution with µ = np and σ 2 = np(1 − p) not only provides a very
(µ − kσ, µ + kσ) rather it gives the minimum probability for the same. However, in case of normal random Ex. How many times should we toss a fair coin so that at least 0.99 probability is ensured that the accurate approximation to the binomial distribution when n is large and p is not extremely close to 0 or
variable, the probability obtained is exact. For example, consider the 2σ interval (µ − 2σ, µ + 2σ) for X. proportion of heads occurs between 0.45 and 0.55? 1 but also provides a fairly good approximation even when n is small and p is reasonably close to 0.5.
Then, Chebyshev’s inequality gives P (|X − µ| < 2σ) ≥ 1 − 14 = 0.75. In case, X is normal variable, we Sol. Let n be number of tosses, and X be the number of heads. Then the proportion of heads is X/n. Ex. To illustrate the normal approximation to the binomial distribution, in Fig. 3.14, we first draw the
get the exact probability P (|X − µ| < 2σ) = 0.9544. However, the advantage of Chebyshev’s inequality is So we need to determine n such that P (0.45 < X/n < 0.55) ≥ 0.99. histogram for a binomial distribution with n = 4 and p = 0.5 and then superimpose the particular normal
√ √
that it applies to any random variable of known mean and variance. Also note that the above proof may Here X follows binomial distribution with p = 0.5, mean µ = np = 0.5n and S.D. σ = npq = 0.5 n. curve having the same mean and variance as the binomial variable X. Hence, we draw a normal curve
be done for discrete random variable as well. So the Chebyshev’s inequality is true for discrete random So by Chebyshev’s inequality, we have with µ = np = (4)(0.5) = 2 and σ 2 = npq = (4)(0.5)(0.5) = 1.
variable as well. √ √ 1
P (0.5n − k(0.5) n < X < 0.5n + k(0.5) n) ≥ 1 − 2 . Now suppose we wish to calculate P (1 ≤ X ≤ 3). From the binomial distribution, we have
k
Ex. A random variable X with unknown probability distribution has mean 8 and S.D. 3. Use Cheby-
√ 4 6 4 14
shev’s inequality to find a lower bound of P (−7 < X < 23). Choosing k = 0.1 n, we get P (1 ≤ X ≤ 3) = P (X = 1) + P (X = 2) + P (X = 3) = + + = = 0.875.
16 16 16 16
100
Sol. Here µ = 8 and σ = 3. So by Chebyshev’s inequality, we have P (0.45n < X < 0.55n) ≥ 1 − . Note that geometrically in the histogram, 164 6
+ 16 4
+ 16 is the sum of the areas of the vertical bars each of
n
1 width unity with centers at 1, 2, 3. We see that it can be approximated by the area under the blue curve
P (8 − 3k < X < 8 + 3k) ≥ 1 − 2 . 100
k =⇒ P (0.45 < X/n < 0.55) ≥ 1 − . from X = 0.5 to X = 3.5. So using the normal distribution approximation, we have
In order to get lower bound of P (−7 < X < 23), we choose k = 5. We get n
1 So the required condition P (0.45 < X/n < 0.55) ≥ 0.99 is satisfied if 1 − 100
n ≥ 0.99, that is, if n ≥ 10000.
P (0.5 ≤ X ≤ 3.5) = P (−1.5 ≤ Z ≤ 1.5) = F (1.5) − F (−1.5) = 0.9332 − 0.0668 = 0.8664,
P (−7 < X < 23) ≥ 1 − = 0.96.
25
22 23 24
3.5.3 Approximation of Poisson distribution by Normal distribution 3.6 Student t-Distribution
Let X be a Poisson random variable with parameter λs. Then for large λs, X is approximately normal If Z is a standard normal variable and χ2ν is anpindependent chi-squared random variable with ν degrees
with mean λs and variance λs. It follows that the Poisson probabilities can be approximated by normal of freedom, then the random variable Tν = Z/ χ2ν /ν is said to follow the Student t-distribution2 with ν
distribution N (λs, λs), by using the 0.5 unit correction on both sides of the given range of X as we did degrees of freedom.
in the binomial case. The density function of a Tν random variable reads as
−(ν+1)/2
t2
Γ[(ν + 1)/2]
f (t) = √ 1+ , −∞ < t < ∞.
Γ(ν/2) πν ν
Since f (−t) = f (t), the graph of this density function is symmetric about the line t = 0 (see Figure 3.15).
ν
Further, its mean is µt = 0 and variance is σt2 = ν−2 , (ν > 2). So σt2 tends to 1 as ν tends to ∞. Thus the
t-distribution tends to the standard normal distribution as the number of degrees of freedom ν increases.
0.40
ν =1
0.35
ν =2
0.30 ν =3
ν =4
0.25
f (t)
0.20
Figure 3.14: Histogram of the binomial distribution with n = 4, p = 0.5 where X takes the values
0.15
0, 1, 2, 3, 4 with probabilities 1/16, 4/16, 6/16, 4/16 and 1/16, respectively. The blue curve is the normal
probability curve with µ = np = (4)(0.5) = 2 and σ 2 = npq = (4)(0.5)(0.5) = 1. 0.10
0.05
which is a good approximation to the binomial probability 0.875. Note that while approximating the 0.00
probability using the normal distribution, we make half-unit correction on both sides of the given range −6 −4 −2 0 2 4 6
of X. t
Ex. The probability that a patient recovers from a rare blood disease is 0.4. If 100 people are known to Figure 3.15: The Student t-distributions are plotted for some specific degrees of freedom. We see that
have contracted this disease, what is the probability that fewer than 30 survive? the t-distribution is symmetric about the vertical line t = 0.
√ p
The t-distribution is used extensively in problems that deal with inference about the population mean
Sol. Here µ = np = (100)(0.4) = 40, σ = npq = (100)(0.4)(0.6) = 4.899. We need the area to the
left of x = 29.5. The corresponding z value is z = (29.5 − 40)/4.899 = 2.14. Therefore, the required or in problems that involve comparative samples (i.e., in cases where one is trying to determine if means
probability is P (X < 30) ≈ P (Z < −2.14) = 0.0162. from two samples are significantly different).
Note. In the above example, the binomial random variable X < 30 implies that X takes the values 0, 1, How to see values from the t-distribution table
2,...., 29. So in the normal approximation, we should have chosen P (−0.5 ≤ X ≤ 29.5). But notice that Table 3.23 gives values of tα for various values of α and ν. The areas, α, are the column headings; the
P (−0.5 ≤ X ≤ 29.5) ≈ P (X < 30) since P (−8.27 ≤ Z ≤ 2.14) ≈ P (Z < 2.14). degrees of freedom, ν, are given in the first column; and the table entries are the t values. Therefore it is
Ex. A multiple-choice quiz has 200 questions, each with 4 possible answers of which only 1 is correct. customary to let tα represent the t-value above which we find an area equal to α.
What is the probability that sheer guesswork yields from 25 to 30 correct answers for the 80 of the 200
problems about which the student has no knowledge? For example, the t-value with 10 degrees of freedom leaving an area of 0.1 to the right is t0.1 = 1.372.,
√ p as shown in Figure 3.16.
Sol. Here µ = np = (80)(0.25) = 20, σ = npq = (80)(0.25)(0.75) = 3.873. We need the area
between x1 = 24.5 and x2 = 30.5. The corresponding z values are z1 = (24.5 − 20)/3.873 = 1.16 and
z2 = (30.5 − 20)/3.873 = 2.71. Therefore, the required probability is P (25 ≤ X ≤ 30) ≈ P (1.16 < Z < Ex. Find P (0.26 ≤ t ≤ 1.812) given the t-distribution with 7 degrees of freedom
2.71) = P (Z < 2.71) − P (Z < 1.16) = 0.9966 − 0.8770 = 0.1196.
2
The probability distribution of T was first published in 1908 in a paper written by W. S. Gosset. At the time, Gosset was
employed by an Irish brewery that prohibited publication of research by members of its staff. To circumvent this restriction,
he published his work secretly under the name “Student”. Consequently, the distribution of T is usually called the Student
t-distribution or simply the t-distribution.
25 26 27
0.20
0.40
is given by the density function:
0.15 ν = 10
0.35
Γ[(ν1 + ν2 )/2](ν1 /ν2 )ν1 /2 f (ν1 /2)−1
0.10 h(f ) = , f > 0,
0.30 Γ(ν1 /2)Γ(ν2 /2) (1 + ν1 f /ν2 )(ν1 +ν2 )/2
0.05
α = 0.1
0.25 and h(f ) = 0 for f ≤ 0.
0.00
f (t)
h(f )
0.40 0.4
ν = 10 Figure 3.18: t0.95 = −t0.05 = −1.812 for 10 degrees of freedom.
0.35
0.30 0.2
0.25
0.0
0 1 2 3 4 5 6
f (t)
0.20
f
0.15
0.10 Figure 3.19: The F -distributions are plotted for some specific degrees of freedom.
0.35
0.05 The F -distribution is used in two-sample situations to draw inferences about the population vari-
0.00 ances. However, the F -distribution can also be applied to many other types of problems involving sample
-4 -1.812 0.26 1.812 4 variances. In fact, the F -distribution is called the variance ratio distribution.
t
28 29 30
How to see values from the F -distribution table
Let fα be the f -value above which we find an area equal to α just as in case of t-distribution. Table 3.24
gives values of fα only for α = 0.05 for various combinations of the degrees of freedom ν1 and ν2 . Hence,
the f -value with 6 and 10 degrees of freedom, leaving an area of 0.05 to the right, is f0.05 = 3.22, as shown
in Figure 3.20.
0.7 ν1 = 6, ν2 = 10
0.6
0.5
h(f ) 0.4
0.3
0.2
0.1
α = 0.05
0.0
0 1 2 3.22 4 5 6
f
Figure 3.20: The shaded golden region area is α = 0.1. It is the area under the F -distribution curve with
6 and 10 degrees of freedom for f ≥ f0.05 = 3.22.
Likewise, Table 3.25 gives values of fα only for α = 0.01 for various combinations of the degrees of
freedom ν1 and ν2 .
By means of the following theorem, the F -distribution Tables can also be used to find values of f0.95
and f0.99 .
1
f1−α (ν1 , ν2 ) = .
fα (ν1 , ν2 )
Thus, the f -value with 6 and 10 degrees of freedom, leaving an area of 0.95 to the right, can be calculated
as
1 1
f0.95 (6, 10) = = = 0.246.
f0.05 (10, 6) 4.06
34 35 36
4.1.3 Marginal density functions
The marginal density of X, denoted by fX , is defined as
X
fX (x) = f (x, y).
Y =y
is called joint probability mass function of (X, Y ). Ex. Suppose in a random experiment of toss of two fair coins, X denotes the number of heads and Y
Suppose X assumes the m values x1 , x2 , ..., xm and Y assumes the n values y1 , y2 , ..., yn . Then it is denotes the number of tails.
convenient and informative to write the joint distribution in the following tabular form: (i) Find the joint density f (x, y) of (X, Y ).
(ii) Find the cdf F (x, y).
X/Y y1 y2 ... yn (iii) Find the marginal densities fX (x) and fY (y).
x1 f (x1 , y1 ) f (x2 , y2 ) ... f (xn , yn ) (iv) Are the variables X and Y independent?
x2 f (x2 , y1 ) f (x2 , y2 ) ... f (x2 , yn ) (v) Find Cov(X, Y ).
... ... ... ... ...
xm f (xm , y1 ) f (xm , y2 ) ... f (xm , yn ) Sol. (i) The sample space of the given random experiment is
S = {HH, HT, T H, T T }.
4.1.2 Cumulative distribution function Given that X denotes the number of heads and Y denotes the number of tails. So X = 0, 1, 2 and
Y = 0, 1, 2. The joint pmf f (x, y) is given by
The cdf of (X, Y ) is given by f (0, 0) = P [X = 0, Y = 0] = 0,
X X
F (x, y) = P [X ≤ x, Y ≤ y] = f (x, y). f (0, 1) = P [X = 0, Y = 1] = 0,
X≤x Y ≤y f (0, 2) = P [X = 0, Y = 2] = 1/4,
1 2 3
4 5 6
This shows that X and Y are not independent. 4.2 Continuous Bivariate Random Variable 4.2.6 Covariance
Let X and Y be two continuous random variables. Then the ordered pair (X, Y ) is called a two dimensional If µX and µY are the means of X and Y respectively, then covariance of X and Y , denoted by Cov(X, Y )
(v) We find
2 or bivariate continuous random variable. is defined as
X
E[X] = xfX (x) = 0.12,
Cov(X, Y ) = E[(X − µX )(Y − µY )] = E[XY ] − E[X]E[Y ].
X=0 4.2.1 Joint probability density function
3
Ex. Let X denote a person’s blood calcium level and Y , the blood cholesterol level. The joint density
X
E[Y ] = yfY (y) = 0.148, A function f such that
Y =0
function of (X, Y ) is
2 3
ZZ Z ∞ Z ∞
f (x, y) ≥ 0, P [(X, Y ) ∈ R] =
(
f (x, y)dxdy, f (x, y)dxdy = 1,
X X
E[XY ] = xyf (x, y) = 0.064. k, 8.5 ≤ x ≤ 10.5, 120 ≤ y ≤ 240
R −∞ −∞ f (x, y) =
X=0 Y =0 0, elsewhere
Hence, Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.046. where R is any region in the domain of f , is called joint probability density function of (X, Y ).
(i) Find the value of k.
Ex. Suppose X and Y are two discrete random variables taking only integer values. The joint density 4.2.2 Distribution function (ii) Find the marginal densities of X and Y .
function of (X, Y ) is (iii) Find the probability that a healthy person has a cholesterol level between 150 to 200.
f (x, y) = c/[n(n + 1)], 1 ≤ y ≤ x ≤ n, where n is some positive integer. The distribution function of (X, Y ) is given by (iv) Are the variables X and Y independent?
(i) Find the value of c. Z x Z y (v) Find Cov(X, Y ).
(ii) Find the marginal densities. F (x, y) = f (x, y)dxdy.
(iii) Given that n = 5, evaluate P [X ≤ 3, Y ≤ 2]. −∞ −∞
Sol. (i) f (x, y) being joint pdf, we have
n X
n
X 4.2.3 Marginal density functions Z ∞Z ∞ Z 240 Z 10.5
Sol. (i) Using f (x, y) = 1, we find 1= f (x, y)dxdy = kdxdy = 240k.
x=1 y=x The marginal density of X, denoted by fX , is defined as −∞ −∞ 120 8.5
n X
x
X c Z ∞ So k = 1/240 and f (x, y) = 1/240
= 1. fX (x) = f (x, y)dy.
n(n + 1)
x=1 y=1 −∞
c XX n x (ii) The marginal density of X is
=⇒ 1 = 1. Similarly, the marginal density of Y , denoted by fY , is defined as
n(n + 1) Z ∞ Z 240
1 1
x=1 y=1
n
Z ∞ fX (x) = f (x, y)dy = dy = , 8.5 ≤ x ≤ 10.5.
c X fY (y) = f (x, y)dx. −∞ 120 240 2
=⇒ x = 1. −∞
n(n + 1) Similarly, the marginal density of Y is
x=1
c n(n + 1)
=⇒ = 1. 4.2.4 Independent random variables Z ∞ Z 10.5
1 1
n(n + 1) 2 fY (y) = f (x, y)dx = dx = , 120 ≤ y ≤ 240.
The continuous random variables X and Y are said to be independent if and only if −∞ 8.5 240 120
=⇒ c = 2.
f (x, y) = fX (x)fY (y). (iii) The probability that a healthy person has a cholesterol level between 150 to 200, is
(ii) We have Z 200
5
x x 4.2.5 Expectation P [150 ≤ Y ≤ 200] = fY (y)dy = .
X X 2 2x 150 12
fX (x) = f (x, y) = = , 1 ≤ x ≤ n.
n(n + 1) n(n + 1) The expectation or mean of X is defined as
y=1 y=1 (iv) We have
n n Z ∞Z ∞
X X 2 2(n − y + 1)
fY (y) = f (x, y) = = , 1 ≤ y ≤ n. E[X] = xf (x, y)dxdy = µX . 1 1 1
x=y x=y
n(n + 1) n(n + 1) −∞ −∞ fX (x)fY (y) = × = = f (x, y).
2 120 240
In general, the expectation of a function of X and Y , say H(X, Y ), is defined as This shows that X and Y are independent.
1
(iii) Given that n = 5, so f (x, y) = 15 , and therefore Z ∞Z ∞
2 X3 2 E[H(X, Y )] = H(x, y)f (x, y)dxdy. (v) We find
X 1 1 X 1 1 −∞ −∞
P [X ≤ 3, Y ≤ 2] = = (3 − y + 1) = (6 − 3 + 2) = . Z ∞ Z ∞ Z 240 Z 10.5
x=y
15 15 15 3 x
y=1 y=1 E[X] = xf (x, y)dxdy = dxdy = 9.5,
−∞ −∞ 120 8.5 240
Z ∞ Z ∞ Z 240 Z 10.5
y
E[Y ] = yf (x, y)dxdy = dxdy = 180,
−∞ −∞ 120 8.5 240
7 8 9
∞ ∞ 240 Z 10.5
−2 −1
Z Z Z
xy (iii) To calculate the probability P [X ≤ 32, Y ≤ 30], we need to integrate the joint density over the X/Y 1 2 fX (x)
E[XY ] = xyf (x, y)dxdy = dxdy = 1710.
−∞ −∞ 120 8.5 240 shaded golden region shown in Figure 4.2. Considering the horizontal ray through this region, we find 1 0 1/4 1/4 0 1/2
that the x limits are from x = y to x = 32, and y limits are from y = 27 to y = 30. 4 1/4 0 0 1/4 1/2
Hence, Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 1710 − 9.5 × 180 = 0.
Z 30 Z 32 fY (y) 1/4 1/4 1/4 1/4 1
c
Ex. The joint density function of (X, Y ) is ∴ P [X ≤ 32, Y ≤ 30] = dxdy = c(3 ln 32 + 3 − 30 ln 30 + 27 ln 27).
27 y x
(
f (x, y) =
c/x, 27 ≤ y ≤ x ≤ 33 4.3 Pearson coefficient of correlation
0, elsewhere 2 and σ 2 , then correlation
If X and Y are two random variables with means µX , µY , and variances σX Y
33 between X and Y is given by
(i) Find the value of c.
(ii) Find the marginal densities and hence check the independence of X and Y
y=x
Cov(X, Y ) σXY
(iii) Evaluate P [X ≤ 32, Y ≤ 30]. 30 ρXY = = .
x = 33 σX σY σX σY
y
Sol. (i) Here the given range of (X, Y ) is the triangular region common to the three regions given by the It can be proved that ρXY lies in the range [−1, 1]. Further, |ρXY | = 1 if and only if Y = a + bX for
27
inequalities y ≥ 27, y ≤ x and x ≤ 33, as shown in Figure 4.1. 6 0.
some real numbers a and b =
y = 27
23
33 23 27 30 32 33
x
y=x
x = 33 Figure 4.2: The shaded golden region is given by X ≤ 32, Y ≤ 30. The horizontal ray enters this region
y
through the line x = y and leaves at the line x = 32. The y-value at the bottomost points the region is
27 y = 27, and at the uppermost points is y = 30.
y = 27 Theorem: If X and Y are two independent random variables with joint density f , then show that
23 E[XY ] = E[X]E[Y ], that is, Cov(X, Y ) = 0.
23 27 33
x
Proof. We have Figure 4.3: In case of large negative covariance, we have ρXY ≈ −1. In case of nearly zero covariance,
Z ∞Z ∞ ρXY ≈ 0 while in case of very large positive covriance, ρXY ≈ 1.
Figure 4.1: The shaded golden region is the triangular region given by the inequalities y ≥ 27, y ≤ x and
E[XY ] = xyf (x, y)dxdy
x ≤ 33. The vertical ray enters the given region through the line y = 27 and leaves at the line y = x.
Z−∞ −∞
∞ Z ∞
Note that if ρXY = 0, we say that X and Y are uncorrelated (means no linear relationship). It does
The x-value at the leftmost point (27, 27) of the region is x = 27, and at the rightmost points (all points
= xyfX (x)fY (y)dxdy (∵ f (x, y) = fX (x)fY (y) as X and Y are given independent.) not imply that X and Y are unrelated. Of course, the relationship, if exists, would not be linear.
on the line x = 33) is x = 33. −∞ −∞
Z ∞ Z ∞ 2 = 0.146, σ 2 = 0.268, Cov(X, Y ) = 0.046 and therefore ρ
In Robot’s example, σX Y XY = 0.23.
Considering vertical ray through the given region, we find that the x limits are from x = 27 to x = 33, = yfY (y) xfX (x)dx dy
and y limits are from y = 27 to y = x. Therefore, to find c, we use Z−∞∞
−∞
Remarks: (i) Covariance tells us how the two variables vary together. It can vary from −∞ to ∞. On
Z 33 Z x = yfY (y)E[X]dy the other hand, correlation tells about the degree of linear relationship between the two variables. It can
−∞
f (x, y)dydx = 1 Z ∞ vary from −1 to 1.
27 27
= E[X] yfY (y)dy (ii) Covariance matrix of X and Y is written as
−∞
and we get
= E[X]E[Y ]. 2
σX σXY
1 σY X σY2
c= . Note. Converse of the above result need not be true, that is, if E[XY ] = E[X]E[Y ], then X and Y
6 − 27 ln(33/27)
need not be independent. For instance, see the following table for the joint density function of a two Notice that the covariance matrix is always symmetric since σXY = σY X .
Z y=x
c dimensional discrete random variable (X, Y ). The correlation matrix is given by
(ii) fX (x) = dy = c(1 − 27/x), 27 ≤ x ≤ 33 We find that E[X] = 5/2, E[Y ] = 0 and E[XY ] = 0. So E[XY ] = E[X]E[Y ]. Next, we see that
x
Z x=33y=27 fX (1) = 1/2, fY (−2) = 1/4 and f (1, −2) = 0. So fX (1)fY (−2) 6= f (1, −2), and hence X and Y are not
1 ρXY
c
fY (y) = dx = c(ln 33 − ln y), 27 ≤ y ≤ 33. independent. We can easily observe the dependency X = Y 2 . Thus, covariance does not describe the ρY X 1
x=y x
We observe that f (x, y) = c/x 6= fX (x)fY (y). So X and Y are not independent. type or strength of the association between X and Y except the linear relationship via a measure known
as Pearson coefficient of correlation. It is also symmetric since ρXY = ρY X . In addition, its diagonal elements are unity.
10 11 12
Practice Problems (with partial solution steps) Q.3 Consider two continuous random variables X and Y with pdf
(
Q.1 Let X denote the number of times a photocopy machine will malfunction: 0, 1, 2 or 3 times, on any k(x + y), x > 0, y > 0, 3x + y < 3
given month. Let Y denote the number of times (0, 1 or 2) a technician is called on an emergency service. f (x, y) =
0, elsewhere
The joint pmf is given as: f (0, 0) = 0.15, f (0, 1) = 0.05, f (0, 2) = 0, f (1, 0) = 0.30, f (1, 1) = 0.15, f (1, 2) =
0.05, f (2, 0) = 0.05, f (2, 1) = 0.05, f (2, 2) = 0.10, f (3, 0) = 0, f (3, 1) = 0.05, and f (3, 2) = 0.05. Find (i) Find (i) k, (ii) P (X < Y ), (iii) the marginal pdfs of X and Y , (iv) Cov(X + 2, Y − 3), (v) Corr(−2X +
P (X < Y ), (ii) the marginal pmfs of X and Y , and (iii) Cov(X, Y ). 3, 2Y + 7), and (vi) Cov(−2X + 3Y − 4, 4X + 7Y + 5).
13 14 15
(i) Find the probability that A arrives before B, and hence compute the expected amount of time A It implies that P (Ti > t) = eλi t .
would have to wait for B to arrive.
The cdf of Tmin = min(T1 , T2 , ..., Tk ) is given by
(ii) If they have pre-decided on a condition that whoever comes first will only wait for 15 minutes for
the other, what is the probability that they will meet for lunch? FTmin (t) = P (Tmin ≤ t)
= 1 − P (Tmin > t)
Sol. Since X and Y are independent, their joint pdf is given by = 1 − P (min(T1 , T2 , ..., Tk ) > t)
( = 1 − P (T1 > t, T2 > t, ..., Tk > t)
6x2 y, 0 < x < 1, 0 < y < 1
f (x, y) = fX (x)fY (y) = = 1 − P (T1 > t)P (T2 > t)....P (Tk > t) (∵ T1 , T2 , ..., Tk are independent.)
0, elsewhere
= 1 − e−λ1 t e−λ2 t ...e−λk t
(i) P (A arrives before B) = P (X < Y ) = 2/5. = 1 − e−(λ1 +λ2 +...+λk )t
Next, expected amount of time A would have to wait for B to arrive is given by E(Y − X) provided
Y > X. Solving the double integral of (y − x)f (x, y) over the region 0 < x < 1, 0 < y < 1, y > x, we get We see that Tmin has an exponential distribution with mean λ1 + λ2 + ... + λk .
E(Y − X) = 1/12 hours. (Verify!).
(ii) They will meet for lunch if waiting for the either is less than 15 minutes or 1/4 hours. So required
probability is the sum of P (Y − X < 1/4) and P (X − Y < 1/4).
Q.6 The following table shows the quality and meal price ratings (1 lowest to 3 highest) of 300 restaurants
in a metro city:
Develop a bivariate probability distribution for quality X and meal price Y of a randomly selected
restaurant in the metro city. Determine Cov(X, Y ) and Cor(X, Y ). Based on your results, do you suppose
it is likely to find a low cost restaurant with high meal quality.
Sol. Divining the number of restaurants by the total number 300, the probability distribution of (X, Y )
reads:
X/Y 1 2 3 fX (x)
1 0.14 0.13 0.01 0.28
2 0.11 0.21 0.18 0.50
3 0.01 0.05 0.05 0.22
fY (y) 0.26 0.29 0.35 1
E(X) = 1.94, E(Y ) = 2.09, V (X) = 0.4964, V (Y ) = 0.6019, Cov(X, Y ) = 0.2854 and Cor(X, Y ) =
0.5221. Because of the moderately positive correlation between X and Y , it is not very likely to find a
restaurant with least meal price and highest quality.
Q.7 Let T1 , T2 , ..., Tk be independent exponential random variables with mean values 1/λ1 , 1/λ2 , ..., 1/λk ,
respectively. Denote Tmin = min(T1 , T2 , ..., Tk ). Show that Tmin has an exponential distribution. What is
the mean of Tmin ?
16 17