Department of Mathematics

1 Statistical Intervals for a Single Sample
2 Confidence Interval on the Mean of a Normal Distribution,
Variance Known
3 Confidence Interval on the Mean of a Normal Distribution,
Variance Unknown
4 Confidence Interval on the Variance and Standard Deviation of
a Normal Distribution
5 Hypothesis testing
6 Types of errors
7 P-values in Hypothesis Tests
8 Connection between Hypothesis Tests and Confidence Intervals
9 Tests on the Mean of a Normal Distribution, Variance Known
10 Tests on the Mean of a Normal Distribution, Variance Unknown
11 Tests on the Variance and Standard Deviation of a Normal
12 Goodness of fit
13 Simple Linear Regression and Correlation
Confidence interval
An interval estimate for a population parameter is called a
confidence interval. Information about the precision of estimation
is conveyed by the length of the interval. A short interval implies
precise estimation
Confidence Interval on the Mean of a Normal Distribution,

Variance Known

1 − α is called the confidence coefficient.

A confidence interval estimate is desired for the gain in a circuit on

a semiconductor device. Assume that gain is normally distributed
with standard deviation s = 20. Find a 95% CI for m when n=10
and x = 1000.

Solution: Given n=10, s=20 and x = 1000.

95% CI ⇒ 100(1 − α)% = 95% ⇒ α = 0.05 ⇒ α2 = 0.025.
z α2 is the upper 100 α2 % point. i.e., 100 × 0.025% = 2.5% point.
Now, P(0 ≤ z ≤z α2 = 0.5 − 0.025 = 0.475. From the standard
normal table, we get z α2 = z0.025 = 1.96.
A random sample has been taken from a normal distribution and
the following confidence intervals constructed using the same data:
(38.02, 61.98) and (39.95, 60.05)
1 What is the value of the sample mean?

2 One of these intervals is a 95% CI and the other is a 90% CI.

Which one is the 95% CI and why?

1 We have X − z σ σ
α/2 √n ≤ µ ≤ X + zα/2 √n
Here, 38.02 ≤ µ ≤ 61.98
Equating LHS of these two inequalities, we have
X − zα/2 √σn = 38.02 and equating RHS, we have
X + zα/2 √σn = 61.98.
Hence 2X = 100 ⇒ X = 50.
2 The 95% CI is (38.02, 61.98) and the 90% CI is (39.95,
60.05). The higher the confidence level, the wider the CI.
Sample Size for Specified Error on the Mean, Variance


If x is used as an estimate of µ, we can be 100(1 − α)% confident

that the error will not exceed a specified amount E when the
sample size is
 z σ 2
One-Sided Confidence Bounds on the Mean, Variance


PROBLEM:Given z = z0 .05 = 1.64, n = 10, σ = 1, and x = 64.46.

Find lower one sided 95% confidence interval.
Large-Sample Confidence Interval for µ

When n is large, replacing by the sample standard deviation S has
little ef-
fect on the distribution of Z. This leads to the following useful result.

Generally, n should be at least 40 to use this result reliably. The

central limit theorem generally holds for n30, but the larger sample
size is recommended here.
Confidence Interval on the Mean of a Normal Distribution,

Variance Unknown
t confidence interval on µ
PROBLEM: A research engineer for a tire manufacturer is

investigating tire life for a new rubber compound and has built 16
tires and tested them to end-of-life in a road test. The sample
mean and standard deviation are 60,139.7 and 3645.94 kilometers.
Find a 95% confidence interval on mean tire life. (Given
t0.025,15 = 2.131)
Confidence Interval on the Variance and Standard

Deviation of a Normal Distribution
A rivet is to be inserted into a hole. A random sample of n = 15

parts is selected, and the hole diameter is measured. The sample
standard deviation of the hole diameter measurements is s = 0.008
millimeters. Construct a 99% lower confidence bound for σ 2 .
Statistical Intervals for a Single Sample

Statistical Hypothesis
A statistical hypothesis is a statement about the parameters of one
or more populations.

Null Hypothesis H0
A null hypothesis is a claim mostly are equality about a certain
parameter of the population.

Alternative Hypothesis H1
Statement which contradicts the null hypothesis.
A machine was producing chocolate bars of average 100 gms.
After maintainance a worker claims that the machine is no longer
produces chocolates of 100 gms.

H0 : µ = 100g
H1 = µ ̸= 100g

Because the alternative hypothesis specifies values of µ that could

be either greater or less than 100 g, it is called a two-sided
alternative hypothesis. In some situations, we may wish to
formulate a one-sided alternative hypothesis
Test of a hypothesis
A procedure leading to a decision about the null hypothesis is
called a test of a hypothesis.
The sample mean can take on many different values. For example,
if 98.5 ≤ x ≤ 101.5, we will not reject the null hypothesis
H0 : µ = 100, and if eitherx < 98.5 orx > 101.5, we will reject the
null hypothesis in favor of the alternative hypothesis H1 : µ ̸= 100.
The values of x that are less than 98.5 and greater than 101.5
constitute the critical region for the test; all values that are in the
interval 98.5 ≤ x ≤ 101.5 form a region for which we will fail to
reject the null hypothesis. By convention, this is usually called the
acceptance region. The boundaries between the critical and
acceptance regions are called the critical values.
Types of errors

Type I Error
Rejecting the null hypothesis H0 when it is true is defined as a type
I error.

Type II Error
Failing to reject (Accept) the null hypothesis when it is false is
defined as a type II error.
Probability of Type I Error (α)

α =P(type I error) = P(reject H0 when H0 is true)

Sometimes the type I error probability is called the significance

level, the -error, or the size of the test.
Probability of Type II Error (β)
β= P(type II error) = P(fail to reject H0 when H0 is false)

An increase in sample size results in decrease in both α and β.

Calculating Type I Error

Let X be the sample mean of a sample of size n.

Let the acceptance region is x1 ≤ X ≤ x2 .
X −µ
Normalize the random variables using the formula z = √
σ/ n
Let zl and zr be the corresponding critical values after
Area beyond zl and zr is the probability of type I error.
Solution: Given n = 9, σ = 2
H0 : µ = 100
H1 : µ ̸= 100
a)α=P(type I error)=P(reject H0 |H0 is true)

b)β=P(type II error)=P(accept H0 |H0 is false)

Power of a statistical test

The power of a statistical test is the probability of rejecting the
null hypothesis H0 when the alternative hypothesis is true. (1 − β)

The power is computed as 1 − β, and power can be interpreted as

the probability of correctly rejecting a false null hypothesis.
P-values in Hypothesis Tests

The P-value is the smallest level of significance that would lead to
rejection of the null hypothesis H0 with the given data.


Consider the two-sided hypothesis test for burning rate

H0 : µ = 50, H1 : µ ̸= 50

with n = 16 and σ = 2.5. Suppose that the observed sample mean

is x = 51.3 centimeters per second.
Consider the accepted region as 48.7 ≤ x ≤ 51.3.
P-value is the area of the shaded

region when x = 51.3.
The null hypothesis H0 = 50 would be rejected at any level of

significance greater than or equal to 0.038.
Connection between Hypothesis Tests and Confidence


If [l, u] is a 100(1 − α)% confidence interval for the parameter θ,

the test with level of significance α of the hypothesis

H 0 : θ = θ0 H1 : θ ̸= θ0
For the problem x = 51.3, σ = 2.5 and n = 16, if we calculate the
95% CI for µ, we get 51.3 ≤ µ ≤ 52.525. Hence µ = 50 lies
outside the CI. Thus we can reject H0 .
General Procedure for Hypothesis Tests

Tests on the Mean of a Normal Distribution, Variance

Air crew escape systems are powered by a solid propellant. The

burning rate of this propellant is an important product
characteristic. Specifications require that the mean burning rate
must be 50 centimeters per second. We know that the standard
deviation of burning rate is = 2 centimeters per second. The
experimenter decides to specify a type I error probability or
significance level of = 0.05 and selects a random sample of n =
25 and obtains a sample average burning rate of x = 51.3
centimeters per second. What conclusions should be drawn?
Probability of Type II Error

Failing to reject the null hypothesis when it is false is defined as a
type II error.
Suppose H0 : µ = µ0 , H1 : µ ̸= µ0 .
Let the null hypothesis is false and µ0 + δ be the true value.
Then µ = µ0 + δ.
Suppose that the true burning rate of a rocket propellant is 49

centimetres per second. Specifications require that the mean
burning rate must be 50 centimetres per second. What is β for the
two-sided test with α = 0.05, σ = 2, and n = 25?
Here, µ = 49. Hence δ = 1 and zα/2 = 1.96.
The heat evolved in calories per gram of a cement mixture is

approximately normally distributed. The mean is thought to be
100, and the standard deviation is 2. You wish to test
H0 : µ = 100 versus H1 : µ ̸= 100 with a sample of n =
9 specimens. calculate the P-value if the observed statistic is x = 98.
Tests on the Mean of a Normal Distribution, Variance


If the null hypothesis is true, T0 has a t distribution with n − 1

degrees of freedom. When we know the distribution of the test
statistic when H0 is true (this is often called the reference
distribution or the null distribution)
P-value for two-sided distribution

To test H0 : µ = µ0 against the two-sided alternative
H1 : µ ̸= µ0 , the value of the test statistic t0 is calculated.
P-value is found from the t distribution with n 1 degrees of
freedom (denoted by Tn1 )
Because the test is two-tailed, the P-value is the sum of the
probabilities in the two tails of the t distribution
P = 2P Tn−1 > |t0 |
where, t0 = t α2 ,n−1 .
Reject H0 if t0 > t α2 ,n−1 or t0 < −t α2 ,n−1 for a fixed
significant level α.
P-value for one-tailed test

For the one-sided alternative hypotheses,
H0 : µ = µ 0 , H1 : µ > µ0
P = P(Tn−1 > t0 ),
Reject H0 if t0 > tα,n−1 .

For H0 : µ = µ0 , H1 : µ < µ0
P = P(Tn−1 < t0 ),
where t0 = tα,n−1 . Reject H0 if t0 < −tα,n−1
Given body temperatures of 25 females: 97.8, 97.2, 97.4, 97.6,
97.8, 97.9, 98.0, 98.0, 98.0, 98.1, 98.2, 98.3, 98.3, 98.4, 98.4,
98.4, 98.5, 98.6, 98.6, 98.7, 98.8, 98.8, 98.9, 98.9, and 99.0.
Test the hypothesis H0 : µ = 98.6 versus H1 : µ ̸= 98.6, using
α = 0.05. Find the P-value.
Tests on the Variance and Standard Deviation of a Normal

An automated filling machine is used to fill bottles with liquid

detergent. A random sample of 20 bottles results in a sample
variance of fill volume of s 2 = 0.0153 (fluid ounces)2. If the
variance of fill volume exceeds 0.01 (fluid ounces)2, an
unacceptable proportion of bottles will be underfilled or overfilled.
Is there evidence in the sample data to suggest that the
manufacturer has a problem with underfilled or overfilled bottles?
Use α = 0.05, and assume that fill volume has a normal distribution.
Testing for Goodness of fit

It is used when population distribution is unknown.

The test procedure requires a random sample of size n from
the population whose probability distribution is unknown.
These n observations are arranged in a frequency histogram,
having k bins or class intervals.
Let Oi be the observed frequency in the ith class interval.
compute the expected frequency in the ith class interval,
denoted Ei
if the population follows the hypothesized distribution, χ20 has,

approximately, a chi-square distribution with k − p − 1 degrees
of freedom, when p represents the number of parameters of
the hypothesized distribution estimated by sample statistics.
For a fixed-level test, we would reject the hypothesis that the
distribution of the population is the hypothesized distribution
if the calculated value of the test statistic χ20 > χ2α,k−p−1 .
p-value=p(χ2α,k−p−1 > χ20 )
Test the goodness of fit against a Poisson distribution

The estimate of the mean

number of defects per board is the sample average, that is,
(32 × 0 + 15 × 1 + 9 × 2 + 4 × 3)/60 = 0.75.
Hence the parameter λ of Poisson distribution=0.75.
Because each class interval corresponds to a particular number of

defects, we may find the pi as follows:
The expected frequencies are computed by multiplying the sample

size n = 60 times the probabilities pi . That is, Ei = n × pi .
Simple Linear Regression and Correlation

Regression Analysis
The collection of statistical tools that are used to model and
explore relationships between variables that are related in a
nondeterministic manner is called regression analysis.

it is probably reasonable to assume that the mean of the random

variable Y is related to x by the following straight-line relationship:

E (Y |x) = µY |x = β0 + β1 x

β0 and β1 are called regression coefficients.

Although the mean of Y is a linear function of x; the actual

observed value y does not fall exactly on a straight line. The
appropriate way to generalize this to a probabilistic linear model is
to assume that the expected value of Y is a linear function of x
but that for a fixed value of x, the actual value of Y is determined
by the mean value function (the linear model) plus a random error
term, say

Y = β0 + β1 x + ϵ

where ϵ is a random error with mean zero and (unknown) variance

σ 2 We call this model the simple linear regression model because it
has only one independent variable or regressor.
Suppose that we have n pairs of observations

(x1 , y1 ), (x2 , y2 ), ..., (xn , yn ). The estimates of β0 and β1 should
result in a line that is (in some sense) a “best fit” to the data. The
criterion for estimating the regression coefficients is called the
method of least squares.
Fit a least-squares line to the data

6 7
the required least-squares line is y = 11 + 11 x
Fit a least-squares line to the data

y = 35.82 + 0.476x.

