Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

Chapter 1 Introduction to Statistics

Statistics is a branch of mathematics dealing with data


collection, organization, analysis, interpretation, presentation
and summarizing data in tables or graph.
Statistics is often categorized into descriptive statistics and
inferential Statistics.
Descriptive Statistics
Descriptive Statistics involves describing, summarizing and
organizing the data so it can be easily understood.
 Note that:
 Descriptive Statistics is methods of describing the
characteristics of a data set.
 It is useful because they allow you to make sense of the data.
 It helps exploring and making conclusion about the data in
order to make decision.
 It provides graphical (Pies, Bar, Histograms, Polygons…)
summary of the data.
 It includes calculating things such as the average of the data,
its spread and the shape it produces.
For example, we may concern about describing:
1- The weight of a product in a production line.
2- The time taken to process an application.
Inferential Statistics
Inferential Statistics makes inference and predictions about a
population's parameters based on a sample of data taken from
that population.

1
 Population is a collection of all possible individuals, objects,
or measurements of interest.
 A sample is a portion, or part of the population of interest.

 Variables are a function defined on a population (sample)


and takes values in an arbitrary set.
Variable measures a characteristic, feature or factor (that
varies from one individual to another) in the population.
Types of variables and levels of measurement
Variables classify in two directions:
1- According to type of values
2- According to number of values
Qualitative (Categorical) Variables
That takes non-numeric values or numeric values which
indicate an attribute or property.
Gender Male Female
Subject English physics Arts
Eye-color black brown green
Country, city of a human Egypt USA
Quantitative Variables
2
That takes numerical values; these numeric values can
undergo mathematical operations.
Discrete take finite or infinite countable number
'' Only assume certain values and there are usually "gaps"
between values''
 Numbers of apples on a tree
 Numbers of students in a class
 Numbers of accidents in a particular city
Continuous take uncountable number
'' Assume any value within a specified range''
 Weight of a person
 Distance between cities
Summary of types of variables

3
Summary of the characteristic for levels of measurement

4
Example
Classify the following variable

Quantitative
Categorical
discrete continuous
1. Numbers of books √
2. Brands of cars √
3. Temperatures √
4. Height of tress √
5. Numbers of cars √
6. Amount of water √

 Statistic
Any function of the random variables constituting a random
sample is called a statistic.

5
Frequency Tables, Frequency Distributions
and Graphic Presentation
 Frequency Distributions
Frequency distribution is a grouping of data into mutually
exclusive categories showing the number of observations in
each class.
 Frequency Table
Frequency table is a grouping of qualitative data into mutually
exclusive classes showing the number of observations in each
class.
For example,
Cumulative
Selling prices Relative
Frequency frequency
($1000) Frequency
15 up to 18 8 8 8/80=0.1
8+23=31
18 up to 21 23 23/80=0.2875
8+23+17=48
21 up to 24 17 17/80=0.2125
8+23+17+18=66
24 up to 27 18 18/80=0.225
8+23+17+18+8=74
27 up to 30 8 8/80=0.1
8+23+17+18+8+4=78
30 up to 33 4 4/80=0.05
8+23+17+18+8+4+2=80
33 up to 36 2 2/80=0.025
Total 80 1

Relative Frequency is the relation between a class total and the


total number of observations.

6
 Graphic Presentation
The three commonly used graphics forms are:
 Histograms

 Frequency polygons

 Cumulative frequency distributions

7
Numerical Measures
 Measure of central tendency
Mean: measure of average of all the values in a sample.

For raw data,



̅

For grouped data,



̅

Where, is the midpoint of each class.

Prosperities
 All values are used.
 It is quick and easy to compute.
 It is unique.
 The sum of the deviations from the mean is 0.
Disadvantages
 mean is not defined for qualitative data
 It is affected by extreme values.
 Example 1
The lengths of time, in minutes, that 10 patients waited in a
doctor’s office before receiving treatment were recorded as
follows: 5, 11, 9, 5, 9, 15, 6, 10, 5, and 10. Treating the data as a
random sample, find the mean?

Solution ∑ =85 ̅ =8.5

8
 Example 2
Find the mean for the following grouped data

Selling prices Frequency Midpoint


($1000)

15 up to 18 8 16.5 132
18 up to 21 23 19.5 448.5
21 up to 24 17 22.5 382.5
24 up to 27 18 25.5 459
27 up to 30 8 28.5 228
30 up to 33 4 31.5 126
33 up to 36 2 34.5 69
Total 80 1845


̅

Median: measure of the central (middle) value of the sample


set after rearrange ascending or descending.

For raw data,

̌ {

 In Example 1
Find the median of the raw data: 5, 11, 9, 5, 9, 15, 6, 10, 5, 10?
Solution

Rearrange 5,5,5,6,9,9,10,10,11,15

Median )=

9
For grouped data,

̌ ( )
̃

Where, lower bound of the class median.


is the cumulative frequency before the class median
̃ is the frequency of the class median.

is the class length of the median class.


Prosperities
 It is not affected by extremely large or small values.
 It is quick and easy to compute.
Disadvantages
 Median doesn't take all values into account
 It cannot identify for qualitative data.
 In Example 2
Find the median for the following grouped data
Selling prices Frequency Cumulative
($1000) f frequency

15 up to 18 8 8
18 up to 21 23 31
̃=17 40
L=21 up to 24 48
24 up to 27 18 66
27 up to 30 8 74
30 up to 33 4 78
33 up to 36 2 80
Total 80

11
Solution

, , , ̃ and

̌ ( * ( )
̃

= 22.58
Note that

The median class is the first class whose cumulative frequency


is greater than or equal to half the sum of the frequencies

Mode: the value most recurrent in the sample set.

For raw data,


The mode of the raw data is the data has the highest frequency.
For grouped data,

( *

Where, lower bound of the class median.


is the difference between the frequency of the modal
class and the previous class.
is the difference between the frequency of the modal
class and the next class.
is the class length of the modal class.

Prosperities
 It is quick and easy to compute.
 It can be evaluated for both quantitative and qualitative
data.

11
 It is not affected by extreme values.
Disadvantages
 Sometimes there is no mode or more than one mode.

 In Example 1
Find the mode of the raw data: 5, 11, 9, 5, 9, 15, 6, 10, 5, 10?
Solution Mode is: 5
 In Example 2
Find the mode for the following grouped data

Selling prices Frequency


($1000) F
15 up to 18 8
18 up to 21 23
21 up to 24 17
24 up to 27 18
27 up to 30 8
30 up to 33 4
33 up to 36 2
Total 80

, , and

( * ( *

12
The relationships between the Mean, Median and Mode

Measure of Dispersion
A measure of location, such as mean or median, only describes
the center of the data. It is valuable from the standpoint, but it
does not tell us anything about the spread of the data.
Range: measure of how spread apart the values in a data set.

For raw data,

For grouped data,

middle point of last class


middle point of first class

Deviation: the difference between the mean and an observed


value.
Variance: the mean squared deviation from the mean.

13
For raw data,

∑ ̅

For grouped data,

∑ ̅
(∑ )

Where, is the midpoint of each class.

Standard deviation: the positive square root of the variance.


 In Example 1
Find the range, variance and standard deviation of the raw data:
5, 11, 9, 5, 9, 15, 6, 10, 5, 10?
Solution
- Range=15-5=10

- Variance ∑

- Standard deviation is √ √ 6.95


 In Example 2
Find the variance and standard deviation for the following
grouped data
Solution
Since,

̅

14
Selling prices Frequency Midpoint
($1000) ̅ ̅ ̅

15 up to 18 8 16.5 -6.6 43.56 348.48


18 up to 21 23 19.5 -3.6 12.96 298.08
21 up to 24 17 22.5 -0.6 0.36 6.12
24 up to 27 18 25.5 2.4 5.76 103.68
27 up to 30 8 28.5 5.4 29.16 233.28
30 up to 33 4 31.5 8.4 70.56 282.24
33 up to 36 2 34.5 11.4 129.96 259.92
Total 80 1531.8
∑ ̅
(∑ )


Quartiles
The standard deviation is the most widely used measure of
dispersion. Alternative ways of describing spread of data
include determine the location of values that divide a set of
observations into equal parts, such as quartiles.

For grouped data,

The first quartile ( *

where, lower bound of .


is the cumulative frequency before
is the frequency of

is the class length.

15
The third quartile ( )

where, lower bound of .


is the cumulative frequency before
is the frequency of

is the class length.

 In Example 2
Find the quartiles for the following grouped data

Selling prices Frequency Cumulative


($1000) F frequency

15 up to 18 8 8
20
18 up to 21 23 31
21 up to 24 17 48
24 up to 27 18 66 60
27 up to 30 8 74
30 up to 33 4 78
33 up to 36 2 80
Total 80

, , , and

( * ( )

, , , and

( ) ( )

16
Chapter 2 Sampling Distribution
 Sampling
The process of selecting the sample from a given population is
called sampling.
 Sampling distribution
The probability distribution of a statistic is called a sampling
distribution.
For Example:
 Sampling distribution of Mean
 Sampling Distribution of difference between two means
 Sampling distribution of Proportion
 Sampling distribution of Variance
Sampling Distribution of Means and the Central Limit
Theorem
Suppose that a random sample of n observations is taken from a
normal population with mean μ and variance .Hence,
̅ + +…+ ) has a normal distribution with mean

̅= + +….+ )=

n times
and variance

̅ + +…+ )

17
Central Limit Theorem
If ̅ is the mean of a random sample of size n taken from a
population with mean μ and finite variance , then the
sampling distribution of the mean ̅ will approximately
normally distributed n(z; 0, 1).

̅
.

̅
If

̅
If

 Example 3
An electrical firm manufactures light bulbs that have a length of
life that is approximately normally distributed, with mean equal
to 800 hours and a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will have an
average life of less than 775 hours.
Solution:
=16, =800, =40

̅
̅ ( )
√ √

0.0062

18
 Example 4
A tobacco company claims that the amount of nicotine in its
cigarettes is a random variable with mean 2.2 mg and a standard
deviation 0.5 mg. What is the probability that the sum of
nicotine content in a sample of 100 cigarettes would have been
as higher than 230 mg?
Solution

P(∑ ( )

̅ =

=
 Example 5
A random sample of size 169 is taken from normal population
with variance 400. What is the probability that the absolute
difference between the sample mean and the population mean is
greater than 4?
Solution
|̅ | [| ̅ | ]
̅

[ ]
[ ]
[ ]
[ ]
[ ] =1 [ ]
19
 Example 6
A large restaurant reports its outstanding bills to suppliers are
approximately normally distributed with a mean of $1200. The
standard deviation is unknown. A random sample of 9 accounts
is taken with standard deviation 210. What is the probability that
the sample mean will be greater than $1300?
Solution:
=9, =1200, =210

̅
̅ ( )
√ √
from table with

Sampling Distribution of the Difference between two Means


If independent samples of size and are drawn at random
from two populations, discrete or continuous, with means and
and variances and respectively, then the sampling
distribution of the differences of means, ̅ ̅ is
approximately normally distributed with mean and variance
given by

̅ ̅ ̅ ̅

̅ ̅
Hence,

is approximately a standard normal variable

21
 Example 7
The television picture tubes of manufacturer A have a mean
lifetime of 6.5 years and a standard deviation of 0.9 year, while
those of manufacturer B have a mean lifetime of 6.0 years and a
standard deviation of 0.8 year. What is the probability that a
random sample of 36 tubes from manufacturer A will have a
mean lifetime that is at least 1 year more than the mean lifetime
of a sample of 49 tubes from manufacturer B?
Solution: Manufacturer A Manufacturer B
Population 1 Population 2
=36
=6.5
P( ̅ ̅ 1.0) =0.9

̅ ̅
= 1.0)

)

=P(Z )=1- P(Z 2.65) from table

=1-0.9960=0.0040
 Example 8
Two independent experiments are run in which two different
types of paint are compared. Eighteen specimens are painted
using type A, and the drying time, in hours, is recorded for each.
The same is done with type B. The population standard
deviations are both known to be 1.0. Assuming that the mean
drying time is equal for the two types of paint.

21
Find P ( ̅ ̅ > 1.0), where ̅ ̅ are averages drying
times?

Solution
̅ ̅
P( ̅ ̅ > 1.0) = P( > )

Sampling Distribution of Proportion


Suppose that a random sample of n observations is taken from a
normal population with mean μ and variance .

Hence, ̂ has a normal distribution with mean

̂ ( ̂) ( )

and variance

̂ =

Then, the sampling distribution of proportion is


̂

22
 Example 9
Suppose that in a certain human population 0.08 are colorblind.
If a random sample of 150 individuals from this population is
selected. What is the probability that the proportion in the
sample who are colorblind will be greater than 0.15?
Solution

̂
̂ ( )
√ √

Sampling Distribution of Variance


When we draw a sample of size n from a population with
variance and the sample variance is computed for each
sample, then we have obtained the value of a statistics
Consider the distribution of a random variable
whose values are calculated from each sample by
formula

with degrees of freedom.


 Example 9
Use the Chi-squared distribution to solve the following: Assume
the population variance equal 9, and the sample is of size 11.
What is the probability that ?

23
Solution

( )

( )

24
25
26
Chapter 3
One and Two Sample Estimation Problems
Point Estimate
A point estimate of some population parameter is a single
value ̂ of a statistic ̂ .
For example

Parameter statistic Point estimate


̂ ̂

Mean ̅ ̅

Proportion ̂ ̂

Variance

Properties of Estimators
We would like to ensure the estimator is somewhat “close” to
the true unknown parameter, therefore we need to devise a way
to measure the distance between the two…
Since the estimator is a function of the data, it is a random
quantity. Therefore we must take this into account.
To formalize these notions we need state a number of properties
that (might) be desirable for estimators to have, just as un-
biasedness, small mean squared error, and low variance

27
Properties of good estimator
1- Unbiased
When the estimated value of the parameter and the value of the
parameter being estimated are equal, the estimator is considered
unbiased.

(̂)

Mean Squared Error


If estimator is unbiased estimator, then

(̂) ̂ ̂)

2- Efficiency
The most efficient point estimator is the one with the smallest
variance of all the unbiased and consistent estimators.

If ̂ < ̂

then ̂ is more efficient then ̂


 Example 1
Let be a random sample from a population with
mean and variance . Show that ̅ is unbiased estimator for
the population mean
Solution

̅ ∑

∑ ∑

(̂ ) - =0

28
̅ this estimator is unbiased.

 Example 2
Let be a random sample from a population with
mean and variance Find the estimator for
Solution

̅ * ∑ + ∑

(∑ ) =

( ) (̂ ) - 0

this estimator is biased.


 Example 3

Show that ̂ and ̂ are unbiased


estimators and determine which of them is more efficient
Solution

̂ ( )

̂ ( ) ( )

( ̂ ) = =0.58

̂ is the most efficient estimator

29
Interval Estimation
An interval estimate of a population parameter θ is an interval of
the form ̂ ̂ ̂

̂ ̂ these endpoints of the interval are values of


corresponding random variables ̂ ̂ .

From the sampling distribution of ̂ we shall be able to


determine ̂ ̂ such that

̂ ̂

The interval ̂ ̂ ̂ , computed from the selected sample,


is called a 100(1 − α)% confidence interval.
Single Sample: Estimating the Mean

Confidence Interval on Mean ( known)

If ̅ is the mean of a random sample of size n from a population


with known variance , a 100(1 − α)% confidence interval for
μ is given by
31
̅ ̅
√ √
where is the z-value leaving an area of α/2 to the right.

 Example 9
The average zinc concentration recovered from a sample of
measurements taken in 36 different locations in a river is found
to be 2.6 grams per milliliter. Find
the 95% confidence intervals for the mean zinc concentration in
the river. Assume that the population standard deviation is 0.3
gram per milliliter.
Solution:

̅ =2.6 , =0.3 and = = =1.96

Hence, the 95% confidence interval is

̅ ̅
√ √

How to calculate =

Find 95% confidence interval

1−α α= α/2 1-α/2


0.05 =0.025 =0.975

= = =1.96

From table

31
Confidence Interval on Mean ( Unknown, )

If ̅ and s are the mean and standard deviation of a random


sample of size n from a population with unknown variance
and , a 100(1 − α)% confidence interval for μ is given by

̅ ̅
√ √
where is the z-value leaving an area of α/2 to the right.

 Example 10
An electrical firm manufactures light bulbs that have a length of
life that is approximately normally distributed with a standard
deviation of 40 hours. If a sample of 30 bulbs has an average life
of 780 hours. Find 95% confidence interval for the population
mean of all bulbs produced by this firm?
Solution:

̅ =780 , =40 and


= = =1.96

Hence, the 95% confidence interval is

32
̅ ̅
√ √

 Example 11
The heights of a random sample of 50 college students showed a
mean of 174.5 cm. and a standard deviation of 6.9 cm. Construct
a 98% confidence interval for the mean height of all college
students.
Solution:

̅ =174.5 , =6.9 and =0.02


= = =

Hence, the 98% confidence interval is

̅ ̅
√ √
172.22

Confidence Interval on Mean ( Unknown, )

If ̅ and s are the mean and standard deviation of a random


sample of size n from a population with unknown variance
and , a 100(1 − α)% confidence interval for μ is given
by

̅ ̅
√ √

33
where is the t-value with degree of freedom,
leaving an area of α/2 to the right.

 Example 12
The contents of seven similar containers of sulfuric acid are 9.8,
10.2, 10.4, 9.8, 10.0, 10.2 and 9.6 liters. Find a 95% confidence
interval for the mean contents of all such containers, assuming
an approximately normal distribution.
Solution:

̅ =10 , =0.283 and = = =2.447

Hence, the 95% confidence interval is

̅ ̅
√ √

How to calculate

Find 95% confidence interval and n=7

1−α α= 0.05 α/2 =0.025

34
= =2.447

Two Samples: Estimating the Difference between Two


Means ( known)
If ̅ and ̅ are means of independent random samples of sizes
and from populations with known variances
and ,respectively, a 100(1 − α)% confidence interval for
is given by

̅ ̅ √ ̅ ̅ √

where is the z-value leaving an area of α/2 to the right.

 Example 13
A study was conducted in which two types of engines, A and B,
were compared. Gas mileage, in miles per gallon, was
measured. Fifty experiments were conducted using engine type
A and 75 experiments were done with engine type B. The
gasoline used and other conditions were held constant. The
average gas mileage was 36 miles per gallon for engine A and
42 miles per gallon for engine B. Find a 96% confidence interval
on , where are population mean gas
mileages for engines A and B, respectively. Assume that the
35
population standard deviations are 6 and 8 for engines A and B,
respectively.
Solution

Engine A Engine B

=50 =75

̅ =36 ̅ =42

=6 =8

= The 96% confidence interval is

̅ ̅ √ ̅ ̅ √

√ √

Two Samples: Estimating the Difference between Two


Means ( Unknown)
If ̅ and ̅ are means of independent random samples of sizes
and from approximately normal populations with
unknown but equal variances, a 100(1 − α) % confidence
interval for is given by

̅ ̅ √ ̅ ̅ √

36
where is the t-value with degrees of
freedom, leaving an area of α/2 to the right and is the pooled
estimates of the population standard deviation.

where

 Example 14
Students may choose between a 3-semester-hour physics course
without labs and a 4-semester-hour course with labs. The final
written examination is the same for each section. If 12 students
in the section with labs made an average grade of 84 with a
standard deviation of 4, and 18 students in the section without
labs made an average grade of 77 with a standard deviation of 6,
find a 99% confidence interval for the difference between the
average grades for the two courses. Assume the populations to
be approximately normally distributed with equal variances.
Solution
With Without
= lab. lab.

=12 =18
√ =5.30
̅ =84 ̅ =77

=4 =6
The 99% confidence interval is

̅ ̅ √ ̅ ̅ √

√ √

12.5

37
Paired Observation
A paired t-test is used to compare two population means where
you have two samples in which observations in one sample can
be paired with observations in the other sample (Before-and-
after observations on the same subjects).
Let = score before the module, = score after the module
The procedure is as follows:
1. Calculate the difference ( ) between the two
observations on each pair, making sure you distinguish between
positive and negative differences.

2. Calculate the mean difference ̅.

3. Calculate the standard deviation of the differences, , and


use this to calculate the standard error of the mean difference,
SE( ̅ ) = .

Calculate a 100(1 − α) % confidence interval for μ is given by

̅ ̅
√ √
where is the t-value with degree of freedom,
leaving an area of α/2 to the right.
 Example 15
Suppose a sample of 20 students were given a diagnostic test
before studying a particular module and then again after
completing the module, the following results were obtained as
follows:

38
Find 95% confidence interval for ?

Solution
̅ and

̅ ̅
√ √

Estimating a Proportion

If ̂ is the proportion of successes in a random sample of size n


and ̂ ̂ , an approximate 100(1 − α)% confidence
interval, for the binomial parameter p is given by

̂̂ ̂̂
̂ √ ̂ √

where is the z-value leaving an area of α/2 to the right

39
 Example 16
In a random sample of 1000 homes in a certain city, it is found
that 228 are heated by oil. Find 99% confidence intervals for the
proportion of homes in this city that are heated by oil?
Solution:
̂ ,̂ ,

=2.57

̂̂ ̂̂
̂ √ ̂ √

0.194 0.262

41
Chapter 4
One and Two Sample Tests of Hypotheses
Statistical hypotheses
A claim about the value of a parameter or population
characteristic.
Components of Hypothesis Test
1. State the null and alternative hypothesis.
null hypothesis and alternative hypothesis
2. Choose Level of significance
3. Choose an appropriate test statistic and calculate it using
the sample data.
4. Establish the critical region based on
5. Comparison of test statistic to critical region to
draw initial conclusions.
6. Decision
The objective of hypothesis testing is to decide, based on
sample information, if the alternative hypotheses is actually
supported by the data.

41
Single sample: Test of single mean
S , ,…., represent random sample from
distribution with mean μ and (known) variance , if ̅ be a
sample mean,
̅

̅
If is unknown and , it is estimated by ,

̅
If is unknown and

Test procedure is the following:

1- {

2-
3- Test statistic (Z computation):
̅
If known

̅

If unknown and { ̅

42
4- Critical region

Alternative Rejection regions

5- Decision if fall in Rejection regions

Example 1
A random sample of 100 recorded deaths in the United States
during the past year showed an average life span of 71.8 years.
Assuming a population standard deviation of 8.9 years, does this
seem to indicate that the mean life span today is greater than 70
years? Use a 0.05 level of significance.
Solution
1-
2-
3- Computation, ̅ and
̅
√ √

43
4- Critical region

if
5- Decision
Since and accept

Example 2
A manufacturer of sports equipment has developed a new
synthetic fishing line that the company claims has a mean
breaking strength of 8 kilograms with a standard deviation of
0.5 kilogram. Test the hypothesis that μ = 8 kilograms against
the alternative that μ 8 kilograms if a random sample of 50
lines is tested and found to have a mean breaking strength of 7.8
kilograms. Use a 0.01 level of significance.
Solution
1-
2-
3- Computation, ̅ and
̅
√ √

4- Critical region

If
5- Decision
Since
and accept

44
Example 3
The Edison Electric Institute has published figures on the
number of kilowatt hours used annually by various home
appliances. It is claimed that a vacuum cleaner uses an average
of 46 kilowatt hours per year. If a random sample of 12 homes
included in a planned study indicates that vacuum cleaners use
an average of 42 kilowatt hours per year with a standard
deviation of 11.9 kilowatt hours, does this suggest at the 0.05
level of significance that vacuum cleaners use, on average, less
than 46 kilowatt hours annually? Assume the population of
kilowatt hours to be normal.
Solution
1-
2-
3- Computation, ̅ and
̅
√ √

4- Critical region

if
5- Decision
Since
and reject

45
Two samples: Test of two means
For two independent samples, are known, then we
have
̅ ̅

If is unknown and , it is estimated by and


, then we have
̅ ̅

If is unknown and , it is estimated by and


, then we have
̅ ̅
,

Test procedure is the following:

1- {

2-
3- Test statistic (Z computation):
̅ ̅
If known

46
̅ ̅


If unknown and ̅ ̅


{
4- Critical region

Alternative Rejection regions

5- Decision if fall in Rejection regions

Example 4

47
Solution
1-
2-
3- Test statistic
̅ ̅

√ √

4- Critical region

6- Decision
Since

and accept

48
Single sample: Test of single Proportion
S is the proportion of the population, is a number of
̂

Test procedure is the following:

1- {

2-
̅
3- Test statistic (Z competition):

4- Critical region

Alternative Rejection regions

5- Decision if fall in Rejection regions

49
Example 5
A builder claims that heat pumps are installed in 70% of all
homes being constructed today in the city of Richmond,
Virginia. Would you agree with this claim if a random survey of
new homes in this city showed that 8 out of 15 had heat pumps
installed? Use a 0.10 level of significance.
Solution
1-
2-
3- Computation, and ̂
̂

√ √

4- Critical region

if

5- Decision
Since
and reject
Example 6
A commonly prescribed drug for relieving nervous tension is
believed to be only 60% effective. Experimental results with a
new drug administered to a random sample of 100 adults who
were suffering from nervous tension show that 70 received
relief. Is this sufficient evidence to conclude that the new drug is

51
superior to the one commonly prescribed? Use a 0.05 level of
significance.
Solution
1-
2-

3- Computation, and ̂
̂

√ √

4- Critical region

if
5- Decision
Since
and accept

51
Goodness-of-fit test
Now we shall consider a test to determine if a population has a
specified theoretical distribution. The test is based on how good
a fit we have between the frequency of occurrence of
observations in an observed sample and the expected
frequencies obtained from the hypothesized distribution.
Example 7
Suppose that the die is tossed 120 times and each outcome is
recorded. The results are given in the following table
Face 1 2 3 4 5 6
Observed 20 22 17 18 19 24
Expected 20 20 20 20 20 20
Test if the die is balanced, Use
Solution
1-

2-
3- Computation,

4- Critical region

if reject

5- Decision and reject

52
Example 8
The grades in a statistics course for a particular semester were as
follows:
Grade A B C D F Total
Observed 14 18 32 20 16 100
Expected 20 20 20 20 20 100/5
Test if the distribution of the grades is Uniform, Use
level of significance
Solution
1-

2-
3- Computation,

4- Critical region

if reject

5- Decision

53
Test of Independence
Example 9
A random sample of 30 adults is classified according to gender
and the number of hours they watch T.V. during a week.

Male Female
Over 25 hours 5 9

Under 25 hours 9 7

Using level of significance, test the hypothesis that


a person's gender and the number of hours they watch T.V.
are independent.
Solution
1- a person's gender and the number of hours they watch
T.V. are independent.
a person's gender and the number of hours they watch
T.V. are dependent.

2-
Male Female Total
Over 25 hours 5 (6.5) 9 (7.5) 14
Under 25 hours 9 (7.5) 7 (8.5) 16
Total 14 16 N=30

3- Computation, number of row, number of column

∑∑

where,

54
4- Critical region
with

if reject

5- Decision
i.e. person's gender and the number of hours they watch T.V. are
independent.

55

You might also like