Download as pdf or txt
Download as pdf or txt
You are on page 1of 257

BITS Pilani

Pilani Campus

[BA ZG524/MBA ZG538/PDBA


ZG538] Advanced Statistical
Methods
Lecture No : 2[20-01-24]
Having a good understanding of the different data types,
also called (measurement scales), is a crucial
prerequisite for doing Exploratory Data Analysis (EDA),
since you can use certain statistical measurements only
for specific data types. ...
Data Types:
Nominal Data:
Ordinal Data
Discrete Data
Continuous Data

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Data Sources

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Data Insight:

BITS Pilani, Pilani Campus


A drug is used to maintain a
steady hear rate in patients Suppose that we wish to
who have suffered a mild compare a new drug to that
heart attack.Let X denote of above
the number of heartbeats And let Y the number of
per minute obtained per heartbeats per minute
patient. Consider the obtained with new drug.
hypothetical density given The hypothetical density
in Table table is given below

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Will focus on:
• Brief review on Normal distribution
• Sampling distribution
• Estimation ,Confidence interval
• Hypothesis testing
• p-value and its significance
• Refer : Chapters 6,7,8,9

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 8 BITS-Pilani


Karl Friedric
Gauss[1777-1855]
“ God loves Average
Looking that is why he
makes so many of
them”
“Two Extremes are
always rare while
middling's are
common”

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 9 BITS-Pilani


BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 10 BITS-Pilani
The Normal Distribution - Families

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 11 BITS-Pilani


BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 12 BITS-Pilani
The standard normal distribution is a normal distribution with a mean of 0
and a standard deviation of 1.
It is also called the z distribution.
A z-value is the distance between a selected value, designated X, and the
population mean µ, divided by the population standard deviation, σ.
The formula is:

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 13 BITS-Pilani


Given that z is a standard normal random
variable, compute the following probabilities
a. P(z≤1.5) b. P(z≤1) c. P( 1 ≤ z ≤1.5)
d. P( 0 ≤ z ≤2.5)

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS BITS Pilani, Deemed to be University under 14


Section 3 of UGCBITS-Pilani
Act, 1956
##

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 15 BITS-Pilani


BITS Pilani
Pilani Campus

[BA ZG524/MBA ZG538/PDBA


ZG538] Advanced Statistical
Methods
Lecture No : 3[27-01-24]
• The problem with collecting data is that you do not
generally know what distribution the data follows. So you
have a sample, but no distribution to help figure it out. The
true distribution is generally not knowable.
• General assumption data follows:
• The Omnipresent and Omnipotent Normal Distribution
• The normal distribution is important because it makes
statistics a lot easier, and more feasible.
• For a person being from a non-statistical background the
most confusing aspect of statistics, are always the
fundamental statistical tests, and when to use which.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
• Outliers:
• a person, thing, or fact that is very different from other
people, things, or facts

5
“It is a good idea to check for outliers before making
decisions based on data analysis. Errors are often made
in recording data and entering data into the computer.
Outliers should not necessarily be deleted, but their
accuracy and appropriateness should be verified.

BITS Pilani, Pilani Campus


Observation!
Some financial data representing jobs and productivity for the 16 largest
publishing firms appeared in an article in Forbes magazine on April 30,1990.
The data for the pair of variable
X1=employees(jobs)
X2=profits per employee(productivity) are graphed in Fig below:

BITS Pilani, Pilani Campus


• Some important techniques to name few
to detect outliers:
• Box Plots
• Numeric Outlier via Interquartile Range
• Z-score using the empirical rule
• Residual analysis in Linear Regression
Model
• Mahalanobis Distance

8
Empirical rule
For the data having a bell-shaped distribution.

1. Standard Z score can be


used to identify outliers
2. Empirical rule allows us
to conclude that for data
with a bell shaped
distribution all most all
the data lies within 3SD
of the mean
3. If Zscore is less than -3
and above 3 is an outlier.
4. Z score =

BITS Pilani, Pilani Campus


Q: Many families in California are using
backyard structures for home offices, art
studios and hobby areas as well as for
additional storage. Suppose that the mean
price for a customized wooden, shingled
backyard structure is $3100. Assume that
the standard deviation is $1200
a. What is the z-score for backyard structure
costing $2300
b. What is the z-score for backyard structure
costing $4900
c. Interpret the z-scores in parts(a) and (b)
Comment on whether either should be
considered an outlier.
d. If the cost for a backyard shed-office in
California is $13,000, should this structure
be considered an outlier? Explain

BITS Pilani, Pilani Campus


Sample mean(x)=29.1
Sample sd(x) = 16.60
Sample mean(y) =17.2
Sample sd(y)=9.28

11
• Is the outlier because of error
measurement or incorrectly • To summarize our discussion:
entered? — Then it is a Noise
and should be dropped (or • Why do you want to find the
change, if you know the real outlier? You might be want to
value of the data) see the outlier because you
• Is the outlier does not change are interested in the
the results but does it affect abnormality. Think about what
the assumptions? In this case, your question is.
you may drop the outlier or not. • Is the outlier “actually” causing
• Is the outlier affects both any problems with the result,
statistical results and the influence, or assumptions?
assumptions? In this case, we • Where did the outlier come?
cannot merely drop the outlier. This might take in-depth
analysis and domain expertise.
• Moreover, You can’t always
tell where it is come from, but
try to consider different
possibilities because it can
help inform the best way to
proceed.

12
[6/288/17]
The mean cost of domestic airfares in the USA rose
to an all-time high of $385per ticket. Airfares
were based on the total ticket value, which
consisted of the price charged by the airlines
plus any additional taxes and fees. Assume
domestic airfares are normally distributed with a
standard deviation of $110
a. What is the probability that a domestic airfare is
$550 or more
b. What is the probability that a domestic airfare is
$250 or less
c. What is the probability that a domestic airfare is
between $300 and $550
d. What is the cost for the 3% highest domestic
airfares?

BITS Pilani, Pilani Campus


For most of us being from a non-statistical background
the most confusing aspect of statistics, are always the
fundamental statistical tests, and when to use which.

14
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 16 BITS-Pilani

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 17 BITS-Pilani


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
The process of drawing inferences
about a population on the basis of
information contained in a sample
taken from the population is called
statistical inference.
Statistical Inference is divided into two
major areas:
Estimation of Parameters
Testing of Hypothesis
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
[8/355/2]
A simple random sample of 50 items
from a population with σ =6
resulted in a sample mean 32.
b. Provide a 90% confidence interval
for the population mean

BITS Pilani, Pilani Campus


[8/355/2]
A simple random sample of 50 items
from a population with σ =6
resulted in a sample mean 32.
c. Provide a 99% confidence interval
for the population mean

BITS Pilani, Pilani Campus


Population Mean : σ is known(or)unknown

BITS Pilani, Pilani Campus


[8/363/13]
The following sample data are from a normal
population: 10,8,12,15,13,11,6,5
a. What is the point estimate of the
population mean
b. What is the point estimate of the
population standard deviation
c. With 95% confidence, what is the margin
of error for the estimation of population
mean
d. What is the 95% confidence interval for
the population mean?

BITS Pilani, Pilani Campus


• Situation1 :
• “ Will taught basic hypothesis in
just $5 “
Situation 2 : Criminal trial example:

BITS Pilani, Pilani Campus


• Situation !:
• Evaluating a question in
Comprehensive Examination.
###

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 35 BITS-Pilani


3
6 Hypothesis testing is the process used to evaluate the
strength of evidence from the sample and provides a
framework for making determinations related to the
population, ie, it provides a method for understanding how
reliably one can extrapolate observed findings in a sample
under study to the larger population from

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 36 BITS-Pilani


• Hypothesis testing
• Generally in Hypothesis testing we begin
by making a tentative assumption about Guidelines:
a population parameter. 1.When testing a hypothesis
• This assumption is called the null concerning the value of some
hypothesis(H0) parameter θ , the statement of
equality will always be included
• Then we define another hypothesis,
in H0.(Null Hypothesis) In this
called the alternative hypothesis, which
way H0 pinpoints a specific
is the opposite of what is stated in the
numerical value that could be
null hypothesis(Ha)
the actual value of θ. This
• Testing procedure uses data from a value is called the null value
sample to test the two competing and is denoted by θ0.
statements.
2.Whatever is to be detected or
• In some situations it is easier to identify supported is the alternative
the alternative hypothesis first and then hypothesis.(Ha)
develop the null hypothesis. In others its
3.Since our research
otherwise
hypothesis is Ha, it is hoped
• Many applications of hypothesis testing that the evidence leads us to
involve an attempt to gather evidence in reject H0 and thereby to
support of a research hypothesis, in accept Ha.
these situations, it is often best to begin
with alternative hypothesis and make it
the conclusive that the researcher hopes
to support.
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS BITS Pilani, Pilani
37 Campus
BITS-Pilani
3
8

Test concerning population mean :


Z-test and t –test [large and small smaple tests]
Test concerning Variances :
Chi-square distribution and F –distribution
Will discuss test concerning Population mean [ Z-test]
Rest will follow in similar lines.

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 38 BITS-Pilani


3 [Construct Logical Statements]
9 !The maximum acceptable level for
exposure to microwave radiation in the
united states is an average of 10
microwatts per square centimeter. It is
feared that a large television transmitter
may be polluting the air nearby pushing
the level of microwave radiation above the
safe limit.
!!Design engineers are working on a low-
effort steering system that can be used in
vans modified to fit the needs of disabled
drivers. The old-type steering system
required a force of 54 ounces to turn the
vans’ 15-inch diameter steering wheel. It
is hoped that the new design will reduce
the average force required to turn the
wheel.

!!! A computer system currently gas 10


terminals and uses a single printer. The
average turnaround time for the system is
15 minutes. Ten new terminals and a
second printer are added to the system.
We want to determine whether or not the
mean turnaround time is affected.
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 39 BITS-Pilani
4 [Construct Logical Statements]
0 !The maximum acceptable level for
exposure to microwave radiation in the
united states is an average of 10
microwatts per square centimeter. It is
feared that a large television transmitter
may be polluting the air nearby pushing
the level of microwave radiation above the
safe limit.
!!Design engineers are working on a low-
effort steering system that can be used in
vans modified to fit the needs of disabled
drivers. The old-type steering system
required a force of 54 ounces to turn the
vans’ 15-inch diameter steering wheel. It
is hoped that the new design will reduce
the average force required to turn the
wheel.

!!! A computer system currently gas 10


terminals and uses a single printer. The
average turnaround time for the system is
15 minutes. Ten new terminals and a
second printer are added to the system.
We want to determine whether or not the
mean turnaround time is affected.
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 40 BITS-Pilani
###

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 41 BITS-Pilani


BITS Pilani
Pilani Campus

[BA ZG524/MBA ZG538/PDBA


ZG538] Advanced Statistical
Methods
Lecture No : 4[03-01-24]
The process of drawing inferences
about a population on the basis of
information contained in a sample
taken from the population is called
statistical inference.
Statistical Inference is divided into two
major areas:
Estimation of Parameters
Testing of Hypothesis
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Sampling distribution of Mean

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
A simple random sample of 50 items
from a population with σ =6
resulted in a sample mean 32.
b. Provide a 90% confidence interval
for the population mean

BITS Pilani, Pilani Campus


A simple random sample of 50 items
from a population with σ =6
resulted in a sample mean 32.
c. Provide a 99% confidence interval
for the population mean

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Population Mean : σ is known(or)unknown

BITS Pilani, Pilani Campus


[8/363/13]
The following sample data are from a normal
population: 10,8,12,15,13,11,6,5
a. What is the point estimate of the
population mean
b. What is the point estimate of the
population standard deviation
c. With 95% confidence, what is the margin
of error for the estimation of population
mean
d. What is the 95% confidence interval for
the population mean?

BITS Pilani, Pilani Campus


• Situation1 :
Situation 2 : Criminal trial example:

BITS Pilani, Pilani Campus


• Situation 3!:
Hypothesis testing is the process used to evaluate the
strength of evidence from the sample and provides a
framework for making determinations related to the
population, ie, it provides a method for understanding how
reliably one can extrapolate observed findings in a sample
under study to the larger population from

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 18 BITS-Pilani


• Hypothesis testing

• Generally in Hypothesis testing we begin by Guidelines:


making a tentative assumption about a 1.When testing a hypothesis
population parameter. concerning the value of some
• This assumption is called the null parameter θ , the statement of
hypothesis(H0) equality will always be included
• Then we define another hypothesis, called the in H0.(Null Hypothesis) In this
alternative hypothesis, which is the opposite of way H0 pinpoints a specific
what is stated in the null hypothesis(Ha)
numerical value that could be
• Testing procedure uses data from a sample to the actual value of θ. This
test the two competing statements.
value is called the null value
• In some situations it is easier to identify the
and is denoted by θ0.
alternative hypothesis first and then develop
the null hypothesis. In others its otherwise 2.Whatever is to be detected or
• Many applications of hypothesis testing supported is the alternative
involve an attempt to gather evidence in hypothesis.(Ha)
support of a research hypothesis, in these 3.Since our research
situations, it is often best to begin with
hypothesis is Ha, it is hoped
alternative hypothesis and make it the
conclusive that the researcher hopes to
that the evidence leads us to
support. reject H0 and thereby to
accept Ha.

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS BITS Pilani, Pilani


19 Campus
BITS-Pilani
Test concerning population mean :
Z-test and t –test [large and small smaple tests]
Test concerning Variances :
Chi-square distribution and F –distribution
Will discuss test concerning Population mean [ Z-test]
Rest will follow in similar lines.

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 20 BITS-Pilani


[Construct Logical Statements]
!The maximum acceptable level for
exposure to microwave radiation in the
united states is an average of 10
microwatts per square centimeter. It is
feared that a large television transmitter
may be polluting the air nearby pushing
the level of microwave radiation above the
safe limit.
!!Design engineers are working on a low-
effort steering system that can be used in
vans modified to fit the needs of disabled
drivers. The old-type steering system
required a force of 54 ounces to turn the
vans’ 15-inch diameter steering wheel. It
is hoped that the new design will reduce
the average force required to turn the
wheel.

!!! A computer system currently gas 10


terminals and uses a single printer. The
average turnaround time for the system is
15 minutes. Ten new terminals and a
second printer are added to the system.
We want to determine whether or not the
mean turnaround time is affected.
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 21 BITS-Pilani
[Construct Logical Statements]
!The maximum acceptable level for
exposure to microwave radiation in the
united states is an average of 10
microwatts per square centimeter. It is
feared that a large television transmitter
may be polluting the air nearby pushing
the level of microwave radiation above the
safe limit.
!!Design engineers are working on a low-
effort steering system that can be used in
vans modified to fit the needs of disabled
drivers. The old-type steering system
required a force of 54 ounces to turn the
vans’ 15-inch diameter steering wheel. It
is hoped that the new design will reduce
the average force required to turn the
wheel.

!!! A computer system currently gas 10


terminals and uses a single printer. The
average turnaround time for the system is
15 minutes. Ten new terminals and a
second printer are added to the system.
We want to determine whether or not the
mean turnaround time is affected.
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 22 BITS-Pilani
Q)Suppose, for instance, that
we want to test on the basis of
n=35 determinations and at
the 0.05 level of significance
whether the thermal
conductivity of a certain kind
of cement brick is 0.340, as
•Hypothesis testing: has claimed. From
1. Set up the Null and Alternative hypothesis
2. The nature of Ha decide whether One- information gathered in similar
tailed(right or left tail) test or two-tailed test studies, the variability of such
is to be applied.
3. Choose an appropriate level of determinations is given by
significance(α) if it is not given in σ=0.010.
problem.(Usually we choose 5% level of
significance. And suppose the mean of the
4. Choose an appropriate test statistic and 35 determinations is 0.343.
compute the value
5. Compare the computed value with the 1)Set up the appropriate null
critical value at the given level of and alternative Hypothesis
significance. And accordingly took the
decision. 2)Test the above hypothesis
using (a)Hypothesis testing
method

BITS Pilani, Pilani Campus


Q)Suppose, for instance, that we want
to test on the basis of n=35
determinations and at the 0.05 level of
significance whether the thermal
conductivity of a certain kind of
cement brick is 0.340, as has claimed.
From information gathered in similar
studies, the variability of such
determinations is given by σ=0.010.
And suppose the mean of the 35
determinations is 0.343.
1)Set up the appropriate null and
•Hypothesis testing method(or) rejection region alternative Hypothesis
2)Test the above hypothesis using
method:
(a)Hypothesis testing method

BITS Pilani, Pilani Campus


Problem :
Q)Suppose, for instance, that we want to test on the basis of
n=35 determinations and at the 0.05 level of significance
whether the thermal conductivity of a certain kind of
cement brick is 0.340, as has claimed. From information
gathered in similar studies, the variability of such
determinations is given by σ=0.010.And suppose the mean
of the 35 determinations is 0.343.
Test the above hypothesis using p-value method
• Significance Testing:(P-value method)
1. If P≤α, then we reject H0 at the stated level of
significance.
2. If P>α, we fail to reject Ho (Accept Ho ).
3. P value: The P-value is the area to the right
side of the observed value of the test statistic.
i.e p-value = P(Z≥z)
p-value = P(Z≤-z) [If test statistic is negative]

i.e p-value = P(Z≥z) [ If it is one sided]


i.e p-value = 2P(Z≥z) [ If it is one sided]

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Problem

• Glycerol is a major by-product of ethanol fermentation in wine production and contributes to


the sweetness, body, and fullness of wines.

• The article “A Rapid and Simple Method for Simultaneous Determination of Glycerol,
Fructose, and Glucose in Wine” (American J. of Enology and Viticulture, 2007: 279–283)
includes the following observations on glycerol concentration (mg/mL) for samples of
standard-quality (uncertified) white wines: 2.67, 4.62, 4.14, 3.81, 3.83. Suppose the desired
concentration value is 4.

• Does the sample data suggest that true average concentration is something other than the
desired value?
• Let’s carry out a test of appropriate hypotheses using the one-sample t test with a
significance level of .05.
The accompanying Minitab output from a request to
perform a two-tailed one sample t test shows
identical calculated values to those just obtained.

The fact that the last number on output, the “P-value,”


exceeds .05 (and any other reasonable significance
level) implies that the null hypothesis can’t be
rejected.

BITS Pilani, Pilani Campus


“ Variance can provide important decision-making
information” [Link]
In the preceding sessions we discussed statistical
inference involving population means.
In continuation will extend our discussion involving
inferences about population variance.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Q. It is observed that for a sample of
size 9 , the sample standard deviation
is 2.81,
Develop a 90% and 95% CI for the
population variance (σ2)

BITS Pilani, Pilani Campus


Upper tail test

Hypotheses

Test statistic

Rejectio region (p-


value approach)

Rejectio region
(Critical value
approach)

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Few applications:

• 12.2 Chi-square test for independence


• 12.3 Chi-square test for goodness of fit.

BITS Pilani, Pilani Campus


Consider the example:
A beer industry association conducts a
survey to determine the preferences of
beer drinkers for light,regular,and dark
beers.
A sample of 200 beer drinkers is taken with
each person in the sample asked to
indicate a preference for one of the three
types of beers:light,regular,or dark.
At the end of the survey questionnaire, the
respondent is asked to provide information
on a variety of demographics including
gender: male or female.
A research question of interest to the
association is whether preferences for the
three types of beer is independent of the
gender of the beer drinker.

BITS Pilani, Pilani Campus


Consider the example:
A beer industry association conducts a survey to
determine the preferences of beer drinkers for
light,regular,and dark beers.
A sample of 200 beer drinkers is taken with each
person in the sample asked to indicate a
preference for one of the three types of
beers:light,regular,or dark.
At the end of the survey questionnaire, the
respondent is asked to provide information on
a variety of demographics including gender:
male or female.
A research question of interest to the association
is whether preferences for the three types of
beer is independent of the gender of the beer
drinker.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Chi-square test for Goodness of fit

Another application of the chi-square test is Source : Chemline Employee


to determine whether a population being
sampled has a specific distribution. Aptitute Test Scores For 50
Look at the example below: Randomly Chosen Job
Chemline hires approximately 400 new Applicants:
employees annually for its four plants
located throughout the United States. The 71 66 61 65 54 93 60 86
personnel director asks whether a normal 70 70 73 73 55 63 56 62
distribution applies for the population of
test scores. If such a distribution can be 76 54 82 79 76 68 53 58
used, the distribution would be helpful in 85 80 56 61 61 64 65 62
evaluating specific test scores; that is , 90 69 76 79 77 54 64 74
scores in the upper 20%, lower 40%, and
so on could be identified quickly. 65 65 61 56 63 80 56 71
Hence we want to test the null hypothesis 79 84
that the population of test scores has a
normal distribution.

BITS Pilani, Pilani Campus


####
BITS Pilani
Pilani Campus

[BA ZG524/MBA ZG538/PDBA


ZG538] Advanced Statistical
Methods
Lecture No : 5[10-02-24]
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Hypothesis testing
• Generally in Hypothesis testing we begin
by making a tentative assumption about
a population parameter.
• This assumption is called the null
hypothesis(H0) Guidelines:
• Then we define another hypothesis, 1.When testing a hypothesis concerning
called the alternative hypothesis, which the value of some parameter θ , the
is the opposite of what is stated in the statement of equality will always be
null hypothesis(Ha) included in H0.(Null Hypothesis) In this
• Testing procedure uses data from a way H0 pinpoints a specific numerical
sample to test the two competing value that could be the actual value of θ.
statements. This value is called the null value and is
• In some situations it is easier to identify denoted by θ0.
the alternative hypothesis first and then 2.Whatever is to be detected or
develop the null hypothesis. In others its supported is the alternative
otherwise hypothesis.(Ha)
• Many applications of hypothesis testing 3.Since our research hypothesis is Ha, it
involve an attempt to gather evidence in is hoped that the evidence leads us to
support of a research hypothesis, in reject H0 and thereby to accept Ha.
these situations, it is often best to begin
with alternative hypothesis and make it
the conclusive that the researcher hopes
to support.

BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 5BITS Pilani, Pilani Campus


BITS-Pilani
[Construct Logical Statements]
!The maximum acceptable level for
exposure to microwave radiation in the
united states is an average of 10
microwatts per square centimeter. It is
feared that a large television transmitter
may be polluting the air nearby pushing
the level of microwave radiation above the
safe limit.
!!Design engineers are working on a low-
effort steering system that can be used in
vans modified to fit the needs of disabled
drivers. The old-type steering system
required a force of 54 ounces to turn the
vans’ 15-inch diameter steering wheel. It
is hoped that the new design will reduce
the average force required to turn the
wheel.

!!! A computer system currently gas 10


terminals and uses a single printer. The
average turnaround time for the system is
15 minutes. Ten new terminals and a
second printer are added to the system.
We want to determine whether or not the
mean turnaround time is affected.
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 6 BITS-Pilani
[Construct Logical Statements]
!The maximum acceptable level for
exposure to microwave radiation in the
united states is an average of 10
microwatts per square centimeter. It is
feared that a large television transmitter
may be polluting the air nearby pushing
the level of microwave radiation above the
safe limit.
!!Design engineers are working on a low-
effort steering system that can be used in
vans modified to fit the needs of disabled
drivers. The old-type steering system
required a force of 54 ounces to turn the
vans’ 15-inch diameter steering wheel. It
is hoped that the new design will reduce
the average force required to turn the
wheel.

!!! A computer system currently gas 10


terminals and uses a single printer. The
average turnaround time for the system is
15 minutes. Ten new terminals and a
second printer are added to the system.
We want to determine whether or not the
mean turnaround time is affected.
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 7 BITS-Pilani
[Construct Logical Statements]
!The maximum acceptable level for
exposure to microwave radiation in the
united states is an average of 10
microwatts per square centimeter. It is
feared that a large television transmitter
may be polluting the air nearby pushing
the level of microwave radiation above the
safe limit.
!!Design engineers are working on a low-
effort steering system that can be used in
vans modified to fit the needs of disabled
drivers. The old-type steering system
required a force of 54 ounces to turn the
vans’ 15-inch diameter steering wheel. It
is hoped that the new design will reduce
the average force required to turn the
wheel.

!!! A computer system currently gas 10


terminals and uses a single printer. The
average turnaround time for the system is
15 minutes. Ten new terminals and a
second printer are added to the system.
We want to determine whether or not the
mean turnaround time is affected.
BA ZG524 A D V AN C E D S T AT IS T IC AL M E T HODS 8 BITS-Pilani
Q)Suppose, for instance, that
we want to test on the basis of
n=35 determinations and at
the 0.05 level of significance
whether the true average
thermal conductivity of a
certain kind of cement brick is
0.340, as has claimed. From
information gathered in similar
•Hypothesis testing: studies, the variability of such
1. Set up the Null and Alternative hypothesis determinations is given by
2. The nature of Ha decide whether One-tailed(right or left tail)
test or two-tailed test is to be applied. σ=0.010.
3. Choose an appropriate level of significance(α) if it is not And suppose the mean of the
given in problem.(Usually we choose 5% level of
significance. 35 determinations is 0.343.
4. Choose an appropriate test statistic and compute the value 1)Set up the appropriate null
5. Compare the computed value with the critical value at the
given level of significance. And accordingly took the decision. and alternative Hypothesis
2)Test the above hypothesis
using (a)Hypothesis testing
method

BITS Pilani, Pilani Campus


Q)Suppose, for instance, that
we want to test on the basis of
n=35 determinations and at the
0.05 level of significance
whether the true average
thermal conductivity of a
certain kind of cement brick is
0.340, as has claimed. From
•Hypothesis testing method(or) rejection region information gathered in similar
method:
studies, the variability of such
determinations is given by
σ=0.010.
And suppose the mean of the
35 determinations is 0.343.
1)Set up the appropriate null
and alternative Hypothesis
2)Test the above hypothesis
using (a)Hypothesis testing
method

BITS Pilani, Pilani Campus


Q)Suppose, for instance, that we want to test on
the basis of n=35 determinations and at the 0.05
level of significance whether the true average
thermal conductivity of a certain kind of cement
brick is 0.340, as has claimed. From
information gathered in similar studies, the
variability of such determinations is given by
σ=0.010.
And suppose the mean of the 35 determinations
•Hypothesis testing method(or) rejection region is 0.343.
method: 1)Set up the appropriate null and alternative
Hypothesis
2)Test the above hypothesis using
(a)Hypothesis testing method

BITS Pilani, Pilani Campus


Problem :
Q)Suppose, for instance, that we want to test on the basis of
n=35 determinations and at the 0.05 level of significance
whether the true average thermal conductivity of a certain
kind of cement brick is 0.340, as has claimed. From
information gathered in similar studies, the variability of
such determinations is given by σ=0.010.And suppose the
mean of the 35 determinations is 0.343.
Test the above hypothesis using p-value method
• Significance Testing:(P-value method)
1. If P≤α, then we reject H0 at the stated level of
significance.
2. If P>α, we fail to reject Ho (Accept Ho ).
3. P value: The P-value is the area to the right
side of the observed value of the test statistic.
i.e p-value = P(Z≥z)
p-value = P(Z≤-z) [If test statistic is negative]

i.e p-value = P(Z≥z) [ If it is one sided]


i.e p-value = 2P(Z≥z) [ If it is one sided]

BITS Pilani, Pilani Campus


Problem :
Q)Suppose, for instance, that we want to test on the basis of
n=35 determinations and at the 0.05 level of significance
whether the true average thermal conductivity of a certain
kind of cement brick is 0.340, as has claimed. From
information gathered in similar studies, the variability of
such determinations is given by σ=0.010.And suppose the
mean of the 35 determinations is 0.343.
Test the above hypothesis using p-value method
• Significance Testing:(P-value method)
1. If P≤α, then we reject H0 at the stated level of
significance.
2. If P>α, we fail to reject Ho (Accept Ho ).
3. P value: The P-value is the area to the right
side of the observed value of the test statistic.
i.e p-value = P(Z≥z)
p-value = P(Z≤-z) [If test statistic is negative]

i.e p-value = P(Z≥z) [ If it is one sided]


i.e p-value = 2P(Z≥z) [ If it is one sided]

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Problem

• Glycerol is a major by-product of ethanol fermentation in wine production and contributes to


the sweetness, body, and fullness of wines.

• The article “A Rapid and Simple Method for Simultaneous Determination of Glycerol,
Fructose, and Glucose in Wine” (American J. of Enology and Viticulture, 2007: 279–283)
includes the following observations on glycerol concentration (mg/mL) for samples of
standard-quality (uncertified) white wines: 2.67, 4.62, 4.14, 3.81, 3.83. Suppose the desired
concentration value is 4.

• Does the sample data suggest that true average concentration is something other than the
desired value?
• Let’s carry out a test of appropriate hypotheses using the one-sample t test with a
significance level of .05.
The accompanying Minitab output from a request to
perform a two-tailed one sample t test shows
identical calculated values to those just obtained.

The fact that the last number on output, the “P-value,”


exceeds .05 (and any other reasonable significance
level) implies that the null hypothesis can’t be
rejected.

BITS Pilani, Pilani Campus


Consider the situation:
A drug is used to maintain a steady heart rate in
patients who have suffered a mild heart
attack. Let X denote the number of
heartbeats per minute obtained per patient.
Consider the hypothetical density given in
Table:

Suppose that we wish to compare a new drug


to that of above where x is number of
heartbeats per minute obtained using the old
drug and Y the number per minute obtained
with the new drug. The hypothetical density
table is

BITS Pilani, Pilani Campus


• “ Variance can provide important decision-
making information”
• In the preceding sessions we discussed
statistical inference involving population
means.
• In continuation will extend our discussion
involving inferences about population variance.
• We will use chi-square distributions

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Q. It is observed that for a sample of
size 9 , the sample standard deviation
is 2.81,
Develop a 90% and 95% CI for the
population variance (σ2)

BITS Pilani, Pilani Campus


Upper tail test

Hypotheses

Test statistic

Rejectio region (p-


value approach)

Rejectio region
(Critical value
approach)

BITS Pilani, Pilani Campus


Few applications:

• 12.2 Chi-square test for independence


• 12.3 Chi-square test for goodness of fit.

BITS Pilani, Pilani Campus


Consider the example:
A beer industry association conducts a
survey to determine the preferences of
beer drinkers for light,regular,and dark
beers.
A sample of 200 beer drinkers is taken with
each person in the sample asked to
indicate a preference for one of the three
types of beers:light,regular,or dark.
At the end of the survey questionnaire, the
respondent is asked to provide information
on a variety of demographics including
gender: male or female.
A research question of interest to the
association is whether preferences for the
three types of beer is independent of the
gender of the beer drinker.

BITS Pilani, Pilani Campus


Consider the example:
A beer industry association conducts a survey to
determine the preferences of beer drinkers for
light,regular,and dark beers.
A sample of 200 beer drinkers is taken with each
person in the sample asked to indicate a
preference for one of the three types of
beers:light,regular,or dark.
At the end of the survey questionnaire, the
respondent is asked to provide information on
a variety of demographics including gender:
male or female.
A research question of interest to the association
is whether preferences for the three types of
beer is independent of the gender of the beer
drinker.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Goodness of fit

Chemline hires approximately 400 new employees annually


for its four plants located throughout the United States.
The personnel director asks whether a normal
distribution applies for the population of test scores. If
such a distribution can be used, the distribution would be
helpful in evaluating specific test scores; that is ,scores
in the upper 20%, lower 40% , and so on could be
identified quickly.
Hence we want to test the null hypothesis that the
population of test scores has a normal distribution.

BITS Pilani, Pilani Campus


Source : Chemline Employee Aptitute Test Scores For 50
Randomly Chosen Job Applicants:
71 66 61 65 54 93 60 86 70 70 73 73 55 63 56 62
76 54 82 79 76 68 53 58 85 80 56 61 61 64 65
62 90 69 76 79 77 54 64 74 65 65 61 56 63 80
56 71 79 84
First step: Develop estimates of the mean and standard
deviation of the normal distribution.
Sample mean:
Sample standard deviation:

BITS Pilani, Pilani Campus


Sol..

BITS Pilani, Pilani Campus


Percentage z Test Score
10% -1.28
20% -0.84
30% -0.52
40% -0.25
50% 0.00
60% 0.25
70% 0.52
80% 0.84
90% 1.28

BITS Pilani, Pilani Campus


Percentage z Test Score
10% -1.28 68.42-1.28(10.41)=55.10
20% -0.84 59.68
30% -0.52 63.01
40% -0.25 65.82
50% 0.00 68.42
60% 0.25 71.02
70% 0.52 73.83
80% 0.84 77.16
90% 1.28 81.74

BITS Pilani, Pilani Campus


71 66 61 65 54 93 60 86 70 70 73 73 55 63 56 62 76 54 82 79 76 68 53 58 85 80 56 61

61 64 65 62 90 69 76 79 77 54 64 74 65 65 61 56 63 80 56 71 79 84

Test Score Interval Observed fre Expected fre Chi- square test sta
Less than 55.10
55.10 to 59.68
59.68 to 63.01
63.01 to 65.82
65.82 to68.42
68.42 to 71.02
71.02 to 73.83
73.83 to 77.16
77.16 to 81.74
81.74 and over
BITS Pilani, Pilani Campus
Test Score Interval Observed fre Expected fre Chi- square test sta
Less than 55.10 5 5 0
55.10 to 59.68 5 5 0
59.68 to 63.01 9 5 3.2
63.01 to 65.82 6 5 0.2
65.82 to68.42 2 5 1.8
68.42 to 71.02 5 5 0
71.02 to 73.83 2 5 1.8
73.83 to 77.16 5 5 0
77.16 to 81.74 5 5 0
81.74 and over 6 5 0.2

BITS Pilani, Pilani Campus


Note : The degree of freedom = k-p-1
where
p= number of parameters of the distribution
estimated by the sample =2
k= number of categories =10
Total number of degrees of freedom= 10-2-1=7
P-value >0.10 , fail to reject the HO
i.E the observed sample came from normal distribution.

BITS Pilani, Pilani Campus


Situation:
The independent-samples t-test with the example of an
experiment aimed at determining whether two types of
music have different effects on the performance of a
mental task.

Suppose that we were instead interested in assessing


the relative effects of three types of music. In this case,
the experimental procedure is the same in every detail,
except that now we carry it out with three groups, one for
each of the three types of music.

BITS Pilani, Pilani Campus


The analysis of variance, commonly referred to by the
acronym ANOVA, was first developed as a strategy for
dealing with this sort of complication. At its lowest level it
is essentially an extension of the logic of t-tests to those
situations where we wish to compare the means of three
or more samples concurrently.

BITS Pilani, Pilani Campus


#####
BITS Pilani
Pilani Campus

[BA ZG524/MBA ZG538/PDBA


ZG538] Advanced Statistical
Methods
Lecture No : 6[16-02-24]
•Significance Testing:(P-value method)
1. If P≤α, then we reject H0 at the stated
level of significance.
2. If P>α, we fail to reject Ho (Accept Ho ).
3. P value: The P-value is the area to the
right side of the observed value of the
test statistic. i.e p-value = P(Z≥z)
p-value = P(Z≤-z) [If test statistic is
negative]

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Solution

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Problem
• Glycerol is a major by-product of ethanol fermentation in wine production and contributes to
the sweetness, body, and fullness of wines.

• The article “A Rapid and Simple Method for Simultaneous Determination of Glycerol,
Fructose, and Glucose in Wine” (American J. of Enology and Viticulture, 2007: 279–283)
includes the following observations on glycerol concentration (mg/mL) for samples of
standard-quality (uncertified) white wines: 2.67, 4.62, 4.14, 3.81, 3.83. Suppose the desired
concentration value is 4.

• Does the sample data suggest that true average concentration is something other than the
desired value?
• Let’s carry out a test of appropriate hypotheses using the one-sample t test with a
significance level of .05.
The accompanying Minitab output from a request to
perform a two-tailed one sample t test shows
identical calculated values to those just obtained.

The fact that the last number on output, the “P-value,”


exceeds .05 (and any other reasonable significance
level) implies that the null hypothesis can’t be
rejected.

BITS Pilani, Pilani Campus


Consider the situation:
A drug is used to maintain a steady heart rate in
patients who have suffered a mild heart
attack. Let X denote the number of
heartbeats per minute obtained per patient.
Consider the hypothetical density given in
Table:

Suppose that we wish to compare a new drug


to that of above where x is number of
heartbeats per minute obtained using the old
drug and Y the number per minute obtained
with the new drug. The hypothetical density
table is

BITS Pilani, Pilani Campus


• “ Variance can provide important decision-
making information”
• In the preceding sessions we discussed
statistical inference involving population
means.
• In continuation will extend our discussion
involving inferences about population variance.
• We will use chi-square distributions

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Q. It is observed that for a sample of
size 9 , the sample standard deviation
is 2.81,
Develop a 90% and 95% CI for the
population variance (σ2)

BITS Pilani, Pilani Campus


Upper tail test

Hypotheses

Test statistic

Rejectio region (p-


value approach)

Rejectio region
(Critical value
approach)

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Few applications:

• 12.2 Chi-square test for independence


• 12.3 Chi-square test for goodness of fit.

BITS Pilani, Pilani Campus


Consider the example:
A beer industry association conducts a
survey to determine the preferences of
beer drinkers for light,regular,and dark
beers.
A sample of 200 beer drinkers is taken with
each person in the sample asked to
indicate a preference for one of the three
types of beers:light,regular,or dark.
At the end of the survey questionnaire, the
respondent is asked to provide information
on a variety of demographics including
gender: male or female.
A research question of interest to the
association is whether preferences for the
three types of beer is independent of the
gender of the beer drinker.

BITS Pilani, Pilani Campus


Consider the example:
A beer industry association conducts a survey to
determine the preferences of beer drinkers for
light,regular,and dark beers.
A sample of 200 beer drinkers is taken with each
person in the sample asked to indicate a
preference for one of the three types of
beers:light,regular,or dark.
At the end of the survey questionnaire, the
respondent is asked to provide information on
a variety of demographics including gender:
male or female.
A research question of interest to the association
is whether preferences for the three types of
beer is independent of the gender of the beer
drinker.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Goodness of fit

Chemline hires approximately 400 new employees annually


for its four plants located throughout the United States.
The personnel director asks whether a normal
distribution applies for the population of test scores. If
such a distribution can be used, the distribution would be
helpful in evaluating specific test scores; that is ,scores
in the upper 20%, lower 40% , and so on could be
identified quickly.
Hence we want to test the null hypothesis that the
population of test scores has a normal distribution.

BITS Pilani, Pilani Campus


Source : Chemline Employee Aptitute Test Scores For 50
Randomly Chosen Job Applicants:
71 66 61 65 54 93 60 86 70 70 73 73 55 63 56 62
76 54 82 79 76 68 53 58 85 80 56 61 61 64 65
62 90 69 76 79 77 54 64 74 65 65 61 56 63 80
56 71 79 84
First step: Develop estimates of the mean and standard
deviation of the normal distribution.
Sample mean:
Sample standard deviation:

BITS Pilani, Pilani Campus


Sol..

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Percentage z Test Score
10% -1.28
20% -0.84
30% -0.52
40% -0.25
50% 0.00
60% 0.25
70% 0.52
80% 0.84
90% 1.28

BITS Pilani, Pilani Campus


Percentage z Test Score
10% -1.28 68.42-1.28(10.41)=55.10
20% -0.84 59.68
30% -0.52 63.01
40% -0.25 65.82
50% 0.00 68.42
60% 0.25 71.02
70% 0.52 73.83
80% 0.84 77.16
90% 1.28 81.74

BITS Pilani, Pilani Campus


71 66 61 65 54 93 60 86 70 70 73 73 55 63 56 62 76 54 82 79 76 68 53 58 85 80 56 61

61 64 65 62 90 69 76 79 77 54 64 74 65 65 61 56 63 80 56 71 79 84

Test Score Interval Observed fre Expected fre Chi- square test sta
Less than 55.10
55.10 to 59.68
59.68 to 63.01
63.01 to 65.82
65.82 to68.42
68.42 to 71.02
71.02 to 73.83
73.83 to 77.16
77.16 to 81.74
81.74 and over
BITS Pilani, Pilani Campus
Test Score Interval Observed fre Expected fre Chi- square test sta
Less than 55.10 5 5 0
55.10 to 59.68 5 5 0
59.68 to 63.01 9 5 3.2
63.01 to 65.82 6 5 0.2
65.82 to68.42 2 5 1.8
68.42 to 71.02 5 5 0
71.02 to 73.83 2 5 1.8
73.83 to 77.16 5 5 0
77.16 to 81.74 5 5 0
81.74 and over 6 5 0.2

BITS Pilani, Pilani Campus


Note : The degree of freedom = k-p-1
where
p= number of parameters of the distribution
estimated by the sample =2
k= number of categories =10
Total number of degrees of freedom= 10-2-1=7
P-value >0.10 , fail to reject the HO
i.E the observed sample came from normal distribution.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
Situation:
The independent-samples t-test with the example of an
experiment aimed at determining whether two types of
music have different effects on the performance of a
mental task.

Suppose that we were instead interested in assessing


the relative effects of three types of music. In this case,
the experimental procedure is the same in every detail,
except that now we carry it out with three groups, one for
each of the three types of music.

BITS Pilani, Pilani Campus


The analysis of variance, commonly referred to by the
acronym ANOVA, was first developed as a strategy for
dealing with this sort of complication. At its lowest level it
is essentially an extension of the logic of t-tests to those
situations where we wish to compare the means of three
or more samples concurrently.

BITS Pilani, Pilani Campus


######
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

[BA ZG524/MBA ZG538/PDBA


ZG538] Advanced Statistical
Methods
Lecture No : 7[24-02-24]
A drug is used to maintain Suppose that we wish to
a steady hear rate in compare a new drug to
patients who have that of above
suffered a mild heart And let Y the number of
attack.Let X denote the heartbeats per minute
number of heartbeats obtained with new
per minute obtained drug. The hypothetical
per patient. Consider density table is given
the hypothetical density below
given in Table

BITS Pilani, Pilani Campus


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• “ Variance can provide important decision-making
information”
• In the preceding sessions we discussed statistical
inference involving population means.
• In continuation will extend our discussion involving
inferences about population variance.
• We will use chi-square distributions

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Q. It is observed that for a sample of
size 9 , the sample standard deviation
is 2.81,
Develop a 90% and 95% CI for the
population variance (σ2)

BITS Pilani, Pilani Campus


Upper tail test

Hypotheses

Test statistic

Rejectio region (p-


value approach)

Rejectio region
(Critical value
approach)

BITS Pilani, Pilani Campus


Few applications:

• 12.2 Chi-square test for independence


• 12.3 Chi-square test for goodness of fit.

BITS Pilani, Pilani Campus


Consider the example:
A beer industry association conducts a
survey to determine the preferences of
beer drinkers for light,regular,and dark
beers.
A sample of 200 beer drinkers is taken with
each person in the sample asked to
indicate a preference for one of the three
types of beers:light,regular,or dark.
At the end of the survey questionnaire, the
respondent is asked to provide information
on a variety of demographics including
gender: male or female.
A research question of interest to the
association is whether preferences for the
three types of beer is independent of the
gender of the beer drinker.

BITS Pilani, Pilani Campus


Consider the example:
A beer industry association conducts a survey to
determine the preferences of beer drinkers for
light,regular,and dark beers.
A sample of 200 beer drinkers is taken with each
person in the sample asked to indicate a
preference for one of the three types of
beers:light,regular,or dark.
At the end of the survey questionnaire, the
respondent is asked to provide information on
a variety of demographics including gender:
male or female.
A research question of interest to the association
is whether preferences for the three types of
beer is independent of the gender of the beer
drinker.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Goodness of fit

Chemline hires approximately 400 new employees annually


for its four plants located throughout the United States.
The personnel director asks whether a normal
distribution applies for the population of test scores. If
such a distribution can be used, the distribution would be
helpful in evaluating specific test scores; that is ,scores
in the upper 20%, lower 40% , and so on could be
identified quickly.
Hence we want to test the null hypothesis that the
population of test scores has a normal distribution.

BITS Pilani, Pilani Campus


Source : Chemline Employee Aptitute Test Scores For 50
Randomly Chosen Job Applicants:
71 66 61 65 54 93 60 86 70 70 73 73 55 63 56 62
76 54 82 79 76 68 53 58 85 80 56 61 61 64 65
62 90 69 76 79 77 54 64 74 65 65 61 56 63 80
56 71 79 84
First step: Develop estimates of the mean and standard
deviation of the normal distribution.
Sample mean:
Sample standard deviation:

BITS Pilani, Pilani Campus


Sol..

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Percentage z Test Score
10% -1.28
20% -0.84
30% -0.52
40% -0.25
50% 0.00
60% 0.25
70% 0.52
80% 0.84
90% 1.28

BITS Pilani, Pilani Campus


Percentage z Test Score
10% -1.28 68.42-1.28(10.41)=55.10
20% -0.84 59.68
30% -0.52 63.01
40% -0.25 65.82
50% 0.00 68.42
60% 0.25 71.02
70% 0.52 73.83
80% 0.84 77.16
90% 1.28 81.74

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
71 66 61 65 54 93 60 86 70 70 73 73 55 63 56 62 76 54 82 79 76 68 53 58 85 80 56 61

61 64 65 62 90 69 76 79 77 54 64 74 65 65 61 56 63 80 56 71 79 84

Test Score Interval Observed fre Expected fre Chi- square test sta
Less than 55.10
55.10 to 59.68
59.68 to 63.01
63.01 to 65.82
65.82 to68.42
68.42 to 71.02
71.02 to 73.83
73.83 to 77.16
77.16 to 81.74
81.74 and over
BITS Pilani, Pilani Campus
Test Score Interval Observed fre Expected fre Chi- square test sta
Less than 55.10 5 5 0
55.10 to 59.68 5 5 0
59.68 to 63.01 9 5 3.2
63.01 to 65.82 6 5 0.2
65.82 to68.42 2 5 1.8
68.42 to 71.02 5 5 0
71.02 to 73.83 2 5 1.8
73.83 to 77.16 5 5 0
77.16 to 81.74 5 5 0
81.74 and over 6 5 0.2

BITS Pilani, Pilani Campus


Note : The degree of freedom = k-p-1
where
p= number of parameters of the distribution
estimated by the sample =2
k= number of categories =10
Total number of degrees of freedom= 10-2-1=7
P-value >0.10 , fail to reject the H0
i.E the observed sample came from normal distribution.

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
The independent-samples t-test
with the example of an experiment
aimed at determining whether two
types of music have different
effects on the mean performance The analysis of variance,
of a mental task. commonly referred to by the
acronym ANOVA, was first
developed as a strategy for
Suppose that we were instead dealing with this sort of
interested in assessing the relative complication. At its lowest level it
effects of three types of music. In is essentially an extension of the
this case, the experimental logic of t-tests to those situations
procedure is the same in every where we wish to compare the
detail, except that now we carry it means of three or more samples
out with three groups, one for concurrently.
each of the three types of music.

BITS Pilani, Pilani Campus


An Introduction to Experimental Design and Analysis of variance

As an example of an experimental statistical


study, let us consider the problem facing
Chemitech,Inc.
Chemitech developed a new filtration system
of municipal water supplies. The
components for the new filtration system
will be purchased from several supplers,
and Chemitech will assemble the
components at its plant in Columbia,South
Carolina. The industrial engineering group
is responsible for determining the best
assembly method for the new filtration
system. After considering a variety of
possible approaches, the group narrows
the alternatives to three: method A,
method B, method C.
These methods differ in the sequence of steps
used to assemble the system
Managers at Chemitech want to determine
which assemble method can produce the
greatest number of filtration systems per
week.

BITS Pilani, Pilani Campus


Data:
The number of units assembled by
each employee during one
Method
week is shown in the given A B C
table: 58
64
58
69
48
57
55 71 59
66 64 47
? real objective is whether the 67 68 49
sample mean 62 66 52
three sample means observed sample variance 27.5 26.5 31
are different enough for us to
conclude that the means of the
populations corresponding to
the three methods of assembly
are different.

BITS Pilani, Pilani Campus


Analysis of variance: A conceptual Overview

When Ho is true

BITS Pilani, Pilani Campus


Analysis of variance: A conceptual Overview(contd..)

When Ho is false

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
All results can be displayed conveniently in a
table referred as ANOVA:

BITS Pilani, Pilani Campus


Managerial decisions often are based on the relationship between two or
more variables.
! considering the relationship between advertising expenditures and sales, a
marketing manager might attempt to predict sales for a given level of
adversiting expenditure.
!! In another case, a public utility might use the relationship between the daily
high temperature and the demand for electricity to predict electricity usage
on the basis of the next month’s anticipated daily high temperature.
!!! Some times a manager will rely on intuition to judge how two variables are
related.

BITS Pilani, Pilani Campus


Consider the example :

Reed Auto periodically has a


special week-long sale. As part
of the advertising campaign
Reed runs one or more
television commercials during
the weekend preceding the sale.
Data from a sample of 5
previous sales are shown as
mentioned in the table.

BITS Pilani, Pilani Campus


Example :
Armand’s Pizza Parlors is a chain of Italian-food restaurants located in a
five-state area. Armand’s most successful locations are near college
campuses. The managers believe that quarterly sales(in $1000) for
these restaurants(denoted by y) are related positively to the size of
the students population(1000’s)(denoted by x); that is, restaurants
near campuses with a large student population tend to generate more
sales than those with a small students populations. Using regression
analysis we observed how the dependent variable is related to the
independent variable.
To illustrate , suppose data were collected from a sample of 10
restaurants located near college is as mentioned below:
R 1 2 3 4 5 6 7 8 9 10
Size 2 6 8 8 12 16 20 20 22 26
Q.Sales 58 105 88 118 117 137 157 169 149 202

BITS Pilani, Pilani Campus


Consider a problem faced by the Butler
Trucking Company (Refer page No
687) an independent trucking Further manager felt that the number of
company in southern California. A deliveries could also contribute to the
major portion of Butler’s business total travel time.
involves deliveries throughout its local
The Buttler Trucking updated data as
area. To develop better work
shown below:
schedules, the managers want to
predict the total daily travel time for
their drivers.
Initial Observation by manager : “ he
believed that the total daily travel
time would be closely related to the
number of miles traveled in making
the daily deliveries.
A simple random sample of 10 driving
assignments provided the data shown
below (y=Travel time(hours),X1= miles
travelled

BITS Pilani, Pilani Campus


#######
BITS Pilani
Pilani Campus

[BA ZG524/MBA ZG538/PDBA


ZG538] Advanced Statistical
Methods
Lecture No : 8[09-03-24]
Test for Independence

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• In Chi-Square goodness of fit test, the
term goodness of fit is used to compare
the observed sample distribution with
the expected probability distribution

BITS Pilani, Pilani Campus


Goodness of fit

Chemline hires approximately 400 new employees annually


for its four plants located throughout the United States.
The personnel director asks whether a normal
distribution applies for the population of test scores. If
such a distribution can be used, the distribution would be
helpful in evaluating specific test scores; that is ,scores
in the upper 20%, lower 40% , and so on could be
identified quickly.
Hence we want to test the null hypothesis that the
population of test scores has a normal distribution.

BITS Pilani, Pilani Campus


Source : Chemline Employee Aptitute Test Scores For 50
Randomly Chosen Job Applicants:
71 66 61 65 54 93 60 86 70 70 73 73 55 63 56 62
76 54 82 79 76 68 53 58 85 80 56 61 61 64 65
62 90 69 76 79 77 54 64 74 65 65 61 56 63 80
56 71 79 84
First step: Develop estimates of the mean and standard
deviation of the normal distribution.
Sample mean: 68.42
Sample standard deviation:10.41

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
Percentage z Test Score
10% -1.28
20% -0.84
30% -0.52
40% -0.25
50% 0.00
60% 0.25
70% 0.52
80% 0.84
90% 1.28

BITS Pilani, Pilani Campus


Percentage z Test Score
10% -1.28 68.42-1.28(10.41)=55.10
20% -0.84 59.68
30% -0.52 63.01
40% -0.25 65.82
50% 0.00 68.42
60% 0.25 71.02
70% 0.52 73.83
80% 0.84 77.16
90% 1.28 81.74

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
71 66 61 65 54 93 60 86 70 70 73 73 55 63 56 62 76 54 82 79 76 68 53 58 85 80 56 61

61 64 65 62 90 69 76 79 77 54 64 74 65 65 61 56 63 80 56 71 79 84

Test Score Interval Observed fre Expected fre Chi- square test sta
Less than 55.10
55.10 to 59.68
59.68 to 63.01
63.01 to 65.82
65.82 to68.42
68.42 to 71.02
71.02 to 73.83
73.83 to 77.16
77.16 to 81.74
81.74 and over
BITS Pilani, Pilani Campus
Test Score Interval Observed fre Expected fre Chi- square test sta
Less than 55.10 5 5 0
55.10 to 59.68 5 5 0
59.68 to 63.01 9 5 3.2
63.01 to 65.82 6 5 0.2
65.82 to68.42 2 5 1.8
68.42 to 71.02 5 5 0
71.02 to 73.83 2 5 1.8
73.83 to 77.16 5 5 0
77.16 to 81.74 5 5 0
81.74 and over 6 5 0.2

Note : The degree of freedom = k-p-1


where
p= number of parameters of the distribution
estimated by the sample =2
k= number of categories =10
Total number of degrees of freedom= 10-2-1=7
P-value >0.10 , fail to reject the H0
i.E the observed sample came from normal distribution.
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
The independent-samples t-test
with the example of an experiment
aimed at determining whether two
types of music have different
effects on the mean performance The analysis of variance,
of a mental task. commonly referred to by the
acronym ANOVA, was first
developed as a strategy for
Suppose that we were instead dealing with this sort of
interested in assessing the relative complication. At its lowest level it
effects of three types of music. In is essentially an extension of the
this case, the experimental logic of t-tests to those situations
procedure is the same in every where we wish to compare the
detail, except that now we carry it means of three or more samples
out with three groups, one for concurrently.
each of the three types of music.

BITS Pilani, Pilani Campus


An Introduction to Experimental Design and Analysis of variance

As an example of an experimental statistical


study, let us consider the problem facing
Chemitech,Inc.
Chemitech developed a new filtration system
of municipal water supplies. The
components for the new filtration system
will be purchased from several supplers,
and Chemitech will assemble the
components at its plant in Columbia,South
Carolina. The industrial engineering group
is responsible for determining the best
assembly method for the new filtration
system. After considering a variety of
possible approaches, the group narrows
the alternatives to three: method A,
method B, method C.
These methods differ in the sequence of steps
used to assemble the system
Managers at Chemitech want to determine
which assemble method can produce the
greatest number of filtration systems per
week.

BITS Pilani, Pilani Campus


Method
Data: A B C
58 58 48
The number of units assembled by 64 69 57
each employee during one 55 71 59
66 64 47
week is shown in the given 67 68 49
table: sample mean
sample variance
62
27.5
66
26.5
52
31

? real objective is whether the


three sample means observed
are different enough for us to
conclude that the means of the
populations corresponding to
the three methods of assembly
are different.

BITS Pilani, Pilani Campus


Analysis of variance: A conceptual Overview

When Ho is true

BITS Pilani, Pilani Campus


Analysis of variance: A conceptual Overview(contd..)

When Ho is false

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
All results can be displayed conveniently in a
table referred as ANOVA:

BITS Pilani, Pilani Campus


In complete ANOVA table:

SoV S.S DF MS F P-value


Treatments 2 260 0.004
Error 340
Total

Note: SST = SSTR+SSE

BITS Pilani, Pilani Campus


One-Way ANOVA Example
The call center ran three 4-hour shifts from 8 AM – 8 PM. Every month the operators would
be rotated out of the current shift and into the next shift.
It wanted to study whether there is any difference in productivity in the 3 shifts. 15 operators
were randomly selected and randomly assigned to the 3 shifts. Productivity was measured in
the number of tickets closed by the operator. The data is given below.

Shift 1 Shift 2 Shift 3


27 32 29
30 29 28
29 31 30
28 29 32
31 30 31
Example: Preliminary Remarks

What is the Factor: Shift


Describe the treatments: Shift 1, Shift 2, Shift 3
What are the Experimental units: Operators
What is the Response variable: Number of tickets closed
What is the Statistical Design: Completely Randomized Design

Set up the Test:


H0: µ 1 = µ 2 = µ 3
Ha: Not all the means are equal
where:
µ 1 = mean number of tickets closed by operators in the 1st shift
µ 2 = mean number of tickets closed by operators in the 1st shift
µ 3 = mean number of tickets closed by operators in the 1st shift
Shift 1 Shift 2 Shift 3
27 32 29
30 29 28
29 31 30
28 29 32
31 30 31
29 30.2 30
s2 2.5 1.7 2.5
ANOVA Table for a Completely Randomized Design
Source of Sum of Degrees of Mean Square F p-value
Variation Squares Freedom
Treatment SSTR k-1

Error SSE nt -k

Total SST nt -1
Example: Reed Manufacturing
Janet Reed would like to know if there is any significant
difference in the mean number of hours worked per week for
the department managers at her three manufacturing plants
(in Buffalo, Pittsburgh, and Detroit). An F test will be
conducted using a = .05

S.O.V S.S df MS F P-value

Plants 490 ? ? ?

Comment
Error on your
? results? using ANOVA
25.66667 table below.

Total ? ?

BITS Pilani, Pilani Campus


########
Observe the Problem:
WARTA, the Warren Area Regional Transit Authority, is expanding bus
service from the suburb of Starbrick into the central business district
of Warren. These are four routes being considered from Starbrick to
downtown Warren:
Via (1) US.6 (2) West End (3) Hickory Street (4) Route 59
WARTA conducted several tests to determine whether there was a
difference in the mean travel times along the four routes. Because
there will be many different drivers, the test was set up so each driver
drove along each of the four routes. Below is the travel time(in
minutes), for each driver-route combination.
Travel time
(Min)
Driver U.S 6 West End Hickory St Rte 59
A 18 17 21 22
B 16 23 23 22
C 21 21 26 22
D 23 22 29 25
E 25 24 28 28

BITS Pilani, Pilani Campus


Anova: Two-Factor Without Replication

SUMMARY Count Sum Average Variance

A 4 78 19.5 5.666667

B 4 84 21 11.33333

C 4 90 22.5 5.666667

D 4 99 24.75 9.583333

E 4 105 26.25 4.25


Anova: Single Factor

SUMMARY U.S 6 5 103 20.6 13.3

Groups Count Sum Average Variance West End 5 107 21.4 7.3

U.S 6 5 103 20.6 13.3 Hickory St 5 127 25.4 11.3


West End 5 107 21.4 7.3
Rte 59 5 119 23.8 7.2
Hickory St 5 127 25.4 11.3

Rte 59 5 119 23.8 7.2

ANOVA

ANOVA Source of Variation SS df MS F P-value F crit


Source of Variation SS df MS F P-value F crit
Rows 119.7 4 29.925 9.784741 0.000934 3.259167
Between Groups 72.8 3 24.26667 2.482523 0.098105 3.238872
Columns 72.8 3 24.26667 7.934605 0.003508 3.490295
Within Groups 156.4 16 9.775
Error 36.7 12 3.058333

Total 229.2 19

Total 229.2 19

BITS Pilani, Pilani Campus


Managerial decisions often are based on
the relationship between two or more
variables.
! considering the relationship between
advertising expenditures and sales, a
marketing manager might attempt to
predict sales for a given level of
adversiting expenditure.
!! In another case, a public utility might
use the relationship between the daily
high temperature and the demand for
electricity to predict electricity usage on
the basis of the next month’s
anticipated daily high temperature.
!!! Some times a manager will rely on
intuition to judge how two variables are
related.

BITS Pilani, Pilani Campus


Consider the example :

Reed Auto periodically has a


special week-long sale. As part
of the advertising campaign
Reed runs one or more
television commercials during
the weekend preceding the sale.
Data from a sample of 5
previous sales are shown as
mentioned in the table.

BITS Pilani, Pilani Campus


Consider a problem faced by the Butler
Trucking Company (Refer page No
687) an independent trucking Further manager felt that the number of
company in southern California. A deliveries could also contribute to the
major portion of Butler’s business total travel time.
involves deliveries throughout its local
The Buttler Trucking updated data as
area. To develop better work
shown below:
schedules, the managers want to
predict the total daily travel time for
their drivers.
Initial Observation by manager : “ he
believed that the total daily travel
time would be closely related to the
number of miles traveled in making
the daily deliveries.
A simple random sample of 10 driving
assignments provided the data shown
below (y=Travel time(hours),X1= miles
travelled

BITS Pilani, Pilani Campus


Johnson Filtration ,Inc., provides maintenance
service for water filtration system throughout
southern Florida. Customers contact Johnson
with requests for maintenance service on their
water-filteration systems. To estimate the
service time and the service cost, Johnson’s
manager want to predict the repair time
necessary for each maintenance request.
[Refer Page No 711, 15.7]
Hence, repair time in hours is dependent variable.
Repair time is believed to be related to two
factors, the number of months since the last
maintenance service and the type of repair
problem(mechanical or electrical). Data for a
sample of 10 service calls reported in table
below:
So far , we discussed quantitative independent
variables such as MSLS.
How ever, the problem is appended with , work on
categorical independent variables such as Type of
Repair.
Purpose of this discussion is to show how
categorical variables are handled in regression
analysis

BITS Pilani, Pilani Campus


Situation 1: a bank might want to develop
an estimated regression equation for
predicting whether a person will be
approved for a credit card.
The dependent variable can be coded as
Y=1 if the bank approves the request for a
credit card and Logistic regression is the
Y=0 if the bank rejects the request for a credit
card. appropriate regression
analysis to conduct
Using logistic regression we can estimate the
probability that the bank will approve the when the dependent
request for a credit card given a particular
set of values for the chosen I.V variable is dichotomous
(binary).
situation 2:
Spam detection is a binary classification Like all regression
problem where we are given an email and
we need to classify whether or not it is analyses, the logistic
spam. If the email is spam, we label it 1; if
it is not spam, we label it 0.
regression is a predictive
analysis.
BITS Pilani, Pilani Campus
Logistic Regression

Management believes that annual spending at


Consider an application of logistic regression Simmons Stores and whether a customer has
involving a direct mail promotion being used Simmons credit card are two variables that
by Simmons Stores. Simmons owns and might be helpful in predicting whether a
operates a national chain of women’s apparel customer who receives the catalog will use
stores. Five thousand copies of an expensive the coupon.
four-colour sales catalog have been printed,
and each catalog includes a coupon that
provides a $50 discount on purchases $200 or
more. The catalogs are expensive and Simons
would like to send them to only those
customers who have a high probability of
using the coupon.
Refer to Page no:727

BITS Pilani, Pilani Campus


Simmons conducted a pilot study using a
random sample of 50 credit card
customers and 50 other customers
who do not have a Simmons credit
card.Simmons sent the catalog to each
of the 100 customers selected. At the
end of a test period, Simmons noted
whether each customer had used her
or his coupon.
The sample data for the first 10 catalog
recipients shown in table
Note: variables involved
X1=The amount spend in thousand
dollars
X2=Credit card information is coded as 1
or 0
Y=Coupon column, 1 indicates customer
used coupon (or) 0 means not.

BITS Pilani, Pilani Campus


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

You might also like